A General Framework for a Class of Primal-Dual Algorithms for TV Minimization

Size: px

Start display at page:

Download "A General Framework for a Class of Primal-Dual Algorithms for TV Minimization"

Ernest Malone
5 years ago
Views:

1 A General Framework for a Class of Primal-Dual Algorithms for TV Minimization Ernie Esser UCLA 1

2 Outline A Model Convex Minimization Problem Main Idea Behind the Primal Dual Hybrid Gradient (PDHG) Method Application to TV Minimization Problems Equivalent Primal, Dual and Saddle Point Formulations Moreau Decomposition Connections Between PDHG and Related Algorithms Proximal Forward Backward Splitting (PFBS) Alternating Minimization Algorithm (AMA) Split Bregman Alternating Direction Method of Multipliers (ADMM) Douglas Rachford Split Inexact Uzawa (SIU) Modified PDHG Application to Convex Problems with Separable Structure Projection Interpretations Example Applications Constrained TV Deblurring Compressive Sensing 2

3 A Model Convex Minimization Problem min J(Au) + H(u) (P) u Rm J, H closed proper convex H : R m (, ] J : R n (, ] A R n m We will also assume there exists an optimal solution u to (P) 3

4 Let J =, an arbitrary norm on R n PDHG Main Idea Dual norm p := max w 1 w, p, = So Au = Au = max p 1 p, Au Rewrite (P) as saddle point problem, min u max p 1 p, Au + H(u) PDHG alternates dual and primal steps of the form p k+1 = arg max p 1 p, Auk 1 2δ k p p k 2 2 u k+1 = arg min u R m pk+1, Au + H(u) + 1 2α k u u k 2 2 for appropriate parameters α k and δ k. Ref: M. ZHU, AND T. F. CHAN, An Efficient Primal-Dual Hybrid Gradient Algorithm for Total Variation Image Restoration, UCLA CAM Report [08-34], May

5 Application to TV Minimization Problems Discretize u TV using forward differences and assuming Neumann BC u TV = M r r=1 M c c=1 (D + c u r,c ) 2 + (D + r u r,c ) 2 Vectorize M r M c matrix by stacking columns, (r, c) element of matrix (c 1)M r + r element of vector Define a directed grid-shaped graph with m = M r M c nodes corresponding to elements (r, c). Index the m nodes by (c 1)M r + r and the e = 2M r Mc M r M c edges arbitrarily. 3 3 example: 5

6 Definition of D Define D R e m to be the edge-node adjacency matrix for the graph. For each edge η with endpoint indices (i, j), i < j, define: 1 for k = i, D η,k = 1 for k = j, 0 for k i, j. Interpret D as discretization of gradient To identify the edges used in each forward difference, also define E R e m such that { 1 if D η,k = 1, E η,k = 0 otherwise. 6

7 Definition of J Can use E to define norm E on R e w E = = = E T (w 2 ) i=1 1 m ( ) E T (w 2 ) i m w i 2 i=1 where w i is the vector of edge values for directed edges coming out of node i. For TV regularization, let J = E and A = D. Then J(Au) = Du E = u TV Note that dual norm p E = E T (p 2 ) 7

8 Nonlocal TV Extension By redefining the graph, the same notation extends to nonlocal TV regularization. node j Double edges in general: node i edge e ij edge e ji For a general graph: D NL e ij,i = 1 D NL e ji,i = 1 Define E NL in same way E was defined earlier. D NL e ij,j = 1 D NL e ji,j = 1 By defining a diagonal matrix W of nonnegative edge weights and letting A = WD NL, we can write u NLTV = Au E NL 8

9 PDHG for Unconstrained TV Deblurring min u TV + λ u R m 2 Ku f 2 2 min max p, Du + λ u R m p X 2 Ku f 2 2 where X = {p R e : p E 1} Apply PDHG: p k+1 = arg max p X p, Duk 1 2λτ k p p k 2 2 u k+1 = arg min u R m pk+1, Du + λ 2 Ku f λ(1 θ k) 2θ k u u k 2 2 where θ and τ are related to α and δ by θ k = λα k 1 + α k λ τ k = δ k λ 9

10 PDHG Iterations for TV Deblurring The u k+1 and p k+1 updates have closed form solutions: p k+1 = Π X ( p k + τ k λdu k) u k+1 = ( (1 θ k ) + θ k K T K ) 1 ( (1 θ k )u k + θ k (K T f 1 ) λ DT p k+1 ) where Π X is the orthogonal projection onto X defined by Π X (q) = arg min p p X q 2 2 q = ( ) E max ET (q 2 ), 1 10

11 Saddle Point Form via Legendre Transform Legendre transform of J is defined by J (p) = sup w, p J(w) w Since J = J for arbitrary closed proper convex J, we can define analogous saddle point version of (P) even when J is not a norm. J(Au) = J (Au) = sup p, Au J (p) p min u sup p min u F P (u) := J(Au) + H(u) (P) L PD (u, p) := p, Au J (p) + H(u) (PD) 11

12 Special Case When J Is a Norm When J(w) = w, J (p) is the indicator function for the unit ball in the dual norm J (p) = sup w, p w w { 0 if w, p w w = otherwise { 0 if sup = w 1 w, p 1 otherwise { 0 if p 1 (by dual norm definition) = otherwise 12

13 Dual Problem and Strong Duality The dual problem is max F D(p) p R n (D) where the dual functional F D (p) is a concave function defined by F D (p) = inf u R m L PD(u, p) = inf u R m p, Au J (p)+h(u) = J (p) H ( A T p) Having assumed u is an optimal solution to (P), it follows that there exists an optimal solution p to (D) Strong duality holds, meaning F P (u ) = F D (p ) u solves (P) and p solves (D) iff (u, p ) is saddle point of L PD Ref: R. T. ROCKAFELLAR, Convex Analysis, Princeton University Press, Princeton, NJ,

14 Split Primal Saddle Point Formulation Introduce the constraint w = Au in (P) and form the Lagrangian L P (u, w, p) = J(w) + H(u) + p, Au w The corresponding saddle point problem is max p R n inf L P(u, w, p) u R m,w R n (SPP) If (u, w, p ) is a saddle point for (SPP), then (u, p ) is a saddle point for (PD) Although p was introduced in L P as a Lagrange multiplier for the constraint Au = w, it has the same interpretation as the dual variable p in (PD). 14

15 Split Dual Saddle Point Formulation Introducing the constraint y = A T p in (D) and form the corresponding Lagrangian L D (p, y, u) = J (p) + H (y) + u, A T p y Obtain yet another saddle point problem, max u R m inf L D(p, y, u) p R n,y R m (SPD) If (p, y, u ) is a saddle point for (SPD), then (u, p ) is a saddle point for (PD). Although u was introduced in (SPD) as a Lagrange multiplier for the constraint y = A T p, it has the same interpretation as the primal variable u in (P). 15

16 (P) min u F P (u) F P (u) = J(Au) + H(u) (D) max p F D (p) F D (p) = J (p) H ( A T p) (PD) min u sup p L PD (u, p) L PD (u, p) = p, Au J (p) + H(u) (SP P ) max p inf u,w L P (u, w, p) L P (u, w, p) = J(w) + H(u) + p, Au w (SP D ) max u inf p,y L D (p, y, u) L D (p, y, u) = J (p) + H (y) + u, A T p y AMA on (SP P ) PFBS on (D) PFBS on (P) AMA on (SP D ) + 1 u 2α uk p 2δ pk 2 2 Relaxed AMA Relaxed AMA on (SP P ) on (SP D ) + δ Au 2 w α 2 AT p + y 2 2 ADMM on (SP P ) Douglas Rachford on (D) Primal-Dual Proximal Point on (PD) = PDHG Douglas Rachford on (P) ADMM on (SP D ) + 1 u 2 uk,( 1 α δat A)(u u k ) p k+1 u k + 1 p 2 pk,( 1 δ αaat )(p p k ) 2p k+1 p k 2u k u k 1 Split Inexact Uzawa on (SP P ) PDHGMp PDHGMu Split Inexact Uzawa on (SP D ) Legend: (P): Primal (D): Dual (PD): Primal-Dual (SP P ): Split Primal (SP D ): Split Dual AMA: Alternating Minimization Algorithm (4.2.1) PFBS: Proximal Forward Backward Splitting (4.2.1) ADMM: Alternating Direction Method of Multipliers (4.2.2) PDHG: Primal Dual Hybrid Gradient (4.2) PDHGM: Modified PDHG (4.2.3) Bold: Well Understood Convergence Properties 16

17 Some Convex Optimization References D. BERTSEKAS, Constrained Optimization and Lagrange Multiplier Methods, Athena Scientific, D. BERTSEKAS, Nonlinear Programming, Athena Scientific, Second Edition D. BERTSEKAS AND J. TSITSIKLIS, Parallel and Distributed Computation, Prentice Hall, S. BOYD AND L. VANDENBERGHE, Convex Analysis, Cambridge University Press, I. EKELAND AND R. TEMAM, Convex Analysis and Variational Problems, SIAM, Classics in Applied Mathematics, 28, R. T. ROCKAFELLAR, Convex Analysis, Princeton University Press, Princeton, NJ, R. T. ROCKAFELLAR, AND R. WETS, Variational Analysis, Springer, Vol. 317,

18 Moreau Decomposition Moreau s decomposition will be a main tool in demonstrating connections between the algorithms that follow. Let f R m, J a closed proper convex function on R n, and A R n m. Then: f = arg min u R m J(Au)+ 1 2α u f 2 2 +αat arg min p R n J (p)+ α 2 AT p f α 2 2 J. J. MOREAU, Proximité et dualité dans un espace hilbertien, Bull. Soc. Math. France, 93, 1965, pp

19 PDHG on (PD) p k+1 = arg min p J (p) + p, Au k + 1 2δ k p p k 2 2 u k+1 = arg min u H(u) + A T p k+1, u + 1 2α k u u k 2 2 Initialization: p 0 and u 0 arbitrary Parameters: α k, δ k > 0 Interesting special cases: α k = or δ k = Example: α k = case corresponds to PFBS on (D) Ref: E. Esser, X. Zhang, and T. F. Chan, A General Framework for a Class of First Order Primal-Dual Algorithms for TV Minimization, UCLA CAM Report [09-67], August

20 PFBS on (D) PFBS alternates a gradient descent step with a proximal step: p k+1 = arg min p R n J (p) + 1 2δ k p (p k + δ k Au k+1 ) 2 2, where u k+1 = H ( A T p k ). Since u k+1 = H ( A T p k ) A T p k H(u k+1 ), which is equivalent to u k+1 = arg min u R m H(u) + AT p k, u, PFBS on (D) can be rewritten as u k+1 = arg min u R m H(u) + AT p k, u p k+1 = arg min p R n J (p) + p, Au k δ k p p k

21 Convergence of PFBS Convergence requires H differentiable, (H ( A T p)) Lipschitz continuous with Lipschitz constant equal to 1 β, and 0 < inf δ k supδ k < 2β. {p k } converges to a solution of (D) {u k } converges to the unique solution of (P) Note that when J is a norm, PFBS on (D) can be interpreted as a gradient projection algorithm. Ref: P. COMBETTES, AND W. WAJS, Signal Recovery by Proximal Forward-Backward Splitting, Multiscale Model. Simul.,

22 AMA on Split Primal AMA applied to (SPP) alternately minimizes first the Lagrangian L P (u, w, p) with respect to u and then the augmented Lagrangian L P + δ k 2 Au w 2 2 with respect to w before updating the Lagrange multiplier p. u k+1 = arg min u R m H(u) + AT p k, u w k+1 = arg min w R n J(w) pk, w + δ k 2 Auk+1 w 2 2 p k+1 = p k + δ k (Au k+1 w k+1 ) Convergence: Requires same assumptions as for PFBS, which translates here to H being strongly convex with modulus β A 2 2 AMA on (SPP) is actually equivalent to PFBS on (D) Ref: P. TSENG, Applications of a Splitting Algorithm to Decomposition in Convex Programming and Variational Inequalities, SIAM J. Control Optim., Vol. 29, No. 1, 1991, pp

23 Equivalence by Moreau Decomposition PFBS on (D) and AMA on (SPP) have the same first step, namely u k+1 = arg min u R m H(u) + AT p k, u, so it suffices to show the last two steps of AMA (combined into one), p k+1 = (p k + δ k Au k+1 ) δ k arg min w is equivalent to the second step of PFBS, J(w) + δ k 2 w (pk + δ k Au k+1 ) δ k 2 2, p k+1 = arg min p J (p) + 1 2δ k p (p k + δ k Au k+1 ) 2 2. This is true by direct application of Moreau decomposition. 23

24 Analogous δ k = Case of PDHG PFBS on (P), p k+1 = arg min p R n J (p) + Au k, p u k+1 = arg min u R m H(u) + u, AT p k α k u u k 2 2 is analogously equivalent to AMA applied to (SPD), p k+1 = arg min p R m J (p) + Au k, p y k+1 = arg min y R m H (y) u k, y + α k 2 y + AT p k u k+1 = u k + α k ( A T p k+1 y k+1 ) 24

25 Connection to PDHG PFBS on (P) with additional proximal penalty, p k+1 = arg min p R n J (p) + Au k, p + 1 2δ k p p k 2 2 u k+1 = arg min u R m H(u) + u, AT p k α k u u k 2 2 AMA applied to (SPD) with first step relaxed by same additional penalty, p k+1 = arg min p R m J (p) + Au k, p + 1 2δ k p p k 2 2 y k+1 = arg min y R m H (y) u k, y + α k 2 y + AT p k u k+1 = u k + α k ( A T p k+1 y k+1 ) PFBS on (D) and AMA on (SPP) are connected to PDHG in the analogous way. 25

26 Connection to ADMM and Split Bregman Add augmented Lagrangian penalty to first step of AMA to get ADMM. ADMM on (SPP) ADMM on (SPD) u k+1 = arg min u R m H(u) + AT p k, u + δ k 2 Au wk 2 2 w k+1 = arg min w R n J(w) pk, w + δ k 2 w Auk p k+1 = p k + δ k (Au k+1 w k+1 ) p k+1 = arg min p R m J (p) + Au k, p + α k 2 AT p + y k 2 2 y k+1 = arg min y R m H (y) u k, y + α k 2 y + AT p k u k+1 = u k + α k ( A T p k+1 y k+1 ) Ref: D. BERTSEKAS, AND J. TSITSIKLIS, Parallel and Distributed Computation, Prentice Hall, Ref: T. GOLDSTEIN AND S. OSHER, The Split Bregman Algorithm for L1 Regularized Problems, UCLA CAM Report [08-29], April

27 Convergence of ADMM For (u k, w k, p k ) satisfying ADMM iteration iterations for (SPP) to converge to saddle point of L P (u, w, p), require H(u) + Au 2 2 to be strictly convex and δ k = δ > 0 For (p k, y k, u k ) satisfying ADMM iteration iterations for (SPD) to converge to saddle point of L D (p, y, u), require J (p) + A T p 2 2 to be strictly convex and α k = α > 0 Minimization steps can be computed inexactly as long as sum of errors is finite Ref: J. ECKSTEIN, AND D. BERTSEKAS, On the Douglas-Rachford splitting method and the proximal point algorithm for maximal monotone operators, Math. Program. 55, 1992, pp

28 Equivalence to Douglas Rachford Splitting Can apply Moreau decomposition twice along with an appropriate change of variables to show ADMM on (SPP) or (SPD) is equivalent to Douglas Rachford Splitting on (D) and (P) resp. Example: Douglas Rachford splitting on (D) with t k = p k + δ k w k : t k+1 = arg min q H ( A T q) + 1 2δ k q + t k 2p k t k p k p k+1 = arg min p J (p) + 1 2δ k p t k Ref: S. SETZER, Split Bregman Algorithm, Douglas-Rachford Splitting and Frame Shrinkage, ssetzer/pub/setzer_fba_fbs_frames08.pdf, Ref: J. ECKSTEIN, Splitting Methods for Monotone Operators with Applications to Parallel Optimization, Ph. D. Thesis, Massachusetts Institute of Technology, Dept. of Civil Engineering,

29 (P) min u F P (u) F P (u) = J(Au) + H(u) (D) max p F D (p) F D (p) = J (p) H ( A T p) (PD) min u sup p L PD (u, p) L PD (u, p) = p, Au J (p) + H(u) (SP P ) max p inf u,w L P (u, w, p) L P (u, w, p) = J(w) + H(u) + p, Au w (SP D ) max u inf p,y L D (p, y, u) L D (p, y, u) = J (p) + H (y) + u, A T p y AMA on (SP P ) PFBS on (D) PFBS on (P) AMA on (SP D ) + 1 u 2α uk p 2δ pk 2 2 Relaxed AMA Relaxed AMA on (SP P ) on (SP D ) + δ Au 2 w α 2 AT p + y 2 2 ADMM on (SP P ) Douglas Rachford on (D) Primal-Dual Proximal Point on (PD) = PDHG Douglas Rachford on (P) ADMM on (SP D ) + 1 u 2 uk,( 1 α δat A)(u u k ) p k+1 u k + 1 p 2 pk,( 1 δ αaat )(p p k ) 2p k+1 p k 2u k u k 1 Split Inexact Uzawa on (SP P ) PDHGMp PDHGMu Split Inexact Uzawa on (SP D ) Legend: (P): Primal (D): Dual (PD): Primal-Dual (SP P ): Split Primal (SP D ): Split Dual AMA: Alternating Minimization Algorithm (4.2.1) PFBS: Proximal Forward Backward Splitting (4.2.1) ADMM: Alternating Direction Method of Multipliers (4.2.2) PDHG: Primal Dual Hybrid Gradient (4.2) PDHGM: Modified PDHG (4.2.3) Bold: Well Understood Convergence Properties 29

30 Decoupling Variables One can add additional proximal-like penalties to ADMM iterations and obtain a more explicit algorithm that still converges. Given a step of the ADMM algorithm of the form u k+1 = arg min u J(u) + δ 2 Ku fk 2 2, modify the objective functional by adding 1 2 ( ) 1 u u k, α δkt K (u u k ), where α is chosen such that 0 < α < 1 δ K T K. Modified update is given by u k+1 = arg min u J(u) + 1 2α u uk + αδk T (Ku k f k ) 2. Ref: X. ZHANG, M. BURGER, X. BRESSON, AND S. OSHER, Bregmanized Nonlocal Regularization for Deconvolution and Sparse Reconstruction, UCLA CAM Report [09-03]

31 Split Inexact Uzawa Method Consider adding 1 2 p pk, ( 1 δ k α k AA T )(p p k ) to the first step of ADMM on (SPD), with 0 < δ k < 1 α k A 2. Split Inexact Uzawa on (SPD) p k+1 = arg min p R n J (p) + 1 2δ k p p k δ k Au k + α k δ k A(A T p k + y k ) 2 2 y k+1 = arg min y R m H (y) u k, y + α k 2 y + AT p k u k+1 = u k + α k ( A T p k+1 y k+1 ) Note: In general we could similarly modify both minimization steps in ADMM, but by only modifying the first step we can obtain an interesting PDHG-like interpretation Ref: X. ZHANG, A Unified Primal-Dual Algorithm Based on l 1 and Bregman Iteration, (Private Communication), April

32 Convergence of SIU on (SPD) Convergence requires α k = α > 0 δ k = δ > 0 0 < δ < 1 α A 2 Then if p is optimal for (D) and y = A T p, A T p k + y k 2 0 J (p k ) J (p ) H (y k ) H (y ) All convergent subsequences of (p k, y k, u k ) converge to a saddle point of L D Ref: X. ZHANG, A Unified Primal-Dual Algorithm Based on l 1 and Bregman Iteration, (Private Communication), April

33 Equivalence to Modified PDHG (PDHGMu) From the multiplier update, y k = uk 1 α k 1 uk α k 1 A T p k. Substitute this into the first step of SIU on (SPD) and apply Moreau decomposition to the remaining steps to obtain PDHGMu: ( p k+1 = arg min J (p) + p, A (1 + α k )u k α ) k u k 1 p R n α k 1 α k 1 u k+1 = arg min u R m H(u) + AT p k+1, u + 1 2α k u u k δ k p p k 2 2 When α k = α, PDHGMu is just PDHG with u k replaced by 2u k u k 1 Ref: T. POCK, D. CREMERS, H. BISCHOF, AND A. CHAMBOLLE, An Algorithm for Minimizing the Mumford-Shah Functional, ICCV,

34 Analogous SIU on (SPP) and PDHGMp Split Inexact Uzawa applied to (SPP): u k+1 = arg min u R m H(u) + 1 2α k u u k + α k A T p k + δ k α k A T (Au k w k ) 2 2 w k+1 = arg min w R n J(w) pk, w + δ k 2 Auk+1 w 2 2 p k+1 = p k + δ k (Au k+1 w k+1 ) PDHGMp: u k+1 = arg min u R m H(u) + A T ( (1 + δ k δ k 1 )p k p k+1 = arg min p R n J (p) p, Au k δ k p p k 2 2 δ ) k p k 1, u + 1 u u k 2 2 δ k 1 2α k 34

35 Modified PDHG Pros and Cons Pros: As simple as PDHG Explicit: avoids solving linear systems in ADMM Convergence theory for SIU methods applies Requires no extra variables beyond original primal and dual variables Cons: Stability restriction may require small step size parameters Dynamic step size schemes not theoretically justified 35

36 (P) min u F P (u) F P (u) = J(Au) + H(u) (D) max p F D (p) F D (p) = J (p) H ( A T p) (PD) min u sup p L PD (u, p) L PD (u, p) = p, Au J (p) + H(u) (SP P ) max p inf u,w L P (u, w, p) L P (u, w, p) = J(w) + H(u) + p, Au w (SP D ) max u inf p,y L D (p, y, u) L D (p, y, u) = J (p) + H (y) + u, A T p y AMA on (SP P ) PFBS on (D) PFBS on (P) AMA on (SP D ) + 1 u 2α uk p 2δ pk 2 2 Relaxed AMA Relaxed AMA on (SP P ) on (SP D ) + δ Au 2 w α 2 AT p + y 2 2 ADMM on (SP P ) Douglas Rachford on (D) Primal-Dual Proximal Point on (PD) = PDHG Douglas Rachford on (P) ADMM on (SP D ) + 1 u 2 uk,( 1 α δat A)(u u k ) p k+1 u k + 1 p 2 pk,( 1 δ αaat )(p p k ) 2p k+1 p k 2u k u k 1 Split Inexact Uzawa on (SP P ) PDHGMp PDHGMu Split Inexact Uzawa on (SP D ) Legend: (P): Primal (D): Dual (PD): Primal-Dual (SP P ): Split Primal (SP D ): Split Dual AMA: Alternating Minimization Algorithm (4.2.1) PFBS: Proximal Forward Backward Splitting (4.2.1) ADMM: Alternating Direction Method of Multipliers (4.2.2) PDHG: Primal Dual Hybrid Gradient (4.2) PDHGM: Modified PDHG (4.2.3) Bold: Well Understood Convergence Properties 36

37 Convex Problems with Separable Structure Many seemingly more complicated problems can be written in the form (P). Example: N φ i (B i A i u + b i ) + H(u) i=1 = N J i (A i u) + H(u) i=1 = J(Au) + H(u), where A = A 1. and J i(z i ) = φ i (B i z i + b i ). Let p = p 1.. Then A N p N J (p) = N Ji (p i ) i=1 37

38 Decoupling Minimization Steps with PDHG Applying PDHGMp to min u N i=1 φ i(b i A i u + b i ) + H(u) yields: u k+1 = arg min u H(u) + 1 2α p k+1 i = arg min p i J i (p i ) + 1 2δ ( u u k α N i=1 pi ( p k i + δa i u k+1) 2 2 A T i (2p k i p k 1 i )) 2 2 i = 1,..., N Need 0 < αδ < 1 A 2 for stability 38

39 Convex Constraints as Indicator Functions Given a constraint of the form u S where S is convex, we can enforce the constraint by adding to the objective functional the indicator function for S, g S (u) = { 0 if u S otherwise In this way, even constrained minimization problems can be written in the form of (P). If PDHG or a related method is used to decouple the terms of the functional into separate minimization steps, the step involving g S can be interpreted as an orthogonal projection onto S. arg min u g S (u) + 1 2α u f 2 2 = Π S (f) 39

40 Effective Operator Splitting Modified PDHG and related algorithms are most effective for problems with separable structure where the decoupled minimization steps are easy to solve. Main examples are when H, J i OR H, J i are Indicator functions for convex sets that are easy to project onto l 2, l or l 1 -like norms l 2 norm squared Any of the above composed with linear operators 40

41 Soft Thresholding for 1 When J(u) = u 1, define soft thresholding operator S α (f) by S α (f) = arg minj(u) + 1 u 2α u f 2 2 { f α sign(f) if f > α = 0 otherwise Can use Moreau decomposition to reinterpret S α (f) in terms of a projection S α (f) = f α arg min p J (p) + α 2 = f απ {p: p 1}( f α ) = f Π {p: p α}(f) p f α 2 2 where Π(p) = p max( p,1) is orthogonal projection onto {p : p 1}. 41

42 Soft Thresholding for E When J(u) = u TV = Du E, problems arise of the form S α (z) = arg min w w E + 1 2α w z 2 2, which via the Moreau decomposition can be again written as a projection ( z S α (z) = z απ X α = z Π αx (z), ) where X = {p : p E 1} p Π X (p) = ( ) E max ET (p 2 ), 1 42

43 Consider TV denoising Not All Projections Are So Easy Moreau decomposition u = arg min u Du E + λ 2 u f 2 2 u = f 1 λ DT arg min p X 1 2λ DT p λf 2 2 Let z = D T p and define K = {z : z = D T p and p X}. Then u = f Π K λ but the projection is nontrivial to compute. (f), Ref: A. CHAMBOLLE, An Algorithm for Total Variation Minimization and Applications, J. Math. Imaging Vision, Vol. 20, 2004, pp

44 A Few More Easy Examples Using g S to denote the indicator function for the set S: J(u) = u 2 J (p) = g {p: p 2 1} J(u) = 1 2α u 2 2 J (p) = α 2 p 2 2 J(u) = u J (p) = g {p: p 1 1} J(u) = max(u) J (p) = g {p:p 0 and p 1 =1} Note: Although there s no simple formula for projecting a vector onto the l 1 unit ball (or its positive face) in R n, this can be computed with O(n log n) complexity. 44

45 Example: Constrained TV Deblurring can be rewritten as min u min Ku f 2 ɛ u TV Du E + g T (Ku), where g T is the indicator function for T = {z : z f 2 ɛ} In order to treat both D and K explicitly, let where A = H(u) = 0 and J(Au) = J 1 (Du) + J 2 (Ku), [ ] D. K Write the dual variable as p = [ p 1 p 2 ] and apply PDHGMp. 45

46 PDHGMp for Constrained TV Deblurring u k+1 = u k α k ( D T (2p k 1 p k 1 1 ) + K T (2p k 2 p k 1 2 ) ) p k+1 1 = Π X ( p k 1 + δ k Du k+1) p k+1 2 = p k 2 + δ k Ku k+1 δ k Π T ( p k 2 δ k + Ku k+1 where Π T is defined by ), Π T (z) = f + z f ( ). max z f 2 ɛ, 1 Note that λ in the unconstrained problem, min u u TV + λ 2 Ku f 2 2, is related to ɛ by λ = p 2 2ɛ. 46

47 Deblurring Numerical Result K convolution operator for normalized Gaussian blur with Std. dev. 3 h clean image f = Kh + η u u* PDHG PDHGMp η zero mean Gaussian noise Std. dev. 1 ɛ = iterations α =.2, δ =.55 Original, blurry/noisy and image recovered from 300 PDHGMp iterations 47

48 Compressive Sensing Example where K = RΓΨ T min u u 1 s.t. Ku f 2 ɛ, R is row selector (selects 10% low frequency DCT measurements) Γ is DCT transform Ψ is orthogonal Haar wavelet transform Similar to constrained deblurring example, except here KK T = I and K = K T, so it is reasonable to satisfy the constraint at each iteration. 48

49 Two PDHGMu Formulations 1. J(u) = u 1, A = I, H(u) = g S (u), S = {u : Ku f 2 ɛ} Π S (u) = (I K T K)u + K T f + (stays on constraint set) Ku f ( ) max Ku f 2 ɛ, J(u) = g T (Ku), H(u) = u 1, T = {z : z f 2 ɛ} Π T (z) = f + z f ( ) max z f 2 ɛ, 1 (simpler projection, but doesn t stay on constraint set) 49

50 Numerical Results for Constrained l 1 Original, damaged and benchmark recovered image PDHGMu PDHGRMu Ku f 2 ε 10 4 u 1 u* iterations Constraint versus iterations for PDHGMu (α = 1, δ = 1) iterations l 1 Norm Comparison (α = 1, δ = 1) 50

A GENERAL FRAMEWORK FOR A CLASS OF FIRST ORDER PRIMAL-DUAL ALGORITHMS FOR TV MINIMIZATION

A GENERAL FRAMEWORK FOR A CLASS OF FIRST ORDER PRIMAL-DUAL ALGORITHMS FOR TV MINIMIZATION ERNIE ESSER XIAOQUN ZHANG TONY CHAN Abstract. We generalize the primal-dual hybrid gradient (PDHG) algorithm proposed