A primal dual fixed point algorithm for convex separable minimization with applications to image restoration

Size: px
Start display at page:

Download "A primal dual fixed point algorithm for convex separable minimization with applications to image restoration"

Transcription

1 IOP PUBLISHING Inverse Problems 29 (2013) (33pp) INVERSE PROBLEMS doi: / /29/2/ A primal dual fixed point algorithm for convex separable minimization with applications to image restoration Peijun Chen 1,2, Jianguo Huang 1,3 and Xiaoqun Zhang 1,4 1 Department of Mathematics, and MOE-LSC, Shanghai Jiao Tong University, Shanghai , People s Republic of China 2 Department of Mathematics, Taiyuan University of Science and Technology, Taiyuan , People s Republic of China 3 Division of Computational Science, E-Institute of Shanghai Universities, Shanghai Normal University, People s Republic of China 4 Institute of Natural Sciences, Shanghai Jiao Tong University, Shanghai , People s Republic of China chenpeijun@sjtu.edu.cn, jghuang@sjtu.edu.cn and xqzhang@sjtu.edu.cn Received 11 August 2012, in final form 25 November 2012 Published 17 January 2013 Online at stacks.iop.org/ip/29/ Abstract Recently, the minimization of a sum of two convex functions has received considerable interest in a variational image restoration model. In this paper, we propose a general algorithmic framework for solving a separable convex minimization problem from the point of view of fixed point algorithms based on proximity operators (Moreau 1962 C. R. Acad. Sci., Paris I ). Motivated by proximal forward backward splitting proposed in Combettes and Wajs (2005 Multiscale Model. Simul ) and fixed point algorithms based on the proximity operator (FP 2 O) for image denoising (Micchelli et al 2011 Inverse Problems ), we design a primal dual fixed point algorithm based on the proximity operator (PDFP 2 O κ for κ [0, 1)) and obtain a scheme with a closed-form solution for each iteration. Using the firmly nonexpansive properties of the proximity operator and with the help of a special norm over a product space, we achieve the convergence of the proposed PDFP 2 O κ algorithm. Moreover, under some stronger assumptions, we can prove the global linear convergence of the proposed algorithm. We also give the connection of the proposed algorithm with other existing firstorder methods. Finally, we illustrate the efficiency of PDFP 2 O κ through some numerical examples on image supper-resolution, computerized tomographic reconstruction and parallel magnetic resonance imaging. Generally speaking, our method PDFP 2 O(κ = 0) is comparable with other state-of-the-art methods in numerical performance, while it has some advantages on parameter selection in real applications. (Some figures may appear in colour only in the online journal) /13/ $ IOP Publishing Ltd Printed in the UK & the USA 1

2 1. Introduction This paper is devoted to designing and discussing an efficient algorithmic framework for minimizing the sum of two proper lower semi-continuous convex functions, i.e. x = arg min ( f 1 B)(x) + f 2 (x), (1.1) x R n where f 1 Ɣ 0 (R m ), f 2 Ɣ 0 (R n ) and f 2 is differentiable on R n witha1/β-lipschitz continuous gradient for some β (0, + ) and B : R n R m a linear transform. This parameter β is related to the convergence conditions of algorithms 3 5 presented in the following section. Here and in what follows, for a real Hilbert space X, Ɣ 0 (X ) denotes the collection of all proper lower semi-continuous convex functions from X to (, + ]. Despite its simplicity, many problems in image processing can be formulated in the form of (1.1). For instance, the following variational sparse recovery models are often considered in image restoration and medical image reconstruction: x = arg min μ Bx Ax b 2 2, (1.2) x R n where 2 denotes the usual Euclidean norm for a vector, A is a p n matrix representing a linear transform, b R p and μ>0isthe regularization parameter. The term Bx 1 is the usual l 1 -based regularization in order to promote sparsity under the transform B. For example, for the well-known Rudin Osher Fatemi (ROF) model [30] Bx 1 represents the total-variation semi-norm which aims to recover piecewise constant images, with B being a 2n n discrete differential matrix (cf [16, 24]). More precisely, Bx 1 and Bx 1,2 are for anisotropic total variation and isotropic total variation, respectively, and here we simply write them as Bx 1. Problem (1.2) can be expressed in the form of (1.1) by setting f 1 = μ 1 and f 2 (x) = 1 2 Ax b 2 2. One of the main difficulties in solving it is that f 1 is non-differentiable. The case often occurs in many problems we are interested in. Another general problem often considered in the literature takes the following form: x = arg min f (x) + h(x), (1.3) x X where f, h Ɣ 0 (X ) and h is differentiable on X witha1/β-lipschitz continuous gradient for some β (0, + ). Problem (1.1), which we are interested in this paper, can be viewed as a special case of problem (1.3)forX = R n and f = f 1 B, h = f 2. On the other hand, we can also consider that problem (1.3) is a special case of problem (1.1)forX = R n, f 2 = h, f 1 = f and B = I, where I denotes the usual identity operator. For problem (1.3), Combettes and Wajs proposed in [12] a proximal forward backward splitting (PFBS) algorithm, i.e. x k+1 = prox γ f (x k γ h(x k )), (1.4) where 0 <γ <2β is a stepsize parameter, and the operator prox f is defined by prox f : X X x arg min f (y) x y 2 2, y X called the proximity operator of f. Note that this type of splitting method was originally studied in [23, 28] for solving partial differential equations, and the notion of proximity operators was first introduced by Moreau in [25] as a generalization of projection operators. The iteration (1.4) consists of two sequential steps. The first performs a forward (explicit) step involving the evaluation of the gradient of h; then the other performs a backward (implicit) step involving f. This numerical scheme is very simple and efficient when the proximity operator used in the second step can be carried out efficiently. For example, when f = 1 for sparse 2

3 regularization, the proximity operator prox γ f (x) can be written as the famous componentwise soft-thresholding (also known as a shrinkage) operation. However, the proximity operators for the general form f = f 1 B as in (1.1) do not have an explicit expression, leading to the numerical solution of a difficult subproblem. In fact, the subproblem of (1.2)isprox μ 1 B (b) and often formulated as the ROF denoising problem: x = arg min μ Bx x b 2 2, (1.5) x R n where b R n denotes a corrupted image to be denoised. In recent years, many splitting methods have been designed to solve the last subproblem in order to take advantage of the efficiency of the soft-thresholding operator. For example, Goldstein and Osher proposed in [18] a splitting algorithm based on the Bregman iteration, namely the split Bregman, to implement the action of prox f1 B, in particular for total variation minimization. This algorithm has shown to be very efficient and useful for a large class of convex separable programming problems. Theoretically, it is shown to be equivalent to the Douglas Rachford splitting algorithm (see [31, 14]) and alternating direction of multiplier method (ADMM, see [15, 7]), and the convergence was then analyzed based on such equivalence. The split Bregman proposed in [18] is also designed to solve the convex separable problem (1.1). In particular, for the variational model (1.2), the subproblem involves solving a quadratic minimization, which sometimes can be time consuming. To overcome this, a primal dual inexact split Uzawa method was proposed in [35] to maximally decouple the subproblems so that each iteration step is precise and explicit. In [16, 9], more theoretical analysis on the variants of the primal dual-type method and the connection with existing methods were examined to bridge the gap between different types of methods. Also, the convergence of ADMM was further analyzed in [19] based on proximal point algorithm (PPA) formulation. In this paper, we will follow a different point of view. In [24], Micchelli Shen Xu designed an algorithm called FP 2 O to solve prox f1 B (x). We aim to extend FP2 Otosolvethe general problem (1.1) with a maximally decoupled iteration scheme. One obvious advantage of the proposed scheme is that it is very easy for parallel implementation. Then, we will show that the proposed algorithm is convergent in a general setting. Under some assumptions of the convex function f 2 and the linear transform B, we can further prove the linear convergence rate of the method under the framework of fixed point iteration. Note that most of the existing works based on ADMM have shown a sub-linear convergence rate O(1/k) on the objective function and O(1/k 2 ) on the accelerated version, where k is the iteration number. Recently, in [19], the ergodic and non-ergodic convergence on the difference of two sequential primal dual sequences were analyzed. In this paper, we will prove the convergence rate of the iterations directly from the point of view of fixed point theory under some common assumptions. We note that, during the preparation of this paper, Deng and Yin [13] also considered the global linear convergence of the ADMM and its variants based on similar assumptions. In addition, we will reformulate our fixed point type of methods and show their connections with some existing first-order methods for (1.1) and (1.2). The rest of the paper is organized as follows. In section 2, we recall the fixed point algorithm FP 2 O and some related works and then deduce the proposed PDFP 2 O algorithm and its extension PDFP 2 O κ from our intuitions. In section 3, we first deduce PDFP 2 O κ again in the setting of fixed point iteration; we then establish its convergence under a general setting and the convergence rate under some stronger assumptions on f 2 and B. In section 4, we give the equivalent form of PDFP 2 O, and the relationships and differences with other first-order algorithms. In section 5, we show the numerical performance and efficiency of PDFP 2 O κ through some examples on image super-resolution, tomographic reconstruction and 3

4 parallel magnetic resonance imaging (pmri). In the final section, we give a discussion and some perspectives on our method and other state-of-the-art methods frequently used in image restoration. 2. Fixed point algorithms Similar to the fixed point algorithm on the dual for the ROF denoising model (1.5) proposed by Chambolle [8], Micchelli et al proposed an algorithm called FP 2 Oin[24] tosolvethe proximity operator prox f1 B (b) for b Rn, especially for the total-variation-based image denoising. Let λ max (BB T ) be the largest eigenvalue of BB T.For0<λ<2/λ max (BB T ),we define the operator H(v) = (I prox f 1λ )(Bb + (I λbb T )v) for all v R m ; (2.1) then the FP 2 O algorithm is described as algorithm 1, where H κ is the κ-averaged operator of H, i.e. H κ = κi + (1 κ)h for κ (0, 1); see definition 3.3 in the following section. Algorithm 1 Fixed point algorithm based on proximity operator, FP 2 O[24]. Step 1: set v 0 R m,0<λ<2/λ max (BB T ), κ (0, 1). Step 2: calculate v, which is the fixed point of H, with iteration v k+1 = H κ (v k ). Step 3: prox f1 B(b) = b λb T v. The key technique to obtain the FP 2 O scheme relies on the relation of the subdifferential of a convex function and its proximity operator, as described in the result (3.1). An advantage of FP 2 O is that its iteration does not require solving the subproblem and the convergence is analyzed in the classical framework of the fixed point iteration. This algorithm has been extended in [2, 10]tosolve x = arg min ( f 1 B)(x) + 1 x R n 2 xt Qx b T x, where Q M n, with M n being the collection of all symmetric positive definite n n matrices, b R n. Define H(v) = (I prox f 1λ )(BQ 1 b + (I λbq 1 B T )v) for all v R m. Then, the corresponding algorithm is given below, called algorithm 2, which can be viewed as a fixed point algorithm based on the inverse matrix and proximity operator or FP 2 O based on the inverse matrix (IFP 2 O). Here the matrix Q is assumed to be invertible and the inverse can be easily calculated, which is unfortunately not the case in most of the applications in imaging science. Moreover, there is no theoretical guarantee of convergence if the linear system is only solved approximately. Algorithm 2 FP 2 O based on inverse matrix, IFP 2 O[2]. Step 1: set v 0 R m and 0 <λ<2/λ max (BQ 1 B T ), κ (0, 1). Step 2: calculate ṽ, which is the fixed point of H, with iteration ṽ k+1 = H κ (ṽ k ). Step 3: x = Q 1 (b λb T ṽ ). Further, the authors in [2] combined PFBS and FP 2 O for solving problem (1.1), for which we call PFBS_FP 2 O (cf algorithm 3 below). Precisely speaking, at step k in PFBS, after one forward iteration x k+1/2 = x k γ f 2 (x k ), we need to solve for x k+1 = prox γ f1 B (x k+1/2). 4

5 FP 2 O is then used to solve this subproblem, i.e. the fixed point v k+1 of H x k+1/2 is obtained by the fixed iteration form v i+1 = (H xk+1/2 ) κ (v i ), where H xk+1/2 (v) = (I prox γ λ f )(Bx 1 k+1/2 + (I λbb T )v) for all v R m. Then x k+1 is given by setting x k+1 = x k+1/2 λb T v k+1. The acceleration combining with the Nesterov method [17, 26, 32, 33] was also considered in [2]. We note that algorithm 3 involves inner and outer iterations, and it is often problematic to set the appropriate inner stopping conditions to balance computational time and precision. In our algorithm developed later on, instead of using many number of inner fixed point iterations for solving prox γ f1 B (x), we use only one inner fixed point iteration. Algorithm 3 Proximal forward backward splitting based on FP 2 O, PFBS_FP 2 O[2]. Step 1: set x 0 R n,0<γ <2β. Step 2: for k = 0, 1, 2,... x k+1/2 = x k γ f 2 (x k ), calculate the fixed point v k+1 of H x k+1/2 with iteration v i+1 = (H xk+1/2 ) κ (v i ), x k+1 = x k+1/2 λb T v k+1. end for Suppose κ = 0inFP 2 O. A very natural idea is to take the numerical solution v k of the fixed point of H x(k 1)+1/2 as the initial value, and only perform one iteration for solving the fixed point of H xk+1/2 ; then we can obtain the following iteration scheme: { vk+1 = (I prox (PDFP 2 γ λ O) f )(B(x 1 k γ f 2 (x k )) + (I λbb T )v k ), (2.2a) x k+1 = x k γ f 2 (x k ) λb T v k+1, (2.2b) which produces our proposed method algorithm 4, described below. This algorithm can also be deduced from the fixed point formulation, whose detail we will give in the following section. On the other hand, since x is the primal variable related to (1.1), it is very natural to ask what role the variable v plays in our algorithm. After a thorough study, we find out as given in section 4.1 that v is actually the dual variable of the primal dual form related to (1.1). Based on these observations, we call our method a primal dual fixed point algorithm based on the proximity operator, and abbreviate it as PDFP 2 O, inheriting the notion of FP 2 O in [24]. If B = I, λ = 1, then form (2.2) is equivalent to form (1.4). So PFPS can be seen as a special case of PDFP 2 O. Also, when f 2 (x) = 1 2 x b 2 2 and γ = 1, then PDFP2 O reduces to FP 2 O for solving prox f1 B (b) with κ = 0. For general B and f 2, each step of the proposed algorithm is explicit when prox γ λ f is easy to compute. Note that the technique of approximating the 1 subproblem by only one iteration is also proposed in a primal dual inexact Uzawa framework in [35]. We will show the connection to this algorithm and other ones in section 4. Algorithm 4 Primal dual fixed point algorithm based on proximity operator, PDFP 2 O. Step 1: set x 0 R n, v 0 R m,0<λ 1/λ max (BB T ),0<γ <2β. Step 2: for k = 0, 1, 2,... x k+ 1 = x k γ f 2 (x k ), 2 v k+1 = (I prox γ λ f )(Bx 1 k+ 1 + (I λbb T )v k ), 2 x k+1 = x k+ 1 λb T v k+1. 2 end for Borrowing the fixed point formulation of PDFP 2 O, we can introduce a relaxation parameter κ [0, 1) to obtain algorithm 5, which is exactly a Picard method with parameters. 5

6 The rule for parameter selection will be illustrated in section 3. Ifκ = 0, then PDFP 2 O κ reduces to PDFP 2 O. Our theoretical analysis for PDFP 2 O κ given in the following section is mainly based on this fixed point setting. Algorithm 5 PDFP 2 O κ. Step 1: set x 0 R n, v 0 R m,0<λ 1/λ max (BB T ),0<γ <2β, κ [0, 1). Step 2: for k = 0, 1, 2,... x k+ 1 = x k γ f 2 (x k ), 2 ṽ k+1 = (I prox γ λ f )(Bx 1 k+ 1 + (I λbb T )v k ), 2 x k+1 = x k+ 1 λb T ṽ k+1, 2 v k+1 = κv k + (1 κ)ṽ k+1, x k+1 = κx k + (1 κ) x k+1. end for 3. Convergence analysis 3.1. General convergence First of all, let us mention some related definitions and lemmas for later requirements. From now on, we use X to denote a finite-dimensional real Hilbert space. Moreover, we always assume that problem (1.1) has at least one solution. As shown in [12], if the objective function ( f 1 B)(x) + f 2 (x) is coercive, i.e. lim (( f 1 B)(x) + f 2 (x)) =+, x 2 + then the existence of solution can be ensured for (1.1). Definition 3.1 (Subdifferential [30]). Let f be a function in Ɣ 0 (X ). The subdifferential of f is the set-valued operator f : X 2 X, the value of which at x X is f (x) ={v X v, y x + f (x) f (y) for all y X }, where, denotes the inner-product over X. Definition 3.2 (Nonexpansive operators and firmly nonexpansive operators [30]). An operator T : X X is nonexpansive if and only if it satisfies Tx Ty 2 x y 2 for all (x, y) X 2. T is firmly nonexpansive if and only if it satisfies one of the following equivalent conditions: (i) Tx Ty 2 2 Tx Ty, x y for all (x, y) X 2. (ii) Tx Ty 2 2 x y 2 2 (I T )x (I T )y 2 2 for all (x, y) X 2. It is easy to show from the above definitions that a firmly nonexpansive operator T is nonexpansive. Definition 3.3 (Picard sequence, κ-averaged operator [27]). Let T : X X be an operator. For a given initial point u 0 X, the Picard sequence of the operator T is defined by u k+1 = T (u k ),fork N. For a real number κ (0, 1), theκ-averaged operator T κ of T is defined by T κ = κi + (1 κ)t. We also write T 0 = T. 6

7 Lemma 3.1. Suppose f Ɣ 0 (R m ) and x R m. Then there holds y f (x) x = prox f (x + y). (3.1) Furthermore, if f has 1/β-Lipschitz continuous gradient, then f (x) f (y), x y β f (x) f (y) 2 for all (x, y) R m. (3.2) Proof. The first result is nothing but proposition 2.6 of [24]. If f has 1/β-Lipschitz continuous gradient, we have from [12] that β f is firmly nonexpansive, which implies (3.2) readily. Lemma 3.2 (Lemma 2.4 of [12]). Let f be a function in Ɣ 0 (R m ). Then prox f and I prox f are both firmly nonexpansive operators. Lemma 3.3 (Opial κ-averaged theorem, theorem 3 of [27]). If S is a closed and convex set in X and T : S S is a nonexpansive mapping having at least one fixed point, then for κ (0, 1), T κ is nonexpansive, maps S to itself and has the same set of fixed points as T. Furthermore, for any u 0 S and κ (0, 1), the Picard sequence of T κ converges to a fixed point of T. Now, we are ready to obtain a fixed point formulation for the solution of problem (1.1) and discuss the convergence of PDFP 2 O κ. To this end, for any two positive numbers λ and γ, define T 1 : R m R n R m as T 1 (v, x) = (I prox γ λ f )(B(x γ f 1 2(x)) + (I λbb T )v) (3.3) and T 2 : R m R n R n as T 2 (v, x) = x γ f 2 (x) λb T T 1. (3.4) Denote T : R m R n R m R n as T (v, x) = (T 1 (v, x), T 2 (v, x)). (3.5) Theorem 3.1. Let λ and γ be two positive numbers. Suppose that x is a solution of (1.1). Then there exists v R m such that { v = T 1 (v, x ), x = T 2 (v, x ). In other words, u = (v, x ) is a fixed point of T. Conversely, if u R m R n is a fixed point of T, with u = (v, x ), v R m,x R n, then x is a solution of (1.1). Proof. By the first-order optimality condition of problem (1.1), we have x = arg min ( f 1 B)(x) + f 2 (x) x R n 0 f 2 (x ) (f 1 B)(x ) 0 γ f 2 (x ) γ (f 1 B)(x ) ( x x γ f 2 (x ) λ B T γ ) λ f 1 B (x ). Let Then v ( γ ( γ ) λ f 1 B) (x ) = λ f 1 (Bx ). (3.6) x = x γ f 2 (x ) λb T v. (3.7) 7

8 Moreover, it follows from result (3.1) that (3.6) is equivalent to Bx = prox γ λ f (Bx + v ) 1 (Bx + v ) v = prox γ λ f (Bx + v ) 1 v = (I prox γ λ f )(Bx + v ). (3.8) 1 Inserting (3.7)into(3.8)gives v = ( ) I prox γ λ f 1 (B(x γ f 2 (x )) + (I λbb T )v ). This shows v = T 1 (v, x ). Next, replacing v in (3.7) byt 1 (v, x ), we readily have x = T 2 (v, x ). Therefore, for u = (v, x ), u = T (u ). On the other hand, if u = T (u ), then we can derive that x satisfies the first-order optimality condition of (1.1). Therefore, we conclude that x is a minimizer of (1.1). In the following, we will show that the algorithm PDFP 2 O κ is a Picard method related to the operator T κ. Theorem 3.2. Suppose κ [0, 1). Set T κ = κi + (1 κ)t. Then the Picard sequence {u k } of T κ is exactly the one obtained by the algorithm PDFP 2 O κ. Proof. According to the definitions in (3.3) (3.5), the component form of u k+1 = T (u k ) can be expressed as { vk+1 = T 1 (v k, x k ) = (I prox γ λ f )(B(x 1 k γ f 2 (x k )) + (I λbb T )v k ) x k+1 = T 2 (v k, x k ) = x k γ f 2 (x k ) λb T T 1 (v k, x k ) = x k γ f 2 (x k ) λb T v k+1. Therefore, the iteration u k+1 = T (u k ) is equivalent to (2.2). Employing the similar argument, we can obtain the conclusion for general T κ with κ [0, 1). Remark 3.1. From the last result, we find out that algorithm PDFP 2 O κ can also be obtained in the setting of fixed point iteration immediately. For the convergence analysis for PDFP 2 O κ, we will first prove a key inequality for general cases (cf equation (3.13)). Denote g(x) = x γ f 2 (x) for all x R n, (3.9) M = I λbb T. (3.10) When 0 <λ 1/λ max (BB T ), M is a symmetric positive semi-definite matrix, so we can define the semi-norm v M = v, Mv for all v R m. (3.11) For an element u = (v, x) R m R n, with v R m and x R n,let u λ = x λ v 2 2. (3.12) We can easily see that λ is a norm over the produce space R m R n whenever λ>0. Theorem 3.3. For any two elements u 1 = (v 1, x 1 ),u 2 = (v 2, x 2 ) in R m R n, there holds T (u 1 ) T (u 2 ) 2 λ u 1 u 2 2 λ γ(2β γ) f 2(x 1 ) f 2 (x 2 ) 2 2 λb T (v 1 v 2 ) 2 2 λ (T 1(u 1 ) T 1 (u 2 )) (v 1 v 2 ) 2 M. (3.13) 8

9 Proof. By lemma 3.2, I prox γ λ f is a firmly nonexpansive operator. This together 1 with (3.3), (3.9) and (3.10) yields T 1 (u 1 ) T 1 (u 2 ) 2 2 T 1(u 1 ) T 1 (u 2 ), B(g(x 1 ) g(x 2 )) + M(v 1 v 2 ) = T 1 (u 1 ) T 1 (u 2 ), B(g(x 1 ) g(x 2 )) + T 1 (u 1 ) T 1 (u 2 ), M(v 1 v 2 ). (3.14) It follows from (3.4), (3.9), (3.10) and (3.11) that T 2 (u 1 ) T 2 (u 2 ) 2 2 = (g(x 1) g(x 2 )) λb T (T 1 (u 1 ) T 1 (u 2 )) 2 2 = g(x 1 ) g(x 2 ) 2 2 2λ BT (T 1 (u 1 ) T 1 (u 2 )), g(x 1 ) g(x 2 ) + λb T (T 1 (u 1 ) T 1 (u 2 )) 2 2 = g(x 1 ) g(x 2 ) 2 2 2λ T 1(u 1 ) T 1 (u 2 ), B(g(x 1 ) g(x 2 )) λ T 1 (u 1 ) T 1 (u 2 ) 2 M + λ T 1(u 1 ) T 1 (u 2 ) 2 2. (3.15) Observing the definitions in (3.5) and (3.9) (3.12), we have by (3.14) (3.15) T (u 1 ) T (u 2 ) 2 λ = T 2(u 1 ) T 2 (u 2 ) λ T 1(u 1 ) T 1 (u 2 ) 2 2 = g(x 1 ) g(x 2 ) 2 2 2λ T 1(u 1 ) T 1 (u 2 ), B(g(x 1 ) g(x 2 )) λ T 1 (u 1 ) T 1 (u 2 ) 2 M + 2λ T 1(u 1 ) T 1 (u 2 ) 2 2 g(x 1 ) g(x 2 ) 2 2 λ T 1(u 1 ) T 1 (u 2 ) 2 M + 2λ T 1(u 1 ) T 1 (u 2 ), M(v 1 v 2 ) = g(x 1 ) g(x 2 ) λ v 1 v 2 2 M λ (T 1(u 1 ) T 1 (u 2 )) (v 1 v 2 ) 2 M. (3.16) Using the definition in (3.9) and estimate (3.2), we know g(x 1 ) g(x 2 ) 2 2 = x 1 x γ f 2(x 1 ) f 2 (x 2 ), x 1 x 2 +γ 2 f 2 (x 1 ) f 2 (x 2 ) 2 2 x 1 x γ(2β γ) f 2(x 1 ) f 2 (x 2 ) 2 2. (3.17) By the definitions in (3.10) and (3.11), λ v 1 v 2 2 M = λ v 1 v λbt (v 1 v 2 ) 2 2. (3.18) Recalling the definition in (3.12), we easily know that (3.13) is a direct consequence of (3.16) (3.18). From theorem 3.3, we can derive the following result. Corollary 3.1. If 0 <γ <2β, 0 <λ 1/λ max (BB T ), then T is nonexpansive under the norm λ. Since T is nonexpansive, we are able to show the convergence of PDFP 2 O κ for κ (0, 1), in view of lemma 3.3. Theorem 3.4. Suppose 0 <γ <2β, 0 <λ 1/λ max (BB T ) and κ (0, 1). Let u k = (v k, x k ) be a sequence generated by PDFP 2 O κ. Then {u k } converges to a fixed point of T and {x k } converges to a solution of problem (1.1). Proof. In view of theorem 3.2, we know u k+1 = T κ (u k ),so{u k } is the Picard sequence of T κ. By assumption, problem (1.1) has a solution, and hence operator T has a fixed point from theorem 3.1. According to corollary 3.1, T is nonexpansive. Therefore, by letting S = R m,we find from lemma 3.3 that {u k } converges to a fixed point of T for κ (0, 1). With this result in mind, {x k } converges to a solution of problem (1.1) from theorem 3.1. Now, let us proceed with the convergence analysis of PDFP 2 O using some novel technique. 9

10 Theorem 3.5. Suppose 0 <γ <2β and 0 <λ 1/λ max (BB T ). Let u k = (v k, x k ) be the sequence generated by PDFP 2 O. Then the sequence {u k } converges to a fixed point of T, and the sequence {x k } converges to a solution of problem (1.1). Proof. Let u = (v, x ) R m R n be a fixed point of T. Using theorem 3.3,wehave u k+1 u 2 λ u k u 2 λ γ(2β γ) f 2(x k ) f 2 (x ) 2 2 λb T (v k v ) 2 2 λ v k+1 v k 2 M. (3.19) Summing (3.19) over k from 0 to + gives + k=0 { γ(2β γ) f2 (x k ) f 2 (x ) λbt (v k v ) λ v k+1 v k 2 } M u0 u 2 λ. So { lim γ(2β γ) f2 (x k ) f 2 (x ) 2 2 k + + λbt (v k v ) λ v k+1 v k 2 } M = 0, which together with 0 <γ <2β implies lim f 2(x k ) f 2 (x ) 2 = 0, (3.20) k + lim k + BT (v k v ) 2 = 0, (3.21) lim v k+1 v k M = 0. (3.22) k + By the definitions in (3.10) and (3.11), v k+1 v k 2 2 = v k+1 v k 2 M + λ BT (v k+1 v k ) 2 2, which when combined with (3.21) and (3.22)gives lim v k+1 v k 2 = 0. (3.23) k + On the other hand, from (3.7) wehave γ f 2 (x ) λb T v = 0, and from (2.2b) x k+1 x k = γ f 2 (x k ) λb T v k+1. Hence, x k+1 x k = γ( f 2 (x k ) f 2 (x )) λ(b T v k+1 B T v ). Now, using (3.20) and (3.21) we immediately obtain lim x k+1 x k 2 = 0. (3.24) k + By the definition in (3.12) and (3.23) (3.24), lim u k+1 u k λ = 0. (3.25) k + From (3.19), we know that the sequence { u k u λ } is non-increasing, so the sequence {u k } is bounded and there exists a convergent subsequence {u k j } such that lim u k j u λ = 0 (3.26) j + for some u R m R n. Next, let us show that u is a fixed point of T. In fact, T (u k j ) u λ = (u k j +1 u k j ) (u k j u ) λ u k j +1 u k j λ + u k j u λ 10

11 which, in conjunction with (3.25) and (3.26), leads to lim T (u k j ) u λ = 0. (3.27) j + The operator T is continuous since it is nonexpansive, so it follows from (3.26) and (3.27) that u is a fixed point of T. Moreover, we know that { u k u λ } is non-increasing for any fixed point u of T. In particular, by choosing u = u, we see that { u k u λ } is non-increasing. Combining this and (3.26) yields lim u k = u. k + Writing u = (v, x ) with v R m, x R n, we find from theorem 3.1 that x is the solution of problem (1.1). Note that if f 2 (x) = 1 2 x b 2 2 and γ = 1, then PDFP2 O reduces to FP 2 Oforκ = 0. As a consequence of the above theorem, we can achieve the convergence of FP 2 Oforκ = 0even when BB T is singular, for which no convergence is available from theorem 3.12 of[24]. Corollary 3.2. Suppose 0 < λ 1/λ max (BB T ). Let {v k } be the sequence generated by FP 2 Oforκ = 0. Set x k = b λb T v k. Then the sequence {v k } converges to the fixed point of H(see (2.1)), the sequence {x k } converges to the solution of problem (1.1) with f 2 (x) = 1 2 x b Linear convergence rate for special cases In this section, we will give some stronger theoretical results about the convergence rate in some special cases. For this, we present the following condition. Condition 3.1. For any two real numbers λ and γ satisfying that 0 < γ < 2β and 0 <λ 1/λ max (BB T ), there exist η 1, η 2 [0, 1) such that I λbb T 2 η1 2 and g(x) g(y) 2 η 2 x y 2 for all x, y R n, where g(x) is given in (3.9). Remark 3.2. If B has full row rank, f 2 is strongly convex, i.e. there exists some σ>0such that f 2 (x) f 2 (y), x y σ x y 2 2 for all x, y R n, (3.28) then this condition can be satisfied. In fact, when B has a full row rank, we can choose η1 2 = 1 λλ min(bb T ), where λ min (BB T ) denotes the smallest eigenvalue of BB T. In this case, η1 2 takes its minimum ( η 2 1 )min = 1 λ min(bb T ) λ max (BB T ) at λ = 1/λ max (BB T ). On the other hand, since f 2 has 1/β-Lipschitz continuous gradient and is strongly convex, it follows from (3.2) and (3.28) that g(x) g(y) 2 2 = x y 2 2 2γ f 2(x) f 2 (y), x y +γ 2 f 2 (x) f 2 (y) 2 2 x y 2 γ(2β γ) 2 f 2 (x) f 2 (y), x y β ( ) σγ(2β γ) 1 x y 2 2 β. 11

12 Hence we can choose η2 2 σγ(2β γ) = 1. β In particular, if we choose γ = β, then η 2 takes its minimum in the present form: (η 2 2 ) min = 1 σβ. As a typical example, consider f 2 (x) = 1 2 Ax b 2 2 with AT A full rank. Then we can find that β = 1/λ max (A T A) and σ = λ min (A T A), and hence ( η 2 2 )min = 1 λ min(a T A) λ max (A T A). Despite most of our interesting problems not belonging to these special cases, and there will be more efficient algorithms if condition 3.1 is satisfied, the following results still have some theoretical values where the best performance of PDFP 2 O κ can be achieved. First of all, we show that T κ is contractive under condition 3.1. Theorem 3.6. Suppose condition 3.1 holds true. Let the operator T be given in (3.5) and T κ = κi + (1 κ)tforκ [0, 1). Then T κ is contractive under the norm λ. Proof. Let η = max{η 1,η 2 }. It is clear that 0 η 1. Then, owing to the condition 3.1, for all u 1 = (v 1, x 1 ), u 2 = (v 2, x 2 ) R m R n, there holds g(x 1 ) g(x 2 ) 2 η x 1 x 2 2, v 1 v 2 M η v 1 v 2 2, from which, (3.12) and (3.16) it follows that T (u 1 ) T (u 2 ) 2 λ g(x 1) g(x 2 ) 2 2 +λ v 1 v 2 2 M λ (T 1(u 1 ) T 1 (u 2 )) (v 1 v 2 ) 2 M η 2 ( x 1 x λ v 1 v ) = η 2 u 1 u 2 2 λ. On the other hand, it is easy to check from the last estimate and the triangle inequality that T κ (u 1 ) T κ (u 2 ) λ κ u 1 u 2 λ + (1 κ) T (u 1 ) T (u 2 ) λ θ u 1 u 2 λ, with θ = κ + (1 κ)η [0, 1). So, operator T κ is contractive. Now, we are ready to analyze the convergence rate of PDFP 2 O κ. Theorem 3.7. Suppose condition 3.1 holds true. Let the operator T be given in (3.5) and T κ = κi + (1 κ)tforκ [0, 1). Let u k = (v k, x k ) be a Picard sequence of the operator T κ (or equivalently, a sequence obtained by algorithm PDFP 2 O κ ). Then the sequence {u k } must converge to the unique fixed point u = (v, x ) R m R n of T, with x being the unique solution of problem (1.1). Furthermore, there holds the estimate x k x 2 cθ k 1 θ, (3.29) where c = u 1 u 0 λ, θ = κ + (1 κ)η [0, 1) and η = max{η 1,η 2 }, with η 1 and η 2 given in condition

13 Proof. Since the operator T κ is contractive, by the Banach contraction mapping theorem, it has a unique fixed point, denoted by u = (v, x ). It is obvious that T κ has the same fixed points as T,sox is the unique solution of problem (1.1) from theorem 3.1. Moreover, it is routine that the sequence {u k } converges to u. On the other hand, it follows from theorem 3.6 that So for all 0 < l N, u k+1 u k λ θ u k u k 1 λ θ k u 1 u 0 λ = cθ k. u k+l u k λ which immediately implies l u k+i u k+i 1 λ = cθ k i=1 x k x 2 u k u λ cθ k 1 θ by letting l +. The desired estimate (3.29) is then obtained. l θ i 1 cθ k 1 θ, i=1 If B = I, λ = 1, then form (2.2) is equivalent to form (1.4), so as a special case of theorem 3.7, we can obtain the convergence rate for PFBS. Corollary 3.3. Suppose 0 <γ <2β and there exists η [0, 1) such that g(x) g(y) 2 η x y 2 for all x, y R n. Let {x k } be a sequence generated by PFBS and x be the solution of problem (1.3)forX = R n. Set c = x 1 x 0 2. Then x k x 2 cηk 1 η. As a conclusion of theorem 3.7, we can also obtain the convergence rate of FP 2 Ofor κ = 0 under the assumption I λbb T < 1. Corollary 3.4. Suppose 0 <λ 1/λ max (BB T ), the matrix B has full row rank and η 1 is given by condition 3.1. Let v be the fixed point of H(cf (2.1)). Let {v k } be a sequence generated by FP 2 Oforκ = 0, with x k = b λb T v k. Set c = λb T (v 1 v 0 ) λ v 1 v Then v k v 2 cη k 1 λ(1 η1 ), x k x 2 cηk 1 1 η Connections to other algorithms We will further investigate the proposed algorithm PDFP 2 O from the perspective of primal dual forms and establish the connections to other existing methods. 13

14 4.1. Primal dual and proximal point algorithms For problem (1.1), we can write its primal dual form using the Fenchel duality [29]as min maxl(x, v) := f 2 (x) + Bx, v f x 1 (v), (4.1) v where f1 is the convex conjugate function of f 1 defined by f1 (v) = sup v, w f 1 (w). w R m By introducing a new intermediate variable y k+1, equations (2.2) are reformulated as y k+1 = x k γ f 2 (x k ) λb T v k, (4.2a) v k+1 = (I prox γ λ f )(By 1 k+1 + v k ), (4.2b) x k+1 = x k γ f 2 (x k ) λb T v k+1. (4.2c) According to Moreau decomposition (see equation (2.21) in [12]), for all v R m,wehave v = v γ + v γ, where v γ = prox γ λ λ λ λ f v, v γ = γ ( ) λ 1 λ λ prox λ γ f 1 γ v, from which we know ( ) I prox γ λ f (Byk+1 + v 1 k ) = γ ( λ λ prox λ γ f 1 γ By k+1 + λ ) γ v k. Let v k = γ λ v k. Then (4.2) can be reformulated as y k+1 = x k γ f 2 (x k ) γ B T v k, (4.3a) ( ) λ v k+1 = prox λ γ f 1 γ By k+1 + v k, (4.3b) x k+1 = x k γ f 2 (x k ) γ B T v k+1. (4.3c) In terms of the saddle point formulation (4.1), we have by a direct manipulation that f 2 (x k ) + B T v k = x L(x k, v k ) ( ) λ prox λ γ f 1 γ By k+1 + v k = arg min L(y k+1, v) + γ v R m 2λ v v k 2 2, f 2 (x k ) + B T v k+1 = x L(x k, v k+1 ). Hence, (4.3) can be expressed as y k+1 = x k γ x L(x k, v k ), (4.4a) v k+1 = arg min L(y k+1, v) + γ v R m 2λ v v k 2 2, (4.4b) x k+1 = x k γ x L(x k, v k+1 ). (4.4c) From (4.3a) and (4.3c), we can find out that y k+1 = x k+1 + γ B T (v k+1 v k ). Then equation (4.4b) becomes v k+1 = arg min L(x k+1, v) + γ v R m 2λ v v k 2 M, where M = 1 λbb T. Together with (4.4c), the iterations (4.4)are v k+1 = arg maxl(x k+1, v) γ v R m 2λ v v k 2 M, (4.5a) x k+1 = x k γ x L(x k, v k+1 ). (4.5b) 14

15 Table 1. Comparison between CP (θ = 1) and PDFP 2 O. CP (θ = 1) PDFP 2 O Form v k+1 = (I + σ f1 ) 1 (v k + σ By k ) v k+1 = (I + λ f γ 1 ) 1 (v k + λ By γ k) x k+1 = (I + τ f 2 ) 1 (x k τb T v k+1 ) x k+1 = x k γ f 2 (x k ) γ B T v k+1 y k+1 = 2x k+1 x k y k+1 = x k+1 γ f 2 (x k+1 ) γ B T v k+1 Convergence 0 <στ<1/λ max (BB T ) 0 <γ <2β,0<λ 1/λ max (BB T ) Relation σ = λ/γ, τ = γ Thus the proposed algorithm can be interpreted as an inexact Uzawa method [3] applied on the dual formulation. Compared to the classical Uzawa method, (4.5) is more implicit since the update of v k+1 involves x k+1 and a proximal point iteration matrix M is used. This leads to a close connection with a class of primal dual method studied in [35, 16, 9, 19]. For example, in [9], Chambolle and Pock proposed the following scheme for solving (4.1): v k+1 = (I + σ f1 ) 1 (v k + σ By k ), (4.6a) (CP) x k+1 = (I + τ f 2 ) 1 (x k τb T v k+1 ), (4.6b) y k+1 = x k+1 + θ(x k+1 x k ), (4.6c) where σ,τ > 0, θ [0, 1] is a parameter. For θ = 0, we can obtain the classical Arrow Hurwicz Uzawa (AHU) method in [3]. The convergence of AHU with very small step length isshownin[16]. Under some assumptions on f1 or strong convexity of f 2, global convergence of the primal dual gap can also be shown with specific chosen adaptive steplength [9]. Note that in the case of the ROF model, Chan and Zhu proposed in [36] a clever adaptive step lengths σ and τ for acceleration, and recently the convergence was shown in [6]. According to equation (4.3), using the relation prox λ γ f = (I + λ 1 γ f 1 ) 1, and changing the order of these equations, we know that PDFP 2 O is equivalent to ( v k+1 = I + λ ) 1 ( ) λ γ f 1 γ By k + v k, (4.7a) x k+1 = x k γ f 2 (x k ) γ B T v k+1, (4.7b) y k+1 = x k+1 γ f 2 (x k+1 ) γ B T v k+1. (4.7c) Let σ = λ/γ, τ = γ, then we can see that equations (4.6b) and (4.6c) are approximated by two explicit steps (4.7b) (4.7c). In summary, we list the comparisons of CP for θ = 1 with the fixed step length and PDFP 2 O in table 1. For f 2 (x) = 1 2 Ax b 2 2, (4.4) can be further expressed as y k+1 = arg minl(x, v k ) + 1 x R n 2γ x x k 2 (I γ A T A), (4.8a) v k+1 = arg min L(y k+1, v) + γ v R m 2λ v v k 2 2, (4.8b) x k+1 = arg minl(x, v k+1 ) + 1 x R n 2γ x x k 2 (I γ A T A). (4.8c) Note that by introducing the proximal iteration norm through the matrix I γ A T A M n for 0 <γ <βwith β = 1/λ max (A T A), (4.8a) and (4.8c) become explicit. This is particularly useful when the inverse of A T A is not easy to obtain in most of the imaging applications, such as super-resolution, tomographic reconstruction and parallel MRI [11]. Meanwhile, it is worthwhile pointing out that the condition on γ by this formulation is stricter than theorem 3.5, where γ is required as 0 <γ <2β for the convergence. Furthermore, if we 15

16 ) and P = ( γ λ (I λbbt ) 0 denote û k = ( v T k, x ) T T k and F(ûk ) = ( f1 (v k) Bx k B T v k + f 2 (x k ) we can also easily write the algorithm in the PPA framework [19] as 0 1 γ (I γ AT A) ), then 0 F(û k+1 ) + P(û k+1 û k ). (4.9) We note that in [19], the Chambolle Pock algorithm (4.6) forθ = 1 was also rewritten in the PPA structure as (4.9) with the same F, while ( 1 P = σ I B ). B T 1 τ I In [19, 9], a more general class of algorithms taking this form are studied. In particular, an extra extrapolation step can be applied to the algorithm (4.9) for acceleration Splitting type of methods There are other types of methods which are designed to solve problem (1.1) based on the notion of an augmented Lagrangian. For simplicity, we only list these algorithms for f 2 (x) = 1 2 Ax b 2 2. Among them, the alternating split Bregman (ASB) method proposed by Goldstein and Osher [18] is very popular for imaging applications. This method has been proved to be equivalent to the Douglas Rachford method and the alternating direction of multiplier method (ADMM). In [34, 35], based on PFBS and Bregman iteration, a split inexact Uzawa (SIU) method is proposed to maximally decouple the iterations, so that each iteration is explicit. Further analysis and connections to primal dual methods algorithm are given in [16, 35]. In particular, it is shown that the primal dual algorithm scheme (4.6) with θ = 1 can be interpreted as SIU. In the following, we study the connections and differences between these two methods. ASB can be described as follows: x k+1 = (A T A + νb T B) 1 (A T b + νb T (d k v k )), (4.10a) (ASB) d k+1 = prox 1 ν f (Bx 1 k+1 + v k ), (4.10b) v k+1 = v k (d k+1 Bx k+1 ), (4.10c) where ν>0isaparameter. The explicit SIU method proposed in the literature [35] can be described as x k+1 = x k δa T (Ax k b) δνb T (Bx k d k + v k ), (4.11a) (SIU) d k+1 = prox 1 ν f (Bx 1 k+1 + v k ), (4.11b) v k+1 = v k (d k+1 Bx k+1 ), (4.11c) where δ>0 is a parameter. We can easily see that we approximate the implicit step (4.10a) in ASB by an explicit step (4.11a)inSIU. From (4.2a) and (4.2c), we can find out a relation between y k and x k, given by x k = y k λb T (v k v k 1 ). Then eliminating x k,pdfp 2 O can be expressed as { yk+1 = y k λb T (2v k v k 1 ) γ f 2 (y k λb T (v k v k 1 )), (4.12a) v k+1 = (I prox γ λ f )(By 1 k+1 + v k ). (4.12b) By introducing the splitting variable d k+1 in (4.12b), (4.12) can be further expressed as 16

17 Table 2. The comparisons among ASB, SIU and PDFP 2 O. ASB SIU PDFP 2 O Form x k+1 = (A T A + νb T B) 1 x k+1 = x k δa T (Ax k b) x k+1 = x k δa T (Ax k b) (A T b + νb T (d k v k )) δνb T (Bx k d k + v k ) δνb T (Bx k d k + v k ) δ 2 νa T AB T (d k Bx k ) d k+1 = prox 1 ν f 1 (Bx k+1 + v k ) d k+1 = prox 1 ν f 1 (Bx k+1 + v k ) d k+1 = prox 1 ν f 1 (Bx k+1 + v k ) v k+1 = v k (d k+1 Bx k+1 ) v k+1 = v k (d k+1 Bx k+1 ) v k+1 = v k (d k+1 Bx k+1 ) Convergence ν>0 ν>0 0 <δ<2/λ max (A T A) 0 <δ 1/λ max (A T A + νb T B) 0 <δν 1/λ max (BB T ) y k+1 = y k λb T (By k d k + v k ) γ f 2 (y k λb T (By k d k )), d k+1 = prox γ λ f (By 1 k+1 + v k ), v k+1 = v k (d k+1 By k+1 ). (4.13) For f 2 (x) = 1 2 Ax b 2 2, f 2(x) = A T (Ax b). By changing the order and letting γ = δ, λ = δν, (4.13) becomes y k+1 = y k δa T (Ay k b) δνb T (By k d k + v k ) δ 2 νa T AB T (d k By k ) (4.14a) d k+1 = (prox 1 ν f )(By 1 k+1 + v k ), (4.14b) v k+1 = v k (d k+1 By k+1 ). (4.14c) We can easily see that equation (4.10a) in ASB is approximated by (4.14a). Although it seems that PDFP 2 O requires more computation in (4.14a) than SIU in (4.11a), PDFP 2 O has the same computation cost as that of SIU if the iterations are implemented cleverly. For the reason of comparison, we can change the variable y k to x k in (4.14). Table 2 gives the summarized comparisons among ASB, SIU and PDFP 2 O. We note that the only difference of SIU and PDFP 2 O is in the first step. As two algorithms converge, the algorithm PDFP 2 O behaves asymptotically the same as SIU since d k Bx k converges to 0. The parameters δ and ν satisfy respectively different conditions to ensure the convergence. 5. Numerical experiments In this section, we illustrate the numerical performance of PDFP 2 O κ for κ [0, 1) through three applications: image super-resolution, computerized tomography (CT) reconstruction and pmri. Both the first two applications can be described as problem (1.2), where A is a linear operator representing the subsampling and tomographic projection operator respectively. In pmri, 2 1 Ax b 2 2 is replaced by 2 1 N j=1 A jx b j 2 2, and a detailed description will be given in section 5.3. Here, we use the total variation as the regularization functional, where the operator B : R n R 2n is a discrete gradient operator. Furthermore, the isotropic definition is adopted, i.e. f 1 (w) = μ w 1,2, for all w = (w 1,...,w n,w n+1,...,w 2n ) T R 2n, n w 1,2 = wi 2 + wn+i 2. i=1 Let w i = (w i,w n+i ) T, w i 2 = expressed as w 2 i + w 2 n+i and ɛ = μγ λ. Then prox ɛ 1,2 (w) can be (prox ɛ 1,2 (w)) i,n+i = max{ w i 2 ɛ,0}, w i 2 w i i = 1,...,n. 17

18 For the implementation of PDFP 2 O, we use the scheme presented in algorithm 4, where we compute directly (I prox γ λ f )(w). In fact, we can deduce that (I prox 1 γ λ f )(w) = Proj 1 ɛ (w), where Proj ɛ is the projection operator from R 2n to l 2, ball of radius ɛ, i.e. w i (Proj ɛ (w)) i,n+i = min{ w i 2,ɛ}, i = 1,...,n. w i 2 In the numerical experiments, we compare our proposed algorithm PDFP 2 O with the three methods: ASB (cf (4.10)), CP (cf (4.6)) and SIU (cf (4.11)). Both ASB and CP involve linear system inversion (A T A + νb T B) 1 and (I + τa T A) 1 respectively. In the experiments, we use the conjugate gradient (CG) method for quadratic subproblems. The maximal number of CG iterations is denoted as N I and the stopping criteria are set as the residual error is less than To numerically measure the convergence speed of various methods, we compute the relative error between the energy at kth outer iteration and the optimal value E. In practice, we run each method 5000 (outer iteration) steps with a large range of parameters and set the minimum of all the tests as the optimal minimum energy E. We denote ε k = (E k E)/E. (5.1) In the following, we will use (5.1) as a criterion to compare the performance among ASB, CP, SIU and PDFP 2 O. To guarantee the quality of recovered images, we also use the criterion peak signal-to-noise ratio (PSNR) ( ) PSNR = 10 log 10, with MSE = I m Ĩ m 2 F, MSE s 1 s 2 where I m denotes the original image of s 1 s 2, Ĩ m denotes the recovered images obtained from various algorithms and F denotes the Frobenius norm. All the experiments are implemented under MATLAB7.11(R2010b) and conducted on a computer with Intel(R) core(tm) i5 CPU 750@ 2.67G Image super-resolution In the numerical simulation, the subsampling operator A is implemented by taking the average of every d d pixels and sampling the average, if a zoom-in ratio d is desired. The experiment is performed on the test image lena of size and the subsampling ratio is d = 4. White Gaussian noise of mean 0 and variance 1 is added to the observed low-resolution image of The regularization parameter μ is set as 0.1 for the best image quality. First we show the impacts of the parameters κ, γ and λ for the proposed algorithm in figure 1. The conditions for theoretical convergence are 0 <γ <2β, 0<λ 1/λ max (BB T ) and κ [0, 1) (see theorems 3.4 and 3.5). The constant β is given by 1/λ max (A T A), and the maximal eigenvalue of A T A is 1/16, so 0 <γ <32. It is well known in total variation application that λ max (BB T ) = 8forB being the usual gradient operator (see [16]), and then 0 < λ 1/8. Figures 1(a) and (b) show that for most cases κ = 0 achieves the fastest convergence compared to other κ (0, 1). Thus we choose κ = 0 for the following comparison. In figures 1(c) and (d), the parameter λ has relatively smaller impact on the performance of this algorithm. We compare the results for λ = 1/5, 1/6, 1/8, 1/16, 1/32. When λ = 1/6 > 1/8, the algorithm is convergent. While for λ = 1/5, the algorithm does not appear to converge, which shows that we cannot extend the range of λ to (0, 2/λ max (BB T )) generally, as given in [24] for denoising case (see algorithm 1). Hence, we only consider 0 <λ 1/λ max (BB T ) as indicated in theorem 3.5, for which the upper bound λ = 1/8 achieves the best performance. The parameter γ has relatively larger impact for the algorithm. We test γ = 8, 16, 24, 30, 32 for κ = 0, λ = 1/8. We observe that numerically larger γ leads to a faster convergence. For this reason, we can choose γ close to 2β. 18

19 κ=0.5 κ=0.1 κ=0.01 κ=0.001 κ= κ=0.5 κ=0.1 κ=0.01 κ=0.001 κ= λ=1/5 λ=1/6 λ=1/8 λ=1/16 λ=1/ PSNR 28 log10(energy) 5.07 PSNR Iteration Iteration Iteration (a) (b) (c) λ=1/5 λ=1/6 λ=1/8 λ=1/16 λ=1/ γ=8 γ=16 γ=24 γ=30 γ= γ=8 γ=16 γ=24 γ=30 γ= log10(energy) PSNR 28 log10(energy) Iteration Iteration Iteration (d) (e) (f) Figure 1. PSNR and energy versus iterations with different parameters. (a) and (b) are PSNR and energy versus iterations for κ = 0.5, 0.1, 0.01, 0.001, 0(λ = 1/8, γ = 30). (c) and (d) are PSNR and energy versus iterations for λ = 1/5, 1/6, 1/8, 1/16, 1/32 (κ = 0, γ = 30). (e) and (f) are PSNR and energy versus iterations for γ = 8, 16, 24, 30, 32 (κ = 0, λ = 1/8). As mentioned above, the optimal value E of the optimization model (1.2) for this example is obtained by taking the minimum of a large range of parameter setting on each method with 5000 iterations. The performances for each method with different parameter sets are listed in tables 3 6 for ε = 10 i, i = 1,...,6. For a given ε, the first column gives the least (outer) iteration number k such that ε k <ε, and the second column in the bracket gives the corresponding running time in second. The entries indicate that the algorithm fails to drop the error below ε within a maximum number of 5000 iterations. Table 3 shows that the number of inner iteration steps N I in CG affects the speed of ASB. We highlight the best performance for each given tolerance ε. The parameter ν also plays an important role in the performance of ASB. For this example, ν = 0.01 generally gives the smallest number of iterations and the least computation time for different tolerance levels ε. For the CP algorithm, we run the tests with different σ, θ = 1 and τ = 1 according to the 8σ convergence condition in table 1. The second quadratic subproblem is solved with CG. After a simple analysis, we observe that (I + τa T A) has only two eigenvalues (1 + τ/16) and 1; thus, this subproblem can be solved within two CG steps theoretically. Therefore we only list the results with N I = 1, 2 in table 4 for comparison, and we can see that N I = 1 has even better performance in terms of total iteration steps and computation time. For this example, σ has some impact on the performance of CP, and σ = gives the best performance for all the tolerance levels. Similar results for the SIU algorithm is given in table 5, and a larger δ yields a better performance on respecting the convergence condition with an accordingly best ν. For this example, δ = 24 and ν = 0.06 give the best performance among the tested parameter sets. We also test various γ and λ and list the results for PDFP 2 O in terms of computation time and relative error to the optimal minimum in table 6. As we observed previously, γ and λ being close to the upper bound 2β and 1/λ max (BB T ) gives a nearly optimal convergence 19

20 Table 3. Performance evaluation for different choices of ν and N I in ASB for image superresolution. N I ν ε = 10 1 ε = 10 2 ε = 10 3 ε = 10 4 ε = 10 5 ε = (2.39) 165 (9.55) 486 (28.12) 1800 (104.53) (0.49) 45 (2.64) 144 (8.37) 469 (27.34) 1599 (93.29) (0.36) 41 (2.39) 153 (8.94) 464 (27.05) 1572 (93.14) 3633 (216.50) (0.52) 103 (7.12) 544 (32.81) 1776 (105.88) (0.94) 205 (13.21) 1062 (63.32) 3485 (208.60) (1.79) 101 (7.53) 459 (34.25) 1772 (132.31) (0.59) 31 (2.33) 110 (8.28) 397 (29.89) 1502 (112.96) (0.32) 21 (1.58) 74 (5.55) 254 (19.14) 908 (69.84) 2930 (225.19) (0.30) 45 (3.39) 203 (15.28) 801 (61.61) 2266 (173.80) (0.53) 89 (6.70) 399 (30.02) 1590 (120.95) 4474 (343.83) (1.60) 93 (11.60) 454 (56.57) 1770 (220.55) (0.61) 24 (2.99) 101 (12.60) 367 (45.79) 1446 (180.41) 4968 (620.12) (0.50) 19 (2.37) 71 (8.91) 248 (32.75) 884 (114.19) 2866 (369.36) (0.49) 45 (5.63) 181 (22.69) 769 (99.59) 2108 (271.02) (0.86) 88 (11.01) 356 (46.29) 1524 (195.54) 4156 (533.58) (2.69) 93 (19.27) 454 (94.18) 1770 (367.16) (1.03) 24 (4.96) 101 (20.87) 367 (78.11) 1446 (309.09) 4967 ( ) (0.84) 19 (3.99) 72 (15.04) 248 (53.50) 886 (189.75) 2871 (613.37) (0.83) 45 (9.59) 182 (38.08) 770 (163.58) 2105 (449.95) (1.42) 88 (18.21) 359 (76.37) 1530 (326.83) 4159 (887.33) Table 4. Performance evaluation for different choices of σ and N I in CP for image super-resolution. N I σ ε = 10 1 ε = 10 2 ε = 10 3 ε = 10 4 ε = 10 5 ε = (4.64) 335 (14.51) 953 (42.64) 3567 (157.65) (1.89) 171 (7.19) 487 (20.56) 1803 (77.75) (0.25) 46 (1.94) 150 (6.35) 481 (20.37) 1616 (69.77) (0.13) 57 (2.42) 217 (10.22) 650 (28.45) 2084 (90.45) 4186 (180.86) (0.25) 262 (11.09) 1049 (45.82) 3110 (134.20) (1.61) 185 (9.11) 909 (44.93) 3537 (178.11) (1.06) 99 (4.89) 458 (24.12) 1773 (90.94) (0.24) 45 (2.23) 149 (7.37) 480 (23.72) 1617 (81.60) (0.14) 56 (2.77) 216 (10.69) 648 (31.95) 2080 (104.27) 4181 (211.21) (0.28) 262 (12.90) 1048 (51.63) 3109 (156.60) Table 5. Performance evaluation for different choices of δ and ν in SIU for image super-resolution. The impacts of different ν for δ = 8, 16, 30 are similar to δ = 24; thus, we only list the cases with different ν for δ = 24. δ ν ε = 10 1 ε = 10 2 ε = 10 3 ε = 10 4 ε = 10 5 ε = (0.11) 81 (2.17) 327 (8.71) 975 (26.53) 3106 (83.28) (0.15) 46 (1.26) 173 (4.65) 519 (13.86) 1675 (44.64) 3542 (94.95) (0.20) 42 (1.14) 141 (3.78) 446 (12.45) 1472 (39.77) 4342 (116.73) (0.23) 47 (1.27) 153 (4.23) 490 (13.22) 1625 (43.96) (1.45) 191 (5.14) 507 (13.60) 1817 (48.52) (3.46) 371 (10.12) 980 (26.77) 3584 (96.88) (0.86) 109 (2.94) 323 (8.65) 1147 (30.73) 4496 (121.20) 20

21 Table 6. Performance evaluation for different choices of γ and λ in PDFP 2 O for image superresolution. The impacts of λ for γ = 8, 16 are similar with γ = 24, 30. γ λ ε = 10 1 ε = 10 2 ε = 10 3 ε = 10 4 ε = 10 5 ε = /8 3 (0.09) 81 (2.05) 328 (8.30) 977 (24.75) 3115 (79.08) 16 1/8 3 (0.09) 49 (1.24) 178 (4.50) 540 (13.64) 1750 (44.10) 3982 (100.71) 24 1/6 3 (0.09) 38 (0.97) 134 (3.41) 417 (10.56) 1373 (34.68) 3826 (96.96) 1/8 5 (0.14) 45 (1.15) 149 (3.80) 478 (12.14) 1587 (40.15) 4940 (125.14) 1/16 15 (0.38) 76 (1.95) 224 (5.74) 768 (19.93) 2811 (71.70) 1/32 41 (1.03) 144 (3.64) 397 (10.03) 1413 (36.15) 30 1/6 5 (0.14) 38 (0.97) 128 (3.25) 417 (10.59) 1412 (35.81) 4579 (116.08) 1/8 6 (0.18) 46 (1.20) 150 (3.83) 508 (13.26) 1788 (45.60) 1/16 17 (0.43) 82 (2.06) 252 (6.38) 897 (22.90) 3465 (88.06) 1/32 46 (1.15) 160 (4.00) 470 (11.88) 1733 (44.13) Table 7. Performance comparison among ASB, CP, SIU and PDFP 2 O for image super-resolution. For a given error tolerance ε, the first column in the bracket gives the first outer iteration number k such that ε k <ε, the second column in the bracket gives the corresponding run time in second and the third column in the bracket gives the corresponding PSNR. For ASB, N I = 2, ν = For CP, N I = 1, σ = For SIU, δ = 24, ν = For PDFP 2 O, γ = 30, λ = 1/6. ε = 10 2 ε = 10 3 ε = 10 4 ε = 10 5 ASB (21, 1.58, 29.34) (74, 5.55, 29.38) (254, 19.14, 29.37) (908, 69.84, 29.36) CP (46, 1.94, 28.97) (150, 6.35, 29.24) (481, 20.37, 29.32) (1616, 69.77, 29.35) SIU (42, 1.14, 28.91) (141, 3.78, 29.22) (446, 12.45, 29.31) (1472, 39.77, 29.35) PDFP 2 O (38, 0.97, 28.98) (128, 3.25, 29.25) (417, 10.59, 29.32) (1412, 35.81, 29.35) speed. Table 6 shows that γ = 24, λ = 1/8 has slightly better convergence speed, while γ = 30, λ = 1/8 can get slightly higher PSNR in the first steps, as shown in figure 1(e). Finally, we compare the four methods with their optimal parameter sets (averagely best for all the tolerance levels) in table 7. We also compare their corresponding values of PSNR to measure the recovered image quality. From table 7, we can see that PDFP 2 O is better than ASB and CP in terms of the computation time, especially for a higher accuracy level. The performance of the two explicit methods SIU and PDFP 2 O is similar. However, the choice of parameters for PDFP 2 O is relatively easier compared to SIU. Also, we point out that ASB can attain higher PSNR at the first few steps for some good choices of ν, which can be interesting in practice when a crude approximation is needed in a short time. Figure 2 shows the images recovered with the four methods for ε = 10 4 and the images look similar as expected CT reconstruction In a simplified parallel beam tomographic problem, an observed body slice is modeled as a two-dimensional function, and projections modeled by line integrals represent the total attenuation of a beam of x-rays when it traverses the object. The operator for this application can be represented by a discrete Radon transform, and the tomographic reconstruction problem is then to estimate a function from a finite number of measured line integrals (see [4]). The standard reconstruction algorithm in clinical applications is the so-called filtered back projection (FBP) algorithm. In the presence of noise, this problem becomes difficult since the inverse of Radon transform is unbounded and ill-posed. In the literature, the model (1.2) is often used for iterative reconstruction. Here, A is the Radon transform matrix and b is the 21

22 Original Zooming ASB, PSNR=29.37 CP, PSNR=29.32 SIU, PSNR=29.31 PDFP 2 O, PSNR=29.32 Figure 2. Super resolution results from image to image by ASB, CP, SIU and PDFP 2 O corresponding to tolerance error ε = 10 4 with noise level 1. 22

23 34 32 κ=0.5 κ=0.1 κ=0.01 κ=0.001 κ= κ=0.5 κ=0.1 κ=0.01 κ=0.001 κ= λ=1/5 λ=1/6 λ=1/8 λ=1/16 λ=1/ PSNR log10(energy) PSNR Iteration Iteration Iteration (a) (b) (c) λ=1/5 λ=1/6 λ=1/8 λ=1/16 λ=1/ γ=0.4 γ=0.7 γ=1 γ=1.2 γ= γ=0.4 γ=0.7 γ=1 γ=1.2 γ= log10(energy) PSNR log10(energy) Iteration Iteration Iteration (d) (e) (f) Figure 3. PSNR and energy versus iterations for different parameters in CT reconstruction. (a) and (b) are PSNR and energy versus iterations for κ = 0.5, 0.1, 0.01, 0.001, 0(λ = 1/8, γ = 1.3). (c) and (d) are PSNR and energy versus iterations for λ = 1/5, 1/6, 1/8, 1/16, 1/32 (κ = 0, γ = 1.3). (e) and (f) are PSNR and energy versus iterations for γ = 0.4, 0.7, 1, 1.2, 1.3 (κ = 0, λ = 1/8). measured projections vector. Generally, the size of A is huge and it is not easy to compute the inverse directly. We note that the total variation regularization has become a standard tool in tomographic reconstruction. Recently, first-order methods have been applied for a faster implementation, for example SIU is applied in [34] for CT and the Chambolle Pock algorithm is applied in [1]for PET and cone beam CT reconstruction. Here, we use the same example tested in [35], i.e. 50 uniformly oriented projections are simulated for a Shepp Logan phantom image and then white Gaussian noise of mean 0 and variance 1 is added to the data. For this example, we compute numerically λ max (AA T ) = , so we can set 0 <γ <2/λ max (AA T ) = As the previous example, we first test the impacts of the parameters κ, γ and λ. The impact of κ has the same behavior as for the super-resolution example, i.e. κ = 0 is the best one for κ [0, 1) (see figures 3(a) and (b)). Similarly, the parameter λ has relatively small impact on the performance of the algorithm (see figures 3(c) and (d)). It seems that the algorithm still converges with λ = 1/5, but it cannot achieve high accuracy (table 11). As the previous example, the parameter γ has larger impact on the convergence rate of the algorithm (see figures 3(e) and (f)). Theoretically, it should satisfy 0 <γ < Numerically, we test γ = 0.4, 0.7, 1, 1.2, 1.3 forκ = 0, λ = 1/8. Better performance with a larger γ is observed (see figures 3(e) and (f)), while when γ = 1.4, the algorithm diverges. As the previous application, we show the performance of ASB, CP, SIU, PDFP 2 O with different parameter sets in tables 8, 9, 10, 11, respectively. For the algorithm ASB, table 8 shows that N I = 5 is the best choice for ε = 10 1, 10 2, while N I = 2 is the best one for ε = 10 i, i = 3, 4, 5, 6. We choose N I = 2 for an averagely good performance. Similarly, ν = 0.01 is the best for ε = 10 i, i = 1, 2, 3, 4 and ν = 0.05 is the best for ε = 10 5, Thus, we use the two parameter sets for comparison (see table 12). Similar to ASB, the best 23

24 Table 8. Performance evaluation for different choices of ν and N I in ASB for CT reconstruction. N I ν ε = 10 1 ε = 10 2 ε = 10 3 ε = 10 4 ε = 10 5 ε = (3.24) 551 (8.43) 995 (14.92) 2380 (35.15) (2.11) 362 (5.20) 662 (11.13) 1070 (17.24) 2310 (35.08) (2.22) 366 (5.27) 657 (9.46) 1002 (14.43) 1564 (24.64) 4893 (72.57) (2.13) 360 (5.24) 646 (9.35) 960 (13.86) 1304 (18.81) 1794 (25.86) (2.12) 367 (5.28) 654 (9.40) 967 (13.92) 1298 (18.66) 1692 (24.35) (1.58) 206 (4.32) 422 (8.87) 2085 (43.81) (0.66) 73 (1.52) 149 (4.13) 478 (11.83) 2083 (45.54) (0.60) 67 (1.42) 129 (2.72) 279 (5.87) 1068 (24.49) 4670 (102.17) (0.99) 124 (2.59) 229 (4.79) 345 (7.24) 492 (10.33) 1011 (21.23) (1.42) 208 (4.35) 421 (8.82) 653 (13.69) 899 (18.86) 1204 (25.28) (0.74) 59 (2.40) 355 (14.56) 1985 (83.22) (0.38) 21 (0.86) 77 (3.16) 407 (16.63) 1897 (79.30) (0.36) 26 (1.06) 63 (3.33) 219 (11.03) 977 (41.97) 4287 (181.05) (1.38) 115 (4.68) 232 (9.44) 356 (16.26) 508 (22.99) 982 (42.31) (2.68) 227 (9.25) 457 (18.62) 698 (30.16) 952 (40.51) 1261 (53.13) (0.75) 44 (3.25) 352 (27.50) 1974 (149.35) (0.51) 19 (1.40) 76 (5.59) 401 (29.54) 1872 (142.26) (0.67) 27 (1.99) 63 (4.65) 217 (16.01) 963 (73.15) 4236 (322.62) (2.42) 115 (8.47) 231 (17.02) 356 (26.24) 508 (37.45) 981 (74.43) (5.95) 228 (18.76) 458 (35.73) 699 (53.50) 952 (74.23) 1261 (97.04) Table 9. Performance evaluation for different choices of σ and N I in CP for CT reconstruction. N I σ ε = 10 1 ε = 10 2 ε = 10 3 ε = 10 4 ε = 10 5 ε = (2.42) 440 (6.18) 847 (11.90) 2314 (32.57) (2.18) 362 (7.23) 652 (11.26) 998 (16.05) 1559 (23.82) 4879 (71.22) (2.10) 364 (5.02) 650 (8.97) 972 (15.55) 1375 (21.11) 2628 (38.39) (2.11) 364 (5.00) 647 (8.88) 960 (13.17) 1305 (17.90) 1791 (24.57) (2.27) 387 (5.32) 686 (9.41) 1015 (13.92) 1365 (19.41) 1779 (26.15) (1.57) 175 (3.55) 415 (8.40) 2098 (42.52) (0.81) 94 (2.04) 174 (4.76) 326 (8.72) 1083 (24.03) 4671 (98.46) (0.81) 93 (1.89) 167 (3.37) 267 (5.40) 588 (12.73) 2401 (50.60) (1.62) 195 (3.88) 346 (6.89) 515 (10.26) 714 (14.21) 1156 (23.05) (3.21) 385 (7.68) 684 (13.65) 1013 (20.56) 1363 (29.14) 1777 (37.38) (0.69) 56 (2.18) 354 (13.80) 1985 (79.67) (0.75) 44 (1.72) 87 (3.75) 249 (11.54) 1053 (42.90) 4636 (186.29) (1.34) 80 (3.13) 144 (5.63) 236 (9.22) 573 (24.49) 2389 (97.59) (3.13) 193 (7.45) 344 (13.29) 513 (19.80) 712 (27.50) 1154 (46.67) (6.20) 384 (14.80) 683 (27.40) 1011 (40.80) 1361 (54.30) 1775 (70.24) (1.44) 46 (3.93) 353 (25.53) 1977 (143.57) (1.36) 44 (3.11) 87 (6.14) 249 (19.95) 1054 (78.92) 4637 (339.05) (2.36) 80 (5.59) 144 (10.04) 236 (16.46) 573 (41.86) 2388 (172.31) (5.63) 193 (13.45) 344 (23.97) 513 (37.65) 712 (51.50) 1154 (82.26) (11.41) 384 (26.90) 683 (47.70) 1011 (70.79) 1361 (92.93) 1775 (116.60) parameter sets chosen in terms of computation time for different tolerance for each method are N I = 2 and ν = 0.02 for CP (see table 9), δ = 1.3 and ν = 0.1 for SIU (see table 10), γ = 1.3 and λ = 1/8 for PDFP 2 O (see table 11). All these results are compared in table 12. Figure 4 gives the corresponding images recovered for ε = From table 12, we can observe that the evolution of PSNR and energy are very close for PDFP 2 O and SIU, although 24

25 Table 10. Performance evaluation for different choices of δ and ν in SIU for CT reconstruction. The impacts of ν for δ = 0.4, 0.7, 1, 1.2 are similar with δ = 1.3, and we only list the results with different ν for δ = 1.3. δ ν ε = 10 1 ε = 10 2 ε = 10 3 ε = 10 4 ε = 10 5 ε = (3.56) 1196 (8.52) 2127 (15.17) 3146 (22.43) 4210 (30.01) (2.00) 684 (4.79) 1216 (8.52) 1798 (12.60) 2408 (16.87) 3071 (21.61) (1.47) 479 (3.52) 852 (6.20) 1261 (9.08) 1695 (12.12) 2195 (15.64) (1.17) 399 (2.80) 710 (4.98) 1051 (7.38) 1415 (9.94) 1842 (12.93) (1.29) (1.08) 368 (2.59) 655 (4.61) 971 (6.84) 1307 (9.20) 1707 (12.03) (1.09) 369 (2.59) 656 (4.61) 975 (6.85) 1326 (9.32) 1816 (12.77) (1.10) 373 (2.62) 668 (4.70) 1019 (7.16) 1580 (11.10) 4881 (34.56) (1.14) 380 (2.67) 685 (4.81) 1098 (7.72) 2336 (16.58) Table 11. Performance evaluation for different choices of γ and λ in PDFP 2 O for CT reconstruction. The impacts of different λ for γ = 0.4, 0.7, 1, 1.2 are similar with γ = 1.3, and we only list the cases with different λ for γ = 1.3. γ λ ε = 10 1 ε = 10 2 ε = 10 3 ε = 10 4 ε = 10 5 ε = /8 501 (3.51) 1196 (8.40) 2127 (15.10) 3145 (22.24) 4208 (29.66) 0.7 1/8 286 (2.01) 684 (4.79) 1216 (8.53) 1799 (12.61) 2409 (16.89) 3074 (21.56) 1 1/8 201 (1.41) 479 (3.34) 851 (5.93) 1260 (8.77) 1692 (11.79) 2178 (15.16) 1.2 1/8 167 (1.16) 399 (2.78) 710 (4.94) 1051 (7.33) 1414 (9.85) 1838 (12.81) 1.3 1/5 157 (1.10) 590 (4.11) 1/6 154 (1.07) 368 (2.57) 655 (4.57) 970 (6.76) 1303 (9.16) 1687 (11.94) 1/8 154 (1.11) 368 (2.61) 655 (4.61) 971 (6.81) 1307 (9.15) 1709 (11.95) 1/ (1.09) 369 (2.57) 656 (4.57) 975 (6.80) 1327 (9.26) 1825 (12.72) 1/ (1.08) 370 (2.58) 659 (4.59) 984 (6.85) 1371 (9.56) 2320 (16.16) Table 12. Performance evaluation comparison among ASB, CP, SIU and PDFP 2 O in CT reconstruction. For a given error tolerance ε, the first column in the bracket gives the first outer iteration number k such that ε k <ε, the second column in the bracket gives the corresponding run time in second and the third column in the bracket gives the corresponding PSNR. For ASB 1, N I = 2, ν = 0.01; for ASB 2, N I = 2, ν = For CP, N I = 2, σ = For SIU, δ = 1.3, ν = 0.1. For PDFP 2 O, γ = 1.3, λ = 1/8. ε = 10 2 ε = 10 3 ε = 10 4 ε = 10 5 ASB 1 (67, 1.42, 30.29) (129, 2.72, 31.97) (279, 5.87, 32.43) (1068, 24.49, 32.45) ASB 2 (124, 2.59, 30.06) (229, 4.79, 31.75) (345, 7.24, 32.26) (492, 10.33, 32.41) CP (44, 1.72, 30.48) (167, 3.37, 31.80) (267, 5.40, 32.32) (588, 12.73, 32.45) SIU (368, 2.59, 30.03) (655, 4.61, 31.70) (971, 6.84, 32.23) (1307, 9.20, 32.38) PDFP 2 O (368, 2.61, 30.03) (655, 4.61, 31.70) (971, 6.81, 32.23) (1307, 9.15, 32.38) the iterative schemes are different for the two algorithms. We note that both ASB and CP have better convergence than PDFP 2 O and SIU in the early steps, while PDFP 2 O and SIU are slightly better if higher accuracy is required. One explanation for this behavior is that when the condition number of A T A is large, explicit methods such as PDFP 2 O and SIU approximate the inverse of A T A slowly. For ASB and CP, the regularized inverses (A T A + νb T B) 1 and (I+τA T A) 1 are used respectively as preconditioning that can partially avoid bad conditioning of A T A at the early steps, while it takes longer due to unnecessary inner iterations later on. For 25

26 Original FBP ASB, PSNR=32.43 CP, PSNR=32.32 SIU, PSNR=32.23 PDFP 2 O, PSNR=32.23 Figure 4. A tomographic reconstruction example for a image, with 50 projections corresponding to tolerance error ε = 10 4 with noise level 1. the bad conditioned case, we can consider an efficient preconditioning technique for PDFP 2 O, which will be presented in a forthcoming work Parallel MRI Magnetic resonance imaging (MRI) is a medical imaging technique largely used in clinical radiology to visualize the internal structure and function of the body by noninvasive and nonionizing means. It provides better contrast between the different soft tissues than other modalities such as CT and PET. MRI images are obtained through an inversion of Fourier data acquired by the receiver coils. Parallel MRI (pmri) is a recent technique to accelerate sampling speed in conventional MRI. Instead of relying on increased gradient performance for increased imaging speed, pmri extracts extra spatial information from an array of surface coils, surrounding the scanned objects by multiple receivers and collecting in the parallel part of the Fourier components at each receiver, resulting in an accelerated image acquisition. There are two general approaches for removing the aliasing artifacts due to Fourier space subsampling: image-domain-methods and k-space-based methods (see [5]). On the other hand, the total variation regularization has also been considered in the literature in order to obtain a better image quality, such as [21, 22, 11]. In this paper, we employ image-domain-based methods and coil sensitivity maps to reconstruct the underlying image. Sensitivity encoding (SENSE) is the most common imagedomain-based parallel imaging method. It is based on the following model which relates the partial k-space data b j, acquired by the jth receiver and the unknown image x: b j = DFS j x + n, where b j is the vector of measured Fourier coefficients at receiver j, D is a diagonal downsampling operator, F is the Fourier transform, S j corresponds to diagonal coil sensitivity 26

27 (a) Coil 1 Coil 2 Coil 3 Coil 4 Coil 1 Coil 2 Coil 3 Coil 4 (b) Coil 5 Coil 6 Coil 7 Coil 8 Figure 5. In vivo MR images acquired (a) four-channel spine data and (b) eight-channel head data. mapping for receiver j and n is Gaussian noise. In practice, S j can be often estimated in advance. Let A j = DFS j ; then we can recover images x by solving the least-squares problem with the total variation regularization x = arg min x R n μ Bx N A j x b j 2 2, (5.2) j=1 where B is the discrete gradient matrix and N is the total number of receivers. Conventionally, the downsampling operator D is implemented with the sampling ratio R = 2, 4 along one dimension (corresponding to phase-encoding). In our experiments, we use the test data provided by the online MATLAB toolbox PULSAR [20]. The toolbox contains two sets of real data acquired by MR systems with a coil array. The first is a spine data set acquired on a3teslawhole-body GE scanner using a four-channel CTL spine array and the second is a brain data set acquired using an eight-channel head array. For the details of machine configuration, see [20]. Figure 5 shows the images of the multichannel data, four coils for the spine and eight coils for the brain. We use the sensibility map S j estimated by the build-in function in PULSAR. The square root of the sum of squares (SOS) image of the N full data coils (without downsampling D) is used as a reference image, for which pixel (i, j) is given by SOS(i, j) = N (F 1 b k )(i, j) 2. k=1 27

28 Given a region of interest (ROI), a measure of the artifact power (AP) for a reconstruction image I rec to a reference image I ref is defined as (i, j) ROI AP = Iref (i, j) c I rec (i, j) 2, (i, j) ROI Iref (i, j) 2 where c = (i, j) ROI Iref (i, j) 2 / (i, j) ROI Irec (i, j) 2. The factor c is used to minimize the scaling effect that might be introduced during the reconstruction process. Another useful index for evaluating image quality is a two-region SNR if a reference image is not available. The two-region SNR method is calculated from a ROS (region of signal) and a RON (region of noise) by SNR = 20 log 10 Mean of ROS Standard Deviation of RON. This SNR measure strongly depends on the locations of the ROS and RON defined. The RON can usually be selected from the background areas where no object features are present. The ROS and ROI definitions used for the SNR evaluation are also shown in the first image SOS in figures 6 and 7. In the experiments, we set μ = and zero as the initial values for the reconstruction algorithms ASB, CP, SIU and PDFP 2 O. Now we estimate the maximum eigenvalue of the system matrix A A = N j=1 A j A j, where A is the conjugate transpose of A. According to A j = DFS j,wehave A A = N S j (F D T DF)S j = S (F D T DF)S, where S = j=1 and S j denotes the conjugate of sensibility mapping and F = F 1 is the inverse Fourier transform. Since D is a downsampling operator with the diagonal elements 1 and 0, we have λ max (F 1 D T DF) = 1. Thus, ( N ) λ max (A A) = λ max (S F D T DFS) = DFS 2 2 DF 2 2 S 2 2 = λ max(s S) = λ max Si S i. The matrix N i=1 S i S i is diagonal and the maximum diagonal element is approximately 1. Based on this simple calculation, we can set γ 2, λ = 1/8 by using our nearly optimal rule on applying PDFP 2 O. We also observe that a slightly larger γ may yield slightly better performance for R = 4 in the two data sets, but we still set the universal parameter γ = 2 according to the simple rule for different test data and sampling ratios. For the other three methods, we run a big host of parameter sets and iteration numbers to choose the optimal one according to the criteria AP and SNR. Figures 6 and 7 show the recovered images with different methods for the two test data sets. According to the numerical results from [20], we choose the best three methods, SENSE, SPACE-RIP and GRAPPA, for comparison. The reconstruction results of SENSE, SPASE-RIP and GRAPPA are obtained using the PULSAR toolbox. In figure 6 on spine data, for the case of sample ratio R = 2, all the four methods based on the TV model (5.2) outperform SENSE, SPACE-RIP and GRAPPA in terms of both computation time and SNR. Among the four TV methods, SIU and PDFP 2 O perform closely as expected, and PDFP 2 O takes slightly less time to attain the similar AP and SNR with ASB and CP. For R = 4 on spine data, SENSE fails to reconstruct a clean image, and GRAPPA has the best AP value. Among the four TV methods, ASB uses the least amount of time with the optimal parameter sets. Similar results are obtained for the brain data (figure 7). The four TV-based methods (5.2) outperform SENSE, 28 S 1. S N i=1

29 SOS SENSE SPACE-RIP GRAPPA ROI ROS RON R=2 AP SNR time ASB CP SIU PDFP 2 O R=2 AP SNR time SOS SENSE SPACE-RIP GRAPPA R=4 AP SNR time ASB CP SIU PDFP 2 O R=4 AP SNR time Figure 6. Recover results from the four-channel in vivo spine data with the subsample ratio R = 2, 4. The size of the image is The AP, SNR and run time are shown under each image. ROI: [15, 220] [35, 165]; ROS: [50, 120] [130, 160]; RON: [190, 250] [185, 215]. N O denotes the outer iteration numbers and N I denotes the inner iteration numbers. For R = 2, ν = 0.1, N I = 2, N O = 4 for ASB; θ = 1, τ = 1/(8σ), σ = 0.05, N I = 1, N O = 8forCP; δ = 1.5, ν = 0.05, N O = 8 for SIU; γ = 2, λ = 1/8, N O = 8 for PDFP 2 O. For R = 4, ν = 0.01, N I = 10, N O = 20 for ASB; θ = 1, τ = 1/(8σ), σ = 0.005, N I = 5, N O = 100 for CP; δ = 2.4, ν = , N O = 500 for SIU; γ = 2, λ = 1/8, N O = 500 for PDFP 2 O. 29

30 SOS SENSE SPACE-RIP GRAPPA R O S R O I R=2 R O N AP SNR time ASB CP SIU PDFP 2 O R=2 AP SNR time SOS SENSE SPACE-RIP GRAPPA R=4 AP SNR time ASB CP SIU PDFP 2 O R=4 AP SNR time Figure 7. Recover results from the eight-channel in vivo brain data with the subsample ratio R = 2, 4. The AP, SNR and run time are shown for each image. The size of the image is ROI: [65, 190] [15, 220]; ROS: [70, 120] [110, 180]; RON: [220, 250] [200, 250]. N O denotes the iteration (outer) number and N I denotes the inner iterations number. For R = 2, ν = 0.1, N I = 5, N O = 4 for ASB; θ = 1, τ = 1/(8σ), σ = 0.05, N I = 1, N O = 25 for CP; δ = 1.5, ν = 0.05, N O = 25 for SIU; γ = 2, λ = 1/8, N O = 25 for PDFP 2 O. For R = 4, ν = 0.05, N I = 10, N O = 10 for ASB; θ = 1, τ = 1/(8σ), σ = 0.01, N I = 2, N O = 50 for CP; δ = 2.2, ν = 0.008, N O = 150 for SIU; γ = 2, λ = 1/8, N O = 160 for PDFP 2 O. 30

Primal-dual fixed point algorithms for separable minimization problems and their applications in imaging

Primal-dual fixed point algorithms for separable minimization problems and their applications in imaging 1/38 Primal-dual fixed point algorithms for separable minimization problems and their applications in imaging Xiaoqun Zhang Department of Mathematics and Institute of Natural Sciences Shanghai Jiao Tong

More information

A primal-dual fixed point algorithm for multi-block convex minimization *

A primal-dual fixed point algorithm for multi-block convex minimization * Journal of Computational Mathematics Vol.xx, No.x, 201x, 1 16. http://www.global-sci.org/jcm doi:?? A primal-dual fixed point algorithm for multi-block convex minimization * Peijun Chen School of Mathematical

More information

A Primal-dual Three-operator Splitting Scheme

A Primal-dual Three-operator Splitting Scheme Noname manuscript No. (will be inserted by the editor) A Primal-dual Three-operator Splitting Scheme Ming Yan Received: date / Accepted: date Abstract In this paper, we propose a new primal-dual algorithm

More information

Adaptive Primal Dual Optimization for Image Processing and Learning

Adaptive Primal Dual Optimization for Image Processing and Learning Adaptive Primal Dual Optimization for Image Processing and Learning Tom Goldstein Rice University tag7@rice.edu Ernie Esser University of British Columbia eesser@eos.ubc.ca Richard Baraniuk Rice University

More information

About Split Proximal Algorithms for the Q-Lasso

About Split Proximal Algorithms for the Q-Lasso Thai Journal of Mathematics Volume 5 (207) Number : 7 http://thaijmath.in.cmu.ac.th ISSN 686-0209 About Split Proximal Algorithms for the Q-Lasso Abdellatif Moudafi Aix Marseille Université, CNRS-L.S.I.S

More information

Contraction Methods for Convex Optimization and Monotone Variational Inequalities No.16

Contraction Methods for Convex Optimization and Monotone Variational Inequalities No.16 XVI - 1 Contraction Methods for Convex Optimization and Monotone Variational Inequalities No.16 A slightly changed ADMM for convex optimization with three separable operators Bingsheng He Department of

More information

Coordinate Update Algorithm Short Course Operator Splitting

Coordinate Update Algorithm Short Course Operator Splitting Coordinate Update Algorithm Short Course Operator Splitting Instructor: Wotao Yin (UCLA Math) Summer 2016 1 / 25 Operator splitting pipeline 1. Formulate a problem as 0 A(x) + B(x) with monotone operators

More information

Master 2 MathBigData. 3 novembre CMAP - Ecole Polytechnique

Master 2 MathBigData. 3 novembre CMAP - Ecole Polytechnique Master 2 MathBigData S. Gaïffas 1 3 novembre 2014 1 CMAP - Ecole Polytechnique 1 Supervised learning recap Introduction Loss functions, linearity 2 Penalization Introduction Ridge Sparsity Lasso 3 Some

More information

Splitting methods for decomposing separable convex programs

Splitting methods for decomposing separable convex programs Splitting methods for decomposing separable convex programs Philippe Mahey LIMOS - ISIMA - Université Blaise Pascal PGMO, ENSTA 2013 October 4, 2013 1 / 30 Plan 1 Max Monotone Operators Proximal techniques

More information

A GENERAL INERTIAL PROXIMAL POINT METHOD FOR MIXED VARIATIONAL INEQUALITY PROBLEM

A GENERAL INERTIAL PROXIMAL POINT METHOD FOR MIXED VARIATIONAL INEQUALITY PROBLEM A GENERAL INERTIAL PROXIMAL POINT METHOD FOR MIXED VARIATIONAL INEQUALITY PROBLEM CAIHUA CHEN, SHIQIAN MA, AND JUNFENG YANG Abstract. In this paper, we first propose a general inertial proximal point method

More information

Math 273a: Optimization Overview of First-Order Optimization Algorithms

Math 273a: Optimization Overview of First-Order Optimization Algorithms Math 273a: Optimization Overview of First-Order Optimization Algorithms Wotao Yin Department of Mathematics, UCLA online discussions on piazza.com 1 / 9 Typical flow of numerical optimization Optimization

More information

A Tutorial on Primal-Dual Algorithm

A Tutorial on Primal-Dual Algorithm A Tutorial on Primal-Dual Algorithm Shenlong Wang University of Toronto March 31, 2016 1 / 34 Energy minimization MAP Inference for MRFs Typical energies consist of a regularization term and a data term.

More information

Primal-dual algorithms for the sum of two and three functions 1

Primal-dual algorithms for the sum of two and three functions 1 Primal-dual algorithms for the sum of two and three functions 1 Ming Yan Michigan State University, CMSE/Mathematics 1 This works is partially supported by NSF. optimization problems for primal-dual algorithms

More information

Tight Rates and Equivalence Results of Operator Splitting Schemes

Tight Rates and Equivalence Results of Operator Splitting Schemes Tight Rates and Equivalence Results of Operator Splitting Schemes Wotao Yin (UCLA Math) Workshop on Optimization for Modern Computing Joint w Damek Davis and Ming Yan UCLA CAM 14-51, 14-58, and 14-59 1

More information

A Unified Approach to Proximal Algorithms using Bregman Distance

A Unified Approach to Proximal Algorithms using Bregman Distance A Unified Approach to Proximal Algorithms using Bregman Distance Yi Zhou a,, Yingbin Liang a, Lixin Shen b a Department of Electrical Engineering and Computer Science, Syracuse University b Department

More information

Proximal splitting methods on convex problems with a quadratic term: Relax!

Proximal splitting methods on convex problems with a quadratic term: Relax! Proximal splitting methods on convex problems with a quadratic term: Relax! The slides I presented with added comments Laurent Condat GIPSA-lab, Univ. Grenoble Alpes, France Workshop BASP Frontiers, Jan.

More information

A GENERAL FRAMEWORK FOR A CLASS OF FIRST ORDER PRIMAL-DUAL ALGORITHMS FOR CONVEX OPTIMIZATION IN IMAGING SCIENCE

A GENERAL FRAMEWORK FOR A CLASS OF FIRST ORDER PRIMAL-DUAL ALGORITHMS FOR CONVEX OPTIMIZATION IN IMAGING SCIENCE A GENERAL FRAMEWORK FOR A CLASS OF FIRST ORDER PRIMAL-DUAL ALGORITHMS FOR CONVEX OPTIMIZATION IN IMAGING SCIENCE ERNIE ESSER XIAOQUN ZHANG TONY CHAN Abstract. We generalize the primal-dual hybrid gradient

More information

On convergence rate of the Douglas-Rachford operator splitting method

On convergence rate of the Douglas-Rachford operator splitting method On convergence rate of the Douglas-Rachford operator splitting method Bingsheng He and Xiaoming Yuan 2 Abstract. This note provides a simple proof on a O(/k) convergence rate for the Douglas- Rachford

More information

Coordinate Update Algorithm Short Course Proximal Operators and Algorithms

Coordinate Update Algorithm Short Course Proximal Operators and Algorithms Coordinate Update Algorithm Short Course Proximal Operators and Algorithms Instructor: Wotao Yin (UCLA Math) Summer 2016 1 / 36 Why proximal? Newton s method: for C 2 -smooth, unconstrained problems allow

More information

Uses of duality. Geoff Gordon & Ryan Tibshirani Optimization /

Uses of duality. Geoff Gordon & Ryan Tibshirani Optimization / Uses of duality Geoff Gordon & Ryan Tibshirani Optimization 10-725 / 36-725 1 Remember conjugate functions Given f : R n R, the function is called its conjugate f (y) = max x R n yt x f(x) Conjugates appear

More information

Optimization methods

Optimization methods Lecture notes 3 February 8, 016 1 Introduction Optimization methods In these notes we provide an overview of a selection of optimization methods. We focus on methods which rely on first-order information,

More information

EE 546, Univ of Washington, Spring Proximal mapping. introduction. review of conjugate functions. proximal mapping. Proximal mapping 6 1

EE 546, Univ of Washington, Spring Proximal mapping. introduction. review of conjugate functions. proximal mapping. Proximal mapping 6 1 EE 546, Univ of Washington, Spring 2012 6. Proximal mapping introduction review of conjugate functions proximal mapping Proximal mapping 6 1 Proximal mapping the proximal mapping (prox-operator) of a convex

More information

A Unified Primal-Dual Algorithm Framework Based on Bregman Iteration

A Unified Primal-Dual Algorithm Framework Based on Bregman Iteration J Sci Comput (2011) 46: 20 46 DOI 10.1007/s10915-010-9408-8 A Unified Primal-Dual Algorithm Framework Based on Bregman Iteration Xiaoqun Zhang Martin Burger Stanley Osher Received: 23 November 2009 / Revised:

More information

A GENERAL FRAMEWORK FOR A CLASS OF FIRST ORDER PRIMAL-DUAL ALGORITHMS FOR TV MINIMIZATION

A GENERAL FRAMEWORK FOR A CLASS OF FIRST ORDER PRIMAL-DUAL ALGORITHMS FOR TV MINIMIZATION A GENERAL FRAMEWORK FOR A CLASS OF FIRST ORDER PRIMAL-DUAL ALGORITHMS FOR TV MINIMIZATION ERNIE ESSER XIAOQUN ZHANG TONY CHAN Abstract. We generalize the primal-dual hybrid gradient (PDHG) algorithm proposed

More information

Sparse Optimization Lecture: Dual Methods, Part I

Sparse Optimization Lecture: Dual Methods, Part I Sparse Optimization Lecture: Dual Methods, Part I Instructor: Wotao Yin July 2013 online discussions on piazza.com Those who complete this lecture will know dual (sub)gradient iteration augmented l 1 iteration

More information

A General Framework for a Class of Primal-Dual Algorithms for TV Minimization

A General Framework for a Class of Primal-Dual Algorithms for TV Minimization A General Framework for a Class of Primal-Dual Algorithms for TV Minimization Ernie Esser UCLA 1 Outline A Model Convex Minimization Problem Main Idea Behind the Primal Dual Hybrid Gradient (PDHG) Method

More information

ON THE GLOBAL AND LINEAR CONVERGENCE OF THE GENERALIZED ALTERNATING DIRECTION METHOD OF MULTIPLIERS

ON THE GLOBAL AND LINEAR CONVERGENCE OF THE GENERALIZED ALTERNATING DIRECTION METHOD OF MULTIPLIERS ON THE GLOBAL AND LINEAR CONVERGENCE OF THE GENERALIZED ALTERNATING DIRECTION METHOD OF MULTIPLIERS WEI DENG AND WOTAO YIN Abstract. The formulation min x,y f(x) + g(y) subject to Ax + By = b arises in

More information

I P IANO : I NERTIAL P ROXIMAL A LGORITHM FOR N ON -C ONVEX O PTIMIZATION

I P IANO : I NERTIAL P ROXIMAL A LGORITHM FOR N ON -C ONVEX O PTIMIZATION I P IANO : I NERTIAL P ROXIMAL A LGORITHM FOR N ON -C ONVEX O PTIMIZATION Peter Ochs University of Freiburg Germany 17.01.2017 joint work with: Thomas Brox and Thomas Pock c 2017 Peter Ochs ipiano c 1

More information

Dual methods for the minimization of the total variation

Dual methods for the minimization of the total variation 1 / 30 Dual methods for the minimization of the total variation Rémy Abergel supervisor Lionel Moisan MAP5 - CNRS UMR 8145 Different Learning Seminar, LTCI Thursday 21st April 2016 2 / 30 Plan 1 Introduction

More information

Recent Developments of Alternating Direction Method of Multipliers with Multi-Block Variables

Recent Developments of Alternating Direction Method of Multipliers with Multi-Block Variables Recent Developments of Alternating Direction Method of Multipliers with Multi-Block Variables Department of Systems Engineering and Engineering Management The Chinese University of Hong Kong 2014 Workshop

More information

Dual and primal-dual methods

Dual and primal-dual methods ELE 538B: Large-Scale Optimization for Data Science Dual and primal-dual methods Yuxin Chen Princeton University, Spring 2018 Outline Dual proximal gradient method Primal-dual proximal gradient method

More information

Convex Optimization Notes

Convex Optimization Notes Convex Optimization Notes Jonathan Siegel January 2017 1 Convex Analysis This section is devoted to the study of convex functions f : B R {+ } and convex sets U B, for B a Banach space. The case of B =

More information

Shiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 9. Alternating Direction Method of Multipliers

Shiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 9. Alternating Direction Method of Multipliers Shiqian Ma, MAT-258A: Numerical Optimization 1 Chapter 9 Alternating Direction Method of Multipliers Shiqian Ma, MAT-258A: Numerical Optimization 2 Separable convex optimization a special case is min f(x)

More information

A Linearly Convergent First-order Algorithm for Total Variation Minimization in Image Processing

A Linearly Convergent First-order Algorithm for Total Variation Minimization in Image Processing A Linearly Convergent First-order Algorithm for Total Variation Minimization in Image Processing Cong D. Dang Kaiyu Dai Guanghui Lan October 9, 0 Abstract We introduce a new formulation for total variation

More information

arxiv: v4 [math.oc] 29 Jan 2018

arxiv: v4 [math.oc] 29 Jan 2018 Noname manuscript No. (will be inserted by the editor A new primal-dual algorithm for minimizing the sum of three functions with a linear operator Ming Yan arxiv:1611.09805v4 [math.oc] 29 Jan 2018 Received:

More information

An Accelerated Hybrid Proximal Extragradient Method for Convex Optimization and its Implications to Second-Order Methods

An Accelerated Hybrid Proximal Extragradient Method for Convex Optimization and its Implications to Second-Order Methods An Accelerated Hybrid Proximal Extragradient Method for Convex Optimization and its Implications to Second-Order Methods Renato D.C. Monteiro B. F. Svaiter May 10, 011 Revised: May 4, 01) Abstract This

More information

ACCELERATED FIRST-ORDER PRIMAL-DUAL PROXIMAL METHODS FOR LINEARLY CONSTRAINED COMPOSITE CONVEX PROGRAMMING

ACCELERATED FIRST-ORDER PRIMAL-DUAL PROXIMAL METHODS FOR LINEARLY CONSTRAINED COMPOSITE CONVEX PROGRAMMING ACCELERATED FIRST-ORDER PRIMAL-DUAL PROXIMAL METHODS FOR LINEARLY CONSTRAINED COMPOSITE CONVEX PROGRAMMING YANGYANG XU Abstract. Motivated by big data applications, first-order methods have been extremely

More information

Inexact Alternating Direction Method of Multipliers for Separable Convex Optimization

Inexact Alternating Direction Method of Multipliers for Separable Convex Optimization Inexact Alternating Direction Method of Multipliers for Separable Convex Optimization Hongchao Zhang hozhang@math.lsu.edu Department of Mathematics Center for Computation and Technology Louisiana State

More information

Accelerated primal-dual methods for linearly constrained convex problems

Accelerated primal-dual methods for linearly constrained convex problems Accelerated primal-dual methods for linearly constrained convex problems Yangyang Xu SIAM Conference on Optimization May 24, 2017 1 / 23 Accelerated proximal gradient For convex composite problem: minimize

More information

Accelerated Dual Gradient-Based Methods for Total Variation Image Denoising/Deblurring Problems (and other Inverse Problems)

Accelerated Dual Gradient-Based Methods for Total Variation Image Denoising/Deblurring Problems (and other Inverse Problems) Accelerated Dual Gradient-Based Methods for Total Variation Image Denoising/Deblurring Problems (and other Inverse Problems) Donghwan Kim and Jeffrey A. Fessler EECS Department, University of Michigan

More information

Solving DC Programs that Promote Group 1-Sparsity

Solving DC Programs that Promote Group 1-Sparsity Solving DC Programs that Promote Group 1-Sparsity Ernie Esser Contains joint work with Xiaoqun Zhang, Yifei Lou and Jack Xin SIAM Conference on Imaging Science Hong Kong Baptist University May 14 2014

More information

Splitting Techniques in the Face of Huge Problem Sizes: Block-Coordinate and Block-Iterative Approaches

Splitting Techniques in the Face of Huge Problem Sizes: Block-Coordinate and Block-Iterative Approaches Splitting Techniques in the Face of Huge Problem Sizes: Block-Coordinate and Block-Iterative Approaches Patrick L. Combettes joint work with J.-C. Pesquet) Laboratoire Jacques-Louis Lions Faculté de Mathématiques

More information

Contraction Methods for Convex Optimization and monotone variational inequalities No.12

Contraction Methods for Convex Optimization and monotone variational inequalities No.12 XII - 1 Contraction Methods for Convex Optimization and monotone variational inequalities No.12 Linearized alternating direction methods of multipliers for separable convex programming Bingsheng He Department

More information

6. Proximal gradient method

6. Proximal gradient method L. Vandenberghe EE236C (Spring 2016) 6. Proximal gradient method motivation proximal mapping proximal gradient method with fixed step size proximal gradient method with line search 6-1 Proximal mapping

More information

Iterative Convex Optimization Algorithms; Part One: Using the Baillon Haddad Theorem

Iterative Convex Optimization Algorithms; Part One: Using the Baillon Haddad Theorem Iterative Convex Optimization Algorithms; Part One: Using the Baillon Haddad Theorem Charles Byrne (Charles Byrne@uml.edu) http://faculty.uml.edu/cbyrne/cbyrne.html Department of Mathematical Sciences

More information

On the convergence rate of a forward-backward type primal-dual splitting algorithm for convex optimization problems

On the convergence rate of a forward-backward type primal-dual splitting algorithm for convex optimization problems On the convergence rate of a forward-backward type primal-dual splitting algorithm for convex optimization problems Radu Ioan Boţ Ernö Robert Csetnek August 5, 014 Abstract. In this paper we analyze the

More information

M. Marques Alves Marina Geremia. November 30, 2017

M. Marques Alves Marina Geremia. November 30, 2017 Iteration complexity of an inexact Douglas-Rachford method and of a Douglas-Rachford-Tseng s F-B four-operator splitting method for solving monotone inclusions M. Marques Alves Marina Geremia November

More information

Lecture: Algorithms for Compressed Sensing

Lecture: Algorithms for Compressed Sensing 1/56 Lecture: Algorithms for Compressed Sensing Zaiwen Wen Beijing International Center For Mathematical Research Peking University http://bicmr.pku.edu.cn/~wenzw/bigdata2017.html wenzw@pku.edu.cn Acknowledgement:

More information

Operator Splitting for Parallel and Distributed Optimization

Operator Splitting for Parallel and Distributed Optimization Operator Splitting for Parallel and Distributed Optimization Wotao Yin (UCLA Math) Shanghai Tech, SSDS 15 June 23, 2015 URL: alturl.com/2z7tv 1 / 60 What is splitting? Sun-Tzu: (400 BC) Caesar: divide-n-conquer

More information

arxiv: v1 [math.oc] 21 Apr 2016

arxiv: v1 [math.oc] 21 Apr 2016 Accelerated Douglas Rachford methods for the solution of convex-concave saddle-point problems Kristian Bredies Hongpeng Sun April, 06 arxiv:604.068v [math.oc] Apr 06 Abstract We study acceleration and

More information

LINEARIZED BREGMAN ITERATIONS FOR FRAME-BASED IMAGE DEBLURRING

LINEARIZED BREGMAN ITERATIONS FOR FRAME-BASED IMAGE DEBLURRING LINEARIZED BREGMAN ITERATIONS FOR FRAME-BASED IMAGE DEBLURRING JIAN-FENG CAI, STANLEY OSHER, AND ZUOWEI SHEN Abstract. Real images usually have sparse approximations under some tight frame systems derived

More information

A Bregman alternating direction method of multipliers for sparse probabilistic Boolean network problem

A Bregman alternating direction method of multipliers for sparse probabilistic Boolean network problem A Bregman alternating direction method of multipliers for sparse probabilistic Boolean network problem Kangkang Deng, Zheng Peng Abstract: The main task of genetic regulatory networks is to construct a

More information

arxiv: v3 [math.oc] 29 Jun 2016

arxiv: v3 [math.oc] 29 Jun 2016 MOCCA: mirrored convex/concave optimization for nonconvex composite functions Rina Foygel Barber and Emil Y. Sidky arxiv:50.0884v3 [math.oc] 9 Jun 06 04.3.6 Abstract Many optimization problems arising

More information

INERTIAL PRIMAL-DUAL ALGORITHMS FOR STRUCTURED CONVEX OPTIMIZATION

INERTIAL PRIMAL-DUAL ALGORITHMS FOR STRUCTURED CONVEX OPTIMIZATION INERTIAL PRIMAL-DUAL ALGORITHMS FOR STRUCTURED CONVEX OPTIMIZATION RAYMOND H. CHAN, SHIQIAN MA, AND JUNFENG YANG Abstract. The primal-dual algorithm recently proposed by Chambolle & Pock (abbreviated as

More information

Convergence of Fixed-Point Iterations

Convergence of Fixed-Point Iterations Convergence of Fixed-Point Iterations Instructor: Wotao Yin (UCLA Math) July 2016 1 / 30 Why study fixed-point iterations? Abstract many existing algorithms in optimization, numerical linear algebra, and

More information

Primal-dual coordinate descent A Coordinate Descent Primal-Dual Algorithm with Large Step Size and Possibly Non-Separable Functions

Primal-dual coordinate descent A Coordinate Descent Primal-Dual Algorithm with Large Step Size and Possibly Non-Separable Functions Primal-dual coordinate descent A Coordinate Descent Primal-Dual Algorithm with Large Step Size and Possibly Non-Separable Functions Olivier Fercoq and Pascal Bianchi Problem Minimize the convex function

More information

Exact penalty decomposition method for zero-norm minimization based on MPEC formulation 1

Exact penalty decomposition method for zero-norm minimization based on MPEC formulation 1 Exact penalty decomposition method for zero-norm minimization based on MPEC formulation Shujun Bi, Xiaolan Liu and Shaohua Pan November, 2 (First revised July 5, 22) (Second revised March 2, 23) (Final

More information

2 Regularized Image Reconstruction for Compressive Imaging and Beyond

2 Regularized Image Reconstruction for Compressive Imaging and Beyond EE 367 / CS 448I Computational Imaging and Display Notes: Compressive Imaging and Regularized Image Reconstruction (lecture ) Gordon Wetzstein gordon.wetzstein@stanford.edu This document serves as a supplement

More information

INERTIAL ACCELERATED ALGORITHMS FOR SOLVING SPLIT FEASIBILITY PROBLEMS. Yazheng Dang. Jie Sun. Honglei Xu

INERTIAL ACCELERATED ALGORITHMS FOR SOLVING SPLIT FEASIBILITY PROBLEMS. Yazheng Dang. Jie Sun. Honglei Xu Manuscript submitted to AIMS Journals Volume X, Number 0X, XX 200X doi:10.3934/xx.xx.xx.xx pp. X XX INERTIAL ACCELERATED ALGORITHMS FOR SOLVING SPLIT FEASIBILITY PROBLEMS Yazheng Dang School of Management

More information

EE 367 / CS 448I Computational Imaging and Display Notes: Image Deconvolution (lecture 6)

EE 367 / CS 448I Computational Imaging and Display Notes: Image Deconvolution (lecture 6) EE 367 / CS 448I Computational Imaging and Display Notes: Image Deconvolution (lecture 6) Gordon Wetzstein gordon.wetzstein@stanford.edu This document serves as a supplement to the material discussed in

More information

Iterative Reweighted Minimization Methods for l p Regularized Unconstrained Nonlinear Programming

Iterative Reweighted Minimization Methods for l p Regularized Unconstrained Nonlinear Programming Iterative Reweighted Minimization Methods for l p Regularized Unconstrained Nonlinear Programming Zhaosong Lu October 5, 2012 (Revised: June 3, 2013; September 17, 2013) Abstract In this paper we study

More information

SPARSE SIGNAL RESTORATION. 1. Introduction

SPARSE SIGNAL RESTORATION. 1. Introduction SPARSE SIGNAL RESTORATION IVAN W. SELESNICK 1. Introduction These notes describe an approach for the restoration of degraded signals using sparsity. This approach, which has become quite popular, is useful

More information

Lecture 24 November 27

Lecture 24 November 27 EE 381V: Large Scale Optimization Fall 01 Lecture 4 November 7 Lecturer: Caramanis & Sanghavi Scribe: Jahshan Bhatti and Ken Pesyna 4.1 Mirror Descent Earlier, we motivated mirror descent as a way to improve

More information

Some Inexact Hybrid Proximal Augmented Lagrangian Algorithms

Some Inexact Hybrid Proximal Augmented Lagrangian Algorithms Some Inexact Hybrid Proximal Augmented Lagrangian Algorithms Carlos Humes Jr. a, Benar F. Svaiter b, Paulo J. S. Silva a, a Dept. of Computer Science, University of São Paulo, Brazil Email: {humes,rsilva}@ime.usp.br

More information

Proximal methods. S. Villa. October 7, 2014

Proximal methods. S. Villa. October 7, 2014 Proximal methods S. Villa October 7, 2014 1 Review of the basics Often machine learning problems require the solution of minimization problems. For instance, the ERM algorithm requires to solve a problem

More information

Sparsity Regularization

Sparsity Regularization Sparsity Regularization Bangti Jin Course Inverse Problems & Imaging 1 / 41 Outline 1 Motivation: sparsity? 2 Mathematical preliminaries 3 l 1 solvers 2 / 41 problem setup finite-dimensional formulation

More information

The Alternating Direction Method of Multipliers

The Alternating Direction Method of Multipliers The Alternating Direction Method of Multipliers With Adaptive Step Size Selection Peter Sutor, Jr. Project Advisor: Professor Tom Goldstein December 2, 2015 1 / 25 Background The Dual Problem Consider

More information

The proximal mapping

The proximal mapping The proximal mapping http://bicmr.pku.edu.cn/~wenzw/opt-2016-fall.html Acknowledgement: this slides is based on Prof. Lieven Vandenberghes lecture notes Outline 2/37 1 closed function 2 Conjugate function

More information

arxiv: v7 [math.oc] 22 Feb 2018

arxiv: v7 [math.oc] 22 Feb 2018 A SMOOTH PRIMAL-DUAL OPTIMIZATION FRAMEWORK FOR NONSMOOTH COMPOSITE CONVEX MINIMIZATION QUOC TRAN-DINH, OLIVIER FERCOQ, AND VOLKAN CEVHER arxiv:1507.06243v7 [math.oc] 22 Feb 2018 Abstract. We propose a

More information

You should be able to...

You should be able to... Lecture Outline Gradient Projection Algorithm Constant Step Length, Varying Step Length, Diminishing Step Length Complexity Issues Gradient Projection With Exploration Projection Solving QPs: active set

More information

Adaptive discretization and first-order methods for nonsmooth inverse problems for PDEs

Adaptive discretization and first-order methods for nonsmooth inverse problems for PDEs Adaptive discretization and first-order methods for nonsmooth inverse problems for PDEs Christian Clason Faculty of Mathematics, Universität Duisburg-Essen joint work with Barbara Kaltenbacher, Tuomo Valkonen,

More information

An inexact block-decomposition method for extra large-scale conic semidefinite programming

An inexact block-decomposition method for extra large-scale conic semidefinite programming An inexact block-decomposition method for extra large-scale conic semidefinite programming Renato D.C Monteiro Camilo Ortiz Benar F. Svaiter December 9, 2013 Abstract In this paper, we present an inexact

More information

An accelerated non-euclidean hybrid proximal extragradient-type algorithm for convex concave saddle-point problems

An accelerated non-euclidean hybrid proximal extragradient-type algorithm for convex concave saddle-point problems Optimization Methods and Software ISSN: 1055-6788 (Print) 1029-4937 (Online) Journal homepage: http://www.tandfonline.com/loi/goms20 An accelerated non-euclidean hybrid proximal extragradient-type algorithm

More information

FAST ALTERNATING DIRECTION OPTIMIZATION METHODS

FAST ALTERNATING DIRECTION OPTIMIZATION METHODS FAST ALTERNATING DIRECTION OPTIMIZATION METHODS TOM GOLDSTEIN, BRENDAN O DONOGHUE, SIMON SETZER, AND RICHARD BARANIUK Abstract. Alternating direction methods are a common tool for general mathematical

More information

1 Sparsity and l 1 relaxation

1 Sparsity and l 1 relaxation 6.883 Learning with Combinatorial Structure Note for Lecture 2 Author: Chiyuan Zhang Sparsity and l relaxation Last time we talked about sparsity and characterized when an l relaxation could recover the

More information

Frank-Wolfe Method. Ryan Tibshirani Convex Optimization

Frank-Wolfe Method. Ryan Tibshirani Convex Optimization Frank-Wolfe Method Ryan Tibshirani Convex Optimization 10-725 Last time: ADMM For the problem min x,z f(x) + g(z) subject to Ax + Bz = c we form augmented Lagrangian (scaled form): L ρ (x, z, w) = f(x)

More information

ACCELERATED LINEARIZED BREGMAN METHOD. June 21, Introduction. In this paper, we are interested in the following optimization problem.

ACCELERATED LINEARIZED BREGMAN METHOD. June 21, Introduction. In this paper, we are interested in the following optimization problem. ACCELERATED LINEARIZED BREGMAN METHOD BO HUANG, SHIQIAN MA, AND DONALD GOLDFARB June 21, 2011 Abstract. In this paper, we propose and analyze an accelerated linearized Bregman (A) method for solving the

More information

Contraction Methods for Convex Optimization and Monotone Variational Inequalities No.18

Contraction Methods for Convex Optimization and Monotone Variational Inequalities No.18 XVIII - 1 Contraction Methods for Convex Optimization and Monotone Variational Inequalities No18 Linearized alternating direction method with Gaussian back substitution for separable convex optimization

More information

STAT 200C: High-dimensional Statistics

STAT 200C: High-dimensional Statistics STAT 200C: High-dimensional Statistics Arash A. Amini May 30, 2018 1 / 57 Table of Contents 1 Sparse linear models Basis Pursuit and restricted null space property Sufficient conditions for RNS 2 / 57

More information

Convergence analysis for a primal-dual monotone + skew splitting algorithm with applications to total variation minimization

Convergence analysis for a primal-dual monotone + skew splitting algorithm with applications to total variation minimization Convergence analysis for a primal-dual monotone + skew splitting algorithm with applications to total variation minimization Radu Ioan Boţ Christopher Hendrich November 7, 202 Abstract. In this paper we

More information

Primal-dual coordinate descent

Primal-dual coordinate descent Primal-dual coordinate descent Olivier Fercoq Joint work with P. Bianchi & W. Hachem 15 July 2015 1/28 Minimize the convex function f, g, h convex f is differentiable Problem min f (x) + g(x) + h(mx) x

More information

1 Introduction and preliminaries

1 Introduction and preliminaries Proximal Methods for a Class of Relaxed Nonlinear Variational Inclusions Abdellatif Moudafi Université des Antilles et de la Guyane, Grimaag B.P. 7209, 97275 Schoelcher, Martinique abdellatif.moudafi@martinique.univ-ag.fr

More information

Extensions of the CQ Algorithm for the Split Feasibility and Split Equality Problems

Extensions of the CQ Algorithm for the Split Feasibility and Split Equality Problems Extensions of the CQ Algorithm for the Split Feasibility Split Equality Problems Charles L. Byrne Abdellatif Moudafi September 2, 2013 Abstract The convex feasibility problem (CFP) is to find a member

More information

SEMI-SMOOTH SECOND-ORDER TYPE METHODS FOR COMPOSITE CONVEX PROGRAMS

SEMI-SMOOTH SECOND-ORDER TYPE METHODS FOR COMPOSITE CONVEX PROGRAMS SEMI-SMOOTH SECOND-ORDER TYPE METHODS FOR COMPOSITE CONVEX PROGRAMS XIANTAO XIAO, YONGFENG LI, ZAIWEN WEN, AND LIWEI ZHANG Abstract. The goal of this paper is to study approaches to bridge the gap between

More information

ADMM for monotone operators: convergence analysis and rates

ADMM for monotone operators: convergence analysis and rates ADMM for monotone operators: convergence analysis and rates Radu Ioan Boţ Ernö Robert Csetne May 4, 07 Abstract. We propose in this paper a unifying scheme for several algorithms from the literature dedicated

More information

NOTES ON FIRST-ORDER METHODS FOR MINIMIZING SMOOTH FUNCTIONS. 1. Introduction. We consider first-order methods for smooth, unconstrained

NOTES ON FIRST-ORDER METHODS FOR MINIMIZING SMOOTH FUNCTIONS. 1. Introduction. We consider first-order methods for smooth, unconstrained NOTES ON FIRST-ORDER METHODS FOR MINIMIZING SMOOTH FUNCTIONS 1. Introduction. We consider first-order methods for smooth, unconstrained optimization: (1.1) minimize f(x), x R n where f : R n R. We assume

More information

An inexact block-decomposition method for extra large-scale conic semidefinite programming

An inexact block-decomposition method for extra large-scale conic semidefinite programming Math. Prog. Comp. manuscript No. (will be inserted by the editor) An inexact block-decomposition method for extra large-scale conic semidefinite programming Renato D. C. Monteiro Camilo Ortiz Benar F.

More information

Convex Optimization. (EE227A: UC Berkeley) Lecture 15. Suvrit Sra. (Gradient methods III) 12 March, 2013

Convex Optimization. (EE227A: UC Berkeley) Lecture 15. Suvrit Sra. (Gradient methods III) 12 March, 2013 Convex Optimization (EE227A: UC Berkeley) Lecture 15 (Gradient methods III) 12 March, 2013 Suvrit Sra Optimal gradient methods 2 / 27 Optimal gradient methods We saw following efficiency estimates for

More information

A Multilevel Proximal Algorithm for Large Scale Composite Convex Optimization

A Multilevel Proximal Algorithm for Large Scale Composite Convex Optimization A Multilevel Proximal Algorithm for Large Scale Composite Convex Optimization Panos Parpas Department of Computing Imperial College London www.doc.ic.ac.uk/ pp500 p.parpas@imperial.ac.uk jointly with D.V.

More information

An Algorithmic Framework of Generalized Primal-Dual Hybrid Gradient Methods for Saddle Point Problems

An Algorithmic Framework of Generalized Primal-Dual Hybrid Gradient Methods for Saddle Point Problems An Algorithmic Framework of Generalized Primal-Dual Hybrid Gradient Methods for Saddle Point Problems Bingsheng He Feng Ma 2 Xiaoming Yuan 3 January 30, 206 Abstract. The primal-dual hybrid gradient method

More information

A generalized forward-backward method for solving split equality quasi inclusion problems in Banach spaces

A generalized forward-backward method for solving split equality quasi inclusion problems in Banach spaces Available online at www.isr-publications.com/jnsa J. Nonlinear Sci. Appl., 10 (2017), 4890 4900 Research Article Journal Homepage: www.tjnsa.com - www.isr-publications.com/jnsa A generalized forward-backward

More information

Gauge optimization and duality

Gauge optimization and duality 1 / 54 Gauge optimization and duality Junfeng Yang Department of Mathematics Nanjing University Joint with Shiqian Ma, CUHK September, 2015 2 / 54 Outline Introduction Duality Lagrange duality Fenchel

More information

Adaptive Corrected Procedure for TVL1 Image Deblurring under Impulsive Noise

Adaptive Corrected Procedure for TVL1 Image Deblurring under Impulsive Noise Adaptive Corrected Procedure for TVL1 Image Deblurring under Impulsive Noise Minru Bai(x T) College of Mathematics and Econometrics Hunan University Joint work with Xiongjun Zhang, Qianqian Shao June 30,

More information

Robust Principal Component Analysis Based on Low-Rank and Block-Sparse Matrix Decomposition

Robust Principal Component Analysis Based on Low-Rank and Block-Sparse Matrix Decomposition Robust Principal Component Analysis Based on Low-Rank and Block-Sparse Matrix Decomposition Gongguo Tang and Arye Nehorai Department of Electrical and Systems Engineering Washington University in St Louis

More information

Interior-Point Methods for Linear Optimization

Interior-Point Methods for Linear Optimization Interior-Point Methods for Linear Optimization Robert M. Freund and Jorge Vera March, 204 c 204 Robert M. Freund and Jorge Vera. All rights reserved. Linear Optimization with a Logarithmic Barrier Function

More information

6. Proximal gradient method

6. Proximal gradient method L. Vandenberghe EE236C (Spring 2013-14) 6. Proximal gradient method motivation proximal mapping proximal gradient method with fixed step size proximal gradient method with line search 6-1 Proximal mapping

More information

A first-order primal-dual algorithm with linesearch

A first-order primal-dual algorithm with linesearch A first-order primal-dual algorithm with linesearch Yura Malitsky Thomas Pock arxiv:608.08883v2 [math.oc] 23 Mar 208 Abstract The paper proposes a linesearch for a primal-dual method. Each iteration of

More information

Optimal Linearized Alternating Direction Method of Multipliers for Convex Programming 1

Optimal Linearized Alternating Direction Method of Multipliers for Convex Programming 1 Optimal Linearized Alternating Direction Method of Multipliers for Convex Programming Bingsheng He 2 Feng Ma 3 Xiaoming Yuan 4 October 4, 207 Abstract. The alternating direction method of multipliers ADMM

More information

arxiv: v2 [math.oc] 28 Jan 2019

arxiv: v2 [math.oc] 28 Jan 2019 On the Equivalence of Inexact Proximal ALM and ADMM for a Class of Convex Composite Programming Liang Chen Xudong Li Defeng Sun and Kim-Chuan Toh March 28, 2018; Revised on Jan 28, 2019 arxiv:1803.10803v2

More information