A primal dual fixed point algorithm for convex separable minimization with applications to image restoration

Size: px

Start display at page:

Download "A primal dual fixed point algorithm for convex separable minimization with applications to image restoration"

Kristin Houston
5 years ago
Views:

1 IOP PUBLISHING Inverse Problems 29 (2013) (33pp) INVERSE PROBLEMS doi: / /29/2/ A primal dual fixed point algorithm for convex separable minimization with applications to image restoration Peijun Chen 1,2, Jianguo Huang 1,3 and Xiaoqun Zhang 1,4 1 Department of Mathematics, and MOE-LSC, Shanghai Jiao Tong University, Shanghai , People s Republic of China 2 Department of Mathematics, Taiyuan University of Science and Technology, Taiyuan , People s Republic of China 3 Division of Computational Science, E-Institute of Shanghai Universities, Shanghai Normal University, People s Republic of China 4 Institute of Natural Sciences, Shanghai Jiao Tong University, Shanghai , People s Republic of China chenpeijun@sjtu.edu.cn, jghuang@sjtu.edu.cn and xqzhang@sjtu.edu.cn Received 11 August 2012, in final form 25 November 2012 Published 17 January 2013 Online at stacks.iop.org/ip/29/ Abstract Recently, the minimization of a sum of two convex functions has received considerable interest in a variational image restoration model. In this paper, we propose a general algorithmic framework for solving a separable convex minimization problem from the point of view of fixed point algorithms based on proximity operators (Moreau 1962 C. R. Acad. Sci., Paris I ). Motivated by proximal forward backward splitting proposed in Combettes and Wajs (2005 Multiscale Model. Simul ) and fixed point algorithms based on the proximity operator (FP 2 O) for image denoising (Micchelli et al 2011 Inverse Problems ), we design a primal dual fixed point algorithm based on the proximity operator (PDFP 2 O κ for κ [0, 1)) and obtain a scheme with a closed-form solution for each iteration. Using the firmly nonexpansive properties of the proximity operator and with the help of a special norm over a product space, we achieve the convergence of the proposed PDFP 2 O κ algorithm. Moreover, under some stronger assumptions, we can prove the global linear convergence of the proposed algorithm. We also give the connection of the proposed algorithm with other existing firstorder methods. Finally, we illustrate the efficiency of PDFP 2 O κ through some numerical examples on image supper-resolution, computerized tomographic reconstruction and parallel magnetic resonance imaging. Generally speaking, our method PDFP 2 O(κ = 0) is comparable with other state-of-the-art methods in numerical performance, while it has some advantages on parameter selection in real applications. (Some figures may appear in colour only in the online journal) /13/ $ IOP Publishing Ltd Printed in the UK & the USA 1

2 1. Introduction This paper is devoted to designing and discussing an efficient algorithmic framework for minimizing the sum of two proper lower semi-continuous convex functions, i.e. x = arg min ( f 1 B)(x) + f 2 (x), (1.1) x R n where f 1 Ɣ 0 (R m ), f 2 Ɣ 0 (R n ) and f 2 is differentiable on R n witha1/β-lipschitz continuous gradient for some β (0, + ) and B : R n R m a linear transform. This parameter β is related to the convergence conditions of algorithms 3 5 presented in the following section. Here and in what follows, for a real Hilbert space X, Ɣ 0 (X ) denotes the collection of all proper lower semi-continuous convex functions from X to (, + ]. Despite its simplicity, many problems in image processing can be formulated in the form of (1.1). For instance, the following variational sparse recovery models are often considered in image restoration and medical image reconstruction: x = arg min μ Bx Ax b 2 2, (1.2) x R n where 2 denotes the usual Euclidean norm for a vector, A is a p n matrix representing a linear transform, b R p and μ>0isthe regularization parameter. The term Bx 1 is the usual l 1 -based regularization in order to promote sparsity under the transform B. For example, for the well-known Rudin Osher Fatemi (ROF) model [30] Bx 1 represents the total-variation semi-norm which aims to recover piecewise constant images, with B being a 2n n discrete differential matrix (cf [16, 24]). More precisely, Bx 1 and Bx 1,2 are for anisotropic total variation and isotropic total variation, respectively, and here we simply write them as Bx 1. Problem (1.2) can be expressed in the form of (1.1) by setting f 1 = μ 1 and f 2 (x) = 1 2 Ax b 2 2. One of the main difficulties in solving it is that f 1 is non-differentiable. The case often occurs in many problems we are interested in. Another general problem often considered in the literature takes the following form: x = arg min f (x) + h(x), (1.3) x X where f, h Ɣ 0 (X ) and h is differentiable on X witha1/β-lipschitz continuous gradient for some β (0, + ). Problem (1.1), which we are interested in this paper, can be viewed as a special case of problem (1.3)forX = R n and f = f 1 B, h = f 2. On the other hand, we can also consider that problem (1.3) is a special case of problem (1.1)forX = R n, f 2 = h, f 1 = f and B = I, where I denotes the usual identity operator. For problem (1.3), Combettes and Wajs proposed in [12] a proximal forward backward splitting (PFBS) algorithm, i.e. x k+1 = prox γ f (x k γ h(x k )), (1.4) where 0 <γ <2β is a stepsize parameter, and the operator prox f is defined by prox f : X X x arg min f (y) x y 2 2, y X called the proximity operator of f. Note that this type of splitting method was originally studied in [23, 28] for solving partial differential equations, and the notion of proximity operators was first introduced by Moreau in [25] as a generalization of projection operators. The iteration (1.4) consists of two sequential steps. The first performs a forward (explicit) step involving the evaluation of the gradient of h; then the other performs a backward (implicit) step involving f. This numerical scheme is very simple and efficient when the proximity operator used in the second step can be carried out efficiently. For example, when f = 1 for sparse 2

3 regularization, the proximity operator prox γ f (x) can be written as the famous componentwise soft-thresholding (also known as a shrinkage) operation. However, the proximity operators for the general form f = f 1 B as in (1.1) do not have an explicit expression, leading to the numerical solution of a difficult subproblem. In fact, the subproblem of (1.2)isprox μ 1 B (b) and often formulated as the ROF denoising problem: x = arg min μ Bx x b 2 2, (1.5) x R n where b R n denotes a corrupted image to be denoised. In recent years, many splitting methods have been designed to solve the last subproblem in order to take advantage of the efficiency of the soft-thresholding operator. For example, Goldstein and Osher proposed in [18] a splitting algorithm based on the Bregman iteration, namely the split Bregman, to implement the action of prox f1 B, in particular for total variation minimization. This algorithm has shown to be very efficient and useful for a large class of convex separable programming problems. Theoretically, it is shown to be equivalent to the Douglas Rachford splitting algorithm (see [31, 14]) and alternating direction of multiplier method (ADMM, see [15, 7]), and the convergence was then analyzed based on such equivalence. The split Bregman proposed in [18] is also designed to solve the convex separable problem (1.1). In particular, for the variational model (1.2), the subproblem involves solving a quadratic minimization, which sometimes can be time consuming. To overcome this, a primal dual inexact split Uzawa method was proposed in [35] to maximally decouple the subproblems so that each iteration step is precise and explicit. In [16, 9], more theoretical analysis on the variants of the primal dual-type method and the connection with existing methods were examined to bridge the gap between different types of methods. Also, the convergence of ADMM was further analyzed in [19] based on proximal point algorithm (PPA) formulation. In this paper, we will follow a different point of view. In [24], Micchelli Shen Xu designed an algorithm called FP 2 O to solve prox f1 B (x). We aim to extend FP2 Otosolvethe general problem (1.1) with a maximally decoupled iteration scheme. One obvious advantage of the proposed scheme is that it is very easy for parallel implementation. Then, we will show that the proposed algorithm is convergent in a general setting. Under some assumptions of the convex function f 2 and the linear transform B, we can further prove the linear convergence rate of the method under the framework of fixed point iteration. Note that most of the existing works based on ADMM have shown a sub-linear convergence rate O(1/k) on the objective function and O(1/k 2 ) on the accelerated version, where k is the iteration number. Recently, in [19], the ergodic and non-ergodic convergence on the difference of two sequential primal dual sequences were analyzed. In this paper, we will prove the convergence rate of the iterations directly from the point of view of fixed point theory under some common assumptions. We note that, during the preparation of this paper, Deng and Yin [13] also considered the global linear convergence of the ADMM and its variants based on similar assumptions. In addition, we will reformulate our fixed point type of methods and show their connections with some existing first-order methods for (1.1) and (1.2). The rest of the paper is organized as follows. In section 2, we recall the fixed point algorithm FP 2 O and some related works and then deduce the proposed PDFP 2 O algorithm and its extension PDFP 2 O κ from our intuitions. In section 3, we first deduce PDFP 2 O κ again in the setting of fixed point iteration; we then establish its convergence under a general setting and the convergence rate under some stronger assumptions on f 2 and B. In section 4, we give the equivalent form of PDFP 2 O, and the relationships and differences with other first-order algorithms. In section 5, we show the numerical performance and efficiency of PDFP 2 O κ through some examples on image super-resolution, tomographic reconstruction and 3

4 parallel magnetic resonance imaging (pmri). In the final section, we give a discussion and some perspectives on our method and other state-of-the-art methods frequently used in image restoration. 2. Fixed point algorithms Similar to the fixed point algorithm on the dual for the ROF denoising model (1.5) proposed by Chambolle [8], Micchelli et al proposed an algorithm called FP 2 Oin[24] tosolvethe proximity operator prox f1 B (b) for b Rn, especially for the total-variation-based image denoising. Let λ max (BB T ) be the largest eigenvalue of BB T.For0<λ<2/λ max (BB T ),we define the operator H(v) = (I prox f 1λ )(Bb + (I λbb T )v) for all v R m ; (2.1) then the FP 2 O algorithm is described as algorithm 1, where H κ is the κ-averaged operator of H, i.e. H κ = κi + (1 κ)h for κ (0, 1); see definition 3.3 in the following section. Algorithm 1 Fixed point algorithm based on proximity operator, FP 2 O[24]. Step 1: set v 0 R m,0<λ<2/λ max (BB T ), κ (0, 1). Step 2: calculate v, which is the fixed point of H, with iteration v k+1 = H κ (v k ). Step 3: prox f1 B(b) = b λb T v. The key technique to obtain the FP 2 O scheme relies on the relation of the subdifferential of a convex function and its proximity operator, as described in the result (3.1). An advantage of FP 2 O is that its iteration does not require solving the subproblem and the convergence is analyzed in the classical framework of the fixed point iteration. This algorithm has been extended in [2, 10]tosolve x = arg min ( f 1 B)(x) + 1 x R n 2 xt Qx b T x, where Q M n, with M n being the collection of all symmetric positive definite n n matrices, b R n. Define H(v) = (I prox f 1λ )(BQ 1 b + (I λbq 1 B T )v) for all v R m. Then, the corresponding algorithm is given below, called algorithm 2, which can be viewed as a fixed point algorithm based on the inverse matrix and proximity operator or FP 2 O based on the inverse matrix (IFP 2 O). Here the matrix Q is assumed to be invertible and the inverse can be easily calculated, which is unfortunately not the case in most of the applications in imaging science. Moreover, there is no theoretical guarantee of convergence if the linear system is only solved approximately. Algorithm 2 FP 2 O based on inverse matrix, IFP 2 O[2]. Step 1: set v 0 R m and 0 <λ<2/λ max (BQ 1 B T ), κ (0, 1). Step 2: calculate ṽ, which is the fixed point of H, with iteration ṽ k+1 = H κ (ṽ k ). Step 3: x = Q 1 (b λb T ṽ ). Further, the authors in [2] combined PFBS and FP 2 O for solving problem (1.1), for which we call PFBS_FP 2 O (cf algorithm 3 below). Precisely speaking, at step k in PFBS, after one forward iteration x k+1/2 = x k γ f 2 (x k ), we need to solve for x k+1 = prox γ f1 B (x k+1/2). 4

5 FP 2 O is then used to solve this subproblem, i.e. the fixed point v k+1 of H x k+1/2 is obtained by the fixed iteration form v i+1 = (H xk+1/2 ) κ (v i ), where H xk+1/2 (v) = (I prox γ λ f )(Bx 1 k+1/2 + (I λbb T )v) for all v R m. Then x k+1 is given by setting x k+1 = x k+1/2 λb T v k+1. The acceleration combining with the Nesterov method [17, 26, 32, 33] was also considered in [2]. We note that algorithm 3 involves inner and outer iterations, and it is often problematic to set the appropriate inner stopping conditions to balance computational time and precision. In our algorithm developed later on, instead of using many number of inner fixed point iterations for solving prox γ f1 B (x), we use only one inner fixed point iteration. Algorithm 3 Proximal forward backward splitting based on FP 2 O, PFBS_FP 2 O[2]. Step 1: set x 0 R n,0<γ <2β. Step 2: for k = 0, 1, 2,... x k+1/2 = x k γ f 2 (x k ), calculate the fixed point v k+1 of H x k+1/2 with iteration v i+1 = (H xk+1/2 ) κ (v i ), x k+1 = x k+1/2 λb T v k+1. end for Suppose κ = 0inFP 2 O. A very natural idea is to take the numerical solution v k of the fixed point of H x(k 1)+1/2 as the initial value, and only perform one iteration for solving the fixed point of H xk+1/2 ; then we can obtain the following iteration scheme: { vk+1 = (I prox (PDFP 2 γ λ O) f )(B(x 1 k γ f 2 (x k )) + (I λbb T )v k ), (2.2a) x k+1 = x k γ f 2 (x k ) λb T v k+1, (2.2b) which produces our proposed method algorithm 4, described below. This algorithm can also be deduced from the fixed point formulation, whose detail we will give in the following section. On the other hand, since x is the primal variable related to (1.1), it is very natural to ask what role the variable v plays in our algorithm. After a thorough study, we find out as given in section 4.1 that v is actually the dual variable of the primal dual form related to (1.1). Based on these observations, we call our method a primal dual fixed point algorithm based on the proximity operator, and abbreviate it as PDFP 2 O, inheriting the notion of FP 2 O in [24]. If B = I, λ = 1, then form (2.2) is equivalent to form (1.4). So PFPS can be seen as a special case of PDFP 2 O. Also, when f 2 (x) = 1 2 x b 2 2 and γ = 1, then PDFP2 O reduces to FP 2 O for solving prox f1 B (b) with κ = 0. For general B and f 2, each step of the proposed algorithm is explicit when prox γ λ f is easy to compute. Note that the technique of approximating the 1 subproblem by only one iteration is also proposed in a primal dual inexact Uzawa framework in [35]. We will show the connection to this algorithm and other ones in section 4. Algorithm 4 Primal dual fixed point algorithm based on proximity operator, PDFP 2 O. Step 1: set x 0 R n, v 0 R m,0<λ 1/λ max (BB T ),0<γ <2β. Step 2: for k = 0, 1, 2,... x k+ 1 = x k γ f 2 (x k ), 2 v k+1 = (I prox γ λ f )(Bx 1 k+ 1 + (I λbb T )v k ), 2 x k+1 = x k+ 1 λb T v k+1. 2 end for Borrowing the fixed point formulation of PDFP 2 O, we can introduce a relaxation parameter κ [0, 1) to obtain algorithm 5, which is exactly a Picard method with parameters. 5

6 The rule for parameter selection will be illustrated in section 3. Ifκ = 0, then PDFP 2 O κ reduces to PDFP 2 O. Our theoretical analysis for PDFP 2 O κ given in the following section is mainly based on this fixed point setting. Algorithm 5 PDFP 2 O κ. Step 1: set x 0 R n, v 0 R m,0<λ 1/λ max (BB T ),0<γ <2β, κ [0, 1). Step 2: for k = 0, 1, 2,... x k+ 1 = x k γ f 2 (x k ), 2 ṽ k+1 = (I prox γ λ f )(Bx 1 k+ 1 + (I λbb T )v k ), 2 x k+1 = x k+ 1 λb T ṽ k+1, 2 v k+1 = κv k + (1 κ)ṽ k+1, x k+1 = κx k + (1 κ) x k+1. end for 3. Convergence analysis 3.1. General convergence First of all, let us mention some related definitions and lemmas for later requirements. From now on, we use X to denote a finite-dimensional real Hilbert space. Moreover, we always assume that problem (1.1) has at least one solution. As shown in [12], if the objective function ( f 1 B)(x) + f 2 (x) is coercive, i.e. lim (( f 1 B)(x) + f 2 (x)) =+, x 2 + then the existence of solution can be ensured for (1.1). Definition 3.1 (Subdifferential [30]). Let f be a function in Ɣ 0 (X ). The subdifferential of f is the set-valued operator f : X 2 X, the value of which at x X is f (x) ={v X v, y x + f (x) f (y) for all y X }, where, denotes the inner-product over X. Definition 3.2 (Nonexpansive operators and firmly nonexpansive operators [30]). An operator T : X X is nonexpansive if and only if it satisfies Tx Ty 2 x y 2 for all (x, y) X 2. T is firmly nonexpansive if and only if it satisfies one of the following equivalent conditions: (i) Tx Ty 2 2 Tx Ty, x y for all (x, y) X 2. (ii) Tx Ty 2 2 x y 2 2 (I T )x (I T )y 2 2 for all (x, y) X 2. It is easy to show from the above definitions that a firmly nonexpansive operator T is nonexpansive. Definition 3.3 (Picard sequence, κ-averaged operator [27]). Let T : X X be an operator. For a given initial point u 0 X, the Picard sequence of the operator T is defined by u k+1 = T (u k ),fork N. For a real number κ (0, 1), theκ-averaged operator T κ of T is defined by T κ = κi + (1 κ)t. We also write T 0 = T. 6

7 Lemma 3.1. Suppose f Ɣ 0 (R m ) and x R m. Then there holds y f (x) x = prox f (x + y). (3.1) Furthermore, if f has 1/β-Lipschitz continuous gradient, then f (x) f (y), x y β f (x) f (y) 2 for all (x, y) R m. (3.2) Proof. The first result is nothing but proposition 2.6 of [24]. If f has 1/β-Lipschitz continuous gradient, we have from [12] that β f is firmly nonexpansive, which implies (3.2) readily. Lemma 3.2 (Lemma 2.4 of [12]). Let f be a function in Ɣ 0 (R m ). Then prox f and I prox f are both firmly nonexpansive operators. Lemma 3.3 (Opial κ-averaged theorem, theorem 3 of [27]). If S is a closed and convex set in X and T : S S is a nonexpansive mapping having at least one fixed point, then for κ (0, 1), T κ is nonexpansive, maps S to itself and has the same set of fixed points as T. Furthermore, for any u 0 S and κ (0, 1), the Picard sequence of T κ converges to a fixed point of T. Now, we are ready to obtain a fixed point formulation for the solution of problem (1.1) and discuss the convergence of PDFP 2 O κ. To this end, for any two positive numbers λ and γ, define T 1 : R m R n R m as T 1 (v, x) = (I prox γ λ f )(B(x γ f 1 2(x)) + (I λbb T )v) (3.3) and T 2 : R m R n R n as T 2 (v, x) = x γ f 2 (x) λb T T 1. (3.4) Denote T : R m R n R m R n as T (v, x) = (T 1 (v, x), T 2 (v, x)). (3.5) Theorem 3.1. Let λ and γ be two positive numbers. Suppose that x is a solution of (1.1). Then there exists v R m such that { v = T 1 (v, x ), x = T 2 (v, x ). In other words, u = (v, x ) is a fixed point of T. Conversely, if u R m R n is a fixed point of T, with u = (v, x ), v R m,x R n, then x is a solution of (1.1). Proof. By the first-order optimality condition of problem (1.1), we have x = arg min ( f 1 B)(x) + f 2 (x) x R n 0 f 2 (x ) (f 1 B)(x ) 0 γ f 2 (x ) γ (f 1 B)(x ) ( x x γ f 2 (x ) λ B T γ ) λ f 1 B (x ). Let Then v ( γ ( γ ) λ f 1 B) (x ) = λ f 1 (Bx ). (3.6) x = x γ f 2 (x ) λb T v. (3.7) 7

8 Moreover, it follows from result (3.1) that (3.6) is equivalent to Bx = prox γ λ f (Bx + v ) 1 (Bx + v ) v = prox γ λ f (Bx + v ) 1 v = (I prox γ λ f )(Bx + v ). (3.8) 1 Inserting (3.7)into(3.8)gives v = ( ) I prox γ λ f 1 (B(x γ f 2 (x )) + (I λbb T )v ). This shows v = T 1 (v, x ). Next, replacing v in (3.7) byt 1 (v, x ), we readily have x = T 2 (v, x ). Therefore, for u = (v, x ), u = T (u ). On the other hand, if u = T (u ), then we can derive that x satisfies the first-order optimality condition of (1.1). Therefore, we conclude that x is a minimizer of (1.1). In the following, we will show that the algorithm PDFP 2 O κ is a Picard method related to the operator T κ. Theorem 3.2. Suppose κ [0, 1). Set T κ = κi + (1 κ)t. Then the Picard sequence {u k } of T κ is exactly the one obtained by the algorithm PDFP 2 O κ. Proof. According to the definitions in (3.3) (3.5), the component form of u k+1 = T (u k ) can be expressed as { vk+1 = T 1 (v k, x k ) = (I prox γ λ f )(B(x 1 k γ f 2 (x k )) + (I λbb T )v k ) x k+1 = T 2 (v k, x k ) = x k γ f 2 (x k ) λb T T 1 (v k, x k ) = x k γ f 2 (x k ) λb T v k+1. Therefore, the iteration u k+1 = T (u k ) is equivalent to (2.2). Employing the similar argument, we can obtain the conclusion for general T κ with κ [0, 1). Remark 3.1. From the last result, we find out that algorithm PDFP 2 O κ can also be obtained in the setting of fixed point iteration immediately. For the convergence analysis for PDFP 2 O κ, we will first prove a key inequality for general cases (cf equation (3.13)). Denote g(x) = x γ f 2 (x) for all x R n, (3.9) M = I λbb T. (3.10) When 0 <λ 1/λ max (BB T ), M is a symmetric positive semi-definite matrix, so we can define the semi-norm v M = v, Mv for all v R m. (3.11) For an element u = (v, x) R m R n, with v R m and x R n,let u λ = x λ v 2 2. (3.12) We can easily see that λ is a norm over the produce space R m R n whenever λ>0. Theorem 3.3. For any two elements u 1 = (v 1, x 1 ),u 2 = (v 2, x 2 ) in R m R n, there holds T (u 1 ) T (u 2 ) 2 λ u 1 u 2 2 λ γ(2β γ) f 2(x 1 ) f 2 (x 2 ) 2 2 λb T (v 1 v 2 ) 2 2 λ (T 1(u 1 ) T 1 (u 2 )) (v 1 v 2 ) 2 M. (3.13) 8

9 Proof. By lemma 3.2, I prox γ λ f is a firmly nonexpansive operator. This together 1 with (3.3), (3.9) and (3.10) yields T 1 (u 1 ) T 1 (u 2 ) 2 2 T 1(u 1 ) T 1 (u 2 ), B(g(x 1 ) g(x 2 )) + M(v 1 v 2 ) = T 1 (u 1 ) T 1 (u 2 ), B(g(x 1 ) g(x 2 )) + T 1 (u 1 ) T 1 (u 2 ), M(v 1 v 2 ). (3.14) It follows from (3.4), (3.9), (3.10) and (3.11) that T 2 (u 1 ) T 2 (u 2 ) 2 2 = (g(x 1) g(x 2 )) λb T (T 1 (u 1 ) T 1 (u 2 )) 2 2 = g(x 1 ) g(x 2 ) 2 2 2λ BT (T 1 (u 1 ) T 1 (u 2 )), g(x 1 ) g(x 2 ) + λb T (T 1 (u 1 ) T 1 (u 2 )) 2 2 = g(x 1 ) g(x 2 ) 2 2 2λ T 1(u 1 ) T 1 (u 2 ), B(g(x 1 ) g(x 2 )) λ T 1 (u 1 ) T 1 (u 2 ) 2 M + λ T 1(u 1 ) T 1 (u 2 ) 2 2. (3.15) Observing the definitions in (3.5) and (3.9) (3.12), we have by (3.14) (3.15) T (u 1 ) T (u 2 ) 2 λ = T 2(u 1 ) T 2 (u 2 ) λ T 1(u 1 ) T 1 (u 2 ) 2 2 = g(x 1 ) g(x 2 ) 2 2 2λ T 1(u 1 ) T 1 (u 2 ), B(g(x 1 ) g(x 2 )) λ T 1 (u 1 ) T 1 (u 2 ) 2 M + 2λ T 1(u 1 ) T 1 (u 2 ) 2 2 g(x 1 ) g(x 2 ) 2 2 λ T 1(u 1 ) T 1 (u 2 ) 2 M + 2λ T 1(u 1 ) T 1 (u 2 ), M(v 1 v 2 ) = g(x 1 ) g(x 2 ) λ v 1 v 2 2 M λ (T 1(u 1 ) T 1 (u 2 )) (v 1 v 2 ) 2 M. (3.16) Using the definition in (3.9) and estimate (3.2), we know g(x 1 ) g(x 2 ) 2 2 = x 1 x γ f 2(x 1 ) f 2 (x 2 ), x 1 x 2 +γ 2 f 2 (x 1 ) f 2 (x 2 ) 2 2 x 1 x γ(2β γ) f 2(x 1 ) f 2 (x 2 ) 2 2. (3.17) By the definitions in (3.10) and (3.11), λ v 1 v 2 2 M = λ v 1 v λbt (v 1 v 2 ) 2 2. (3.18) Recalling the definition in (3.12), we easily know that (3.13) is a direct consequence of (3.16) (3.18). From theorem 3.3, we can derive the following result. Corollary 3.1. If 0 <γ <2β, 0 <λ 1/λ max (BB T ), then T is nonexpansive under the norm λ. Since T is nonexpansive, we are able to show the convergence of PDFP 2 O κ for κ (0, 1), in view of lemma 3.3. Theorem 3.4. Suppose 0 <γ <2β, 0 <λ 1/λ max (BB T ) and κ (0, 1). Let u k = (v k, x k ) be a sequence generated by PDFP 2 O κ. Then {u k } converges to a fixed point of T and {x k } converges to a solution of problem (1.1). Proof. In view of theorem 3.2, we know u k+1 = T κ (u k ),so{u k } is the Picard sequence of T κ. By assumption, problem (1.1) has a solution, and hence operator T has a fixed point from theorem 3.1. According to corollary 3.1, T is nonexpansive. Therefore, by letting S = R m,we find from lemma 3.3 that {u k } converges to a fixed point of T for κ (0, 1). With this result in mind, {x k } converges to a solution of problem (1.1) from theorem 3.1. Now, let us proceed with the convergence analysis of PDFP 2 O using some novel technique. 9

10 Theorem 3.5. Suppose 0 <γ <2β and 0 <λ 1/λ max (BB T ). Let u k = (v k, x k ) be the sequence generated by PDFP 2 O. Then the sequence {u k } converges to a fixed point of T, and the sequence {x k } converges to a solution of problem (1.1). Proof. Let u = (v, x ) R m R n be a fixed point of T. Using theorem 3.3,wehave u k+1 u 2 λ u k u 2 λ γ(2β γ) f 2(x k ) f 2 (x ) 2 2 λb T (v k v ) 2 2 λ v k+1 v k 2 M. (3.19) Summing (3.19) over k from 0 to + gives + k=0 { γ(2β γ) f2 (x k ) f 2 (x ) λbt (v k v ) λ v k+1 v k 2 } M u0 u 2 λ. So { lim γ(2β γ) f2 (x k ) f 2 (x ) 2 2 k + + λbt (v k v ) λ v k+1 v k 2 } M = 0, which together with 0 <γ <2β implies lim f 2(x k ) f 2 (x ) 2 = 0, (3.20) k + lim k + BT (v k v ) 2 = 0, (3.21) lim v k+1 v k M = 0. (3.22) k + By the definitions in (3.10) and (3.11), v k+1 v k 2 2 = v k+1 v k 2 M + λ BT (v k+1 v k ) 2 2, which when combined with (3.21) and (3.22)gives lim v k+1 v k 2 = 0. (3.23) k + On the other hand, from (3.7) wehave γ f 2 (x ) λb T v = 0, and from (2.2b) x k+1 x k = γ f 2 (x k ) λb T v k+1. Hence, x k+1 x k = γ( f 2 (x k ) f 2 (x )) λ(b T v k+1 B T v ). Now, using (3.20) and (3.21) we immediately obtain lim x k+1 x k 2 = 0. (3.24) k + By the definition in (3.12) and (3.23) (3.24), lim u k+1 u k λ = 0. (3.25) k + From (3.19), we know that the sequence { u k u λ } is non-increasing, so the sequence {u k } is bounded and there exists a convergent subsequence {u k j } such that lim u k j u λ = 0 (3.26) j + for some u R m R n. Next, let us show that u is a fixed point of T. In fact, T (u k j ) u λ = (u k j +1 u k j ) (u k j u ) λ u k j +1 u k j λ + u k j u λ 10

11 which, in conjunction with (3.25) and (3.26), leads to lim T (u k j ) u λ = 0. (3.27) j + The operator T is continuous since it is nonexpansive, so it follows from (3.26) and (3.27) that u is a fixed point of T. Moreover, we know that { u k u λ } is non-increasing for any fixed point u of T. In particular, by choosing u = u, we see that { u k u λ } is non-increasing. Combining this and (3.26) yields lim u k = u. k + Writing u = (v, x ) with v R m, x R n, we find from theorem 3.1 that x is the solution of problem (1.1). Note that if f 2 (x) = 1 2 x b 2 2 and γ = 1, then PDFP2 O reduces to FP 2 Oforκ = 0. As a consequence of the above theorem, we can achieve the convergence of FP 2 Oforκ = 0even when BB T is singular, for which no convergence is available from theorem 3.12 of[24]. Corollary 3.2. Suppose 0 < λ 1/λ max (BB T ). Let {v k } be the sequence generated by FP 2 Oforκ = 0. Set x k = b λb T v k. Then the sequence {v k } converges to the fixed point of H(see (2.1)), the sequence {x k } converges to the solution of problem (1.1) with f 2 (x) = 1 2 x b Linear convergence rate for special cases In this section, we will give some stronger theoretical results about the convergence rate in some special cases. For this, we present the following condition. Condition 3.1. For any two real numbers λ and γ satisfying that 0 < γ < 2β and 0 <λ 1/λ max (BB T ), there exist η 1, η 2 [0, 1) such that I λbb T 2 η1 2 and g(x) g(y) 2 η 2 x y 2 for all x, y R n, where g(x) is given in (3.9). Remark 3.2. If B has full row rank, f 2 is strongly convex, i.e. there exists some σ>0such that f 2 (x) f 2 (y), x y σ x y 2 2 for all x, y R n, (3.28) then this condition can be satisfied. In fact, when B has a full row rank, we can choose η1 2 = 1 λλ min(bb T ), where λ min (BB T ) denotes the smallest eigenvalue of BB T. In this case, η1 2 takes its minimum ( η 2 1 )min = 1 λ min(bb T ) λ max (BB T ) at λ = 1/λ max (BB T ). On the other hand, since f 2 has 1/β-Lipschitz continuous gradient and is strongly convex, it follows from (3.2) and (3.28) that g(x) g(y) 2 2 = x y 2 2 2γ f 2(x) f 2 (y), x y +γ 2 f 2 (x) f 2 (y) 2 2 x y 2 γ(2β γ) 2 f 2 (x) f 2 (y), x y β ( ) σγ(2β γ) 1 x y 2 2 β. 11

12 Hence we can choose η2 2 σγ(2β γ) = 1. β In particular, if we choose γ = β, then η 2 takes its minimum in the present form: (η 2 2 ) min = 1 σβ. As a typical example, consider f 2 (x) = 1 2 Ax b 2 2 with AT A full rank. Then we can find that β = 1/λ max (A T A) and σ = λ min (A T A), and hence ( η 2 2 )min = 1 λ min(a T A) λ max (A T A). Despite most of our interesting problems not belonging to these special cases, and there will be more efficient algorithms if condition 3.1 is satisfied, the following results still have some theoretical values where the best performance of PDFP 2 O κ can be achieved. First of all, we show that T κ is contractive under condition 3.1. Theorem 3.6. Suppose condition 3.1 holds true. Let the operator T be given in (3.5) and T κ = κi + (1 κ)tforκ [0, 1). Then T κ is contractive under the norm λ. Proof. Let η = max{η 1,η 2 }. It is clear that 0 η 1. Then, owing to the condition 3.1, for all u 1 = (v 1, x 1 ), u 2 = (v 2, x 2 ) R m R n, there holds g(x 1 ) g(x 2 ) 2 η x 1 x 2 2, v 1 v 2 M η v 1 v 2 2, from which, (3.12) and (3.16) it follows that T (u 1 ) T (u 2 ) 2 λ g(x 1) g(x 2 ) 2 2 +λ v 1 v 2 2 M λ (T 1(u 1 ) T 1 (u 2 )) (v 1 v 2 ) 2 M η 2 ( x 1 x λ v 1 v ) = η 2 u 1 u 2 2 λ. On the other hand, it is easy to check from the last estimate and the triangle inequality that T κ (u 1 ) T κ (u 2 ) λ κ u 1 u 2 λ + (1 κ) T (u 1 ) T (u 2 ) λ θ u 1 u 2 λ, with θ = κ + (1 κ)η [0, 1). So, operator T κ is contractive. Now, we are ready to analyze the convergence rate of PDFP 2 O κ. Theorem 3.7. Suppose condition 3.1 holds true. Let the operator T be given in (3.5) and T κ = κi + (1 κ)tforκ [0, 1). Let u k = (v k, x k ) be a Picard sequence of the operator T κ (or equivalently, a sequence obtained by algorithm PDFP 2 O κ ). Then the sequence {u k } must converge to the unique fixed point u = (v, x ) R m R n of T, with x being the unique solution of problem (1.1). Furthermore, there holds the estimate x k x 2 cθ k 1 θ, (3.29) where c = u 1 u 0 λ, θ = κ + (1 κ)η [0, 1) and η = max{η 1,η 2 }, with η 1 and η 2 given in condition

13 Proof. Since the operator T κ is contractive, by the Banach contraction mapping theorem, it has a unique fixed point, denoted by u = (v, x ). It is obvious that T κ has the same fixed points as T,sox is the unique solution of problem (1.1) from theorem 3.1. Moreover, it is routine that the sequence {u k } converges to u. On the other hand, it follows from theorem 3.6 that So for all 0 < l N, u k+1 u k λ θ u k u k 1 λ θ k u 1 u 0 λ = cθ k. u k+l u k λ which immediately implies l u k+i u k+i 1 λ = cθ k i=1 x k x 2 u k u λ cθ k 1 θ by letting l +. The desired estimate (3.29) is then obtained. l θ i 1 cθ k 1 θ, i=1 If B = I, λ = 1, then form (2.2) is equivalent to form (1.4), so as a special case of theorem 3.7, we can obtain the convergence rate for PFBS. Corollary 3.3. Suppose 0 <γ <2β and there exists η [0, 1) such that g(x) g(y) 2 η x y 2 for all x, y R n. Let {x k } be a sequence generated by PFBS and x be the solution of problem (1.3)forX = R n. Set c = x 1 x 0 2. Then x k x 2 cηk 1 η. As a conclusion of theorem 3.7, we can also obtain the convergence rate of FP 2 Ofor κ = 0 under the assumption I λbb T < 1. Corollary 3.4. Suppose 0 <λ 1/λ max (BB T ), the matrix B has full row rank and η 1 is given by condition 3.1. Let v be the fixed point of H(cf (2.1)). Let {v k } be a sequence generated by FP 2 Oforκ = 0, with x k = b λb T v k. Set c = λb T (v 1 v 0 ) λ v 1 v Then v k v 2 cη k 1 λ(1 η1 ), x k x 2 cηk 1 1 η Connections to other algorithms We will further investigate the proposed algorithm PDFP 2 O from the perspective of primal dual forms and establish the connections to other existing methods. 13

14 4.1. Primal dual and proximal point algorithms For problem (1.1), we can write its primal dual form using the Fenchel duality [29]as min maxl(x, v) := f 2 (x) + Bx, v f x 1 (v), (4.1) v where f1 is the convex conjugate function of f 1 defined by f1 (v) = sup v, w f 1 (w). w R m By introducing a new intermediate variable y k+1, equations (2.2) are reformulated as y k+1 = x k γ f 2 (x k ) λb T v k, (4.2a) v k+1 = (I prox γ λ f )(By 1 k+1 + v k ), (4.2b) x k+1 = x k γ f 2 (x k ) λb T v k+1. (4.2c) According to Moreau decomposition (see equation (2.21) in [12]), for all v R m,wehave v = v γ + v γ, where v γ = prox γ λ λ λ λ f v, v γ = γ ( ) λ 1 λ λ prox λ γ f 1 γ v, from which we know ( ) I prox γ λ f (Byk+1 + v 1 k ) = γ ( λ λ prox λ γ f 1 γ By k+1 + λ ) γ v k. Let v k = γ λ v k. Then (4.2) can be reformulated as y k+1 = x k γ f 2 (x k ) γ B T v k, (4.3a) ( ) λ v k+1 = prox λ γ f 1 γ By k+1 + v k, (4.3b) x k+1 = x k γ f 2 (x k ) γ B T v k+1. (4.3c) In terms of the saddle point formulation (4.1), we have by a direct manipulation that f 2 (x k ) + B T v k = x L(x k, v k ) ( ) λ prox λ γ f 1 γ By k+1 + v k = arg min L(y k+1, v) + γ v R m 2λ v v k 2 2, f 2 (x k ) + B T v k+1 = x L(x k, v k+1 ). Hence, (4.3) can be expressed as y k+1 = x k γ x L(x k, v k ), (4.4a) v k+1 = arg min L(y k+1, v) + γ v R m 2λ v v k 2 2, (4.4b) x k+1 = x k γ x L(x k, v k+1 ). (4.4c) From (4.3a) and (4.3c), we can find out that y k+1 = x k+1 + γ B T (v k+1 v k ). Then equation (4.4b) becomes v k+1 = arg min L(x k+1, v) + γ v R m 2λ v v k 2 M, where M = 1 λbb T. Together with (4.4c), the iterations (4.4)are v k+1 = arg maxl(x k+1, v) γ v R m 2λ v v k 2 M, (4.5a) x k+1 = x k γ x L(x k, v k+1 ). (4.5b) 14

15 Table 1. Comparison between CP (θ = 1) and PDFP 2 O. CP (θ = 1) PDFP 2 O Form v k+1 = (I + σ f1 ) 1 (v k + σ By k ) v k+1 = (I + λ f γ 1 ) 1 (v k + λ By γ k) x k+1 = (I + τ f 2 ) 1 (x k τb T v k+1 ) x k+1 = x k γ f 2 (x k ) γ B T v k+1 y k+1 = 2x k+1 x k y k+1 = x k+1 γ f 2 (x k+1 ) γ B T v k+1 Convergence 0 <στ<1/λ max (BB T ) 0 <γ <2β,0<λ 1/λ max (BB T ) Relation σ = λ/γ, τ = γ Thus the proposed algorithm can be interpreted as an inexact Uzawa method [3] applied on the dual formulation. Compared to the classical Uzawa method, (4.5) is more implicit since the update of v k+1 involves x k+1 and a proximal point iteration matrix M is used. This leads to a close connection with a class of primal dual method studied in [35, 16, 9, 19]. For example, in [9], Chambolle and Pock proposed the following scheme for solving (4.1): v k+1 = (I + σ f1 ) 1 (v k + σ By k ), (4.6a) (CP) x k+1 = (I + τ f 2 ) 1 (x k τb T v k+1 ), (4.6b) y k+1 = x k+1 + θ(x k+1 x k ), (4.6c) where σ,τ > 0, θ [0, 1] is a parameter. For θ = 0, we can obtain the classical Arrow Hurwicz Uzawa (AHU) method in [3]. The convergence of AHU with very small step length isshownin[16]. Under some assumptions on f1 or strong convexity of f 2, global convergence of the primal dual gap can also be shown with specific chosen adaptive steplength [9]. Note that in the case of the ROF model, Chan and Zhu proposed in [36] a clever adaptive step lengths σ and τ for acceleration, and recently the convergence was shown in [6]. According to equation (4.3), using the relation prox λ γ f = (I + λ 1 γ f 1 ) 1, and changing the order of these equations, we know that PDFP 2 O is equivalent to ( v k+1 = I + λ ) 1 ( ) λ γ f 1 γ By k + v k, (4.7a) x k+1 = x k γ f 2 (x k ) γ B T v k+1, (4.7b) y k+1 = x k+1 γ f 2 (x k+1 ) γ B T v k+1. (4.7c) Let σ = λ/γ, τ = γ, then we can see that equations (4.6b) and (4.6c) are approximated by two explicit steps (4.7b) (4.7c). In summary, we list the comparisons of CP for θ = 1 with the fixed step length and PDFP 2 O in table 1. For f 2 (x) = 1 2 Ax b 2 2, (4.4) can be further expressed as y k+1 = arg minl(x, v k ) + 1 x R n 2γ x x k 2 (I γ A T A), (4.8a) v k+1 = arg min L(y k+1, v) + γ v R m 2λ v v k 2 2, (4.8b) x k+1 = arg minl(x, v k+1 ) + 1 x R n 2γ x x k 2 (I γ A T A). (4.8c) Note that by introducing the proximal iteration norm through the matrix I γ A T A M n for 0 <γ <βwith β = 1/λ max (A T A), (4.8a) and (4.8c) become explicit. This is particularly useful when the inverse of A T A is not easy to obtain in most of the imaging applications, such as super-resolution, tomographic reconstruction and parallel MRI [11]. Meanwhile, it is worthwhile pointing out that the condition on γ by this formulation is stricter than theorem 3.5, where γ is required as 0 <γ <2β for the convergence. Furthermore, if we 15

16 ) and P = ( γ λ (I λbbt ) 0 denote û k = ( v T k, x ) T T k and F(ûk ) = ( f1 (v k) Bx k B T v k + f 2 (x k ) we can also easily write the algorithm in the PPA framework [19] as 0 1 γ (I γ AT A) ), then 0 F(û k+1 ) + P(û k+1 û k ). (4.9) We note that in [19], the Chambolle Pock algorithm (4.6) forθ = 1 was also rewritten in the PPA structure as (4.9) with the same F, while ( 1 P = σ I B ). B T 1 τ I In [19, 9], a more general class of algorithms taking this form are studied. In particular, an extra extrapolation step can be applied to the algorithm (4.9) for acceleration Splitting type of methods There are other types of methods which are designed to solve problem (1.1) based on the notion of an augmented Lagrangian. For simplicity, we only list these algorithms for f 2 (x) = 1 2 Ax b 2 2. Among them, the alternating split Bregman (ASB) method proposed by Goldstein and Osher [18] is very popular for imaging applications. This method has been proved to be equivalent to the Douglas Rachford method and the alternating direction of multiplier method (ADMM). In [34, 35], based on PFBS and Bregman iteration, a split inexact Uzawa (SIU) method is proposed to maximally decouple the iterations, so that each iteration is explicit. Further analysis and connections to primal dual methods algorithm are given in [16, 35]. In particular, it is shown that the primal dual algorithm scheme (4.6) with θ = 1 can be interpreted as SIU. In the following, we study the connections and differences between these two methods. ASB can be described as follows: x k+1 = (A T A + νb T B) 1 (A T b + νb T (d k v k )), (4.10a) (ASB) d k+1 = prox 1 ν f (Bx 1 k+1 + v k ), (4.10b) v k+1 = v k (d k+1 Bx k+1 ), (4.10c) where ν>0isaparameter. The explicit SIU method proposed in the literature [35] can be described as x k+1 = x k δa T (Ax k b) δνb T (Bx k d k + v k ), (4.11a) (SIU) d k+1 = prox 1 ν f (Bx 1 k+1 + v k ), (4.11b) v k+1 = v k (d k+1 Bx k+1 ), (4.11c) where δ>0 is a parameter. We can easily see that we approximate the implicit step (4.10a) in ASB by an explicit step (4.11a)inSIU. From (4.2a) and (4.2c), we can find out a relation between y k and x k, given by x k = y k λb T (v k v k 1 ). Then eliminating x k,pdfp 2 O can be expressed as { yk+1 = y k λb T (2v k v k 1 ) γ f 2 (y k λb T (v k v k 1 )), (4.12a) v k+1 = (I prox γ λ f )(By 1 k+1 + v k ). (4.12b) By introducing the splitting variable d k+1 in (4.12b), (4.12) can be further expressed as 16

17 Table 2. The comparisons among ASB, SIU and PDFP 2 O. ASB SIU PDFP 2 O Form x k+1 = (A T A + νb T B) 1 x k+1 = x k δa T (Ax k b) x k+1 = x k δa T (Ax k b) (A T b + νb T (d k v k )) δνb T (Bx k d k + v k ) δνb T (Bx k d k + v k ) δ 2 νa T AB T (d k Bx k ) d k+1 = prox 1 ν f 1 (Bx k+1 + v k ) d k+1 = prox 1 ν f 1 (Bx k+1 + v k ) d k+1 = prox 1 ν f 1 (Bx k+1 + v k ) v k+1 = v k (d k+1 Bx k+1 ) v k+1 = v k (d k+1 Bx k+1 ) v k+1 = v k (d k+1 Bx k+1 ) Convergence ν>0 ν>0 0 <δ<2/λ max (A T A) 0 <δ 1/λ max (A T A + νb T B) 0 <δν 1/λ max (BB T ) y k+1 = y k λb T (By k d k + v k ) γ f 2 (y k λb T (By k d k )), d k+1 = prox γ λ f (By 1 k+1 + v k ), v k+1 = v k (d k+1 By k+1 ). (4.13) For f 2 (x) = 1 2 Ax b 2 2, f 2(x) = A T (Ax b). By changing the order and letting γ = δ, λ = δν, (4.13) becomes y k+1 = y k δa T (Ay k b) δνb T (By k d k + v k ) δ 2 νa T AB T (d k By k ) (4.14a) d k+1 = (prox 1 ν f )(By 1 k+1 + v k ), (4.14b) v k+1 = v k (d k+1 By k+1 ). (4.14c) We can easily see that equation (4.10a) in ASB is approximated by (4.14a). Although it seems that PDFP 2 O requires more computation in (4.14a) than SIU in (4.11a), PDFP 2 O has the same computation cost as that of SIU if the iterations are implemented cleverly. For the reason of comparison, we can change the variable y k to x k in (4.14). Table 2 gives the summarized comparisons among ASB, SIU and PDFP 2 O. We note that the only difference of SIU and PDFP 2 O is in the first step. As two algorithms converge, the algorithm PDFP 2 O behaves asymptotically the same as SIU since d k Bx k converges to 0. The parameters δ and ν satisfy respectively different conditions to ensure the convergence. 5. Numerical experiments In this section, we illustrate the numerical performance of PDFP 2 O κ for κ [0, 1) through three applications: image super-resolution, computerized tomography (CT) reconstruction and pmri. Both the first two applications can be described as problem (1.2), where A is a linear operator representing the subsampling and tomographic projection operator respectively. In pmri, 2 1 Ax b 2 2 is replaced by 2 1 N j=1 A jx b j 2 2, and a detailed description will be given in section 5.3. Here, we use the total variation as the regularization functional, where the operator B : R n R 2n is a discrete gradient operator. Furthermore, the isotropic definition is adopted, i.e. f 1 (w) = μ w 1,2, for all w = (w 1,...,w n,w n+1,...,w 2n ) T R 2n, n w 1,2 = wi 2 + wn+i 2. i=1 Let w i = (w i,w n+i ) T, w i 2 = expressed as w 2 i + w 2 n+i and ɛ = μγ λ. Then prox ɛ 1,2 (w) can be (prox ɛ 1,2 (w)) i,n+i = max{ w i 2 ɛ,0}, w i 2 w i i = 1,...,n. 17

18 For the implementation of PDFP 2 O, we use the scheme presented in algorithm 4, where we compute directly (I prox γ λ f )(w). In fact, we can deduce that (I prox 1 γ λ f )(w) = Proj 1 ɛ (w), where Proj ɛ is the projection operator from R 2n to l 2, ball of radius ɛ, i.e. w i (Proj ɛ (w)) i,n+i = min{ w i 2,ɛ}, i = 1,...,n. w i 2 In the numerical experiments, we compare our proposed algorithm PDFP 2 O with the three methods: ASB (cf (4.10)), CP (cf (4.6)) and SIU (cf (4.11)). Both ASB and CP involve linear system inversion (A T A + νb T B) 1 and (I + τa T A) 1 respectively. In the experiments, we use the conjugate gradient (CG) method for quadratic subproblems. The maximal number of CG iterations is denoted as N I and the stopping criteria are set as the residual error is less than To numerically measure the convergence speed of various methods, we compute the relative error between the energy at kth outer iteration and the optimal value E. In practice, we run each method 5000 (outer iteration) steps with a large range of parameters and set the minimum of all the tests as the optimal minimum energy E. We denote ε k = (E k E)/E. (5.1) In the following, we will use (5.1) as a criterion to compare the performance among ASB, CP, SIU and PDFP 2 O. To guarantee the quality of recovered images, we also use the criterion peak signal-to-noise ratio (PSNR) ( ) PSNR = 10 log 10, with MSE = I m Ĩ m 2 F, MSE s 1 s 2 where I m denotes the original image of s 1 s 2, Ĩ m denotes the recovered images obtained from various algorithms and F denotes the Frobenius norm. All the experiments are implemented under MATLAB7.11(R2010b) and conducted on a computer with Intel(R) core(tm) i5 CPU 750@ 2.67G Image super-resolution In the numerical simulation, the subsampling operator A is implemented by taking the average of every d d pixels and sampling the average, if a zoom-in ratio d is desired. The experiment is performed on the test image lena of size and the subsampling ratio is d = 4. White Gaussian noise of mean 0 and variance 1 is added to the observed low-resolution image of The regularization parameter μ is set as 0.1 for the best image quality. First we show the impacts of the parameters κ, γ and λ for the proposed algorithm in figure 1. The conditions for theoretical convergence are 0 <γ <2β, 0<λ 1/λ max (BB T ) and κ [0, 1) (see theorems 3.4 and 3.5). The constant β is given by 1/λ max (A T A), and the maximal eigenvalue of A T A is 1/16, so 0 <γ <32. It is well known in total variation application that λ max (BB T ) = 8forB being the usual gradient operator (see [16]), and then 0 < λ 1/8. Figures 1(a) and (b) show that for most cases κ = 0 achieves the fastest convergence compared to other κ (0, 1). Thus we choose κ = 0 for the following comparison. In figures 1(c) and (d), the parameter λ has relatively smaller impact on the performance of this algorithm. We compare the results for λ = 1/5, 1/6, 1/8, 1/16, 1/32. When λ = 1/6 > 1/8, the algorithm is convergent. While for λ = 1/5, the algorithm does not appear to converge, which shows that we cannot extend the range of λ to (0, 2/λ max (BB T )) generally, as given in [24] for denoising case (see algorithm 1). Hence, we only consider 0 <λ 1/λ max (BB T ) as indicated in theorem 3.5, for which the upper bound λ = 1/8 achieves the best performance. The parameter γ has relatively larger impact for the algorithm. We test γ = 8, 16, 24, 30, 32 for κ = 0, λ = 1/8. We observe that numerically larger γ leads to a faster convergence. For this reason, we can choose γ close to 2β. 18

19 κ=0.5 κ=0.1 κ=0.01 κ=0.001 κ= κ=0.5 κ=0.1 κ=0.01 κ=0.001 κ= λ=1/5 λ=1/6 λ=1/8 λ=1/16 λ=1/ PSNR 28 log10(energy) 5.07 PSNR Iteration Iteration Iteration (a) (b) (c) λ=1/5 λ=1/6 λ=1/8 λ=1/16 λ=1/ γ=8 γ=16 γ=24 γ=30 γ= γ=8 γ=16 γ=24 γ=30 γ= log10(energy) PSNR 28 log10(energy) Iteration Iteration Iteration (d) (e) (f) Figure 1. PSNR and energy versus iterations with different parameters. (a) and (b) are PSNR and energy versus iterations for κ = 0.5, 0.1, 0.01, 0.001, 0(λ = 1/8, γ = 30). (c) and (d) are PSNR and energy versus iterations for λ = 1/5, 1/6, 1/8, 1/16, 1/32 (κ = 0, γ = 30). (e) and (f) are PSNR and energy versus iterations for γ = 8, 16, 24, 30, 32 (κ = 0, λ = 1/8). As mentioned above, the optimal value E of the optimization model (1.2) for this example is obtained by taking the minimum of a large range of parameter setting on each method with 5000 iterations. The performances for each method with different parameter sets are listed in tables 3 6 for ε = 10 i, i = 1,...,6. For a given ε, the first column gives the least (outer) iteration number k such that ε k <ε, and the second column in the bracket gives the corresponding running time in second. The entries indicate that the algorithm fails to drop the error below ε within a maximum number of 5000 iterations. Table 3 shows that the number of inner iteration steps N I in CG affects the speed of ASB. We highlight the best performance for each given tolerance ε. The parameter ν also plays an important role in the performance of ASB. For this example, ν = 0.01 generally gives the smallest number of iterations and the least computation time for different tolerance levels ε. For the CP algorithm, we run the tests with different σ, θ = 1 and τ = 1 according to the 8σ convergence condition in table 1. The second quadratic subproblem is solved with CG. After a simple analysis, we observe that (I + τa T A) has only two eigenvalues (1 + τ/16) and 1; thus, this subproblem can be solved within two CG steps theoretically. Therefore we only list the results with N I = 1, 2 in table 4 for comparison, and we can see that N I = 1 has even better performance in terms of total iteration steps and computation time. For this example, σ has some impact on the performance of CP, and σ = gives the best performance for all the tolerance levels. Similar results for the SIU algorithm is given in table 5, and a larger δ yields a better performance on respecting the convergence condition with an accordingly best ν. For this example, δ = 24 and ν = 0.06 give the best performance among the tested parameter sets. We also test various γ and λ and list the results for PDFP 2 O in terms of computation time and relative error to the optimal minimum in table 6. As we observed previously, γ and λ being close to the upper bound 2β and 1/λ max (BB T ) gives a nearly optimal convergence 19

20 Table 3. Performance evaluation for different choices of ν and N I in ASB for image superresolution. N I ν ε = 10 1 ε = 10 2 ε = 10 3 ε = 10 4 ε = 10 5 ε = (2.39) 165 (9.55) 486 (28.12) 1800 (104.53) (0.49) 45 (2.64) 144 (8.37) 469 (27.34) 1599 (93.29) (0.36) 41 (2.39) 153 (8.94) 464 (27.05) 1572 (93.14) 3633 (216.50) (0.52) 103 (7.12) 544 (32.81) 1776 (105.88) (0.94) 205 (13.21) 1062 (63.32) 3485 (208.60) (1.79) 101 (7.53) 459 (34.25) 1772 (132.31) (0.59) 31 (2.33) 110 (8.28) 397 (29.89) 1502 (112.96) (0.32) 21 (1.58) 74 (5.55) 254 (19.14) 908 (69.84) 2930 (225.19) (0.30) 45 (3.39) 203 (15.28) 801 (61.61) 2266 (173.80) (0.53) 89 (6.70) 399 (30.02) 1590 (120.95) 4474 (343.83) (1.60) 93 (11.60) 454 (56.57) 1770 (220.55) (0.61) 24 (2.99) 101 (12.60) 367 (45.79) 1446 (180.41) 4968 (620.12) (0.50) 19 (2.37) 71 (8.91) 248 (32.75) 884 (114.19) 2866 (369.36) (0.49) 45 (5.63) 181 (22.69) 769 (99.59) 2108 (271.02) (0.86) 88 (11.01) 356 (46.29) 1524 (195.54) 4156 (533.58) (2.69) 93 (19.27) 454 (94.18) 1770 (367.16) (1.03) 24 (4.96) 101 (20.87) 367 (78.11) 1446 (309.09) 4967 ( ) (0.84) 19 (3.99) 72 (15.04) 248 (53.50) 886 (189.75) 2871 (613.37) (0.83) 45 (9.59) 182 (38.08) 770 (163.58) 2105 (449.95) (1.42) 88 (18.21) 359 (76.37) 1530 (326.83) 4159 (887.33) Table 4. Performance evaluation for different choices of σ and N I in CP for image super-resolution. N I σ ε = 10 1 ε = 10 2 ε = 10 3 ε = 10 4 ε = 10 5 ε = (4.64) 335 (14.51) 953 (42.64) 3567 (157.65) (1.89) 171 (7.19) 487 (20.56) 1803 (77.75) (0.25) 46 (1.94) 150 (6.35) 481 (20.37) 1616 (69.77) (0.13) 57 (2.42) 217 (10.22) 650 (28.45) 2084 (90.45) 4186 (180.86) (0.25) 262 (11.09) 1049 (45.82) 3110 (134.20) (1.61) 185 (9.11) 909 (44.93) 3537 (178.11) (1.06) 99 (4.89) 458 (24.12) 1773 (90.94) (0.24) 45 (2.23) 149 (7.37) 480 (23.72) 1617 (81.60) (0.14) 56 (2.77) 216 (10.69) 648 (31.95) 2080 (104.27) 4181 (211.21) (0.28) 262 (12.90) 1048 (51.63) 3109 (156.60) Table 5. Performance evaluation for different choices of δ and ν in SIU for image super-resolution. The impacts of different ν for δ = 8, 16, 30 are similar to δ = 24; thus, we only list the cases with different ν for δ = 24. δ ν ε = 10 1 ε = 10 2 ε = 10 3 ε = 10 4 ε = 10 5 ε = (0.11) 81 (2.17) 327 (8.71) 975 (26.53) 3106 (83.28) (0.15) 46 (1.26) 173 (4.65) 519 (13.86) 1675 (44.64) 3542 (94.95) (0.20) 42 (1.14) 141 (3.78) 446 (12.45) 1472 (39.77) 4342 (116.73) (0.23) 47 (1.27) 153 (4.23) 490 (13.22) 1625 (43.96) (1.45) 191 (5.14) 507 (13.60) 1817 (48.52) (3.46) 371 (10.12) 980 (26.77) 3584 (96.88) (0.86) 109 (2.94) 323 (8.65) 1147 (30.73) 4496 (121.20) 20

21 Table 6. Performance evaluation for different choices of γ and λ in PDFP 2 O for image superresolution. The impacts of λ for γ = 8, 16 are similar with γ = 24, 30. γ λ ε = 10 1 ε = 10 2 ε = 10 3 ε = 10 4 ε = 10 5 ε = /8 3 (0.09) 81 (2.05) 328 (8.30) 977 (24.75) 3115 (79.08) 16 1/8 3 (0.09) 49 (1.24) 178 (4.50) 540 (13.64) 1750 (44.10) 3982 (100.71) 24 1/6 3 (0.09) 38 (0.97) 134 (3.41) 417 (10.56) 1373 (34.68) 3826 (96.96) 1/8 5 (0.14) 45 (1.15) 149 (3.80) 478 (12.14) 1587 (40.15) 4940 (125.14) 1/16 15 (0.38) 76 (1.95) 224 (5.74) 768 (19.93) 2811 (71.70) 1/32 41 (1.03) 144 (3.64) 397 (10.03) 1413 (36.15) 30 1/6 5 (0.14) 38 (0.97) 128 (3.25) 417 (10.59) 1412 (35.81) 4579 (116.08) 1/8 6 (0.18) 46 (1.20) 150 (3.83) 508 (13.26) 1788 (45.60) 1/16 17 (0.43) 82 (2.06) 252 (6.38) 897 (22.90) 3465 (88.06) 1/32 46 (1.15) 160 (4.00) 470 (11.88) 1733 (44.13) Table 7. Performance comparison among ASB, CP, SIU and PDFP 2 O for image super-resolution. For a given error tolerance ε, the first column in the bracket gives the first outer iteration number k such that ε k <ε, the second column in the bracket gives the corresponding run time in second and the third column in the bracket gives the corresponding PSNR. For ASB, N I = 2, ν = For CP, N I = 1, σ = For SIU, δ = 24, ν = For PDFP 2 O, γ = 30, λ = 1/6. ε = 10 2 ε = 10 3 ε = 10 4 ε = 10 5 ASB (21, 1.58, 29.34) (74, 5.55, 29.38) (254, 19.14, 29.37) (908, 69.84, 29.36) CP (46, 1.94, 28.97) (150, 6.35, 29.24) (481, 20.37, 29.32) (1616, 69.77, 29.35) SIU (42, 1.14, 28.91) (141, 3.78, 29.22) (446, 12.45, 29.31) (1472, 39.77, 29.35) PDFP 2 O (38, 0.97, 28.98) (128, 3.25, 29.25) (417, 10.59, 29.32) (1412, 35.81, 29.35) speed. Table 6 shows that γ = 24, λ = 1/8 has slightly better convergence speed, while γ = 30, λ = 1/8 can get slightly higher PSNR in the first steps, as shown in figure 1(e). Finally, we compare the four methods with their optimal parameter sets (averagely best for all the tolerance levels) in table 7. We also compare their corresponding values of PSNR to measure the recovered image quality. From table 7, we can see that PDFP 2 O is better than ASB and CP in terms of the computation time, especially for a higher accuracy level. The performance of the two explicit methods SIU and PDFP 2 O is similar. However, the choice of parameters for PDFP 2 O is relatively easier compared to SIU. Also, we point out that ASB can attain higher PSNR at the first few steps for some good choices of ν, which can be interesting in practice when a crude approximation is needed in a short time. Figure 2 shows the images recovered with the four methods for ε = 10 4 and the images look similar as expected CT reconstruction In a simplified parallel beam tomographic problem, an observed body slice is modeled as a two-dimensional function, and projections modeled by line integrals represent the total attenuation of a beam of x-rays when it traverses the object. The operator for this application can be represented by a discrete Radon transform, and the tomographic reconstruction problem is then to estimate a function from a finite number of measured line integrals (see [4]). The standard reconstruction algorithm in clinical applications is the so-called filtered back projection (FBP) algorithm. In the presence of noise, this problem becomes difficult since the inverse of Radon transform is unbounded and ill-posed. In the literature, the model (1.2) is often used for iterative reconstruction. Here, A is the Radon transform matrix and b is the 21

22 Original Zooming ASB, PSNR=29.37 CP, PSNR=29.32 SIU, PSNR=29.31 PDFP 2 O, PSNR=29.32 Figure 2. Super resolution results from image to image by ASB, CP, SIU and PDFP 2 O corresponding to tolerance error ε = 10 4 with noise level 1. 22

23 34 32 κ=0.5 κ=0.1 κ=0.01 κ=0.001 κ= κ=0.5 κ=0.1 κ=0.01 κ=0.001 κ= λ=1/5 λ=1/6 λ=1/8 λ=1/16 λ=1/ PSNR log10(energy) PSNR Iteration Iteration Iteration (a) (b) (c) λ=1/5 λ=1/6 λ=1/8 λ=1/16 λ=1/ γ=0.4 γ=0.7 γ=1 γ=1.2 γ= γ=0.4 γ=0.7 γ=1 γ=1.2 γ= log10(energy) PSNR log10(energy) Iteration Iteration Iteration (d) (e) (f) Figure 3. PSNR and energy versus iterations for different parameters in CT reconstruction. (a) and (b) are PSNR and energy versus iterations for κ = 0.5, 0.1, 0.01, 0.001, 0(λ = 1/8, γ = 1.3). (c) and (d) are PSNR and energy versus iterations for λ = 1/5, 1/6, 1/8, 1/16, 1/32 (κ = 0, γ = 1.3). (e) and (f) are PSNR and energy versus iterations for γ = 0.4, 0.7, 1, 1.2, 1.3 (κ = 0, λ = 1/8). measured projections vector. Generally, the size of A is huge and it is not easy to compute the inverse directly. We note that the total variation regularization has become a standard tool in tomographic reconstruction. Recently, first-order methods have been applied for a faster implementation, for example SIU is applied in [34] for CT and the Chambolle Pock algorithm is applied in [1]for PET and cone beam CT reconstruction. Here, we use the same example tested in [35], i.e. 50 uniformly oriented projections are simulated for a Shepp Logan phantom image and then white Gaussian noise of mean 0 and variance 1 is added to the data. For this example, we compute numerically λ max (AA T ) = , so we can set 0 <γ <2/λ max (AA T ) = As the previous example, we first test the impacts of the parameters κ, γ and λ. The impact of κ has the same behavior as for the super-resolution example, i.e. κ = 0 is the best one for κ [0, 1) (see figures 3(a) and (b)). Similarly, the parameter λ has relatively small impact on the performance of the algorithm (see figures 3(c) and (d)). It seems that the algorithm still converges with λ = 1/5, but it cannot achieve high accuracy (table 11). As the previous example, the parameter γ has larger impact on the convergence rate of the algorithm (see figures 3(e) and (f)). Theoretically, it should satisfy 0 <γ < Numerically, we test γ = 0.4, 0.7, 1, 1.2, 1.3 forκ = 0, λ = 1/8. Better performance with a larger γ is observed (see figures 3(e) and (f)), while when γ = 1.4, the algorithm diverges. As the previous application, we show the performance of ASB, CP, SIU, PDFP 2 O with different parameter sets in tables 8, 9, 10, 11, respectively. For the algorithm ASB, table 8 shows that N I = 5 is the best choice for ε = 10 1, 10 2, while N I = 2 is the best one for ε = 10 i, i = 3, 4, 5, 6. We choose N I = 2 for an averagely good performance. Similarly, ν = 0.01 is the best for ε = 10 i, i = 1, 2, 3, 4 and ν = 0.05 is the best for ε = 10 5, Thus, we use the two parameter sets for comparison (see table 12). Similar to ASB, the best 23

24 Table 8. Performance evaluation for different choices of ν and N I in ASB for CT reconstruction. N I ν ε = 10 1 ε = 10 2 ε = 10 3 ε = 10 4 ε = 10 5 ε = (3.24) 551 (8.43) 995 (14.92) 2380 (35.15) (2.11) 362 (5.20) 662 (11.13) 1070 (17.24) 2310 (35.08) (2.22) 366 (5.27) 657 (9.46) 1002 (14.43) 1564 (24.64) 4893 (72.57) (2.13) 360 (5.24) 646 (9.35) 960 (13.86) 1304 (18.81) 1794 (25.86) (2.12) 367 (5.28) 654 (9.40) 967 (13.92) 1298 (18.66) 1692 (24.35) (1.58) 206 (4.32) 422 (8.87) 2085 (43.81) (0.66) 73 (1.52) 149 (4.13) 478 (11.83) 2083 (45.54) (0.60) 67 (1.42) 129 (2.72) 279 (5.87) 1068 (24.49) 4670 (102.17) (0.99) 124 (2.59) 229 (4.79) 345 (7.24) 492 (10.33) 1011 (21.23) (1.42) 208 (4.35) 421 (8.82) 653 (13.69) 899 (18.86) 1204 (25.28) (0.74) 59 (2.40) 355 (14.56) 1985 (83.22) (0.38) 21 (0.86) 77 (3.16) 407 (16.63) 1897 (79.30) (0.36) 26 (1.06) 63 (3.33) 219 (11.03) 977 (41.97) 4287 (181.05) (1.38) 115 (4.68) 232 (9.44) 356 (16.26) 508 (22.99) 982 (42.31) (2.68) 227 (9.25) 457 (18.62) 698 (30.16) 952 (40.51) 1261 (53.13) (0.75) 44 (3.25) 352 (27.50) 1974 (149.35) (0.51) 19 (1.40) 76 (5.59) 401 (29.54) 1872 (142.26) (0.67) 27 (1.99) 63 (4.65) 217 (16.01) 963 (73.15) 4236 (322.62) (2.42) 115 (8.47) 231 (17.02) 356 (26.24) 508 (37.45) 981 (74.43) (5.95) 228 (18.76) 458 (35.73) 699 (53.50) 952 (74.23) 1261 (97.04) Table 9. Performance evaluation for different choices of σ and N I in CP for CT reconstruction. N I σ ε = 10 1 ε = 10 2 ε = 10 3 ε = 10 4 ε = 10 5 ε = (2.42) 440 (6.18) 847 (11.90) 2314 (32.57) (2.18) 362 (7.23) 652 (11.26) 998 (16.05) 1559 (23.82) 4879 (71.22) (2.10) 364 (5.02) 650 (8.97) 972 (15.55) 1375 (21.11) 2628 (38.39) (2.11) 364 (5.00) 647 (8.88) 960 (13.17) 1305 (17.90) 1791 (24.57) (2.27) 387 (5.32) 686 (9.41) 1015 (13.92) 1365 (19.41) 1779 (26.15) (1.57) 175 (3.55) 415 (8.40) 2098 (42.52) (0.81) 94 (2.04) 174 (4.76) 326 (8.72) 1083 (24.03) 4671 (98.46) (0.81) 93 (1.89) 167 (3.37) 267 (5.40) 588 (12.73) 2401 (50.60) (1.62) 195 (3.88) 346 (6.89) 515 (10.26) 714 (14.21) 1156 (23.05) (3.21) 385 (7.68) 684 (13.65) 1013 (20.56) 1363 (29.14) 1777 (37.38) (0.69) 56 (2.18) 354 (13.80) 1985 (79.67) (0.75) 44 (1.72) 87 (3.75) 249 (11.54) 1053 (42.90) 4636 (186.29) (1.34) 80 (3.13) 144 (5.63) 236 (9.22) 573 (24.49) 2389 (97.59) (3.13) 193 (7.45) 344 (13.29) 513 (19.80) 712 (27.50) 1154 (46.67) (6.20) 384 (14.80) 683 (27.40) 1011 (40.80) 1361 (54.30) 1775 (70.24) (1.44) 46 (3.93) 353 (25.53) 1977 (143.57) (1.36) 44 (3.11) 87 (6.14) 249 (19.95) 1054 (78.92) 4637 (339.05) (2.36) 80 (5.59) 144 (10.04) 236 (16.46) 573 (41.86) 2388 (172.31) (5.63) 193 (13.45) 344 (23.97) 513 (37.65) 712 (51.50) 1154 (82.26) (11.41) 384 (26.90) 683 (47.70) 1011 (70.79) 1361 (92.93) 1775 (116.60) parameter sets chosen in terms of computation time for different tolerance for each method are N I = 2 and ν = 0.02 for CP (see table 9), δ = 1.3 and ν = 0.1 for SIU (see table 10), γ = 1.3 and λ = 1/8 for PDFP 2 O (see table 11). All these results are compared in table 12. Figure 4 gives the corresponding images recovered for ε = From table 12, we can observe that the evolution of PSNR and energy are very close for PDFP 2 O and SIU, although 24

25 Table 10. Performance evaluation for different choices of δ and ν in SIU for CT reconstruction. The impacts of ν for δ = 0.4, 0.7, 1, 1.2 are similar with δ = 1.3, and we only list the results with different ν for δ = 1.3. δ ν ε = 10 1 ε = 10 2 ε = 10 3 ε = 10 4 ε = 10 5 ε = (3.56) 1196 (8.52) 2127 (15.17) 3146 (22.43) 4210 (30.01) (2.00) 684 (4.79) 1216 (8.52) 1798 (12.60) 2408 (16.87) 3071 (21.61) (1.47) 479 (3.52) 852 (6.20) 1261 (9.08) 1695 (12.12) 2195 (15.64) (1.17) 399 (2.80) 710 (4.98) 1051 (7.38) 1415 (9.94) 1842 (12.93) (1.29) (1.08) 368 (2.59) 655 (4.61) 971 (6.84) 1307 (9.20) 1707 (12.03) (1.09) 369 (2.59) 656 (4.61) 975 (6.85) 1326 (9.32) 1816 (12.77) (1.10) 373 (2.62) 668 (4.70) 1019 (7.16) 1580 (11.10) 4881 (34.56) (1.14) 380 (2.67) 685 (4.81) 1098 (7.72) 2336 (16.58) Table 11. Performance evaluation for different choices of γ and λ in PDFP 2 O for CT reconstruction. The impacts of different λ for γ = 0.4, 0.7, 1, 1.2 are similar with γ = 1.3, and we only list the cases with different λ for γ = 1.3. γ λ ε = 10 1 ε = 10 2 ε = 10 3 ε = 10 4 ε = 10 5 ε = /8 501 (3.51) 1196 (8.40) 2127 (15.10) 3145 (22.24) 4208 (29.66) 0.7 1/8 286 (2.01) 684 (4.79) 1216 (8.53) 1799 (12.61) 2409 (16.89) 3074 (21.56) 1 1/8 201 (1.41) 479 (3.34) 851 (5.93) 1260 (8.77) 1692 (11.79) 2178 (15.16) 1.2 1/8 167 (1.16) 399 (2.78) 710 (4.94) 1051 (7.33) 1414 (9.85) 1838 (12.81) 1.3 1/5 157 (1.10) 590 (4.11) 1/6 154 (1.07) 368 (2.57) 655 (4.57) 970 (6.76) 1303 (9.16) 1687 (11.94) 1/8 154 (1.11) 368 (2.61) 655 (4.61) 971 (6.81) 1307 (9.15) 1709 (11.95) 1/ (1.09) 369 (2.57) 656 (4.57) 975 (6.80) 1327 (9.26) 1825 (12.72) 1/ (1.08) 370 (2.58) 659 (4.59) 984 (6.85) 1371 (9.56) 2320 (16.16) Table 12. Performance evaluation comparison among ASB, CP, SIU and PDFP 2 O in CT reconstruction. For a given error tolerance ε, the first column in the bracket gives the first outer iteration number k such that ε k <ε, the second column in the bracket gives the corresponding run time in second and the third column in the bracket gives the corresponding PSNR. For ASB 1, N I = 2, ν = 0.01; for ASB 2, N I = 2, ν = For CP, N I = 2, σ = For SIU, δ = 1.3, ν = 0.1. For PDFP 2 O, γ = 1.3, λ = 1/8. ε = 10 2 ε = 10 3 ε = 10 4 ε = 10 5 ASB 1 (67, 1.42, 30.29) (129, 2.72, 31.97) (279, 5.87, 32.43) (1068, 24.49, 32.45) ASB 2 (124, 2.59, 30.06) (229, 4.79, 31.75) (345, 7.24, 32.26) (492, 10.33, 32.41) CP (44, 1.72, 30.48) (167, 3.37, 31.80) (267, 5.40, 32.32) (588, 12.73, 32.45) SIU (368, 2.59, 30.03) (655, 4.61, 31.70) (971, 6.84, 32.23) (1307, 9.20, 32.38) PDFP 2 O (368, 2.61, 30.03) (655, 4.61, 31.70) (971, 6.81, 32.23) (1307, 9.15, 32.38) the iterative schemes are different for the two algorithms. We note that both ASB and CP have better convergence than PDFP 2 O and SIU in the early steps, while PDFP 2 O and SIU are slightly better if higher accuracy is required. One explanation for this behavior is that when the condition number of A T A is large, explicit methods such as PDFP 2 O and SIU approximate the inverse of A T A slowly. For ASB and CP, the regularized inverses (A T A + νb T B) 1 and (I+τA T A) 1 are used respectively as preconditioning that can partially avoid bad conditioning of A T A at the early steps, while it takes longer due to unnecessary inner iterations later on. For 25

26 Original FBP ASB, PSNR=32.43 CP, PSNR=32.32 SIU, PSNR=32.23 PDFP 2 O, PSNR=32.23 Figure 4. A tomographic reconstruction example for a image, with 50 projections corresponding to tolerance error ε = 10 4 with noise level 1. the bad conditioned case, we can consider an efficient preconditioning technique for PDFP 2 O, which will be presented in a forthcoming work Parallel MRI Magnetic resonance imaging (MRI) is a medical imaging technique largely used in clinical radiology to visualize the internal structure and function of the body by noninvasive and nonionizing means. It provides better contrast between the different soft tissues than other modalities such as CT and PET. MRI images are obtained through an inversion of Fourier data acquired by the receiver coils. Parallel MRI (pmri) is a recent technique to accelerate sampling speed in conventional MRI. Instead of relying on increased gradient performance for increased imaging speed, pmri extracts extra spatial information from an array of surface coils, surrounding the scanned objects by multiple receivers and collecting in the parallel part of the Fourier components at each receiver, resulting in an accelerated image acquisition. There are two general approaches for removing the aliasing artifacts due to Fourier space subsampling: image-domain-methods and k-space-based methods (see [5]). On the other hand, the total variation regularization has also been considered in the literature in order to obtain a better image quality, such as [21, 22, 11]. In this paper, we employ image-domain-based methods and coil sensitivity maps to reconstruct the underlying image. Sensitivity encoding (SENSE) is the most common imagedomain-based parallel imaging method. It is based on the following model which relates the partial k-space data b j, acquired by the jth receiver and the unknown image x: b j = DFS j x + n, where b j is the vector of measured Fourier coefficients at receiver j, D is a diagonal downsampling operator, F is the Fourier transform, S j corresponds to diagonal coil sensitivity 26

(a) Coil 1 Coil 2 Coil 3 Coil 4 Coil 1 Coil 2 Coil 3 Coil 4 (b) Coil 5 Coil

In vivo MR images acquired (a) four-channel spine data and (b)

In practice, S j can be often estimated in advance.

problem with the total variation regularization x = arg min x R n μ Bx 1 +

2) j=1 where B is the discrete gradient matrix and N is the total number of

Conventionally, the downsampling operator D is implemented with the

using a four-channel CTL spine array and the second is a brain data set

27 (a) Coil 1 Coil 2 Coil 3 Coil 4 Coil 1 Coil 2 Coil 3 Coil 4 (b) Coil 5 Coil 6 Coil 7 Coil 8 Figure 5. In vivo MR images acquired (a) four-channel spine data and (b) eight-channel head data. mapping for receiver j and n is Gaussian noise. In practice, S j can be often estimated in advance. Let A j = DFS j ; then we can recover images x by solving the least-squares problem with the total variation regularization x = arg min x R n μ Bx N A j x b j 2 2, (5.2) j=1 where B is the discrete gradient matrix and N is the total number of receivers. Conventionally, the downsampling operator D is implemented with the sampling ratio R = 2, 4 along one dimension (corresponding to phase-encoding). In our experiments, we use the test data provided by the online MATLAB toolbox PULSAR [20]. The toolbox contains two sets of real data acquired by MR systems with a coil array. The first is a spine data set acquired on a3teslawhole-body GE scanner using a four-channel CTL spine array and the second is a brain data set acquired using an eight-channel head array. For the details of machine configuration, see [20]. Figure 5 shows the images of the multichannel data, four coils for the spine and eight coils for the brain. We use the sensibility map S j estimated by the build-in function in PULSAR. The square root of the sum of squares (SOS) image of the N full data coils (without downsampling D) is used as a reference image, for which pixel (i, j) is given by SOS(i, j) = N (F 1 b k )(i, j) 2. k=1 27

28 Given a region of interest (ROI), a measure of the artifact power (AP) for a reconstruction image I rec to a reference image I ref is defined as (i, j) ROI AP = Iref (i, j) c I rec (i, j) 2, (i, j) ROI Iref (i, j) 2 where c = (i, j) ROI Iref (i, j) 2 / (i, j) ROI Irec (i, j) 2. The factor c is used to minimize the scaling effect that might be introduced during the reconstruction process. Another useful index for evaluating image quality is a two-region SNR if a reference image is not available. The two-region SNR method is calculated from a ROS (region of signal) and a RON (region of noise) by SNR = 20 log 10 Mean of ROS Standard Deviation of RON. This SNR measure strongly depends on the locations of the ROS and RON defined. The RON can usually be selected from the background areas where no object features are present. The ROS and ROI definitions used for the SNR evaluation are also shown in the first image SOS in figures 6 and 7. In the experiments, we set μ = and zero as the initial values for the reconstruction algorithms ASB, CP, SIU and PDFP 2 O. Now we estimate the maximum eigenvalue of the system matrix A A = N j=1 A j A j, where A is the conjugate transpose of A. According to A j = DFS j,wehave A A = N S j (F D T DF)S j = S (F D T DF)S, where S = j=1 and S j denotes the conjugate of sensibility mapping and F = F 1 is the inverse Fourier transform. Since D is a downsampling operator with the diagonal elements 1 and 0, we have λ max (F 1 D T DF) = 1. Thus, ( N ) λ max (A A) = λ max (S F D T DFS) = DFS 2 2 DF 2 2 S 2 2 = λ max(s S) = λ max Si S i. The matrix N i=1 S i S i is diagonal and the maximum diagonal element is approximately 1. Based on this simple calculation, we can set γ 2, λ = 1/8 by using our nearly optimal rule on applying PDFP 2 O. We also observe that a slightly larger γ may yield slightly better performance for R = 4 in the two data sets, but we still set the universal parameter γ = 2 according to the simple rule for different test data and sampling ratios. For the other three methods, we run a big host of parameter sets and iteration numbers to choose the optimal one according to the criteria AP and SNR. Figures 6 and 7 show the recovered images with different methods for the two test data sets. According to the numerical results from [20], we choose the best three methods, SENSE, SPACE-RIP and GRAPPA, for comparison. The reconstruction results of SENSE, SPASE-RIP and GRAPPA are obtained using the PULSAR toolbox. In figure 6 on spine data, for the case of sample ratio R = 2, all the four methods based on the TV model (5.2) outperform SENSE, SPACE-RIP and GRAPPA in terms of both computation time and SNR. Among the four TV methods, SIU and PDFP 2 O perform closely as expected, and PDFP 2 O takes slightly less time to attain the similar AP and SNR with ASB and CP. For R = 4 on spine data, SENSE fails to reconstruct a clean image, and GRAPPA has the best AP value. Among the four TV methods, ASB uses the least amount of time with the optimal parameter sets. Similar results are obtained for the brain data (figure 7). The four TV-based methods (5.2) outperform SENSE, 28 S 1. S N i=1

SOS SENSE SPACE-RIP GRAPPA ROI ROS RON R=2 AP 0 0.004159 0.004159 0.002148 SNR 31.38 27.18 27.18 29.87 time 0.17 6.10 102.55 11.44 ASB CP SIU PDFP 2 O R=2 AP 0.002502 0.002529 0.002522 0.

29 SOS SENSE SPACE-RIP GRAPPA ROI ROS RON R=2 AP SNR time ASB CP SIU PDFP 2 O R=2 AP SNR time SOS SENSE SPACE-RIP GRAPPA R=4 AP SNR time ASB CP SIU PDFP 2 O R=4 AP SNR time Figure 6. Recover results from the four-channel in vivo spine data with the subsample ratio R = 2, 4. The size of the image is The AP, SNR and run time are shown under each image. ROI: [15, 220] [35, 165]; ROS: [50, 120] [130, 160]; RON: [190, 250] [185, 215]. N O denotes the outer iteration numbers and N I denotes the inner iteration numbers. For R = 2, ν = 0.1, N I = 2, N O = 4 for ASB; θ = 1, τ = 1/(8σ), σ = 0.05, N I = 1, N O = 8forCP; δ = 1.5, ν = 0.05, N O = 8 for SIU; γ = 2, λ = 1/8, N O = 8 for PDFP 2 O. For R = 4, ν = 0.01, N I = 10, N O = 20 for ASB; θ = 1, τ = 1/(8σ), σ = 0.005, N I = 5, N O = 100 for CP; δ = 2.4, ν = , N O = 500 for SIU; γ = 2, λ = 1/8, N O = 500 for PDFP 2 O. 29

SOS SENSE SPACE-RIP GRAPPA R O S R O I R=2 R O N AP 0 0.001939 0.001939 0.001624 SNR 36.08 31.08 31.08 29.06 time 0.19 5.94 155.88 44.85 ASB CP SIU PDFP 2 O R=2 AP 0.000811 0.000823 0.000823 0.000822 SNR 39.

30 SOS SENSE SPACE-RIP GRAPPA R O S R O I R=2 R O N AP SNR time ASB CP SIU PDFP 2 O R=2 AP SNR time SOS SENSE SPACE-RIP GRAPPA R=4 AP SNR time ASB CP SIU PDFP 2 O R=4 AP SNR time Figure 7. Recover results from the eight-channel in vivo brain data with the subsample ratio R = 2, 4. The AP, SNR and run time are shown for each image. The size of the image is ROI: [65, 190] [15, 220]; ROS: [70, 120] [110, 180]; RON: [220, 250] [200, 250]. N O denotes the iteration (outer) number and N I denotes the inner iterations number. For R = 2, ν = 0.1, N I = 5, N O = 4 for ASB; θ = 1, τ = 1/(8σ), σ = 0.05, N I = 1, N O = 25 for CP; δ = 1.5, ν = 0.05, N O = 25 for SIU; γ = 2, λ = 1/8, N O = 25 for PDFP 2 O. For R = 4, ν = 0.05, N I = 10, N O = 10 for ASB; θ = 1, τ = 1/(8σ), σ = 0.01, N I = 2, N O = 50 for CP; δ = 2.2, ν = 0.008, N O = 150 for SIU; γ = 2, λ = 1/8, N O = 160 for PDFP 2 O. 30

Primal-dual fixed point algorithms for separable minimization problems and their applications in imaging

1/38 Primal-dual fixed point algorithms for separable minimization problems and their applications in imaging Xiaoqun Zhang Department of Mathematics and Institute of Natural Sciences Shanghai Jiao Tong