STATISTICAL image reconstruction (SIR) methods [1, 2]

Size: px

Start display at page:

Download "STATISTICAL image reconstruction (SIR) methods [1, 2]"

Esther Bryan
5 years ago
Views:

1 Relaxed Linearized Algorithms for Faster X-Ray CT Image Reconstruction Hung Nien, Member, IEEE, and Jeffrey A. Fessler, Fellow, IEEE arxiv:.464v [math.oc] 4 Dec Abstract Statistical image reconstruction (SIR) methods are studied extensively for X-ray computed tomography (CT) due to the potential of acquiring CT scans with reduced X-ray dose while maintaining image quality. However, the longer reconstruction time of SIR methods hinders their use in X-ray CT in practice. To accelerate statistical methods, many optimization techniques have been investigated. Over-relaxation is a common technique to speed up convergence of iterative algorithms. For instance, using a relaxation parameter that is close to two in alternating direction method of multipliers (ADMM) has been shown to speed up convergence significantly. This paper proposes a relaxed linearized augmented Lagrangian (AL) method that shows theoretical faster convergence rate with over-relaxation and applies the proposed relaxed linearized AL method to X-ray CT image reconstruction problems. Experimental results with both simulated and real CT scan data show that the proposed relaxed algorithm (with ordered-subsets [OS] acceleration) is about twice as fast as the existing unrelaxed fast algorithms, with negligible computation and memory overhead. Index Terms Statistical image reconstruction, computed tomography, ordered subsets, augmented Lagrangian, relaxation. I. INTRODUCTION STATISTICAL image reconstruction (SIR) methods [, ] have been studied extensively and used widely in medical imaging. In SIR methods, one models the physics of the imaging system, the statistics of noisy measurements, and the prior information of the object to be imaged, and then finds the best fitted estimate by minimizing a cost function using iterative algorithms. By considering noise statistics when reconstructing images, SIR methods have better biasvariance performance and noise robustness. However, the iterative nature of algorithms in SIR methods also increases the reconstruction time, hindering their ubiquitous use in X- ray CT in practice. Penalized weighted least-squares (PWLS) cost functions with a statistically weighted quadratic data-fidelity term are commonly used in SIR methods for X-ray CT [3]. Conventional SIR methods include the preconditioned conjugate gradient (PCG) method [4] and the separable quadratic surrogate (SQS) method with ordered-subsets (OS) acceleration []. These first-order methods update the image based on the gradient of the cost function at the current estimate. Due to the time-consuming forward/back-projection operations in X- ray CT when computing gradients, conventional first-order This work is supported in part by National Institutes of Health (NIH) grant U-EB-873 and by equipment donations from Intel Corporation. Hung Nien and Jeffrey A. Fessler are with the Department of Electrical Engineering and Computer Science, University of Michigan, Ann Arbor, MI 489, USA ( {hungnien,fessler}@umich.edu). methods are typically very slow. The efficiency of PCG relies on choosing an appropriate preconditioner of the highly shiftvariant Hessian caused by the huge dynamic range of the statistical weighting. In -D CT, one can introduce an auxiliary variable that separates the shift-variant and approximately shift-invariant components of the weighted quadratic datafidelity term using a variable splitting technique [6], leading to better conditioned inner least-squares problems. However, this method has not worked well in 3-D CT, probably due to the 3-D cone-beam geometry and helical trajectory. OS-SQS accelerates convergence using more frequent image updates by incremental gradients, i.e., computing image gradients with only a subset of data. This method usually exhibits fast convergence behavior in early iterations and becomes faster by using more subsets. However, it is not convergent in general [7, 8]. When more subsets are used, larger limit cycles can be observed. Unlike methods that update all voxels simultaneously, the iterative coordinate descent (ICD) method [9] updates one voxel at a time. Experimental results show that ICD approximately minimizes the PWLS cost function in several passes of the image volume if initialized appropriately; however, the sequential nature of ICD makes it difficult to parallelize and restrains the use of modern parallel computing architectures like GPU for speed-up. OS-mom [] and OS-LALM [] are two recently proposed iterative algorithms that demonstrate promising fast convergence speed when solving 3-D X-ray CT image reconstruction problems. In short, OS-mom combines Nesterov s momentum techniques [, 3] with the conventional OS-SQS algorithm, greatly accelerating convergence in early iterations. OS-LALM, on the other hand, is a linearized augmented Lagrangian (AL) method [4] that does not require inverting an enormous Hessian matrix involving the forward projection matrix when updating images, unlike typical splitting-based algorithms [6], but still enjoys the empirical fast convergence speed and error tolerance of AL methods such as the alternating direction method of multipliers (ADMM) [ 7]. Further acceleration from an algorithmic perspective is possible but seems to be more challenging. Kim et al. [8, 9] proposed two optimal gradient methods (OGM s) that use a new momentum term and showed a -times speed-up for minimizing smooth convex functions, comparing to existing fast gradient methods (FGM s) [, 3,, ]. Over-relaxation is a common technique to speed up convergence of iterative algorithms. For example, it is very effective for accelerating ADMM [6, 7]. The same relaxation technique was also applied to linearized ADMM very recently [], but the speed-up was less significant than expected.

2 Chambolle et al. proposed a relaxed primal-dual algorithm (whose unrelaxed variant happens to be a linearized ADMM [3, Section 4.3]) and showed the first theoretical justification for speeding up convergence with over-relaxation [4, Theorem ]. However, their theorem also pointed out that when the smooth explicit term (majorization of the Lipschitz part in the cost function mentioned later) is not zero, one must use smaller primal step size to ensure convergence with over-relaxation, precluding the use of larger relaxation parameter (close to two) for more acceleration. This paper proposes a non-trivial relaxed variant of linearized AL methods that improves the convergence rate by using larger relaxation parameter values (close to two) but does not require the step-size adjustment in [4]. We apply the proposed relaxed linearized algorithm to X-ray CT image reconstruction problems, and experimental results show that our proposed relaxation works much better than the simple relaxation [] and significantly accelerates X- ray CT image reconstruction, even with ordered-subsets (OS) acceleration. This paper is organized as follows. Section II shows the convergence rate of a linearized AL method (LALM) with simple relaxation and proposes a novel relaxed LALM whose convergence rate scales better with the relaxation parameter. Section III applies the proposed relaxed LALM to X-ray CT image reconstruction and uses a second-order recursive system analysis to derive a continuation sequence that speeds up the proposed algorithm. Section IV reports the experimental results of X-ray CT image reconstruction using the proposed algorithm. Finally, we draw conclusions in Section V. Online supplementary material contains many additional results and derivation details. II. RELAXED LINEARIZED AL METHODS We begin by discussing a more general constrained minimization problem for which X-ray CT image reconstruction is a special case considered in Section III. Consider an equalityconstrained minimization problem: (ˆx,û) arg min x,u { gy (u)+h(x) } s.t. u = Ax, () where g y and h are closed and proper convex functions. In particular, g y is a loss function that measures the discrepancy between the linear modelax and noisy measurementy, andh is a regularization term that introduces prior knowledge ofxto the reconstruction. We assume that the regularizer h φ+ψ is the sum of two convex components φ and ψ, where φ has inexpensive proximal mapping (prox-operator) defined as prox φ (x) arg min z {φ(z)+ z x }, () e.g., soft-shrinkage for the l -norm and truncating zeros for non-negativity constraints, and where ψ is continuously differentiable with L ψ -Lipschitz gradients [, p. 48], i.e., ψ(x ) ψ(x ) L ψ x x (3) for any x and x in the domain of ψ. The Lipschitz condition of ψ implies the (quadratic) majorization condition of ψ: ψ(x ) ψ(x )+ ψ(x ),x x + L ψ x x. (4) More generally, one can replace the Lipschitz constant L ψ by a diagonal majorizing matrix D ψ based on the maximum curvature [6] or Huber s optimal curvature [7, p. 84] of ψ while still guaranteeing the majorization condition: ψ(x ) ψ(x )+ ψ(x ),x x + x x D ψ. () We show later that decomposing h into the proximal part φ and the Lipschitz part ψ is useful when solving minimization problems with composite regularization. For example, Section III writes iterative X-ray CT image reconstruction as a special case of (), where g y is a weighted quadratic function, and h is an edge-preserving regularizer with a non-negativity constraint on the reconstructed image. A. Preliminaries Solving the equality-constrained minimization problem () is equivalent to finding a saddle-point of the Lagrangian: L(x,u,µ) f(x,u) µ,ax u, (6) where f(x,u) g y (u) + h(x), and µ is the Lagrange multiplier of the equality constraint [8, p. 37]. In other words, (ˆx, û, ˆµ) solves the minimax problem: (ˆx,û, ˆµ) arg min max x,u µ L(x,u,µ). (7) Moreover, since (ˆx, û, ˆµ) is a saddle-point of L, the following inequalities hold for any x, u, and µ: L(x,u, ˆµ) L(ˆx,û, ˆµ) L(ˆx,û,µ). (8) The non-negative duality gap function: G(x,u,µ;ˆx,û, ˆµ) L(x,u, ˆµ) L(ˆx,û,µ) = [ f(x,u) f(ˆx,û) ] ˆµ,Ax u (9) characterizes the accuracy of an approximate solution (x,u,µ) to the saddle-point problem (7). Note that û = Aˆx due to the equality constraint. Besides solving the classic Lagrangian minimax problem (7), (ˆx, û, ˆµ) also solves a family of minimax problems: (ˆx,û, ˆµ) arg min x,u max µ L AL(x,u,µ), () where the augmented Lagrangian (AL) [, p. 97] is L AL (x,u,µ) L(x,u,µ)+ ρ Ax u. () The augmented quadratic penalty term penalizes the feasibility violation of the equality constraint, and the AL penalty parameter ρ > controls the curvature of L AL but does not change the solution, sometimes leading to better conditioned minimax problems. One popular iterative algorithm for solving equalityconstrained minimization problems based on the AL theory is ADMM, which solves the AL minimax problem (), and thus the equality-constrained minimization problem (), in an alternating direction manner. More precisely, ADMM minimizes AL () with respect to x and u alternatingly, followed by a gradient ascent of µ with step size ρ. One can also interpolate or extrapolate variables in subproblems,

3 3 leading to a relaxed AL method [6, Theorem 8]: { x (k+) arg min h(x) µ (k),ax + ρ } x Ax u (k) { u (k+) arg min g y (u)+ µ (k),u + ρ (k+) u r u,α u } µ (k+) = µ (k) ρ ( r (k+) u,α u (k+)), () where the relaxation variable of u is: r (k+) u,α αax (k+) +( α)u (k), (3) and < α < is the relaxation parameter. It is called overrelaxation when α > and under-relaxation when α <. When α is unity, () reverts to the standard (alternating direction) AL method []. Experimental results suggest that over-relaxation with α [.,.8] can accelerate convergence [7]. Although () is used widely in applications, two concerns about the relaxed AL method () arise in practice. First, the cost function of the x-subproblem in () contains the augmented quadratic penalty of AL that involves A, deeply coupling elements of x and often leading to an expensive iterative x-update, especially when A is large and unstructured, e.g., in X-ray CT. This motivates alternative methods like LALM [, 4]. Second, even though LALM removes the x-coupling due to the augmented quadratic penalty, the regularization term h might not have inexpensive proximal mapping and still require an iterative x-upate (albeit without using A). This consideration inspires the decomposition h φ +ψ used in the algorithms discussed next. B. Linearized AL methods with simple relaxation In LALM, one adds an iteration-dependent proximity term: x x (k) (4) P to thex-update in () withα =, wherepis a positive semidefinite matrix. Choosing P = ρg, where G L A I A A, and L A denotes the maximum eigenvalue of A A, the nonseparable Hessian of the augmented quadratic penalty of AL is cancelled, and the Hessian of ρ Ax u (k) + ρ x x (k) () G becomes a diagonal matrix ρl A I, decoupling x in the x- update except for the effect of h. This technique is known as linearization (more precisely, majorization) because it majorizes a non-separable quadratic term by its linear component plus some separable qradratic proximity term. In general, one can also use G D A A A, (6) where D A A A is a diagonal majorizing matrix of A A, e.g., D A = diag{ A A } A A [], and still guarantee the positive semi-definiteness of P. This trick can be applied to () when α, too. To remove the possible coupling due to the regularization term h, we replace the Lipschitz part of h φ + ψ in the Because () is quadratic, not linear, a more apt term would be majorized rather than linearized. We stick with the term linearized for consistency with the literature on LALM. x-update of () with its separable quadratic surrogate (SQS): ( Q ψ x;x (k) ) ψ ( x (k)) + ψ ( x (k)),x x (k) + x x (k) (7) D ψ shown in (4) and (). Note that (4) is just a special case of () when D ψ = L ψ I. Incorporating all techniques mentioned above, the x-update becomes simply a proximal mapping of φ, which by assumption is inexpensive. The resulting LALM with simple relaxation algorithm is: { ( φ(x)+qψ x;x (k) ) } µ (k),ax x (k+) arg min x + ρ Ax u (k) + ρ x x (k) { G u (k+) arg min g y (u)+ µ (k),u + ρ u r (k+) u,α u } µ (k+) = µ (k) ρ ( r (k+) u,α u (k+)). (8) When ψ =, (8) reverts to the L-GADMM algorithm proposed in []. In [], the authors analyzed the convergence rate of L-GADMM (for solving an equivalent variational inequality problem; however, there is no analysis on how relaxation parameter α affects the convergence rate) and investigated solving problems in statistical learning using L-GADMM. The speed-up resulting from over-relaxation was less significant than expected (e.g., when solving an X-ray CT image reconstruction problem discussed later). To explain the small speed-up, the following theorem shows that the duality gap (9) of the time-averaged approximate solution w K = (x K,u K,µ K ) generated by (8) vanishes at rate O(/K), where K is the number of iterations, and c K K K k= c(k) (9) denotes the time-average of some iterate c (k) for k = to K. Theorem. Let w K = (x K,u K,µ K ) be the time-averages of the iterates of LALM with simple relaxation in (8), where ρ > and < α <. We have ( ) G(w K ;ŵ) K ADψ +B ρ,da +C α,ρ, () where the first two constants A Dψ x () ˆx () D ψ B ρ,da ρ x () ˆx () D A A A depend on how far the initial guess is from a minimizer, and the last constant depends on the relaxation parameter [ ρ C α,ρ u () α û + µ () ρ ˆµ ]. (3) Proof. The proof is in the supplementary material. Theorem shows that (8) converges at rate O(/K), and the constant multiplying /K consists of three terms: A Dψ, B ρ,da, and C α,ρ. The first term A Dψ comes from the majorization of ψ, and it is large when ψ has large curvature. The second term B ρ,da comes from the linearization trick in (). One can always decrease its value by decreasing ρ. The third term C α,ρ is the only α-dependent component. The trend of C α,ρ when varying ρ depends on the norms of u () û and µ () ˆµ, i.e., how one initializes the algorithm. Finally, the

4 4 convergence rate of (8) scales well with α iff C α,ρ A Dψ and C α,ρ B ρ,da. When ψ has large curvature or D A is a loose majorizing matrix of A A (like in X-ray CT), the above inequalities do not hold, leading to poor scalability of convergence rate with the relaxation parameter α. C. Linearized AL methods with proposed relaxation To better scale the convergence rate of relaxed LALM with α, we want to design an algorithm that replaces the α- independent components by α-dependent ones in the constant multiplying /K in (). This can be (partially) done by linearizing (more precisely, majorizing) the non-separable AL penalty term in () implicitly. Instead of explicitly adding a G-weighted proximity term, where G is defined in (6), to the x-update like (8), we consider solving an equalityconstrained minimization problem equivalent to () with an additional redundant equality constraint v = G / x, i.e., { (ˆx,û,ˆv) arg min gy (u)+h(x) } x,u,v s.t. u = Ax and v = G / x, (4) using the relaxed AL method () as follows: ( φ(x)+q ψ x;x (k) ) µ (k),ax x (k+) arg min ν (k),g / x + ρ x Ax u (k) + ρ G / x v (k) { u (k+) arg min g y (u)+ µ (k),u + ρ u r (k+) u,α u } µ (k+) = µ (k) ρ ( r (k+) u,α u (k+)) v (k+) = r v,α (k+) ρ ν (k) ν (k+) = ν (k) ρ ( r (k+) v,α v (k+)), () where the relaxation variable of v is: r (k+) v,α αg / x (k+) +( α)v (k), (6) and ν is the Lagrange multiplier of the redundant equality constraint. One can easily verify thatν (k) = fork =,,... if we initialize ν as ν () =. The additional equality constraint introduces an additional inner-product term and a quadratic penalty term to the x- update. The latter can be used to cancel the non-separable Hessian of the AL penalty term as in explicit linearization. By choosing the same AL penalty parameter ρ > for the additional constraint, the Hessian matrix of the quadratic penalty term in the x-update of () is ρa A+ρG = ρd A. In other words, by choosing G in (6), the quadratic penalty term in the x-update of () becomes separable, and the x- update becomes an efficient proximal mapping of φ, as seen in () below. Next we analyze the convergence rate of the proposed relaxed LALM method (). With the additional redundant equality constraint, the Lagrangian becomes L (x,u,µ,v,ν) L(x,u,µ) ν,g / x v. (7) Setting gradients of L with respect to x, u, µ, v, and ν to be zero yields a necessary condition for a saddle-point ŵ = (ˆx,û, ˆµ,ˆv,ˆν) of L. It follows that ˆν = v L (ŵ) =. Therefore, setting ν () = is indeed a natural choice for initializing ν. Moreover, since ˆν =, the gap function G of the new problem (4) coincides with (9), and we can compare the convergence rate of the simple and proposed relaxed algorithms directly. Theorem. Let w K = (x K,u K,v K,µ K,ν K ) be the timeaverages of the iterates of LALM with proposed relaxation in (), where ρ > and < α <. When initializing v and ν as v () = G / x () and ν () =, respectively, we have G (w K ;ŵ) K ( ADψ +B α,ρ,da +C α,ρ ), (8) where A Dψ and C α,ρ were defined in () and (3), and B α,ρ,da ρ v () ˆv = ρ x () ˆx D A A A. (9) α Proof. The proof is in the supplementary material. Theorem shows the O(/K) convergence rate of (). Due to the different variable splitting scheme, the term introduced by the implicit linearization trick in () (i.e., B α,ρ,da ) also depends on the relaxation parameter α, improving convergence rate scalibility with α in () over (8). This theorem provides a theoretical explanation why () converges faster than (8) in the experiments shown later. For practical implementation, the remaining concern is multiplications by G / in (). There is no efficient way to compute the square root of G for any A in general, especially when A is large and unstructured like in X-ray CT. To solve this problem, let h G / v+a y. We rewrite () so that no explicit multiplication by G / is needed (the derivation is in the supplementary material), leading to the following LALM with proposed relaxation algorithm: { ( φ(x)+qψ x;x (k) ) } x (k+) arg min x + x (ρda ) γ (k+) { ρd A u (k+) arg min g y (u)+ µ (k),u + ρ (k+) u r u,α u } µ (k+) = µ (k) ρ ( r (k+) u,α u (k+)) h (k+) = αη (k+) +( α)h (k), () where and α γ (k+) ρa ( u (k) y+ρ µ (k)) +ρh (k), (3) η (k+) D A x (k+) A ( Ax (k+) y ). (3) When g y is a quadratic loss, i.e., g y (z) = (/) z y, we further simplify the proposed relaxed LALM by manipu- When ψ has large curvature (thus, α-dependent terms do not dominate the constant multiplying /K), we can use techniques as in [9, ] to reduce the ψ-dependent constant. In X-ray CT, the data-fidelity term often dominates the cost function, so A Dψ B α,ρ,da.

5 lations like those in [] (omitted here for brevity) as: γ (k+) = (ρ )g (k) +ρh { (k) ( φ(x)+qψ x;x (k) ) } x (k+) arg min x + x (ρd A ) γ (k+) ρd A ζ (k+) L ( x (k+)) = A ( Ax (k+) y ) g (k+) = ρ ( ρ+ αζ (k+) +( α)g (k)) + ρ+ g(k) h (k+) = α ( D A x (k+) ζ (k+)) +( α)h (k), (33) where L(x) g y (Ax) is the quadratic data-fidelity term, and g A (u y) []. For initialization, we suggest using g () = ζ () and h () = D A x () ζ () (Theorem ). The algorithm (33) computes multiplications by A and A only once per iteration and does not have to invert A A, unlike standard relaxed AL methods (). This property is especially useful when A A is large and unstructured. When α =, (33) reverts to the unrelaxed LALM in []. Lastly, we contrast our proposed relaxed LALM () with Chambolle s relaxed primal-dual algorithm [4, Algorithm ]. Both algorithms exhibit O(/K) ergodic (i.e., with respect to the time-averaged iterates) convergence rate and α-times speed-up when ψ =. Using () would require one more multiplication by A per iteration than in Chambolle s relaxed algorithm; however, the additional A is not required with quadratic loss in (33). When ψ, unlike Chambolle s relaxed algorithm in which one has to adjust the primal step size according to the value of α (effectively, one scales D ψ by /( α)) [4, Remark 6], the proposed relaxed LALM () does not require such step-size adjustment, which is especially useful when using α that is close to two. III. X-RAY CT IMAGE RECONSTRUCTION Consider the X-ray CT image reconstruction problem [3]: { } ˆx arg min x Ω y Ax W +R(x), (34) where A is the forward projection matrix of a CT scan [3], y is the noisy sinogram, W is the statistical diagonal weighting matrix, R denotes an edge-preserving regularizer, and Ω denotes a box-constraint on the image x. We focus on the edge-preserving regularizer R defined as: R(x) β i κ n κ n+si ϕ i ([C i x] n ), (3) i n where β i, s i, ϕ i, and C i denote the regularization parameter, spatial offset, potential function, and finite difference matrix in the ith direction, respectively, and κ n is a voxel-dependent weight for improving resolution uniformity [3, 33]. In our experiments, we used 3 directions to include all 6 neighbors in 3-D CT. A. Relaxed OS-LALM for faster CT reconstruction To solve X-ray CT image reconstruction (34) using the proposed relaxed LALM (33), we apply the following substitution: { A W / A y W / (36) y, and we set φ = ι Ω and ψ = R, where ι Ω (x) = if x Ω, and ι Ω (x) = + otherwise. The proximal mapping of ι Ω simply projects the input vector to the convex set Ω, e.g., clipping negative values of x to zero for a non-negativity constraint. Theorems developed in Section II considered the ergodic convergence rate of the non-negative duality gap, which is not a common convergence metric for X-ray CT image reconstruction. However, the ergodic convergence rate analysis suggests how factors like α, ρ, D A, and D ψ affect convergence speed (a LASSO regression example can be found in the supplementary material) and motivates our more practical (over-)relaxed OS-LALM summarized below. Algorithm : Proposed (over-)relaxed OS-LALM for (34). Input: M, α <, and an initial (FBP) image x. set ρ =, ζ = g = M L M (x), h = D L x ζ for k =,,... do for m =,,...,M do s = ρ(d L x h)+( ρ)g x + = [ x (ρd L +D R ) (s+ R(x)) ] Ω ζ = M L m (x + ) g + = ρ ρ+ (αζ +( α)g)+ ρ+ g h + = α(d L x + ζ)+( α)h decrease ρ using (37) end end Algorithm describes the proposed relaxed algorithm for solving the X-ray CT image reconstruction problem (34), wherel m denotes the data-fidelity term of the mth subset, and [ ] Ω is an operator that projects the input vector onto the convex set Ω, e.g., truncating zeros for Ω {x x i for all i}. All variables are updated in-place, and we use the superscript ( ) + to denote the new values that replace the old values. We also use the substitution s ρd L x γ + in the proposed method, so Algorithm has comparable form with the unrelaxed OS-LALM []; however, such substitution is not necessary. As seen in Algorithm, the proposed relaxed OS-LALM has the form of (33) but uses some modifications that violate assumptions in our theorems but speed up convergence in practice. First, although Theorem assumes a constant majorizing matrix D R for the Lipschitz term R (e.g., the maximum curvature of R), we use the iteration-dependent Huber s curvature of R [6] for faster convergence (the same in other algorithms for comparison). Second, since the updates in (33) depend only on the gradients of L, we can further accelerate the gradient computation by using partial projection data, i.e., ordered subsets. Lastly, we incorporate continuation technique (i.e., decreasing the AL penalty parameter ρ every iteration) in the proposed algorithm as described in the next subsection. To select the number of subsets, we used the rule suggested in [, Eqn. and 7]. However, since over-relaxation provides two-times acceleration, we used % of the suggested number of subsets (for the unrelaxed OS-LALM) yet achieved similar convergence speed (faster in runtime since fewer regularizer gradient evaluations are performed) and more

6 6 stable reconstruction. For the implicit linearization, we use the diagonal majorizing matrix diag{a WA} for A WA [], the same diagonal majorizing matrix D L for the quadratic loss function used in OS algorithms. Furthermore, Algorithm depicts the OS version of the simple relaxed algorithm (8) for solving (34) (derivation is omitted here). The main difference between Algorithm and Algorithm is the extra recursion of variable h. When α =, both algorithms revert to the unrelaxed OS-LALM []. Algorithm : Simple (over-)relaxed OS-LALM for (34). Input: M, α <, and an initial (FBP) image x. set ρ =, ζ = g = M L M (x) for k =,,... do for m =,,...,M do s = ρζ +( ρ)g x + = [ x (ρd L +D R ) (s+ R(x)) ] Ω ζ + = M L m (x + ) g + = ρ ρ+ (αζ+ +( α)g)+ ρ+ g decrease ρ using (37) end end B. Further speed-up with continuation We also use a continuation technique [] to speed up convergence; that is, we decrease ρ gradually with iteration. Note that ρd L +D R is the inverse of the voxel-dependent step size of image updates; decreasing ρ increases step sizes gradually as iteration progress. Due to the extra relaxation parameter α, the good decreasing continuation sequence differs from that in []. We use the following α-dependent continuation sequence for the proposed relaxed LALM ( α < ):, if k = ρ k (α) = π α(k+) ( π α(k+) ), otherwise. (37) The supplementary material describes the rationale for this continuation sequence. When using OS, ρ decreases every subiteration, and the counter k in (37) denotes the number of subiterations, instead of the number of iterations. IV. EXPERIMENTAL RESULTS This section reports numerical results for 3-D X-ray CT image reconstruction using one conventional algorithm (OS-SQS []) and four contemporary algorithms: OS-FGM: the OS variant of the standard fast gradient method proposed in [, 9], OS-LALM: the OS variant of the unrelaxed linearized AL method proposed in [], OS-OGM: the OS variant of the optimal fast gradient method proposed in [9], and Relaxed OS-LALM: the OS variants of the proposed relaxed linearized AL methods given in Algorithm (proposed) and Algorithm (simple) above (α =.999 unless otherwise specified). A. XCAT phantom We simulated an axial CT scan using a XCAT phantom [34] for mm transaxial field-of-view (FOV), where x = y =.4883 mm and z =.6 mm. An ([detector columns] [detector rows] [projection views]) noisy (with Poisson noise) sinogram is numerically generated with GE LightSpeed fan-beam geometry corresponding to a monoenergetic source at 7 kev with incident photons per ray and no scatter. We reconstructed a 9 image volume with a coarser grid, where x = y =.9776 mm and z =.6 mm. The statistical weighting matrix W is defined as a diagonal matrix with diagonal entries w j exp( y j ), and an edge-preserving regularizer is used with ϕ i (t) δ ( t/δ log(+ t/δ )) (δ = HU) and parameters β i set to achieve a reasonable noise-resolution trade-off. We used subsets for the relaxed OS-LALM, while [, Eqn. ] suggests using about 4 subsets for the unrelaxed OS-LALM. Figure shows the cropped images (displayed from 8 to HU [modified so that air is ]) from the central transaxial plane of the initial FBP image x (), the reference reconstruction x (generated by running thousands of iterations of the convergent FGM with adaptive restart [3]), and the reconstructed image x () using the proposed algorithm (relaxed OS-LALM with subsets) after iterations. There is no visible difference between the reference reconstruction and our reconstruction. To analyze the proposed algorithm quantitatively, Figure shows the RMS differences between the reference reconstruction x and the reconstructed image x (k) using different algorithms as a function of iteration 3 with and 4 subsets. As seen in Figure, the proposed algorithm (cyan curves) is approximately twice as fast as the unrelaxed OS-LALM (green curves) at least in early iterations. Furthermore, comparing with OS-FGM and OS-OGM, the proposed algorithm converges faster and is more stable when using more subsets for acceleration. Difference images using different algorithms and additional experimental results are shown in the supplementary material. To illustrate the improved speed-up of the proposed relaxation (Algorithm ) over the simple one (Algorithm ), Figure 3 shows convergence rate curves of different relaxed algorithms ( subsets and α =.999) with (a) a fixed AL penalty parameter ρ =. and (b) the decreasing sequence ρ k in (37). As seen in Figure 3(a), the simple relaxation does not provide much acceleration, especially after iterations. In contrast, the proposed relaxation accelerates convergence about twice (i.e., α-times), as predicted by Theorem. When the decreasing sequence of ρ k is used, as seen in Figure 3(b), the simple relaxation seems to provide somewhat more acceleration than before; however, the proposed relaxation still outperforms the simple one, illustrating approximately twofold speed-up over the unrelaxed counterpart. 3 All algorithms listed above require one forward/back-projection pair and M (the number of subsets) regularizer gradient evaluations (plus some negligible overhead) per iteration, so comparing the convergence rate as a function of iteration is fair.

7 7 FBP Converged Proposed 9 Fig. : XCAT: Cropped images (displayed from 8 to HU) from the central transaxial plane of the initial FBP image x () (left), the reference reconstruction x (center), and the reconstructed image x () using the proposed algorithm (relaxed OS-LALM with subsets) after iterations (right) OS SQS OS FGM OS LALM OS OGM Relaxed OS LALM 4 3 OS SQS OS FGM OS LALM OS OGM Relaxed OS LALM HU HU HU (a) subsets HU (b) 4 subsets Fig. : XCAT: Convergence rate curves of different OS algorithms with (a) subsets and (b) 4 subsets. The proposed relaxed OS-LALM with subsets exhibits similar convergence rate as the unrelaxed OS-LALM with 4 subsets. 4 3 OS SQS OS LALM Relaxed OS LALM (Algorithm ) Relaxed OS LALM (Algorithm ) 4 3 OS SQS OS LALM Relaxed OS LALM (Algorithm ) Relaxed OS LALM (Algorithm ) HU HU HU (a) Fixed ρ =. HU (b) Decreasing ρ k in (37) Fig. 3: XCAT: Convergence rate curves of different relaxed algorithms ( subsets and α =.999) with (a) a fixed AL penalty parameter ρ =. and (b) the decreasing sequence ρ k in (37).

8 8 FBP Converged Proposed 9 Fig. 4: Chest: Cropped images (displayed from 8 to HU) from the central transaxial plane of the initial FBP image x () (left), the reference reconstruction x (center), and the reconstructed image x () using the proposed algorithm (relaxed OS-LALM with subsets) after iterations (right). 8 3 OS SQS OS FGM OS LALM OS OGM Relaxed OS LALM 3 OS SQS OS FGM OS LALM OS OGM Relaxed OS LALM HU HU HU HU (a) subsets (b) subsets Fig. : Chest: Convergence rate curves of different OS algorithms with (a) subsets and (b) subsets. The proposed relaxed OS-LALM with subsets exhibits similar convergence rate as the unrelaxed OS-LALM with subsets. B. Chest scan We reconstructed a 6 6 image volume, where x = y =.667 mm and z =.6 mm, from a chest region helical CT scan. The size of sinogram is and pitch. (about 3.7 rotations with rotation time.4 seconds). The tube current and tube voltage of the X-ray source are 7 ma and kvp, respectively. We started from a smoothed FBP image x () and tuned the statistical weights [36] and the q-generalized Gaussian MRF regularization parameters [33] to emulate the MBIR method [3, 37]. We used subsets for the relaxed OS-LALM, while [, Eqn. 7] suggests using about subsets for the unrelaxed OS-LALM. Figure 4 shows the cropped images from the central transaxial plane of the initial FBP imagex (), the reference reconstructionx, and the reconstructed image x () using the proposed algorithm (relaxed OS-LALM with subsets) after iterations. Figure shows the RMS differences between the reference reconstruction x and the reconstructed image x (k) using different algorithms as a function of iteration with and subsets. The proposed relaxed OS-LALM shows about two-times faster convergence rate, comparing to its unrelaxed counterpart, with moderate number of subsets. The speed-up diminishes as the iterate approaches the solution. Furthermore, the faster relaxed OS-LALM seems likely to be more sensitive to gradient approximation errors and exhibits ripples in convergence rate curves when using too many subsets for acceleration. In contrast, the slower unrelaxed OS-LALM is less sensitive to gradient error when using more subsets and does not exhibit such ripples in convergence rate curves. Compared with OS-FGM and OS-OGM, the proposed relaxed OS-LALM has smaller limit cycles and might be more stable for practical use. V. DISCUSSION AND CONCLUSIONS In this paper, we proposed a non-trivial relaxed variant of LALM and applied it to X-ray CT image reconstruction. Experimental results with simulated and real CT scan data showed that our proposed relaxed algorithm converges about twice as fast as its unrelaxed counterpart, outperforming stateof-the-art fast iterative algorithms using momentum [, 9].

9 9 This speed-up means that one needs fewer subsets to reach an RMS difference criteria like HU in a given number of iterations. For instance, we used % of the number of subsets suggested by [] (for the unrelaxed OS-LALM) in our experiment but found similar convergence speed with overrelaxation. Moreover, using fewer subsets can be beneficial for distributed computing [38], reducing communication overhead required after every update. ACKNOWLEDGMENT The authors thank GE Healthcare for providing sinogram data in our experiments. The authors would also like to thank the anonymous reviewers for their comments and suggestions. REFERENCES [] J. A. Fessler, Penalized weighted least-squares image reconstruction for positron emission tomography, IEEE Trans. Med. Imag., vol. 3, pp. 9, June 994. [] J. Nuyts, B. De Man, J. A. Fessler, W. Zbijewski, and F. J. Beekman, Modelling the physics in iterative reconstruction for transmission computed tomography, Phys. Med. Biol., vol. 8, pp. R63 96, June 3. [3] J.-B. Thibault, K. Sauer, C. Bouman, and J. Hsieh, A three-dimensional statistical approach to improved image quality for multi-slice helical CT, Med. Phys., vol. 34, pp , Nov. 7. [4] J. A. Fessler and S. D. Booth, Conjugate-gradient preconditioning methods for shift-variant PET image reconstruction, IEEE Trans. Im. Proc., vol. 8, pp , May 999. [] H. Erdoğan and J. A. Fessler, Ordered subsets algorithms for transmission tomography, Phys. Med. Biol., vol. 44, pp. 83, Nov [6] S. Ramani and J. A. Fessler, A splitting-based iterative algorithm for accelerated statistical X-ray CT reconstruction, IEEE Trans. Med. Imag., vol. 3, pp , Mar.. [7] S. Ahn and J. A. Fessler, Globally convergent image reconstruction for emission tomography using relaxed ordered subsets algorithms, IEEE Trans. Med. Imag., vol., pp. 63 6, May 3. [8] S. Ahn, J. A. Fessler, D. Blatt, and A. O. Hero, Convergent incremental optimization transfer algorithms: Application to tomography, IEEE Trans. Med. Imag., vol., pp , Mar. 6. [9] Z. Yu, J.-B. Thibault, C. A. Bouman, K. D. Sauer, and J. Hsieh, Fast model-based X-ray CT reconstruction using spatially non-homogeneous ICD optimization, IEEE Trans. Im. Proc., vol., pp. 6 7, Jan.. [] D. Kim, S. Ramani, and J. A. Fessler, Combining ordered subsets and momentum for accelerated X-ray CT image reconstruction, IEEE Trans. Med. Imag., vol. 34, pp , Jan.. [] H. Nien and J. A. Fessler, Fast X-ray CT image reconstruction using a linearized augmented Lagrangian method with ordered subsets, IEEE Trans. Med. Imag., vol. 34, pp , Feb.. [] Y. Nesterov, A method for unconstrained convex minimization problem with the rate of convergence O(/k ), Dokl. Akad. Nauk. USSR, vol. 69, no. 3, pp. 43 7, 983. [3] Y. Nesterov, Smooth minimization of non-smooth functions, Mathematical Programming, vol. 3, pp. 7, May. [4] X. Zhang, M. Burger, and S. Osher, A unified primal-dual algorithm framework based on Bregman iteration, Journal of Scientific Computing, vol. 46, no., pp. 46,. [] D. Gabay and B. Mercier, A dual algorithm for the solution of nonlinear variational problems via finite-element approximations, Comput. Math. Appl., vol., no., pp. 7, 976. [6] J. Eckstein and D. P. Bertsekas, On the Douglas-Rachford splitting method and the proximal point algorithm for maximal monotone operators, Mathematical Programming, vol., pp , Apr. 99. [7] S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein, Distributed optimization and statistical learning via the alternating direction method of multipliers, Found. & Trends in Machine Learning, vol. 3, no., pp.,. [8] D. Kim and J. A. Fessler, Optimized momentum steps for accelerating X-ray CT ordered subsets image reconstruction, in Proc. 3rd Intl. Mtg. on image formation in X-ray CT, pp. 3 6, 4. [9] D. Kim and J. A. Fessler, Optimized first-order methods for smooth convex minimization, Mathematical Programming, 6. [] Y. Nesterov, On an approach to the construction of optimal methods of minimization of smooth convex functions, Ekonomika i Mateaticheskie Metody, vol. 4, pp. 9 7, 988. In Russian. [] A. Beck and M. Teboulle, A fast iterative shrinkage-thresholding algorithm for linear inverse problems, SIAM J. Imaging Sci., vol., no., pp. 83, 9. [] E. X. Fang, B. He, H. Liu, and X. Yuan, Generalized alternating direction method of multipliers: New theoretical insight and application, Math. Prog. Comp., vol. 7, pp , June. [3] A. Chambolle and T. Pock, A first-order primal-dual algorithm for convex problems with applications to imaging, J. Math. Im. Vision, vol., no., pp. 4,. [4] A. Chambolle and T. Pock, On the ergodic convergence rates of a first-order primal-dual algorithm, Mathematical Programming, 6. [] D. P. Bertsekas, Nonlinear programming. Belmont: Athena Scientific, ed., 999. [6] H. Erdoğan and J. A. Fessler, Monotonic algorithms for transmission tomography, IEEE Trans. Med. Imag., vol. 8, pp. 8 4, Sept [7] P. J. Huber, Robust statistics. New York: Wiley, 98. [8] S. Boyd and L. Vandenberghe, Convex optimization. UK: Cambridge, 4. [9] S. Azadi and S. Sra, Towards an optimal stochastic alternating direction method of multipliers, in Proc. Intl. Conf. on Mach. Learning, pp. 6 8, 4. [] Y. Ouyang, Y. Chen, G. Lan, and E. Pasiliao Jr., An accelerated linearized alternating direction method of multipliers, SIAM J. Imaging Sci., vol. 8, no., pp ,. [3] Y. Long, J. A. Fessler, and J. M. Balter, 3D forward and back-projection for X-ray CT using separable footprints, IEEE Trans. Med. Imag., vol. 9, pp. 839, Nov.. [3] J. A. Fessler and W. L. Rogers, Spatial resolution properties of penalized-likelihood image reconstruction methods: Space-invariant tomographs, IEEE Trans. Im. Proc., vol., pp , Sept [33] J. H. Cho and J. A. Fessler, Regularization designs for uniform spatial resolution and noise properties in statistical image reconstruction for 3D X-ray CT, IEEE Trans. Med. Imag., vol. 34, pp , Feb.. [34] W. P. Segars, M. Mahesh, T. J. Beck, E. C. Frey, and B. M. W. Tsui, Realistic CT simulation using the 4D XCAT phantom, Med. Phys., vol. 3, pp. 38 8, Aug. 8. [3] B. O Donoghue and E. Candès, Adaptive restart for accelerated gradient schemes, Found. Comp. Math., vol., pp. 7 3, June. [36] Z. Chang, R. Zhang, J.-B. Thibault, K. Sauer, and C. Bouman, Statistical x-ray computed tomography from photon-starved measurements, in Proc. SPIE 9 Computational Imaging XII, p. 9G, 4. [37] W. P. Shuman, D. E. Green, J. M. Busey, O. Kolokythas, L. M. Mitsumori, K. M. Koprowicz, J.-B. Thibault, J. Hsieh, A. M. Alessio, E. Choi, and P. E. Kinahan, Model-based iterative reconstruction versus adaptive statistical iterative reconstruction and filtered back projection in 64-MDCT: Focal lesion detection, lesion conspicuity, and image noise, Am. J. Roentgenol., vol., pp. 7 6, May 3. [38] J. M. Rosen, J. Wu, T. F. Wenisch, and J. A. Fessler, Iterative helical CT reconstruction in the cloud for ten dollars in five minutes, in Proc. Intl. Mtg. on Fully 3D Image Recon. in Rad. and Nuc. Med, pp. 4 4, 3.

10 Relaxed Linearized Algorithms for Faster X-Ray CT Image Reconstruction: Supplementary Material Hung Nien, Member, IEEE, and Jeffrey A. Fessler, Fellow, IEEE arxiv:.464v [math.oc] 4 Dec This supplementary material for [] has three parts. The first part analyzes the convergence rate of the simple and proposed relaxed linearized augmented Lagrangian (AL) methods (LALM s) in [] for solving an equality-constrained composite convex optimization problem. We demonstrate the convergence rate bound and the effect of relaxation with a numerical example (LASSO regression). The second part derives the continuation sequence we used in []. The third part shows additional experimental results of applying the proposed relaxed LALM with ordered subsets (OS) for solving model-based X-ray computed tomography (CT) image reconstruction problems. The additional experimental results are consistent with the results we showed in [], illustrating the efficiency and stability of the proposed relaxed OS-LALM over existing methods. I. CONVERGENCE RATE ANALYSES OF THE SIMPLE AND PROPOSED LALM S We begin by considering a more general equality-constrained composite convex optimization problem (for which the equalityconstrained minimization problem considered in [] is a special case): { } (ˆx,û) arg min f(x,u) g(u)+h(x) s.t. Kx+Bu = b, () x,u where both g and h are closed and proper convex functions. We further decompose h φ+ψ into two convex functions φ and ψ, where φ is simple in the sense that it has an efficient proximal mapping, e.g., soft-shrinkage for the l -norm, and ψ is continuously differentiable with D ψ -Lipschitz gradients (defined in []). One example of h is the edge-preserving regularizer with a non-negativity constraint (e.g., sum of a corner-rounded total-variation [TV] regularizer and the characteristic function of the non-negativity set) used in statistical image reconstruction methods [, ]. As mentioned in [], solving a composite convex optimization problem with equality constraints like () is equivalent to finding a saddle-point of the Lagrangian: L(x,u,µ) f(x,u) µ,kx+bu b, () where µ is the Lagrange multiplier of the equality constraint [3, p. 37]. In other words, (ˆx, û, ˆµ) solves the minimax problem: (ˆx,û, ˆµ) arg min max x,u µ Moreover, since (ˆx, û, ˆµ) is a saddle-point of L, the following inequalities hold for any x, u, and µ, and the duality gap function: L(x,u,µ). (3) L(x,u, ˆµ) L(ˆx,û, ˆµ) L(ˆx,û,µ) (4) G(x,u,µ;ˆx,û, ˆµ) L(x,u, ˆµ) L(ˆx,û,µ) = [ f(x,u) f(ˆx,û) ] ˆµ,Kx+Bu b () characterizes the accuracy of an approximate solution (x,u,µ) to the saddle-point problem (3). Note that Kˆx+Bû b = due to the equality constraint. We consider the following (generalized alternating direction method of multipliers [ADMM]) iteration: { x (k+) arg min φ(x)+ ψ ( x (k)),x + x { u (k+) arg min g(u) µ (k),bu + ρ u x x (k) µ (k),kx + ρ D ψ αkx (k+) +( α) ( b Bu (k)) +Bu b Kx+Bu (k) b + } x x (k) } P µ (k+) = µ (k) ρ ( αkx (k+) +( α) ( b Bu (k)) +Bu (k+) b ) (6) and show that the duality gap of the time-averaged solution w K = (x K,u K,µ K ) it generates converges to zero at rate O(/K), where K is the number of iterations, c K K K k= c(k) (7) denotes the time-average of some iterate c (k) for k =,...,K, ρ > is the corresponding AL penalty parameter, P is a positive semi-definite weighting matrix, and < α < is the relaxation parameter. This work is supported in part by National Institutes of Health (NIH) grants U-EB-873 and by equipment donations from Intel Corporation. Hung Nien and Jeffrey A. Fessler are with the Department of Electrical Engineering and Computer Science, University of Michigan, Ann Arbor, MI 489, USA ( {hungnien,fessler}@umich.edu). Date of current version: December,.

11 A. Preliminaries The convergence rate analysis of the iteration (6) is inspired by previous work [4 9]. For simplicity, we use the following notations: w x u, w x u, λ (k+) µ (k) ρ ( Kx (k+) +Bu (k) b ) K λ, and F(w) B λ. (8) µ λ Kx+Bu b We also introduce three matrices: D ψ +P H ρ α B B α α α B α B αρ I, M I D ψ +P I, and Q HM = ρb B ( α)b. (9) ρb αi B ρ I The following lemmas show the properties of vectors and matrices defined in (8) and (9) and an identity used in our derivation. Lemma. The matrix H defined in (9) is positive semi-definite for any < α < and ρ >. Proof. For any w, completing the square yields w Hw = x (D ψ +P)x+ ρ α u B Bu+ ( α) α u B µ+ αρ µ µ = x D ψ +P + ( α ρbu + sgn( α) α ( ρbu) ( ρ µ ) + ρ µ ) = x D ψ +P + α( α ρbu+sgn( α) ρ µ +( α )( ρbu + ρ µ )). () All terms in () are non-negative for any < α < and ρ >. Thus under such conditions, w Hw for any w, and H is positive semi-definite. Lemma. For any k, we have w (k) w (k+) = M ( w (k) w (k+)). Proof. Since two stacked vectors (x and u) of w and w are the same, we need only show that µ (k) µ (k+) is equal to α ( µ (k) λ (k+)) ρb ( u (k) u (k+)) for any k. By the definition of λ (k+) in (8), we have Then, by the definition of the µ-update in (6), we get Thus the lemma holds. µ (k) λ (k+) = ρ ( Kx (k+) +Bu (k) b ). () µ (k) µ (k+) = ρ ( αkx (k+) +( α) ( b Bu (k)) +Bu (k+) b ) = ρ ( α ( Kx (k+) +Bu (k) b ) +B ( u (k+) u (k))) = α ( ρ ( Kx (k+) +Bu (k) b )) ρb ( u (k) u (k+)) = α ( µ (k) λ (k+)) ρb ( u (k) u (k+)). () Lemma 3. For any positive semi-definite matrix M and vectors x, x, x 3, and x 4, we have (x x ) M(x 3 x 4 ) = x x 4 M x x 3 M + x x 3 M x x 4 M. (3) Proof. The proof is omitted here. It can be verified by expanding out all the inner product and norms on both sides. B. Main results In the following theorem, we show that the duality gap defined in () of the time-averaged iterates w K = (x K,u K,µ K ) in (6) converges at rate O(/K), where K denotes the number of iterations. Theorem. Let w K = (x K,u K,µ K ) be the time-averages of iterates in (6) where ρ >, < α <, and P is positive semi-definite. We have G ( w K ;ŵ ) = [ f ( ) ] x K,u K f(ˆx,û) ˆµ,KxK +Bu K b { K x () ˆx + D ψ x () ˆx P + α [ ρ B ( u () û ) + ρ µ () ˆµ ] }. (4)

12 3 Proof. We first focus on the x-update in (6). By the convexity of ψ, we have ψ ( x (k+)) ψ ( x (k)) + ψ ( x (k)),x (k+) x (k) + x (k+) x (k) D ψ = ψ ( x (k)) + ψ ( x (k)),x x (k) + ψ ( x (k)),x (k+) x + x (k+) x (k) D ψ ψ(x)+ ψ ( x (k)),x (k+) x + x (k+) x (k) () D ψ for any x. Moving ψ(x) to the left-hand side leads to ψ ( x (k+)) ψ(x) ψ ( x (k)),x (k+) x + x (k+) x (k). (6) D ψ Moreover, by the optimality condition of the x-update in (6), we have so φ ( x (k+)) + ψ ( x (k)) +D ψ ( x (k+) x (k)) K ( µ (k) ρ ( Kx (k+) +Bu (k) b )) +P ( x (k+) x (k)), (7) φ ( x (k+)) ψ ( x (k)) D ψ ( x (k+) x (k)) +K λ (k+) P ( x (k+) x (k)). (8) By the definition of subgradient for the convex function φ, it follows that φ(x) φ ( x (k+)) + φ ( x (k+)),x x (k+) = φ ( x (k+)) + x (k+) x, K λ (k+) + ψ ( x (k)),x (k+) x + x (k+) x,(d ψ +P) ( x (k+) x (k)) (9) for all x. Rearranging (9) leads to [ φ ( x (k+) ) φ(x) ] + x (k+) x, K λ (k+) ψ ( x (k)),x (k+) x + x (k+) x,(d ψ +P) ( x (k) x (k+)). () Summing (6) and (), we get the first inequality: [ h ( x (k+) ) h(x) ] + x (k+) x, K λ (k+) x (k+) x,(d ψ +P) ( x (k) x (k+)) + x (k+) x (k) D ψ. () Following the same procedure, by the optimality condition of the u-update in (6), we have g(u) g ( u (k+)) + g ( u (k+)),u u (k+) = g ( u (k+)) + u (k+) u, B µ (k+) () for any u. To substitute µ (k+) in (), subtracting and adding λ (k+) on the left-hand side of () and rearranging it yield µ (k+) = λ (k+) +( α) ( µ (k) λ (k+)) +ρb ( u (k) u (k+)). (3) Substituting (3) into () and rearranging it, we get the second inequality: [ g ( u (k+) ) g(u) ] + u (k+) u, B λ (k+) u (k+) u,ρb B ( u (k) u (k+)) +( α)b ( µ (k) λ (k+)). (4) The third step differes a bit from the previous ones because the µ-update in (6) is not a minimization problem. By (), we have Kx (k+) +Bu (k+) b = B ( u (k) u (k+)) + ρ( µ (k) λ (k+)). () This gives the third equality: λ (k+) µ,kx (k+) +Bu (k+) b = λ (k+) µ, B ( u (k) u (k+)) + ρ( µ (k) λ (k+)) (6) for any µ. Summing (), (4), and (6), we can write it compactly as [ f ( x (k+),u (k+)) f(x,u) ] + w (k+) w,f ( w (k+)) w (k+) w,q ( w (k) w (k+)) + x (k+) x (k) D ψ. (7) By Lemma (note that Q = HM) and Lemma 3, the first term on the right-hand side of (7) can be expressed as w (k+) w,h ( w (k) w (k+)) = w (k+) w (k+) H w (k+) w (k) H + w (k) w H w (k+) w H. (8) Moreover, the first term on the right-hand side of (8) is λ (k+) µ (k+) = ( ρb u (k+) u (k)) +( α) ( λ (k+) µ (k)) (9) αρ αρ

13 4 by (3), and the second term on the right-hand side of (8) is x (k+) x (k) D ψ +P + αρ ρb ( u (k+) u (k)) + ( α) αρ ρb ( u (k+) u (k)),λ (k+) µ (k) + αρ λ (k+) µ (k) = x (k+) x (k) D ψ +P + ( αρ ρb u (k+) u (k)) +( α) ( λ (k+) µ (k)) + α ρ λ (k+) µ (k). () Substituting (9) and () into (8), we can upper bound the inequality (7) by [ ( f x (k+),u (k+)) f(x,u) ] + w (k+) w,f ( w (k+)) w (k) w H w (k+) w H x (k+) x (k) α D ψ +P ρ λ (k+) µ (k) + x (k+) x (k) D ψ w (k) w H w (k+) w H x (k+) x (k) P α ρ λ (k+) µ (k) w (k) w H w (k+) w (3) H because P is positive semi-definite and α > for α (,). To show the convergence rate of (6), let (x,u,µ) = ŵ (ˆx,û, ˆµ). The last term on the left-hand side of (3) can be represented as w (k+) ŵ,f ( w (k+)) = x (k+) ˆx, K λ (k+) + u (k+) û, B λ (k+) + λ (k+) ˆµ,Kx (k+) +Bu (k+) b = λ (k+),kˆx Kx (k+) +Bû Bu (k+) +Kx (k+) +Bu (k+) b ˆµ,Kx (k+) +Bu (k+) b = λ (k+),kˆx+bû b ˆµ,Kx (k+) +Bu (k+) b = ˆµ,Kx (k+) +Bu (k+) b. (3) Note that Kˆx+Bû b = due to the equality constraint. Using (3) yields G ( w (k+) ;ŵ ) = [ f ( x (k+),u (k+)) f(ˆx,û) ] + w (k+) ŵ,f ( w (k+)) w (k) ŵ H w (k+) ŵ H. (33) Summing (33) from k =,...,K, dividing both sides by K, and applying Jensen s inequality to the convex function f, we have G ( w K ;ŵ ) = [ f ( ) ] x K,u K f(ˆx,û) ˆµ,KxK +Bu K b K( w () ŵ H w (K) ŵ H) K w () ŵ (34) H since H is positive semi-definite for any α (,) and ρ > (Lemma ). To finish the analysis, the remaining task is to upper bound w () ŵ. Note that H w () ŵ can be expressed as H x () ˆx + D ψ x () ˆx [ ] u P + () [ û ρb B ( α)b ][ ] u () û α µ () ˆµ ( α)b ρ I µ (). (3) ˆµ The last term in (3) can be further expressed as and upper bounded by [ α ρ ( B u () û ) +( α) B( u () û ),µ () ˆµ + ρ µ () ˆµ ] [ α ρ B ( u () û ) + α B ( u () û ) µ () ˆµ + ρ µ () ˆµ ] [ α ρ ( B u () û ) + ( B u () û ) µ () ˆµ + ρ µ () ˆµ ] [ ρ ( B u () û ) + µ () ρ ˆµ ] (36) = α due to the fact that < α <. Combining (34), (3), and (36), we get our final convergence rate bound: G ( w K ;ŵ ) = [ f ( ) ] x K,u K f(ˆx,û) ˆµ,KxK +Bu K b { K x () ˆx + D ψ x () ˆx P + α [ ρ B ( u () û ) + ρ µ () ˆµ ] }. (37) Theorem can be used to show the convergence rates of other AL-based algorithms. The following theorems show the convergence rates of the simple and the proposed relaxed LALM s in []. From now on, suppose A is an m n matrix, and let G D A A A, where D A is a diagonal majorizing matrix of A A.

14 Theorem ([, Theorem ]). Let K = A, B = I m, b = m, and P = ρg. The iteration (6) with ρ > and < α < reduces to the simple relaxed LALM that achieves a convergence rate ( ) G(w K ;ŵ) K ADψ +B ρ,da +C α,ρ, (38) where the first two constants depend on how far the initial guess is from a minimizer, and the last constant depends on the relaxation parameter. C α,ρ α A Dψ x () ˆx (39) D ψ B ρ,da ρ x () ˆx D A A () A [ ρ u () û + ρ µ () ˆµ ] (4) Proof. One just uses the substitutions K = A, B = I m, b = m, and P = ρg in Theorem to prove the theorem. As seen in Theorem, the convergence rate of the simple relaxed LALM scales well with the relaxation parameter α iff C α,ρ A Dψ and C α,ρ B ρ,da. When ψ has large curvature or D A is a loose majorizing matrix of A A (like in X-ray CT), the above inequalities do not hold, leading to worse scalability of convergence rate with the relaxation parameter α. This motivated the proposed relaxed LALM [] whose convergence rate analysis is shown below. Theorem 3 ([, Theorem ]). Let K = [ A G /], B = Im+n, b = m+n, and P =. The iteration (6) with ρ > and < α < reduces to the proposed relaxed LALM [] that achieves a convergence rate G ( ) (w K ;ŵ) K ADψ +B α,ρ,da +C α,ρ, (4) where A Dψ and C α,ρ were defined in (39) and (4), and B α,ρ,da ρ v () ˆv = ρ x () ˆx D A A (43) A when initializing v and ν as v () = G / x () and ν () = n, respectively. α Proof. Applying the substitutions K = [ A G /], B = Im+n, b = m+n, and P = to Theorem, except for the upper bounding (36), yields G ( w K ;ŵ ) K( ADψ +D α,ρ ), (44) where α u () û ρi m ( α)i m u () û D α,ρ v () ˆv ρi n ( α)i n α µ () ˆµ ( α)i m ρ I v () ˆv m µ () ˆµ, (4) ν () ˆν ( α)i n ρ I n ν () ˆν and v and ν are the auxiliary variable and Lagrange multiplier of the additional redundant equality constraint v = G / x in [], respectively. Note that ν (k) = n for k =,,... if we initialize ν as ν () = n, and ˆν = n []. We have ν () ˆν = n. Hence, (4) is further upper bounded by Let u () û ρi m ( α)i m u () û v () ˆv ρi n v () ˆv µ () ˆµ ( α)i m ρ I m µ () ˆµ ( = α ρ u () û ( α) u() û,µ () ˆµ + ρ µ () ˆµ ) + ρ α v () ˆv [ ρ α u () û + ρ µ () ˆµ ] + ρ α G / x () G /ˆx = C α,ρ + ρ x () ˆx D A A A. (46) D α,ρ = α α Thus, the convergence rate of the proposed relaxed LALM [] is upper bounded by B α,ρ,da ρ x () ˆx D A A A. (47) α G ( w K ;ŵ ) K( ADψ +B α,ρ,da +C α,ρ ). (48)

15 6 C. Practical implementation of the proposed relaxed LALM Although the proposed relaxed LALM shows better scalability of the convergence rate with the relaxation parameter α, a straightforward implementation with substitutions in Theorem 3 is not recommended because there is no efficient way to compute the square root of G for any A in general. For practical implementation, we must avoid using multiplication by G / in both the x- and v-updates. To derive the practical implementation, we first substitue K = [ A G /], B = I, b =, and P = in (6). This leads to the following iterates (i.e., [, Eqn. ]): { ( x (k+) arg min φ(x)+q ) ψ x;x (k) µ (k),ax ν (k),g / x + ρ x Ax u (k) + ρ G / x v (k) } { u (k+) arg min g(u)+ µ (k),u + ρ (k+) u r u,α u } µ (k+) = µ (k) ρ ( r (k+) u,α u (k+)) (49) v (k+) = r v,α (k+) ρ ν (k) ν (k+) = ν (k) ρ ( r v,α (k+) v (k+)), where Q ψ is a separable quadratic surrogate (SQS) of ψ at x (k) [, Eqn. 7], r u,α is the relaxation variable of u, and r v,α is the relaxation variable of v. Suppose ν () =. Then ν (k) = for k =,,..., and (49) can be further simplified as { ( x (k+) arg min φ(x)+q ) ψ x;x (k) µ (k),ax + ρ x Ax u (k) + ρ G / x v (k) } { u (k+) arg min g(u)+ µ (k),u + ρ (k+) u r u,α u } µ (k+) = µ (k) ρ ( r u,α (k+) u (k+)) () v (k+) = αg / x (k+) +( α)v (k). Let h G / v+a y. By the v-update in (), we have h (k+) = G / v (k+) +A y = G /( αg / x (k+) +( α)v (k)) +A y = α ( Gx (k+) +A y ) +( α) ( G / v (k) +A y ) = α ( D A x (k+) A ( Ax (k+) y )) +( α)h (k). () To avoid multiplication by G / in the x-update in (), we rewrite the last three terms in the x-update cost function using Taylor s expansion around x (k). That is, µ (k),ax + ρ Ax u (k) + ρ G / x v (k) ρ Ax u (k) ρ µ (k) + ρ G / x v (k) ( x x (k)) ( ρa ( Ax (k) u (k) ρ µ (k)) +ρ ( Gx (k) G / v (k))) + x x (k) ρa A+ρG = ( x x (k)) ( ( ρ A A+G ) x (k) ρa ( u (k) +ρ µ (k)) ρg / v (k)) + x x (k) ρ(a A+G) = ( x x (k)) ( ρda x (k) ρa ( u (k) y+ρ µ (k)) ρh (k)) + x x (k) ρd A x x (k) +(ρd A ) ( ρd A x (k) ρa ( u (k) y+ρ µ (k)) ρh (k)) ρd A x (ρda ) ( ρa ( u (k) y+ρ µ (k)) +ρh (k)) ρd A. () = The practical relaxed LALM without multiplication by G / becomes { ( x (k+) arg min φ(x)+q ) ψ x;x (k) + x x (ρda ) ( ρa ( u (k) y+ρ µ (k)) +ρh (k)) } { ρd A u (k+) arg min g(u)+ µ (k),u + ρ (k+) u r u,α u } µ (k+) = µ (k) ρ ( r (k+) u,α u (k+)) h (k+) = α ( D A x (k+) A ( Ax (k+) y )) +( α)h (k). (3) D. Numerical example: LASSO regression Here we describe a numerical example that demonstrates the convergence rate bound and the effect of relaxation. Consider the following l -regularized linear regression problem: { } ˆx arg min x y Ax +λ x, (4)

16 7 Primal dual gap 4 3 Bound (α =, ρ =.) E gap (α =, ρ =.) Bound (α, ρ =.) E gap (α, ρ =.) Bound (α =, ρ =.) E gap (α =, ρ =.) Bound (α, ρ =.) E gap (α, ρ =.) Primal dual gap E gap (α =, ρ =.) NE gap (α =, ρ =.) E gap (α, ρ =.) NE gap (α, ρ =.) E gap (α =, ρ =.) NE gap (α =, ρ =.) E gap (α, ρ =.) NE gap (α, ρ =.) 3 4 (a) Bound (4) vs. ergodic gap (E-gap) 6 8 (b) Ergodic gap (E-gap) vs. non-ergodic gap (NE-gap) Fig. : LASSO regression: Duality gap curves of relaxed LALM with different relaxation parameters and AL penalty parameters. (a) Bound (4) vs. ergodic gap, and (b) ergodic gap vs. non-ergodic gap. where A IR m n, and n m in general. This is a widely studied problem in the field of statistics (also known as LASSO regression) and compressed sensing for seeking a sparse solution of a linear system with small measurement errors. To solve this problem using the proposed relaxed LALM (3), we focused on the following equivalent equality-constrained minimization problem: (ˆx,û) arg min x,u { } y u +λ x s.t. u = Ax () with φ = λ, ψ =, D A = λ max (A A)I, and g = y. We set x() = A y, u () = Ax (), µ () = y u (), and h () = D A x () A ( Ax () y ). Data for numerical instances were generated as follows. The entries of the system matrix A IR were sampled from an iid standard normal distribution. The hidden sparse vector x s IR was a randomly generated -sparse vector, and the noisy measurement y = Ax s +n, where n IR was sampled from an iid N(,.). The regularization parameter λ was set to be unity in our experiment. Figure shows the duality gap curves of relaxed LALM with different relaxation parameters (α =,.999) and AL penalty parameters (ρ =.,.). As seen in Figure (a), the ergodic duality gap G (w K ;ŵ) converges at rate O(/k), and the bound derived in Theorem 3 is a tighter upper bound for large number of iterations. Furthermore, as seen in Figure (b), the non-ergodic duality gap G ( w (K) ;ŵ ) converges much faster than the ergodic one, and we can achieve about two-times speed-up by using α empirically. II. CONTINUATION WITH OVER-RELAXATION This section describes the rationale for the continuation sequence in []. Consider solving a simple quadratic problem: ˆx arg min x Ax, (6) using [, Eqn. 33] with h = and y =. If A A is positive definite (for this analysis only), then (6) has a unique solution ˆx =. Let VΛV be the eigenvalue decomposition of A A, where Λ diag{λ i < λ λ n = L A }. Updates generated by [, Eqn. 33] (with D A = L A I) simplify as follows: ( x (k+) = ρl A (ρ )g (k) +ρh (k)) g (k+) = ρ ( ρ+ αa Ax (k+) +( α)g (k)) + ρ+ g(k) h (k+) = α ( (7) L A x (k+) A Ax (k+)) +( α)h (k). Let x V x, ḡ V g, and h V h. The linear system (7) can be further diagonalized, and the ith components of x, ḡ, and h evolve as follows: x (k+) ( i = (k) ρl A (ρ )ḡ i +ρ h (k) ) i (8) and { ḡ (k+) i h (k+) i = ρ ( ρ+ αλi x (k+) i = α ( L A x (k+) i +( α)ḡ (k) ) + λ i x (k+) i i ρ+ḡ(k) i ) +( α) h(k) i. (9)

17 8 Plugging (8) into (9) leads to a second-order recursion (of ḡ i and h i ) with a transition matrix [ ] αρλi T i ρ+ ρl A (ρ )+ ( α)ρ+ αρλ i ρ+ ρ+ ρl A ρ, (6) α(l A λ i ) ρl A (ρ ) α(l A λ i ) ρl A ρ+( α) and x (k+) i is just a linear combination of ḡ (k) (k) i and h i. The eigenvalues of the transition matrix T i defined in (6) determine the convergence rate of the second-order recursion, and we can analyze the second-order recursive system by studying its characteristic polynomial: ri ([T i] +[T i ] )r i +([T i ] [T i ] [T i ] [T i ] ). (6) The proposed α-dependent continuation sequence is based on the critical value ρ c and the damping frequency ω (as ρ ) of the eigencomponent corresponding to the smallest eigenvalue λ []. The critical value ρ c solves and the damping frequency ω satisfies [, p. 8] ([T ] +[T ] ) 4([T ] [T ] [T ] [T ] ) =, (6) cosω = [T ] +[T ] 4([T ] [T ] [T ] [T ] ). (63) We solve (6) and (63) using MATLAB s symbolic toolbox. For (6), we found that ρ c = λ ( ) L A λ L A (64) is independent of α. Hence, the optimal AL penalty parameter ρ ρ c depends only on the geometry of A A and does not change for different values of the relaxation parameter α. For (63), we found that cosω α λ L A (6) (α α ) λ L A for ρ. When α =, cosω λ /L A, and thus ω λ /L A due to the small angle approximation: cos θ θ/ θ. (66) When α, cosω λ /L A, and ω λ /L A also due to (66). For general < α <, we can approximate cosω in (6) using a Taylor series as ( ( cosω α λ L A )(+ α α ) ) λ L A + [higher-order terms] = α λ L A + [higher-order terms]. (67) We ignore higher-order terms in (67) since λ /L A is usually very small in practice. Hence, cosω ( α ) /, λ /L A and ω α λ /L A due to the small angle approximation (66). This expression covers both the previous unrelaxed (α = ) and proposed relaxed (α ) cases. Suppose we use the same restart condition as in []; that is, restarts occur about every (/)(π/ω ) iterations. If we restart at the kth iteration, we have the approximation λ /L A π/(αk), and the ideal AL penalty parameter at the kth iteration is ( π ) ( ( αk π ) ) αk = π αk ( π αk). (68) That is, the values of ρ k (α) are scaled by the value of α. To demonstrate the speed-up resulting from combining continuation with over-relaxation, Figure shows the convergence rate curves of the proposed relaxed OS-LALM ( subsets) using different values of the over-relaxation parameter α when reconstructing the simulated XCAT dataset. For comparison, the convergence rate curves that do not use continuation (fixed AL penalty parameter ρ =.) are also shown. As seen in Figure (b), the RMS difference of the green curve (relaxed OS-LALM with α =.) after iterations is about the same as the RMS difference of the blue curve (unrelaxed OS-LALM) after iterations, exhibiting an approximately.-times speed-up. Using larger α (up to two) can further accelerate convergence; however, the speed-up can be slightly slower than α-times due to the dominance of the constant B in [, Theorem ] and the accumulation of gradient errors with ordered subsets. For instance, the RMS difference of the red curve (relaxed OS-LALM with α =.999) after iterations is a bit larger the RMS difference of the blue curve (unrelaxed OS-LALM) after iterations.

18 9 4 3 OS LALM (α = ) Relaxed OS LALM, α =. Relaxed OS LALM, α = OS LALM (α = ) Relaxed OS LALM, α =. Relaxed OS LALM, α =.999 HU HU HU (M = ) (a) Fixed ρ =. HU (M = ) (b) Proposed decreasing ρ k Fig. : XCAT: Convergence rate curves of relaxed OS-LALM ( subsets) using different values of the over-relaxation parameter α with (a) fixed AL penalty parameter ρ =. and (b) proposed decreasing ρ k. A. XCAT phantom III. ADDITIONAL EXPERIMENTAL RESULTS Additional experimental results of the simulated XCAT phantom dataset shown in [] are reported here. Figure 3 shows the difference images (in the central transaxial plane) of FBP (i.e., x () x ) and OS algorithms with subsets after iterations (i.e., x () x ). As seen in Figure 3, low-frequency components converge faster than high/mid-frequency components like streaks and edges with all algorithms. This is common for gradient-based algorithms when the Hessian matrix of the cost function is more low-pass/band-cut like in X-ray CT. The difference image of the proposed relaxed OS-LALM shows less edge structures and looks more uniform in flat regions. Figure 4 shows the difference images after iterations. We can see that the proposed relaxed OS-LALM shows very uniform difference images, while the subtle noise-like artifacts remain with OS-OGM. To demonstrate the improvement of our modified relaxed LALM (i.e., with ordered subsets and continuation) for X-ray CT image reconstruction problems, Figure shows convergence rate curves of unrelaxed/relaxed OS-LALM using different parameter settings with (a) one subset and (b) subsets. All algorithms run 36 subiterations; however, those with OS should be faster in runtime because they perform fewer forward/back-projections. As seen in Figure, convergence rate curves of OS algorithms are scaled almost perfectly (in the horizontal axis) when using modest number of subsets (M = ). However, the scalability might be worse when using more subsets (more severe gradient error accumulation) or in other dataset. Moreover, solid lines (relaxed algorithms) always show about two-times faster convergence rate than dashed lines (unrelaxed algorithms), without and with continuation. Note that the solid blue line (relaxed LALM, ρ = /6) and the dashed green line (unrelaxed LALM, ρ = /) in both cases are overlapped after 6 subiterations, implying that halving the AL penalty parameter ρ and setting relaxation parameter α to be close to two have similar effect on convergence speed in this CT problem (where the data fidelity term dominates the cost function). Note that when the data-fidelity term dominates the cost function, the constant B dominates the constant multiplying /K in [, Theorem ], leading to the better speed-up with α. We also investigated the effect of majorization (for both the data-fidelity term and the regularizer term) on convergence speed. Figure 6 shows the convergence rate curves of the proposed relaxed OS-LALM with different (a) data-fidelity term majorizations and (b) regularization term majorizations. As seen in Figure 6(a), the proposed algorithm diverges when D L is too small, violating the majorization condition. Larger D L slows down the algorithm. However, multiplying D L by κ-times does not necessarily slow down the algorithm by κ-times since the weighting matrix of B α,ρ,dl is D L A WA. Besides, larger D L helps reduce the gradient error accumulation in fast algorithms []. Figure 6(b) shows the convergence rate curves of the proposed relaxed OS-LALM with regularizer majorization using the maximum curvature and Huber s curvature, respectively. We can see that the speed-up of using Huber s curvature is very significant. Note that ρd L +D R determines the step sizes of the image update of the proposed relaxed OS-LALM. Better majorization of R (i.e., smaller [D R ] i for those voxels that are still far from the optimum) leads to larger image update step sizes, especially when ρ is small. B. Chest scan Additional experimental results of the chest scan dataset shown in [] are reported here. Figure 7 shows convergence rate curves of different relaxed algorithms ( subsets and α =.999) with (a) a fixed AL penalty parameter ρ =. and (b) the

19 FBP OS SQS OS FGM OS LALM OS OGM Relaxed OS LALM Fig. 3: XCAT: Cropped difference images (displayed from to HU) from the central transaxial plane of the initial FBP image x () x and the reconstructed image x () x using OS algorithms with subsets after iterations. FBP OS SQS OS FGM OS LALM OS OGM Relaxed OS LALM Fig. 4: XCAT: Cropped difference images (displayed from to HU) from the central transaxial plane of the initial FBP image x () x and the reconstructed image x () x using OS algorithms with subsets after iterations. decreasing sequence ρ k proposed in []. Like the experimental results with the simulated CT scan shown in [], the simple relaxation does not provide much acceleration with a fixed AL penalty parameter, but it works somewhat better when using the decreasing ρ k. Figure 8 and Figure 9 show the difference images (in the central transaxial plane) of FBP and OS algorithms with subsets after and iterations, respectively. Difference images of the proposed relaxed OS-LALM show the fewest structured artifacts among all algorithms for comparison. C. Shoulder scan We reconstructed a 9 image volume, where x = y =.369 mm and z =.6 mm, from a shoulder region helical CT scan. The size of sinogram is (pitch =., about7.3 rotations with rotation time.8 seconds). The tube current and tube voltage of the X-ray source are 8 ma and kvp, respectively. The initial FBP image x () has lots of streak artifacts due to low signal-to-noise ratio (SNR), and we tuned the statistical weights and regularization parameters using [, 3] to emulate [4, ]. We used subsets for the relaxed OS-LALM, while [, Eqn. 7] suggests using about subsets for the unrelaxed OS-LALM. Figure shows the cropped images from the central transaxial plane of the initial FBP image x (), the reference reconstruction x, and the reconstructed image x () using the proposed algorithm (relaxed OS-LALM with subsets) after iterations. Figure shows the RMS differences between the reference reconstruction x

20 4 3 SQS LALM, ρ = /6 Relaxed LALM, ρ = /6 LALM, ρ = / Relaxed LALM, ρ = / LALM Relaxed LALM 4 3 OS SQS OS LALM, ρ = /6 Relaxed OS LALM, ρ = /6 OS LALM, ρ = / Relaxed OS LALM, ρ = / OS LALM Relaxed OS LALM HU HU HU (a) One subset HU (b) subsets Fig. : XCAT: Convergence rate curves of unrelaxed/relaxed OS-LALM using different parameter settings with (a) one subset and (b) subsets. All algorithms run 36 subiterations. 4 3 D(loss) =.6 diag{a WA} D(loss) =.7 diag{a WA} D(loss) =.8 diag{a WA} D(loss) =. diag{a WA} D(loss) =. diag{a WA} D(loss) =. diag{a WA} 4 3 D(reg.) with maximum curvature D(reg.) with Huber s curvature HU HU HU (a) Data-fidelity term majorization HU (b) Regularization term majorization Fig. 6: XCAT: Convergence rate curves of the proposed relaxed OS-LALM with different (a) data-fidelity term majorizations and (b) regularization term majorizations. and the reconstructed image x (k) using different OS algorithms as a function of iteration with and subsets. As seen in Figure, the proposed relaxed OS-LALM shows faster convergence rate with moderate number of subsets, but the speed-up diminishes as the iterate approaches the solution. Figure and Figure 3 show the difference images (in the central transaxial plane) of FBP and OS algorithms with subsets after and iterations, respectively. The proposed relaxed OS-LALM removes more streak artifacts than other OS algorithms.

3 OS SQS OS LALM Relaxed OS LALM (Algorithm ) Relaxed OS LALM (Algorithm ) 3 OS SQS OS LALM Relaxed OS LALM (Algorithm ) Relaxed OS LALM (Algorithm ) HU HU HU HU (a) Fixed ρ =.

21 3 OS SQS OS LALM Relaxed OS LALM (Algorithm ) Relaxed OS LALM (Algorithm ) 3 OS SQS OS LALM Relaxed OS LALM (Algorithm ) Relaxed OS LALM (Algorithm ) HU HU HU HU (a) Fixed ρ =. (b) Decreasing ρ k proposed in [] Fig. 7: Chest: Convergence rate curves of different relaxed algorithms ( subsets and α =.999) with (a) a fixed AL penalty parameter ρ =. and (b) the decreasing sequence ρ k proposed in []. FBP OS SQS OS FGM OS LALM OS OGM Relaxed OS LALM Fig. 8: Chest: Cropped difference images (displayed from to HU) from the central transaxial plane of the initial FBP image x () x and the reconstructed image x () x using OS algorithms with subsets after iterations.

22 3 FBP OS SQS OS FGM OS LALM OS OGM Relaxed OS LALM Fig. 9: Chest: Cropped difference images (displayed from to HU) from the central transaxial plane of the initial FBP image x () x and the reconstructed image x () x using OS algorithms with subsets after iterations. FBP Converged Proposed Fig. : Shoulder: Cropped images (displayed from 8 to HU) from the central transaxial plane of the initial FBP image x () (left), the reference reconstruction x (center), and the reconstructed image x () using the proposed algorithm (relaxed OS-LALM with subsets) after iterations (right). 9 8 OS SQS OS FGM OS LALM OS OGM Relaxed OS LALM OS SQS OS FGM OS LALM OS OGM Relaxed OS LALM HU HU HU (a) subsets HU (b) subsets Fig. : Shoulder: Convergence rate curves of different OS algorithms with (a) subsets and (b) subsets. The proposed relaxed OS-LALM with subsets exhibits similar convergence rate as the unrelaxed OS-LALM with subsets.

the reconstructed image x () x using OS algorithms with subsets after iterations.

23 4 FBP OS SQS OS FGM OS LALM OS OGM Relaxed OS LALM Fig. : Shoulder: Cropped difference images (displayed from to HU) from the central transaxial plane of the initial FBP image x () x and the reconstructed image x () x using OS algorithms with subsets after iterations. FBP OS SQS OS FGM OS LALM OS OGM Relaxed OS LALM Fig. 3: Shoulder: Cropped difference images (displayed from to HU) from the central transaxial plane of the initial FBP image x () x and the reconstructed image x () x using OS algorithms with subsets after iterations.

Relaxed linearized algorithms for faster X-ray CT image reconstruction

Relaxed linearized algorithms for faster X-ray CT image reconstruction Hung Nien and Jeffrey A. Fessler University of Michigan, Ann Arbor The 13th Fully 3D Meeting June 2, 2015 1/20 Statistical image reconstruction