STATISTICAL image reconstruction (SIR) methods [1, 2]

Size: px
Start display at page:

Download "STATISTICAL image reconstruction (SIR) methods [1, 2]"

Transcription

1 Relaxed Linearized Algorithms for Faster X-Ray CT Image Reconstruction Hung Nien, Member, IEEE, and Jeffrey A. Fessler, Fellow, IEEE arxiv:.464v [math.oc] 4 Dec Abstract Statistical image reconstruction (SIR) methods are studied extensively for X-ray computed tomography (CT) due to the potential of acquiring CT scans with reduced X-ray dose while maintaining image quality. However, the longer reconstruction time of SIR methods hinders their use in X-ray CT in practice. To accelerate statistical methods, many optimization techniques have been investigated. Over-relaxation is a common technique to speed up convergence of iterative algorithms. For instance, using a relaxation parameter that is close to two in alternating direction method of multipliers (ADMM) has been shown to speed up convergence significantly. This paper proposes a relaxed linearized augmented Lagrangian (AL) method that shows theoretical faster convergence rate with over-relaxation and applies the proposed relaxed linearized AL method to X-ray CT image reconstruction problems. Experimental results with both simulated and real CT scan data show that the proposed relaxed algorithm (with ordered-subsets [OS] acceleration) is about twice as fast as the existing unrelaxed fast algorithms, with negligible computation and memory overhead. Index Terms Statistical image reconstruction, computed tomography, ordered subsets, augmented Lagrangian, relaxation. I. INTRODUCTION STATISTICAL image reconstruction (SIR) methods [, ] have been studied extensively and used widely in medical imaging. In SIR methods, one models the physics of the imaging system, the statistics of noisy measurements, and the prior information of the object to be imaged, and then finds the best fitted estimate by minimizing a cost function using iterative algorithms. By considering noise statistics when reconstructing images, SIR methods have better biasvariance performance and noise robustness. However, the iterative nature of algorithms in SIR methods also increases the reconstruction time, hindering their ubiquitous use in X- ray CT in practice. Penalized weighted least-squares (PWLS) cost functions with a statistically weighted quadratic data-fidelity term are commonly used in SIR methods for X-ray CT [3]. Conventional SIR methods include the preconditioned conjugate gradient (PCG) method [4] and the separable quadratic surrogate (SQS) method with ordered-subsets (OS) acceleration []. These first-order methods update the image based on the gradient of the cost function at the current estimate. Due to the time-consuming forward/back-projection operations in X- ray CT when computing gradients, conventional first-order This work is supported in part by National Institutes of Health (NIH) grant U-EB-873 and by equipment donations from Intel Corporation. Hung Nien and Jeffrey A. Fessler are with the Department of Electrical Engineering and Computer Science, University of Michigan, Ann Arbor, MI 489, USA ( {hungnien,fessler}@umich.edu). methods are typically very slow. The efficiency of PCG relies on choosing an appropriate preconditioner of the highly shiftvariant Hessian caused by the huge dynamic range of the statistical weighting. In -D CT, one can introduce an auxiliary variable that separates the shift-variant and approximately shift-invariant components of the weighted quadratic datafidelity term using a variable splitting technique [6], leading to better conditioned inner least-squares problems. However, this method has not worked well in 3-D CT, probably due to the 3-D cone-beam geometry and helical trajectory. OS-SQS accelerates convergence using more frequent image updates by incremental gradients, i.e., computing image gradients with only a subset of data. This method usually exhibits fast convergence behavior in early iterations and becomes faster by using more subsets. However, it is not convergent in general [7, 8]. When more subsets are used, larger limit cycles can be observed. Unlike methods that update all voxels simultaneously, the iterative coordinate descent (ICD) method [9] updates one voxel at a time. Experimental results show that ICD approximately minimizes the PWLS cost function in several passes of the image volume if initialized appropriately; however, the sequential nature of ICD makes it difficult to parallelize and restrains the use of modern parallel computing architectures like GPU for speed-up. OS-mom [] and OS-LALM [] are two recently proposed iterative algorithms that demonstrate promising fast convergence speed when solving 3-D X-ray CT image reconstruction problems. In short, OS-mom combines Nesterov s momentum techniques [, 3] with the conventional OS-SQS algorithm, greatly accelerating convergence in early iterations. OS-LALM, on the other hand, is a linearized augmented Lagrangian (AL) method [4] that does not require inverting an enormous Hessian matrix involving the forward projection matrix when updating images, unlike typical splitting-based algorithms [6], but still enjoys the empirical fast convergence speed and error tolerance of AL methods such as the alternating direction method of multipliers (ADMM) [ 7]. Further acceleration from an algorithmic perspective is possible but seems to be more challenging. Kim et al. [8, 9] proposed two optimal gradient methods (OGM s) that use a new momentum term and showed a -times speed-up for minimizing smooth convex functions, comparing to existing fast gradient methods (FGM s) [, 3,, ]. Over-relaxation is a common technique to speed up convergence of iterative algorithms. For example, it is very effective for accelerating ADMM [6, 7]. The same relaxation technique was also applied to linearized ADMM very recently [], but the speed-up was less significant than expected.

2 Chambolle et al. proposed a relaxed primal-dual algorithm (whose unrelaxed variant happens to be a linearized ADMM [3, Section 4.3]) and showed the first theoretical justification for speeding up convergence with over-relaxation [4, Theorem ]. However, their theorem also pointed out that when the smooth explicit term (majorization of the Lipschitz part in the cost function mentioned later) is not zero, one must use smaller primal step size to ensure convergence with over-relaxation, precluding the use of larger relaxation parameter (close to two) for more acceleration. This paper proposes a non-trivial relaxed variant of linearized AL methods that improves the convergence rate by using larger relaxation parameter values (close to two) but does not require the step-size adjustment in [4]. We apply the proposed relaxed linearized algorithm to X-ray CT image reconstruction problems, and experimental results show that our proposed relaxation works much better than the simple relaxation [] and significantly accelerates X- ray CT image reconstruction, even with ordered-subsets (OS) acceleration. This paper is organized as follows. Section II shows the convergence rate of a linearized AL method (LALM) with simple relaxation and proposes a novel relaxed LALM whose convergence rate scales better with the relaxation parameter. Section III applies the proposed relaxed LALM to X-ray CT image reconstruction and uses a second-order recursive system analysis to derive a continuation sequence that speeds up the proposed algorithm. Section IV reports the experimental results of X-ray CT image reconstruction using the proposed algorithm. Finally, we draw conclusions in Section V. Online supplementary material contains many additional results and derivation details. II. RELAXED LINEARIZED AL METHODS We begin by discussing a more general constrained minimization problem for which X-ray CT image reconstruction is a special case considered in Section III. Consider an equalityconstrained minimization problem: (ˆx,û) arg min x,u { gy (u)+h(x) } s.t. u = Ax, () where g y and h are closed and proper convex functions. In particular, g y is a loss function that measures the discrepancy between the linear modelax and noisy measurementy, andh is a regularization term that introduces prior knowledge ofxto the reconstruction. We assume that the regularizer h φ+ψ is the sum of two convex components φ and ψ, where φ has inexpensive proximal mapping (prox-operator) defined as prox φ (x) arg min z {φ(z)+ z x }, () e.g., soft-shrinkage for the l -norm and truncating zeros for non-negativity constraints, and where ψ is continuously differentiable with L ψ -Lipschitz gradients [, p. 48], i.e., ψ(x ) ψ(x ) L ψ x x (3) for any x and x in the domain of ψ. The Lipschitz condition of ψ implies the (quadratic) majorization condition of ψ: ψ(x ) ψ(x )+ ψ(x ),x x + L ψ x x. (4) More generally, one can replace the Lipschitz constant L ψ by a diagonal majorizing matrix D ψ based on the maximum curvature [6] or Huber s optimal curvature [7, p. 84] of ψ while still guaranteeing the majorization condition: ψ(x ) ψ(x )+ ψ(x ),x x + x x D ψ. () We show later that decomposing h into the proximal part φ and the Lipschitz part ψ is useful when solving minimization problems with composite regularization. For example, Section III writes iterative X-ray CT image reconstruction as a special case of (), where g y is a weighted quadratic function, and h is an edge-preserving regularizer with a non-negativity constraint on the reconstructed image. A. Preliminaries Solving the equality-constrained minimization problem () is equivalent to finding a saddle-point of the Lagrangian: L(x,u,µ) f(x,u) µ,ax u, (6) where f(x,u) g y (u) + h(x), and µ is the Lagrange multiplier of the equality constraint [8, p. 37]. In other words, (ˆx, û, ˆµ) solves the minimax problem: (ˆx,û, ˆµ) arg min max x,u µ L(x,u,µ). (7) Moreover, since (ˆx, û, ˆµ) is a saddle-point of L, the following inequalities hold for any x, u, and µ: L(x,u, ˆµ) L(ˆx,û, ˆµ) L(ˆx,û,µ). (8) The non-negative duality gap function: G(x,u,µ;ˆx,û, ˆµ) L(x,u, ˆµ) L(ˆx,û,µ) = [ f(x,u) f(ˆx,û) ] ˆµ,Ax u (9) characterizes the accuracy of an approximate solution (x,u,µ) to the saddle-point problem (7). Note that û = Aˆx due to the equality constraint. Besides solving the classic Lagrangian minimax problem (7), (ˆx, û, ˆµ) also solves a family of minimax problems: (ˆx,û, ˆµ) arg min x,u max µ L AL(x,u,µ), () where the augmented Lagrangian (AL) [, p. 97] is L AL (x,u,µ) L(x,u,µ)+ ρ Ax u. () The augmented quadratic penalty term penalizes the feasibility violation of the equality constraint, and the AL penalty parameter ρ > controls the curvature of L AL but does not change the solution, sometimes leading to better conditioned minimax problems. One popular iterative algorithm for solving equalityconstrained minimization problems based on the AL theory is ADMM, which solves the AL minimax problem (), and thus the equality-constrained minimization problem (), in an alternating direction manner. More precisely, ADMM minimizes AL () with respect to x and u alternatingly, followed by a gradient ascent of µ with step size ρ. One can also interpolate or extrapolate variables in subproblems,

3 3 leading to a relaxed AL method [6, Theorem 8]: { x (k+) arg min h(x) µ (k),ax + ρ } x Ax u (k) { u (k+) arg min g y (u)+ µ (k),u + ρ (k+) u r u,α u } µ (k+) = µ (k) ρ ( r (k+) u,α u (k+)), () where the relaxation variable of u is: r (k+) u,α αax (k+) +( α)u (k), (3) and < α < is the relaxation parameter. It is called overrelaxation when α > and under-relaxation when α <. When α is unity, () reverts to the standard (alternating direction) AL method []. Experimental results suggest that over-relaxation with α [.,.8] can accelerate convergence [7]. Although () is used widely in applications, two concerns about the relaxed AL method () arise in practice. First, the cost function of the x-subproblem in () contains the augmented quadratic penalty of AL that involves A, deeply coupling elements of x and often leading to an expensive iterative x-update, especially when A is large and unstructured, e.g., in X-ray CT. This motivates alternative methods like LALM [, 4]. Second, even though LALM removes the x-coupling due to the augmented quadratic penalty, the regularization term h might not have inexpensive proximal mapping and still require an iterative x-upate (albeit without using A). This consideration inspires the decomposition h φ +ψ used in the algorithms discussed next. B. Linearized AL methods with simple relaxation In LALM, one adds an iteration-dependent proximity term: x x (k) (4) P to thex-update in () withα =, wherepis a positive semidefinite matrix. Choosing P = ρg, where G L A I A A, and L A denotes the maximum eigenvalue of A A, the nonseparable Hessian of the augmented quadratic penalty of AL is cancelled, and the Hessian of ρ Ax u (k) + ρ x x (k) () G becomes a diagonal matrix ρl A I, decoupling x in the x- update except for the effect of h. This technique is known as linearization (more precisely, majorization) because it majorizes a non-separable quadratic term by its linear component plus some separable qradratic proximity term. In general, one can also use G D A A A, (6) where D A A A is a diagonal majorizing matrix of A A, e.g., D A = diag{ A A } A A [], and still guarantee the positive semi-definiteness of P. This trick can be applied to () when α, too. To remove the possible coupling due to the regularization term h, we replace the Lipschitz part of h φ + ψ in the Because () is quadratic, not linear, a more apt term would be majorized rather than linearized. We stick with the term linearized for consistency with the literature on LALM. x-update of () with its separable quadratic surrogate (SQS): ( Q ψ x;x (k) ) ψ ( x (k)) + ψ ( x (k)),x x (k) + x x (k) (7) D ψ shown in (4) and (). Note that (4) is just a special case of () when D ψ = L ψ I. Incorporating all techniques mentioned above, the x-update becomes simply a proximal mapping of φ, which by assumption is inexpensive. The resulting LALM with simple relaxation algorithm is: { ( φ(x)+qψ x;x (k) ) } µ (k),ax x (k+) arg min x + ρ Ax u (k) + ρ x x (k) { G u (k+) arg min g y (u)+ µ (k),u + ρ u r (k+) u,α u } µ (k+) = µ (k) ρ ( r (k+) u,α u (k+)). (8) When ψ =, (8) reverts to the L-GADMM algorithm proposed in []. In [], the authors analyzed the convergence rate of L-GADMM (for solving an equivalent variational inequality problem; however, there is no analysis on how relaxation parameter α affects the convergence rate) and investigated solving problems in statistical learning using L-GADMM. The speed-up resulting from over-relaxation was less significant than expected (e.g., when solving an X-ray CT image reconstruction problem discussed later). To explain the small speed-up, the following theorem shows that the duality gap (9) of the time-averaged approximate solution w K = (x K,u K,µ K ) generated by (8) vanishes at rate O(/K), where K is the number of iterations, and c K K K k= c(k) (9) denotes the time-average of some iterate c (k) for k = to K. Theorem. Let w K = (x K,u K,µ K ) be the time-averages of the iterates of LALM with simple relaxation in (8), where ρ > and < α <. We have ( ) G(w K ;ŵ) K ADψ +B ρ,da +C α,ρ, () where the first two constants A Dψ x () ˆx () D ψ B ρ,da ρ x () ˆx () D A A A depend on how far the initial guess is from a minimizer, and the last constant depends on the relaxation parameter [ ρ C α,ρ u () α û + µ () ρ ˆµ ]. (3) Proof. The proof is in the supplementary material. Theorem shows that (8) converges at rate O(/K), and the constant multiplying /K consists of three terms: A Dψ, B ρ,da, and C α,ρ. The first term A Dψ comes from the majorization of ψ, and it is large when ψ has large curvature. The second term B ρ,da comes from the linearization trick in (). One can always decrease its value by decreasing ρ. The third term C α,ρ is the only α-dependent component. The trend of C α,ρ when varying ρ depends on the norms of u () û and µ () ˆµ, i.e., how one initializes the algorithm. Finally, the

4 4 convergence rate of (8) scales well with α iff C α,ρ A Dψ and C α,ρ B ρ,da. When ψ has large curvature or D A is a loose majorizing matrix of A A (like in X-ray CT), the above inequalities do not hold, leading to poor scalability of convergence rate with the relaxation parameter α. C. Linearized AL methods with proposed relaxation To better scale the convergence rate of relaxed LALM with α, we want to design an algorithm that replaces the α- independent components by α-dependent ones in the constant multiplying /K in (). This can be (partially) done by linearizing (more precisely, majorizing) the non-separable AL penalty term in () implicitly. Instead of explicitly adding a G-weighted proximity term, where G is defined in (6), to the x-update like (8), we consider solving an equalityconstrained minimization problem equivalent to () with an additional redundant equality constraint v = G / x, i.e., { (ˆx,û,ˆv) arg min gy (u)+h(x) } x,u,v s.t. u = Ax and v = G / x, (4) using the relaxed AL method () as follows: ( φ(x)+q ψ x;x (k) ) µ (k),ax x (k+) arg min ν (k),g / x + ρ x Ax u (k) + ρ G / x v (k) { u (k+) arg min g y (u)+ µ (k),u + ρ u r (k+) u,α u } µ (k+) = µ (k) ρ ( r (k+) u,α u (k+)) v (k+) = r v,α (k+) ρ ν (k) ν (k+) = ν (k) ρ ( r (k+) v,α v (k+)), () where the relaxation variable of v is: r (k+) v,α αg / x (k+) +( α)v (k), (6) and ν is the Lagrange multiplier of the redundant equality constraint. One can easily verify thatν (k) = fork =,,... if we initialize ν as ν () =. The additional equality constraint introduces an additional inner-product term and a quadratic penalty term to the x- update. The latter can be used to cancel the non-separable Hessian of the AL penalty term as in explicit linearization. By choosing the same AL penalty parameter ρ > for the additional constraint, the Hessian matrix of the quadratic penalty term in the x-update of () is ρa A+ρG = ρd A. In other words, by choosing G in (6), the quadratic penalty term in the x-update of () becomes separable, and the x- update becomes an efficient proximal mapping of φ, as seen in () below. Next we analyze the convergence rate of the proposed relaxed LALM method (). With the additional redundant equality constraint, the Lagrangian becomes L (x,u,µ,v,ν) L(x,u,µ) ν,g / x v. (7) Setting gradients of L with respect to x, u, µ, v, and ν to be zero yields a necessary condition for a saddle-point ŵ = (ˆx,û, ˆµ,ˆv,ˆν) of L. It follows that ˆν = v L (ŵ) =. Therefore, setting ν () = is indeed a natural choice for initializing ν. Moreover, since ˆν =, the gap function G of the new problem (4) coincides with (9), and we can compare the convergence rate of the simple and proposed relaxed algorithms directly. Theorem. Let w K = (x K,u K,v K,µ K,ν K ) be the timeaverages of the iterates of LALM with proposed relaxation in (), where ρ > and < α <. When initializing v and ν as v () = G / x () and ν () =, respectively, we have G (w K ;ŵ) K ( ADψ +B α,ρ,da +C α,ρ ), (8) where A Dψ and C α,ρ were defined in () and (3), and B α,ρ,da ρ v () ˆv = ρ x () ˆx D A A A. (9) α Proof. The proof is in the supplementary material. Theorem shows the O(/K) convergence rate of (). Due to the different variable splitting scheme, the term introduced by the implicit linearization trick in () (i.e., B α,ρ,da ) also depends on the relaxation parameter α, improving convergence rate scalibility with α in () over (8). This theorem provides a theoretical explanation why () converges faster than (8) in the experiments shown later. For practical implementation, the remaining concern is multiplications by G / in (). There is no efficient way to compute the square root of G for any A in general, especially when A is large and unstructured like in X-ray CT. To solve this problem, let h G / v+a y. We rewrite () so that no explicit multiplication by G / is needed (the derivation is in the supplementary material), leading to the following LALM with proposed relaxation algorithm: { ( φ(x)+qψ x;x (k) ) } x (k+) arg min x + x (ρda ) γ (k+) { ρd A u (k+) arg min g y (u)+ µ (k),u + ρ (k+) u r u,α u } µ (k+) = µ (k) ρ ( r (k+) u,α u (k+)) h (k+) = αη (k+) +( α)h (k), () where and α γ (k+) ρa ( u (k) y+ρ µ (k)) +ρh (k), (3) η (k+) D A x (k+) A ( Ax (k+) y ). (3) When g y is a quadratic loss, i.e., g y (z) = (/) z y, we further simplify the proposed relaxed LALM by manipu- When ψ has large curvature (thus, α-dependent terms do not dominate the constant multiplying /K), we can use techniques as in [9, ] to reduce the ψ-dependent constant. In X-ray CT, the data-fidelity term often dominates the cost function, so A Dψ B α,ρ,da.

5 lations like those in [] (omitted here for brevity) as: γ (k+) = (ρ )g (k) +ρh { (k) ( φ(x)+qψ x;x (k) ) } x (k+) arg min x + x (ρd A ) γ (k+) ρd A ζ (k+) L ( x (k+)) = A ( Ax (k+) y ) g (k+) = ρ ( ρ+ αζ (k+) +( α)g (k)) + ρ+ g(k) h (k+) = α ( D A x (k+) ζ (k+)) +( α)h (k), (33) where L(x) g y (Ax) is the quadratic data-fidelity term, and g A (u y) []. For initialization, we suggest using g () = ζ () and h () = D A x () ζ () (Theorem ). The algorithm (33) computes multiplications by A and A only once per iteration and does not have to invert A A, unlike standard relaxed AL methods (). This property is especially useful when A A is large and unstructured. When α =, (33) reverts to the unrelaxed LALM in []. Lastly, we contrast our proposed relaxed LALM () with Chambolle s relaxed primal-dual algorithm [4, Algorithm ]. Both algorithms exhibit O(/K) ergodic (i.e., with respect to the time-averaged iterates) convergence rate and α-times speed-up when ψ =. Using () would require one more multiplication by A per iteration than in Chambolle s relaxed algorithm; however, the additional A is not required with quadratic loss in (33). When ψ, unlike Chambolle s relaxed algorithm in which one has to adjust the primal step size according to the value of α (effectively, one scales D ψ by /( α)) [4, Remark 6], the proposed relaxed LALM () does not require such step-size adjustment, which is especially useful when using α that is close to two. III. X-RAY CT IMAGE RECONSTRUCTION Consider the X-ray CT image reconstruction problem [3]: { } ˆx arg min x Ω y Ax W +R(x), (34) where A is the forward projection matrix of a CT scan [3], y is the noisy sinogram, W is the statistical diagonal weighting matrix, R denotes an edge-preserving regularizer, and Ω denotes a box-constraint on the image x. We focus on the edge-preserving regularizer R defined as: R(x) β i κ n κ n+si ϕ i ([C i x] n ), (3) i n where β i, s i, ϕ i, and C i denote the regularization parameter, spatial offset, potential function, and finite difference matrix in the ith direction, respectively, and κ n is a voxel-dependent weight for improving resolution uniformity [3, 33]. In our experiments, we used 3 directions to include all 6 neighbors in 3-D CT. A. Relaxed OS-LALM for faster CT reconstruction To solve X-ray CT image reconstruction (34) using the proposed relaxed LALM (33), we apply the following substitution: { A W / A y W / (36) y, and we set φ = ι Ω and ψ = R, where ι Ω (x) = if x Ω, and ι Ω (x) = + otherwise. The proximal mapping of ι Ω simply projects the input vector to the convex set Ω, e.g., clipping negative values of x to zero for a non-negativity constraint. Theorems developed in Section II considered the ergodic convergence rate of the non-negative duality gap, which is not a common convergence metric for X-ray CT image reconstruction. However, the ergodic convergence rate analysis suggests how factors like α, ρ, D A, and D ψ affect convergence speed (a LASSO regression example can be found in the supplementary material) and motivates our more practical (over-)relaxed OS-LALM summarized below. Algorithm : Proposed (over-)relaxed OS-LALM for (34). Input: M, α <, and an initial (FBP) image x. set ρ =, ζ = g = M L M (x), h = D L x ζ for k =,,... do for m =,,...,M do s = ρ(d L x h)+( ρ)g x + = [ x (ρd L +D R ) (s+ R(x)) ] Ω ζ = M L m (x + ) g + = ρ ρ+ (αζ +( α)g)+ ρ+ g h + = α(d L x + ζ)+( α)h decrease ρ using (37) end end Algorithm describes the proposed relaxed algorithm for solving the X-ray CT image reconstruction problem (34), wherel m denotes the data-fidelity term of the mth subset, and [ ] Ω is an operator that projects the input vector onto the convex set Ω, e.g., truncating zeros for Ω {x x i for all i}. All variables are updated in-place, and we use the superscript ( ) + to denote the new values that replace the old values. We also use the substitution s ρd L x γ + in the proposed method, so Algorithm has comparable form with the unrelaxed OS-LALM []; however, such substitution is not necessary. As seen in Algorithm, the proposed relaxed OS-LALM has the form of (33) but uses some modifications that violate assumptions in our theorems but speed up convergence in practice. First, although Theorem assumes a constant majorizing matrix D R for the Lipschitz term R (e.g., the maximum curvature of R), we use the iteration-dependent Huber s curvature of R [6] for faster convergence (the same in other algorithms for comparison). Second, since the updates in (33) depend only on the gradients of L, we can further accelerate the gradient computation by using partial projection data, i.e., ordered subsets. Lastly, we incorporate continuation technique (i.e., decreasing the AL penalty parameter ρ every iteration) in the proposed algorithm as described in the next subsection. To select the number of subsets, we used the rule suggested in [, Eqn. and 7]. However, since over-relaxation provides two-times acceleration, we used % of the suggested number of subsets (for the unrelaxed OS-LALM) yet achieved similar convergence speed (faster in runtime since fewer regularizer gradient evaluations are performed) and more

6 6 stable reconstruction. For the implicit linearization, we use the diagonal majorizing matrix diag{a WA} for A WA [], the same diagonal majorizing matrix D L for the quadratic loss function used in OS algorithms. Furthermore, Algorithm depicts the OS version of the simple relaxed algorithm (8) for solving (34) (derivation is omitted here). The main difference between Algorithm and Algorithm is the extra recursion of variable h. When α =, both algorithms revert to the unrelaxed OS-LALM []. Algorithm : Simple (over-)relaxed OS-LALM for (34). Input: M, α <, and an initial (FBP) image x. set ρ =, ζ = g = M L M (x) for k =,,... do for m =,,...,M do s = ρζ +( ρ)g x + = [ x (ρd L +D R ) (s+ R(x)) ] Ω ζ + = M L m (x + ) g + = ρ ρ+ (αζ+ +( α)g)+ ρ+ g decrease ρ using (37) end end B. Further speed-up with continuation We also use a continuation technique [] to speed up convergence; that is, we decrease ρ gradually with iteration. Note that ρd L +D R is the inverse of the voxel-dependent step size of image updates; decreasing ρ increases step sizes gradually as iteration progress. Due to the extra relaxation parameter α, the good decreasing continuation sequence differs from that in []. We use the following α-dependent continuation sequence for the proposed relaxed LALM ( α < ):, if k = ρ k (α) = π α(k+) ( π α(k+) ), otherwise. (37) The supplementary material describes the rationale for this continuation sequence. When using OS, ρ decreases every subiteration, and the counter k in (37) denotes the number of subiterations, instead of the number of iterations. IV. EXPERIMENTAL RESULTS This section reports numerical results for 3-D X-ray CT image reconstruction using one conventional algorithm (OS-SQS []) and four contemporary algorithms: OS-FGM: the OS variant of the standard fast gradient method proposed in [, 9], OS-LALM: the OS variant of the unrelaxed linearized AL method proposed in [], OS-OGM: the OS variant of the optimal fast gradient method proposed in [9], and Relaxed OS-LALM: the OS variants of the proposed relaxed linearized AL methods given in Algorithm (proposed) and Algorithm (simple) above (α =.999 unless otherwise specified). A. XCAT phantom We simulated an axial CT scan using a XCAT phantom [34] for mm transaxial field-of-view (FOV), where x = y =.4883 mm and z =.6 mm. An ([detector columns] [detector rows] [projection views]) noisy (with Poisson noise) sinogram is numerically generated with GE LightSpeed fan-beam geometry corresponding to a monoenergetic source at 7 kev with incident photons per ray and no scatter. We reconstructed a 9 image volume with a coarser grid, where x = y =.9776 mm and z =.6 mm. The statistical weighting matrix W is defined as a diagonal matrix with diagonal entries w j exp( y j ), and an edge-preserving regularizer is used with ϕ i (t) δ ( t/δ log(+ t/δ )) (δ = HU) and parameters β i set to achieve a reasonable noise-resolution trade-off. We used subsets for the relaxed OS-LALM, while [, Eqn. ] suggests using about 4 subsets for the unrelaxed OS-LALM. Figure shows the cropped images (displayed from 8 to HU [modified so that air is ]) from the central transaxial plane of the initial FBP image x (), the reference reconstruction x (generated by running thousands of iterations of the convergent FGM with adaptive restart [3]), and the reconstructed image x () using the proposed algorithm (relaxed OS-LALM with subsets) after iterations. There is no visible difference between the reference reconstruction and our reconstruction. To analyze the proposed algorithm quantitatively, Figure shows the RMS differences between the reference reconstruction x and the reconstructed image x (k) using different algorithms as a function of iteration 3 with and 4 subsets. As seen in Figure, the proposed algorithm (cyan curves) is approximately twice as fast as the unrelaxed OS-LALM (green curves) at least in early iterations. Furthermore, comparing with OS-FGM and OS-OGM, the proposed algorithm converges faster and is more stable when using more subsets for acceleration. Difference images using different algorithms and additional experimental results are shown in the supplementary material. To illustrate the improved speed-up of the proposed relaxation (Algorithm ) over the simple one (Algorithm ), Figure 3 shows convergence rate curves of different relaxed algorithms ( subsets and α =.999) with (a) a fixed AL penalty parameter ρ =. and (b) the decreasing sequence ρ k in (37). As seen in Figure 3(a), the simple relaxation does not provide much acceleration, especially after iterations. In contrast, the proposed relaxation accelerates convergence about twice (i.e., α-times), as predicted by Theorem. When the decreasing sequence of ρ k is used, as seen in Figure 3(b), the simple relaxation seems to provide somewhat more acceleration than before; however, the proposed relaxation still outperforms the simple one, illustrating approximately twofold speed-up over the unrelaxed counterpart. 3 All algorithms listed above require one forward/back-projection pair and M (the number of subsets) regularizer gradient evaluations (plus some negligible overhead) per iteration, so comparing the convergence rate as a function of iteration is fair.

7 7 FBP Converged Proposed 9 Fig. : XCAT: Cropped images (displayed from 8 to HU) from the central transaxial plane of the initial FBP image x () (left), the reference reconstruction x (center), and the reconstructed image x () using the proposed algorithm (relaxed OS-LALM with subsets) after iterations (right) OS SQS OS FGM OS LALM OS OGM Relaxed OS LALM 4 3 OS SQS OS FGM OS LALM OS OGM Relaxed OS LALM HU HU HU (a) subsets HU (b) 4 subsets Fig. : XCAT: Convergence rate curves of different OS algorithms with (a) subsets and (b) 4 subsets. The proposed relaxed OS-LALM with subsets exhibits similar convergence rate as the unrelaxed OS-LALM with 4 subsets. 4 3 OS SQS OS LALM Relaxed OS LALM (Algorithm ) Relaxed OS LALM (Algorithm ) 4 3 OS SQS OS LALM Relaxed OS LALM (Algorithm ) Relaxed OS LALM (Algorithm ) HU HU HU (a) Fixed ρ =. HU (b) Decreasing ρ k in (37) Fig. 3: XCAT: Convergence rate curves of different relaxed algorithms ( subsets and α =.999) with (a) a fixed AL penalty parameter ρ =. and (b) the decreasing sequence ρ k in (37).

8 8 FBP Converged Proposed 9 Fig. 4: Chest: Cropped images (displayed from 8 to HU) from the central transaxial plane of the initial FBP image x () (left), the reference reconstruction x (center), and the reconstructed image x () using the proposed algorithm (relaxed OS-LALM with subsets) after iterations (right). 8 3 OS SQS OS FGM OS LALM OS OGM Relaxed OS LALM 3 OS SQS OS FGM OS LALM OS OGM Relaxed OS LALM HU HU HU HU (a) subsets (b) subsets Fig. : Chest: Convergence rate curves of different OS algorithms with (a) subsets and (b) subsets. The proposed relaxed OS-LALM with subsets exhibits similar convergence rate as the unrelaxed OS-LALM with subsets. B. Chest scan We reconstructed a 6 6 image volume, where x = y =.667 mm and z =.6 mm, from a chest region helical CT scan. The size of sinogram is and pitch. (about 3.7 rotations with rotation time.4 seconds). The tube current and tube voltage of the X-ray source are 7 ma and kvp, respectively. We started from a smoothed FBP image x () and tuned the statistical weights [36] and the q-generalized Gaussian MRF regularization parameters [33] to emulate the MBIR method [3, 37]. We used subsets for the relaxed OS-LALM, while [, Eqn. 7] suggests using about subsets for the unrelaxed OS-LALM. Figure 4 shows the cropped images from the central transaxial plane of the initial FBP imagex (), the reference reconstructionx, and the reconstructed image x () using the proposed algorithm (relaxed OS-LALM with subsets) after iterations. Figure shows the RMS differences between the reference reconstruction x and the reconstructed image x (k) using different algorithms as a function of iteration with and subsets. The proposed relaxed OS-LALM shows about two-times faster convergence rate, comparing to its unrelaxed counterpart, with moderate number of subsets. The speed-up diminishes as the iterate approaches the solution. Furthermore, the faster relaxed OS-LALM seems likely to be more sensitive to gradient approximation errors and exhibits ripples in convergence rate curves when using too many subsets for acceleration. In contrast, the slower unrelaxed OS-LALM is less sensitive to gradient error when using more subsets and does not exhibit such ripples in convergence rate curves. Compared with OS-FGM and OS-OGM, the proposed relaxed OS-LALM has smaller limit cycles and might be more stable for practical use. V. DISCUSSION AND CONCLUSIONS In this paper, we proposed a non-trivial relaxed variant of LALM and applied it to X-ray CT image reconstruction. Experimental results with simulated and real CT scan data showed that our proposed relaxed algorithm converges about twice as fast as its unrelaxed counterpart, outperforming stateof-the-art fast iterative algorithms using momentum [, 9].

9 9 This speed-up means that one needs fewer subsets to reach an RMS difference criteria like HU in a given number of iterations. For instance, we used % of the number of subsets suggested by [] (for the unrelaxed OS-LALM) in our experiment but found similar convergence speed with overrelaxation. Moreover, using fewer subsets can be beneficial for distributed computing [38], reducing communication overhead required after every update. ACKNOWLEDGMENT The authors thank GE Healthcare for providing sinogram data in our experiments. The authors would also like to thank the anonymous reviewers for their comments and suggestions. REFERENCES [] J. A. Fessler, Penalized weighted least-squares image reconstruction for positron emission tomography, IEEE Trans. Med. Imag., vol. 3, pp. 9, June 994. [] J. Nuyts, B. De Man, J. A. Fessler, W. Zbijewski, and F. J. Beekman, Modelling the physics in iterative reconstruction for transmission computed tomography, Phys. Med. Biol., vol. 8, pp. R63 96, June 3. [3] J.-B. Thibault, K. Sauer, C. Bouman, and J. Hsieh, A three-dimensional statistical approach to improved image quality for multi-slice helical CT, Med. Phys., vol. 34, pp , Nov. 7. [4] J. A. Fessler and S. D. Booth, Conjugate-gradient preconditioning methods for shift-variant PET image reconstruction, IEEE Trans. Im. Proc., vol. 8, pp , May 999. [] H. Erdoğan and J. A. Fessler, Ordered subsets algorithms for transmission tomography, Phys. Med. Biol., vol. 44, pp. 83, Nov [6] S. Ramani and J. A. Fessler, A splitting-based iterative algorithm for accelerated statistical X-ray CT reconstruction, IEEE Trans. Med. Imag., vol. 3, pp , Mar.. [7] S. Ahn and J. A. Fessler, Globally convergent image reconstruction for emission tomography using relaxed ordered subsets algorithms, IEEE Trans. Med. Imag., vol., pp. 63 6, May 3. [8] S. Ahn, J. A. Fessler, D. Blatt, and A. O. Hero, Convergent incremental optimization transfer algorithms: Application to tomography, IEEE Trans. Med. Imag., vol., pp , Mar. 6. [9] Z. Yu, J.-B. Thibault, C. A. Bouman, K. D. Sauer, and J. Hsieh, Fast model-based X-ray CT reconstruction using spatially non-homogeneous ICD optimization, IEEE Trans. Im. Proc., vol., pp. 6 7, Jan.. [] D. Kim, S. Ramani, and J. A. Fessler, Combining ordered subsets and momentum for accelerated X-ray CT image reconstruction, IEEE Trans. Med. Imag., vol. 34, pp , Jan.. [] H. Nien and J. A. Fessler, Fast X-ray CT image reconstruction using a linearized augmented Lagrangian method with ordered subsets, IEEE Trans. Med. Imag., vol. 34, pp , Feb.. [] Y. Nesterov, A method for unconstrained convex minimization problem with the rate of convergence O(/k ), Dokl. Akad. Nauk. USSR, vol. 69, no. 3, pp. 43 7, 983. [3] Y. Nesterov, Smooth minimization of non-smooth functions, Mathematical Programming, vol. 3, pp. 7, May. [4] X. Zhang, M. Burger, and S. Osher, A unified primal-dual algorithm framework based on Bregman iteration, Journal of Scientific Computing, vol. 46, no., pp. 46,. [] D. Gabay and B. Mercier, A dual algorithm for the solution of nonlinear variational problems via finite-element approximations, Comput. Math. Appl., vol., no., pp. 7, 976. [6] J. Eckstein and D. P. Bertsekas, On the Douglas-Rachford splitting method and the proximal point algorithm for maximal monotone operators, Mathematical Programming, vol., pp , Apr. 99. [7] S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein, Distributed optimization and statistical learning via the alternating direction method of multipliers, Found. & Trends in Machine Learning, vol. 3, no., pp.,. [8] D. Kim and J. A. Fessler, Optimized momentum steps for accelerating X-ray CT ordered subsets image reconstruction, in Proc. 3rd Intl. Mtg. on image formation in X-ray CT, pp. 3 6, 4. [9] D. Kim and J. A. Fessler, Optimized first-order methods for smooth convex minimization, Mathematical Programming, 6. [] Y. Nesterov, On an approach to the construction of optimal methods of minimization of smooth convex functions, Ekonomika i Mateaticheskie Metody, vol. 4, pp. 9 7, 988. In Russian. [] A. Beck and M. Teboulle, A fast iterative shrinkage-thresholding algorithm for linear inverse problems, SIAM J. Imaging Sci., vol., no., pp. 83, 9. [] E. X. Fang, B. He, H. Liu, and X. Yuan, Generalized alternating direction method of multipliers: New theoretical insight and application, Math. Prog. Comp., vol. 7, pp , June. [3] A. Chambolle and T. Pock, A first-order primal-dual algorithm for convex problems with applications to imaging, J. Math. Im. Vision, vol., no., pp. 4,. [4] A. Chambolle and T. Pock, On the ergodic convergence rates of a first-order primal-dual algorithm, Mathematical Programming, 6. [] D. P. Bertsekas, Nonlinear programming. Belmont: Athena Scientific, ed., 999. [6] H. Erdoğan and J. A. Fessler, Monotonic algorithms for transmission tomography, IEEE Trans. Med. Imag., vol. 8, pp. 8 4, Sept [7] P. J. Huber, Robust statistics. New York: Wiley, 98. [8] S. Boyd and L. Vandenberghe, Convex optimization. UK: Cambridge, 4. [9] S. Azadi and S. Sra, Towards an optimal stochastic alternating direction method of multipliers, in Proc. Intl. Conf. on Mach. Learning, pp. 6 8, 4. [] Y. Ouyang, Y. Chen, G. Lan, and E. Pasiliao Jr., An accelerated linearized alternating direction method of multipliers, SIAM J. Imaging Sci., vol. 8, no., pp ,. [3] Y. Long, J. A. Fessler, and J. M. Balter, 3D forward and back-projection for X-ray CT using separable footprints, IEEE Trans. Med. Imag., vol. 9, pp. 839, Nov.. [3] J. A. Fessler and W. L. Rogers, Spatial resolution properties of penalized-likelihood image reconstruction methods: Space-invariant tomographs, IEEE Trans. Im. Proc., vol., pp , Sept [33] J. H. Cho and J. A. Fessler, Regularization designs for uniform spatial resolution and noise properties in statistical image reconstruction for 3D X-ray CT, IEEE Trans. Med. Imag., vol. 34, pp , Feb.. [34] W. P. Segars, M. Mahesh, T. J. Beck, E. C. Frey, and B. M. W. Tsui, Realistic CT simulation using the 4D XCAT phantom, Med. Phys., vol. 3, pp. 38 8, Aug. 8. [3] B. O Donoghue and E. Candès, Adaptive restart for accelerated gradient schemes, Found. Comp. Math., vol., pp. 7 3, June. [36] Z. Chang, R. Zhang, J.-B. Thibault, K. Sauer, and C. Bouman, Statistical x-ray computed tomography from photon-starved measurements, in Proc. SPIE 9 Computational Imaging XII, p. 9G, 4. [37] W. P. Shuman, D. E. Green, J. M. Busey, O. Kolokythas, L. M. Mitsumori, K. M. Koprowicz, J.-B. Thibault, J. Hsieh, A. M. Alessio, E. Choi, and P. E. Kinahan, Model-based iterative reconstruction versus adaptive statistical iterative reconstruction and filtered back projection in 64-MDCT: Focal lesion detection, lesion conspicuity, and image noise, Am. J. Roentgenol., vol., pp. 7 6, May 3. [38] J. M. Rosen, J. Wu, T. F. Wenisch, and J. A. Fessler, Iterative helical CT reconstruction in the cloud for ten dollars in five minutes, in Proc. Intl. Mtg. on Fully 3D Image Recon. in Rad. and Nuc. Med, pp. 4 4, 3.

10 Relaxed Linearized Algorithms for Faster X-Ray CT Image Reconstruction: Supplementary Material Hung Nien, Member, IEEE, and Jeffrey A. Fessler, Fellow, IEEE arxiv:.464v [math.oc] 4 Dec This supplementary material for [] has three parts. The first part analyzes the convergence rate of the simple and proposed relaxed linearized augmented Lagrangian (AL) methods (LALM s) in [] for solving an equality-constrained composite convex optimization problem. We demonstrate the convergence rate bound and the effect of relaxation with a numerical example (LASSO regression). The second part derives the continuation sequence we used in []. The third part shows additional experimental results of applying the proposed relaxed LALM with ordered subsets (OS) for solving model-based X-ray computed tomography (CT) image reconstruction problems. The additional experimental results are consistent with the results we showed in [], illustrating the efficiency and stability of the proposed relaxed OS-LALM over existing methods. I. CONVERGENCE RATE ANALYSES OF THE SIMPLE AND PROPOSED LALM S We begin by considering a more general equality-constrained composite convex optimization problem (for which the equalityconstrained minimization problem considered in [] is a special case): { } (ˆx,û) arg min f(x,u) g(u)+h(x) s.t. Kx+Bu = b, () x,u where both g and h are closed and proper convex functions. We further decompose h φ+ψ into two convex functions φ and ψ, where φ is simple in the sense that it has an efficient proximal mapping, e.g., soft-shrinkage for the l -norm, and ψ is continuously differentiable with D ψ -Lipschitz gradients (defined in []). One example of h is the edge-preserving regularizer with a non-negativity constraint (e.g., sum of a corner-rounded total-variation [TV] regularizer and the characteristic function of the non-negativity set) used in statistical image reconstruction methods [, ]. As mentioned in [], solving a composite convex optimization problem with equality constraints like () is equivalent to finding a saddle-point of the Lagrangian: L(x,u,µ) f(x,u) µ,kx+bu b, () where µ is the Lagrange multiplier of the equality constraint [3, p. 37]. In other words, (ˆx, û, ˆµ) solves the minimax problem: (ˆx,û, ˆµ) arg min max x,u µ Moreover, since (ˆx, û, ˆµ) is a saddle-point of L, the following inequalities hold for any x, u, and µ, and the duality gap function: L(x,u,µ). (3) L(x,u, ˆµ) L(ˆx,û, ˆµ) L(ˆx,û,µ) (4) G(x,u,µ;ˆx,û, ˆµ) L(x,u, ˆµ) L(ˆx,û,µ) = [ f(x,u) f(ˆx,û) ] ˆµ,Kx+Bu b () characterizes the accuracy of an approximate solution (x,u,µ) to the saddle-point problem (3). Note that Kˆx+Bû b = due to the equality constraint. We consider the following (generalized alternating direction method of multipliers [ADMM]) iteration: { x (k+) arg min φ(x)+ ψ ( x (k)),x + x { u (k+) arg min g(u) µ (k),bu + ρ u x x (k) µ (k),kx + ρ D ψ αkx (k+) +( α) ( b Bu (k)) +Bu b Kx+Bu (k) b + } x x (k) } P µ (k+) = µ (k) ρ ( αkx (k+) +( α) ( b Bu (k)) +Bu (k+) b ) (6) and show that the duality gap of the time-averaged solution w K = (x K,u K,µ K ) it generates converges to zero at rate O(/K), where K is the number of iterations, c K K K k= c(k) (7) denotes the time-average of some iterate c (k) for k =,...,K, ρ > is the corresponding AL penalty parameter, P is a positive semi-definite weighting matrix, and < α < is the relaxation parameter. This work is supported in part by National Institutes of Health (NIH) grants U-EB-873 and by equipment donations from Intel Corporation. Hung Nien and Jeffrey A. Fessler are with the Department of Electrical Engineering and Computer Science, University of Michigan, Ann Arbor, MI 489, USA ( {hungnien,fessler}@umich.edu). Date of current version: December,.

11 A. Preliminaries The convergence rate analysis of the iteration (6) is inspired by previous work [4 9]. For simplicity, we use the following notations: w x u, w x u, λ (k+) µ (k) ρ ( Kx (k+) +Bu (k) b ) K λ, and F(w) B λ. (8) µ λ Kx+Bu b We also introduce three matrices: D ψ +P H ρ α B B α α α B α B αρ I, M I D ψ +P I, and Q HM = ρb B ( α)b. (9) ρb αi B ρ I The following lemmas show the properties of vectors and matrices defined in (8) and (9) and an identity used in our derivation. Lemma. The matrix H defined in (9) is positive semi-definite for any < α < and ρ >. Proof. For any w, completing the square yields w Hw = x (D ψ +P)x+ ρ α u B Bu+ ( α) α u B µ+ αρ µ µ = x D ψ +P + ( α ρbu + sgn( α) α ( ρbu) ( ρ µ ) + ρ µ ) = x D ψ +P + α( α ρbu+sgn( α) ρ µ +( α )( ρbu + ρ µ )). () All terms in () are non-negative for any < α < and ρ >. Thus under such conditions, w Hw for any w, and H is positive semi-definite. Lemma. For any k, we have w (k) w (k+) = M ( w (k) w (k+)). Proof. Since two stacked vectors (x and u) of w and w are the same, we need only show that µ (k) µ (k+) is equal to α ( µ (k) λ (k+)) ρb ( u (k) u (k+)) for any k. By the definition of λ (k+) in (8), we have Then, by the definition of the µ-update in (6), we get Thus the lemma holds. µ (k) λ (k+) = ρ ( Kx (k+) +Bu (k) b ). () µ (k) µ (k+) = ρ ( αkx (k+) +( α) ( b Bu (k)) +Bu (k+) b ) = ρ ( α ( Kx (k+) +Bu (k) b ) +B ( u (k+) u (k))) = α ( ρ ( Kx (k+) +Bu (k) b )) ρb ( u (k) u (k+)) = α ( µ (k) λ (k+)) ρb ( u (k) u (k+)). () Lemma 3. For any positive semi-definite matrix M and vectors x, x, x 3, and x 4, we have (x x ) M(x 3 x 4 ) = x x 4 M x x 3 M + x x 3 M x x 4 M. (3) Proof. The proof is omitted here. It can be verified by expanding out all the inner product and norms on both sides. B. Main results In the following theorem, we show that the duality gap defined in () of the time-averaged iterates w K = (x K,u K,µ K ) in (6) converges at rate O(/K), where K denotes the number of iterations. Theorem. Let w K = (x K,u K,µ K ) be the time-averages of iterates in (6) where ρ >, < α <, and P is positive semi-definite. We have G ( w K ;ŵ ) = [ f ( ) ] x K,u K f(ˆx,û) ˆµ,KxK +Bu K b { K x () ˆx + D ψ x () ˆx P + α [ ρ B ( u () û ) + ρ µ () ˆµ ] }. (4)

12 3 Proof. We first focus on the x-update in (6). By the convexity of ψ, we have ψ ( x (k+)) ψ ( x (k)) + ψ ( x (k)),x (k+) x (k) + x (k+) x (k) D ψ = ψ ( x (k)) + ψ ( x (k)),x x (k) + ψ ( x (k)),x (k+) x + x (k+) x (k) D ψ ψ(x)+ ψ ( x (k)),x (k+) x + x (k+) x (k) () D ψ for any x. Moving ψ(x) to the left-hand side leads to ψ ( x (k+)) ψ(x) ψ ( x (k)),x (k+) x + x (k+) x (k). (6) D ψ Moreover, by the optimality condition of the x-update in (6), we have so φ ( x (k+)) + ψ ( x (k)) +D ψ ( x (k+) x (k)) K ( µ (k) ρ ( Kx (k+) +Bu (k) b )) +P ( x (k+) x (k)), (7) φ ( x (k+)) ψ ( x (k)) D ψ ( x (k+) x (k)) +K λ (k+) P ( x (k+) x (k)). (8) By the definition of subgradient for the convex function φ, it follows that φ(x) φ ( x (k+)) + φ ( x (k+)),x x (k+) = φ ( x (k+)) + x (k+) x, K λ (k+) + ψ ( x (k)),x (k+) x + x (k+) x,(d ψ +P) ( x (k+) x (k)) (9) for all x. Rearranging (9) leads to [ φ ( x (k+) ) φ(x) ] + x (k+) x, K λ (k+) ψ ( x (k)),x (k+) x + x (k+) x,(d ψ +P) ( x (k) x (k+)). () Summing (6) and (), we get the first inequality: [ h ( x (k+) ) h(x) ] + x (k+) x, K λ (k+) x (k+) x,(d ψ +P) ( x (k) x (k+)) + x (k+) x (k) D ψ. () Following the same procedure, by the optimality condition of the u-update in (6), we have g(u) g ( u (k+)) + g ( u (k+)),u u (k+) = g ( u (k+)) + u (k+) u, B µ (k+) () for any u. To substitute µ (k+) in (), subtracting and adding λ (k+) on the left-hand side of () and rearranging it yield µ (k+) = λ (k+) +( α) ( µ (k) λ (k+)) +ρb ( u (k) u (k+)). (3) Substituting (3) into () and rearranging it, we get the second inequality: [ g ( u (k+) ) g(u) ] + u (k+) u, B λ (k+) u (k+) u,ρb B ( u (k) u (k+)) +( α)b ( µ (k) λ (k+)). (4) The third step differes a bit from the previous ones because the µ-update in (6) is not a minimization problem. By (), we have Kx (k+) +Bu (k+) b = B ( u (k) u (k+)) + ρ( µ (k) λ (k+)). () This gives the third equality: λ (k+) µ,kx (k+) +Bu (k+) b = λ (k+) µ, B ( u (k) u (k+)) + ρ( µ (k) λ (k+)) (6) for any µ. Summing (), (4), and (6), we can write it compactly as [ f ( x (k+),u (k+)) f(x,u) ] + w (k+) w,f ( w (k+)) w (k+) w,q ( w (k) w (k+)) + x (k+) x (k) D ψ. (7) By Lemma (note that Q = HM) and Lemma 3, the first term on the right-hand side of (7) can be expressed as w (k+) w,h ( w (k) w (k+)) = w (k+) w (k+) H w (k+) w (k) H + w (k) w H w (k+) w H. (8) Moreover, the first term on the right-hand side of (8) is λ (k+) µ (k+) = ( ρb u (k+) u (k)) +( α) ( λ (k+) µ (k)) (9) αρ αρ

13 4 by (3), and the second term on the right-hand side of (8) is x (k+) x (k) D ψ +P + αρ ρb ( u (k+) u (k)) + ( α) αρ ρb ( u (k+) u (k)),λ (k+) µ (k) + αρ λ (k+) µ (k) = x (k+) x (k) D ψ +P + ( αρ ρb u (k+) u (k)) +( α) ( λ (k+) µ (k)) + α ρ λ (k+) µ (k). () Substituting (9) and () into (8), we can upper bound the inequality (7) by [ ( f x (k+),u (k+)) f(x,u) ] + w (k+) w,f ( w (k+)) w (k) w H w (k+) w H x (k+) x (k) α D ψ +P ρ λ (k+) µ (k) + x (k+) x (k) D ψ w (k) w H w (k+) w H x (k+) x (k) P α ρ λ (k+) µ (k) w (k) w H w (k+) w (3) H because P is positive semi-definite and α > for α (,). To show the convergence rate of (6), let (x,u,µ) = ŵ (ˆx,û, ˆµ). The last term on the left-hand side of (3) can be represented as w (k+) ŵ,f ( w (k+)) = x (k+) ˆx, K λ (k+) + u (k+) û, B λ (k+) + λ (k+) ˆµ,Kx (k+) +Bu (k+) b = λ (k+),kˆx Kx (k+) +Bû Bu (k+) +Kx (k+) +Bu (k+) b ˆµ,Kx (k+) +Bu (k+) b = λ (k+),kˆx+bû b ˆµ,Kx (k+) +Bu (k+) b = ˆµ,Kx (k+) +Bu (k+) b. (3) Note that Kˆx+Bû b = due to the equality constraint. Using (3) yields G ( w (k+) ;ŵ ) = [ f ( x (k+),u (k+)) f(ˆx,û) ] + w (k+) ŵ,f ( w (k+)) w (k) ŵ H w (k+) ŵ H. (33) Summing (33) from k =,...,K, dividing both sides by K, and applying Jensen s inequality to the convex function f, we have G ( w K ;ŵ ) = [ f ( ) ] x K,u K f(ˆx,û) ˆµ,KxK +Bu K b K( w () ŵ H w (K) ŵ H) K w () ŵ (34) H since H is positive semi-definite for any α (,) and ρ > (Lemma ). To finish the analysis, the remaining task is to upper bound w () ŵ. Note that H w () ŵ can be expressed as H x () ˆx + D ψ x () ˆx [ ] u P + () [ û ρb B ( α)b ][ ] u () û α µ () ˆµ ( α)b ρ I µ (). (3) ˆµ The last term in (3) can be further expressed as and upper bounded by [ α ρ ( B u () û ) +( α) B( u () û ),µ () ˆµ + ρ µ () ˆµ ] [ α ρ B ( u () û ) + α B ( u () û ) µ () ˆµ + ρ µ () ˆµ ] [ α ρ ( B u () û ) + ( B u () û ) µ () ˆµ + ρ µ () ˆµ ] [ ρ ( B u () û ) + µ () ρ ˆµ ] (36) = α due to the fact that < α <. Combining (34), (3), and (36), we get our final convergence rate bound: G ( w K ;ŵ ) = [ f ( ) ] x K,u K f(ˆx,û) ˆµ,KxK +Bu K b { K x () ˆx + D ψ x () ˆx P + α [ ρ B ( u () û ) + ρ µ () ˆµ ] }. (37) Theorem can be used to show the convergence rates of other AL-based algorithms. The following theorems show the convergence rates of the simple and the proposed relaxed LALM s in []. From now on, suppose A is an m n matrix, and let G D A A A, where D A is a diagonal majorizing matrix of A A.

14 Theorem ([, Theorem ]). Let K = A, B = I m, b = m, and P = ρg. The iteration (6) with ρ > and < α < reduces to the simple relaxed LALM that achieves a convergence rate ( ) G(w K ;ŵ) K ADψ +B ρ,da +C α,ρ, (38) where the first two constants depend on how far the initial guess is from a minimizer, and the last constant depends on the relaxation parameter. C α,ρ α A Dψ x () ˆx (39) D ψ B ρ,da ρ x () ˆx D A A () A [ ρ u () û + ρ µ () ˆµ ] (4) Proof. One just uses the substitutions K = A, B = I m, b = m, and P = ρg in Theorem to prove the theorem. As seen in Theorem, the convergence rate of the simple relaxed LALM scales well with the relaxation parameter α iff C α,ρ A Dψ and C α,ρ B ρ,da. When ψ has large curvature or D A is a loose majorizing matrix of A A (like in X-ray CT), the above inequalities do not hold, leading to worse scalability of convergence rate with the relaxation parameter α. This motivated the proposed relaxed LALM [] whose convergence rate analysis is shown below. Theorem 3 ([, Theorem ]). Let K = [ A G /], B = Im+n, b = m+n, and P =. The iteration (6) with ρ > and < α < reduces to the proposed relaxed LALM [] that achieves a convergence rate G ( ) (w K ;ŵ) K ADψ +B α,ρ,da +C α,ρ, (4) where A Dψ and C α,ρ were defined in (39) and (4), and B α,ρ,da ρ v () ˆv = ρ x () ˆx D A A (43) A when initializing v and ν as v () = G / x () and ν () = n, respectively. α Proof. Applying the substitutions K = [ A G /], B = Im+n, b = m+n, and P = to Theorem, except for the upper bounding (36), yields G ( w K ;ŵ ) K( ADψ +D α,ρ ), (44) where α u () û ρi m ( α)i m u () û D α,ρ v () ˆv ρi n ( α)i n α µ () ˆµ ( α)i m ρ I v () ˆv m µ () ˆµ, (4) ν () ˆν ( α)i n ρ I n ν () ˆν and v and ν are the auxiliary variable and Lagrange multiplier of the additional redundant equality constraint v = G / x in [], respectively. Note that ν (k) = n for k =,,... if we initialize ν as ν () = n, and ˆν = n []. We have ν () ˆν = n. Hence, (4) is further upper bounded by Let u () û ρi m ( α)i m u () û v () ˆv ρi n v () ˆv µ () ˆµ ( α)i m ρ I m µ () ˆµ ( = α ρ u () û ( α) u() û,µ () ˆµ + ρ µ () ˆµ ) + ρ α v () ˆv [ ρ α u () û + ρ µ () ˆµ ] + ρ α G / x () G /ˆx = C α,ρ + ρ x () ˆx D A A A. (46) D α,ρ = α α Thus, the convergence rate of the proposed relaxed LALM [] is upper bounded by B α,ρ,da ρ x () ˆx D A A A. (47) α G ( w K ;ŵ ) K( ADψ +B α,ρ,da +C α,ρ ). (48)

15 6 C. Practical implementation of the proposed relaxed LALM Although the proposed relaxed LALM shows better scalability of the convergence rate with the relaxation parameter α, a straightforward implementation with substitutions in Theorem 3 is not recommended because there is no efficient way to compute the square root of G for any A in general. For practical implementation, we must avoid using multiplication by G / in both the x- and v-updates. To derive the practical implementation, we first substitue K = [ A G /], B = I, b =, and P = in (6). This leads to the following iterates (i.e., [, Eqn. ]): { ( x (k+) arg min φ(x)+q ) ψ x;x (k) µ (k),ax ν (k),g / x + ρ x Ax u (k) + ρ G / x v (k) } { u (k+) arg min g(u)+ µ (k),u + ρ (k+) u r u,α u } µ (k+) = µ (k) ρ ( r (k+) u,α u (k+)) (49) v (k+) = r v,α (k+) ρ ν (k) ν (k+) = ν (k) ρ ( r v,α (k+) v (k+)), where Q ψ is a separable quadratic surrogate (SQS) of ψ at x (k) [, Eqn. 7], r u,α is the relaxation variable of u, and r v,α is the relaxation variable of v. Suppose ν () =. Then ν (k) = for k =,,..., and (49) can be further simplified as { ( x (k+) arg min φ(x)+q ) ψ x;x (k) µ (k),ax + ρ x Ax u (k) + ρ G / x v (k) } { u (k+) arg min g(u)+ µ (k),u + ρ (k+) u r u,α u } µ (k+) = µ (k) ρ ( r u,α (k+) u (k+)) () v (k+) = αg / x (k+) +( α)v (k). Let h G / v+a y. By the v-update in (), we have h (k+) = G / v (k+) +A y = G /( αg / x (k+) +( α)v (k)) +A y = α ( Gx (k+) +A y ) +( α) ( G / v (k) +A y ) = α ( D A x (k+) A ( Ax (k+) y )) +( α)h (k). () To avoid multiplication by G / in the x-update in (), we rewrite the last three terms in the x-update cost function using Taylor s expansion around x (k). That is, µ (k),ax + ρ Ax u (k) + ρ G / x v (k) ρ Ax u (k) ρ µ (k) + ρ G / x v (k) ( x x (k)) ( ρa ( Ax (k) u (k) ρ µ (k)) +ρ ( Gx (k) G / v (k))) + x x (k) ρa A+ρG = ( x x (k)) ( ( ρ A A+G ) x (k) ρa ( u (k) +ρ µ (k)) ρg / v (k)) + x x (k) ρ(a A+G) = ( x x (k)) ( ρda x (k) ρa ( u (k) y+ρ µ (k)) ρh (k)) + x x (k) ρd A x x (k) +(ρd A ) ( ρd A x (k) ρa ( u (k) y+ρ µ (k)) ρh (k)) ρd A x (ρda ) ( ρa ( u (k) y+ρ µ (k)) +ρh (k)) ρd A. () = The practical relaxed LALM without multiplication by G / becomes { ( x (k+) arg min φ(x)+q ) ψ x;x (k) + x x (ρda ) ( ρa ( u (k) y+ρ µ (k)) +ρh (k)) } { ρd A u (k+) arg min g(u)+ µ (k),u + ρ (k+) u r u,α u } µ (k+) = µ (k) ρ ( r (k+) u,α u (k+)) h (k+) = α ( D A x (k+) A ( Ax (k+) y )) +( α)h (k). (3) D. Numerical example: LASSO regression Here we describe a numerical example that demonstrates the convergence rate bound and the effect of relaxation. Consider the following l -regularized linear regression problem: { } ˆx arg min x y Ax +λ x, (4)

16 7 Primal dual gap 4 3 Bound (α =, ρ =.) E gap (α =, ρ =.) Bound (α, ρ =.) E gap (α, ρ =.) Bound (α =, ρ =.) E gap (α =, ρ =.) Bound (α, ρ =.) E gap (α, ρ =.) Primal dual gap E gap (α =, ρ =.) NE gap (α =, ρ =.) E gap (α, ρ =.) NE gap (α, ρ =.) E gap (α =, ρ =.) NE gap (α =, ρ =.) E gap (α, ρ =.) NE gap (α, ρ =.) 3 4 (a) Bound (4) vs. ergodic gap (E-gap) 6 8 (b) Ergodic gap (E-gap) vs. non-ergodic gap (NE-gap) Fig. : LASSO regression: Duality gap curves of relaxed LALM with different relaxation parameters and AL penalty parameters. (a) Bound (4) vs. ergodic gap, and (b) ergodic gap vs. non-ergodic gap. where A IR m n, and n m in general. This is a widely studied problem in the field of statistics (also known as LASSO regression) and compressed sensing for seeking a sparse solution of a linear system with small measurement errors. To solve this problem using the proposed relaxed LALM (3), we focused on the following equivalent equality-constrained minimization problem: (ˆx,û) arg min x,u { } y u +λ x s.t. u = Ax () with φ = λ, ψ =, D A = λ max (A A)I, and g = y. We set x() = A y, u () = Ax (), µ () = y u (), and h () = D A x () A ( Ax () y ). Data for numerical instances were generated as follows. The entries of the system matrix A IR were sampled from an iid standard normal distribution. The hidden sparse vector x s IR was a randomly generated -sparse vector, and the noisy measurement y = Ax s +n, where n IR was sampled from an iid N(,.). The regularization parameter λ was set to be unity in our experiment. Figure shows the duality gap curves of relaxed LALM with different relaxation parameters (α =,.999) and AL penalty parameters (ρ =.,.). As seen in Figure (a), the ergodic duality gap G (w K ;ŵ) converges at rate O(/k), and the bound derived in Theorem 3 is a tighter upper bound for large number of iterations. Furthermore, as seen in Figure (b), the non-ergodic duality gap G ( w (K) ;ŵ ) converges much faster than the ergodic one, and we can achieve about two-times speed-up by using α empirically. II. CONTINUATION WITH OVER-RELAXATION This section describes the rationale for the continuation sequence in []. Consider solving a simple quadratic problem: ˆx arg min x Ax, (6) using [, Eqn. 33] with h = and y =. If A A is positive definite (for this analysis only), then (6) has a unique solution ˆx =. Let VΛV be the eigenvalue decomposition of A A, where Λ diag{λ i < λ λ n = L A }. Updates generated by [, Eqn. 33] (with D A = L A I) simplify as follows: ( x (k+) = ρl A (ρ )g (k) +ρh (k)) g (k+) = ρ ( ρ+ αa Ax (k+) +( α)g (k)) + ρ+ g(k) h (k+) = α ( (7) L A x (k+) A Ax (k+)) +( α)h (k). Let x V x, ḡ V g, and h V h. The linear system (7) can be further diagonalized, and the ith components of x, ḡ, and h evolve as follows: x (k+) ( i = (k) ρl A (ρ )ḡ i +ρ h (k) ) i (8) and { ḡ (k+) i h (k+) i = ρ ( ρ+ αλi x (k+) i = α ( L A x (k+) i +( α)ḡ (k) ) + λ i x (k+) i i ρ+ḡ(k) i ) +( α) h(k) i. (9)

17 8 Plugging (8) into (9) leads to a second-order recursion (of ḡ i and h i ) with a transition matrix [ ] αρλi T i ρ+ ρl A (ρ )+ ( α)ρ+ αρλ i ρ+ ρ+ ρl A ρ, (6) α(l A λ i ) ρl A (ρ ) α(l A λ i ) ρl A ρ+( α) and x (k+) i is just a linear combination of ḡ (k) (k) i and h i. The eigenvalues of the transition matrix T i defined in (6) determine the convergence rate of the second-order recursion, and we can analyze the second-order recursive system by studying its characteristic polynomial: ri ([T i] +[T i ] )r i +([T i ] [T i ] [T i ] [T i ] ). (6) The proposed α-dependent continuation sequence is based on the critical value ρ c and the damping frequency ω (as ρ ) of the eigencomponent corresponding to the smallest eigenvalue λ []. The critical value ρ c solves and the damping frequency ω satisfies [, p. 8] ([T ] +[T ] ) 4([T ] [T ] [T ] [T ] ) =, (6) cosω = [T ] +[T ] 4([T ] [T ] [T ] [T ] ). (63) We solve (6) and (63) using MATLAB s symbolic toolbox. For (6), we found that ρ c = λ ( ) L A λ L A (64) is independent of α. Hence, the optimal AL penalty parameter ρ ρ c depends only on the geometry of A A and does not change for different values of the relaxation parameter α. For (63), we found that cosω α λ L A (6) (α α ) λ L A for ρ. When α =, cosω λ /L A, and thus ω λ /L A due to the small angle approximation: cos θ θ/ θ. (66) When α, cosω λ /L A, and ω λ /L A also due to (66). For general < α <, we can approximate cosω in (6) using a Taylor series as ( ( cosω α λ L A )(+ α α ) ) λ L A + [higher-order terms] = α λ L A + [higher-order terms]. (67) We ignore higher-order terms in (67) since λ /L A is usually very small in practice. Hence, cosω ( α ) /, λ /L A and ω α λ /L A due to the small angle approximation (66). This expression covers both the previous unrelaxed (α = ) and proposed relaxed (α ) cases. Suppose we use the same restart condition as in []; that is, restarts occur about every (/)(π/ω ) iterations. If we restart at the kth iteration, we have the approximation λ /L A π/(αk), and the ideal AL penalty parameter at the kth iteration is ( π ) ( ( αk π ) ) αk = π αk ( π αk). (68) That is, the values of ρ k (α) are scaled by the value of α. To demonstrate the speed-up resulting from combining continuation with over-relaxation, Figure shows the convergence rate curves of the proposed relaxed OS-LALM ( subsets) using different values of the over-relaxation parameter α when reconstructing the simulated XCAT dataset. For comparison, the convergence rate curves that do not use continuation (fixed AL penalty parameter ρ =.) are also shown. As seen in Figure (b), the RMS difference of the green curve (relaxed OS-LALM with α =.) after iterations is about the same as the RMS difference of the blue curve (unrelaxed OS-LALM) after iterations, exhibiting an approximately.-times speed-up. Using larger α (up to two) can further accelerate convergence; however, the speed-up can be slightly slower than α-times due to the dominance of the constant B in [, Theorem ] and the accumulation of gradient errors with ordered subsets. For instance, the RMS difference of the red curve (relaxed OS-LALM with α =.999) after iterations is a bit larger the RMS difference of the blue curve (unrelaxed OS-LALM) after iterations.

18 9 4 3 OS LALM (α = ) Relaxed OS LALM, α =. Relaxed OS LALM, α = OS LALM (α = ) Relaxed OS LALM, α =. Relaxed OS LALM, α =.999 HU HU HU (M = ) (a) Fixed ρ =. HU (M = ) (b) Proposed decreasing ρ k Fig. : XCAT: Convergence rate curves of relaxed OS-LALM ( subsets) using different values of the over-relaxation parameter α with (a) fixed AL penalty parameter ρ =. and (b) proposed decreasing ρ k. A. XCAT phantom III. ADDITIONAL EXPERIMENTAL RESULTS Additional experimental results of the simulated XCAT phantom dataset shown in [] are reported here. Figure 3 shows the difference images (in the central transaxial plane) of FBP (i.e., x () x ) and OS algorithms with subsets after iterations (i.e., x () x ). As seen in Figure 3, low-frequency components converge faster than high/mid-frequency components like streaks and edges with all algorithms. This is common for gradient-based algorithms when the Hessian matrix of the cost function is more low-pass/band-cut like in X-ray CT. The difference image of the proposed relaxed OS-LALM shows less edge structures and looks more uniform in flat regions. Figure 4 shows the difference images after iterations. We can see that the proposed relaxed OS-LALM shows very uniform difference images, while the subtle noise-like artifacts remain with OS-OGM. To demonstrate the improvement of our modified relaxed LALM (i.e., with ordered subsets and continuation) for X-ray CT image reconstruction problems, Figure shows convergence rate curves of unrelaxed/relaxed OS-LALM using different parameter settings with (a) one subset and (b) subsets. All algorithms run 36 subiterations; however, those with OS should be faster in runtime because they perform fewer forward/back-projections. As seen in Figure, convergence rate curves of OS algorithms are scaled almost perfectly (in the horizontal axis) when using modest number of subsets (M = ). However, the scalability might be worse when using more subsets (more severe gradient error accumulation) or in other dataset. Moreover, solid lines (relaxed algorithms) always show about two-times faster convergence rate than dashed lines (unrelaxed algorithms), without and with continuation. Note that the solid blue line (relaxed LALM, ρ = /6) and the dashed green line (unrelaxed LALM, ρ = /) in both cases are overlapped after 6 subiterations, implying that halving the AL penalty parameter ρ and setting relaxation parameter α to be close to two have similar effect on convergence speed in this CT problem (where the data fidelity term dominates the cost function). Note that when the data-fidelity term dominates the cost function, the constant B dominates the constant multiplying /K in [, Theorem ], leading to the better speed-up with α. We also investigated the effect of majorization (for both the data-fidelity term and the regularizer term) on convergence speed. Figure 6 shows the convergence rate curves of the proposed relaxed OS-LALM with different (a) data-fidelity term majorizations and (b) regularization term majorizations. As seen in Figure 6(a), the proposed algorithm diverges when D L is too small, violating the majorization condition. Larger D L slows down the algorithm. However, multiplying D L by κ-times does not necessarily slow down the algorithm by κ-times since the weighting matrix of B α,ρ,dl is D L A WA. Besides, larger D L helps reduce the gradient error accumulation in fast algorithms []. Figure 6(b) shows the convergence rate curves of the proposed relaxed OS-LALM with regularizer majorization using the maximum curvature and Huber s curvature, respectively. We can see that the speed-up of using Huber s curvature is very significant. Note that ρd L +D R determines the step sizes of the image update of the proposed relaxed OS-LALM. Better majorization of R (i.e., smaller [D R ] i for those voxels that are still far from the optimum) leads to larger image update step sizes, especially when ρ is small. B. Chest scan Additional experimental results of the chest scan dataset shown in [] are reported here. Figure 7 shows convergence rate curves of different relaxed algorithms ( subsets and α =.999) with (a) a fixed AL penalty parameter ρ =. and (b) the

19 FBP OS SQS OS FGM OS LALM OS OGM Relaxed OS LALM Fig. 3: XCAT: Cropped difference images (displayed from to HU) from the central transaxial plane of the initial FBP image x () x and the reconstructed image x () x using OS algorithms with subsets after iterations. FBP OS SQS OS FGM OS LALM OS OGM Relaxed OS LALM Fig. 4: XCAT: Cropped difference images (displayed from to HU) from the central transaxial plane of the initial FBP image x () x and the reconstructed image x () x using OS algorithms with subsets after iterations. decreasing sequence ρ k proposed in []. Like the experimental results with the simulated CT scan shown in [], the simple relaxation does not provide much acceleration with a fixed AL penalty parameter, but it works somewhat better when using the decreasing ρ k. Figure 8 and Figure 9 show the difference images (in the central transaxial plane) of FBP and OS algorithms with subsets after and iterations, respectively. Difference images of the proposed relaxed OS-LALM show the fewest structured artifacts among all algorithms for comparison. C. Shoulder scan We reconstructed a 9 image volume, where x = y =.369 mm and z =.6 mm, from a shoulder region helical CT scan. The size of sinogram is (pitch =., about7.3 rotations with rotation time.8 seconds). The tube current and tube voltage of the X-ray source are 8 ma and kvp, respectively. The initial FBP image x () has lots of streak artifacts due to low signal-to-noise ratio (SNR), and we tuned the statistical weights and regularization parameters using [, 3] to emulate [4, ]. We used subsets for the relaxed OS-LALM, while [, Eqn. 7] suggests using about subsets for the unrelaxed OS-LALM. Figure shows the cropped images from the central transaxial plane of the initial FBP image x (), the reference reconstruction x, and the reconstructed image x () using the proposed algorithm (relaxed OS-LALM with subsets) after iterations. Figure shows the RMS differences between the reference reconstruction x

20 4 3 SQS LALM, ρ = /6 Relaxed LALM, ρ = /6 LALM, ρ = / Relaxed LALM, ρ = / LALM Relaxed LALM 4 3 OS SQS OS LALM, ρ = /6 Relaxed OS LALM, ρ = /6 OS LALM, ρ = / Relaxed OS LALM, ρ = / OS LALM Relaxed OS LALM HU HU HU (a) One subset HU (b) subsets Fig. : XCAT: Convergence rate curves of unrelaxed/relaxed OS-LALM using different parameter settings with (a) one subset and (b) subsets. All algorithms run 36 subiterations. 4 3 D(loss) =.6 diag{a WA} D(loss) =.7 diag{a WA} D(loss) =.8 diag{a WA} D(loss) =. diag{a WA} D(loss) =. diag{a WA} D(loss) =. diag{a WA} 4 3 D(reg.) with maximum curvature D(reg.) with Huber s curvature HU HU HU (a) Data-fidelity term majorization HU (b) Regularization term majorization Fig. 6: XCAT: Convergence rate curves of the proposed relaxed OS-LALM with different (a) data-fidelity term majorizations and (b) regularization term majorizations. and the reconstructed image x (k) using different OS algorithms as a function of iteration with and subsets. As seen in Figure, the proposed relaxed OS-LALM shows faster convergence rate with moderate number of subsets, but the speed-up diminishes as the iterate approaches the solution. Figure and Figure 3 show the difference images (in the central transaxial plane) of FBP and OS algorithms with subsets after and iterations, respectively. The proposed relaxed OS-LALM removes more streak artifacts than other OS algorithms.

21 3 OS SQS OS LALM Relaxed OS LALM (Algorithm ) Relaxed OS LALM (Algorithm ) 3 OS SQS OS LALM Relaxed OS LALM (Algorithm ) Relaxed OS LALM (Algorithm ) HU HU HU HU (a) Fixed ρ =. (b) Decreasing ρ k proposed in [] Fig. 7: Chest: Convergence rate curves of different relaxed algorithms ( subsets and α =.999) with (a) a fixed AL penalty parameter ρ =. and (b) the decreasing sequence ρ k proposed in []. FBP OS SQS OS FGM OS LALM OS OGM Relaxed OS LALM Fig. 8: Chest: Cropped difference images (displayed from to HU) from the central transaxial plane of the initial FBP image x () x and the reconstructed image x () x using OS algorithms with subsets after iterations.

22 3 FBP OS SQS OS FGM OS LALM OS OGM Relaxed OS LALM Fig. 9: Chest: Cropped difference images (displayed from to HU) from the central transaxial plane of the initial FBP image x () x and the reconstructed image x () x using OS algorithms with subsets after iterations. FBP Converged Proposed Fig. : Shoulder: Cropped images (displayed from 8 to HU) from the central transaxial plane of the initial FBP image x () (left), the reference reconstruction x (center), and the reconstructed image x () using the proposed algorithm (relaxed OS-LALM with subsets) after iterations (right). 9 8 OS SQS OS FGM OS LALM OS OGM Relaxed OS LALM OS SQS OS FGM OS LALM OS OGM Relaxed OS LALM HU HU HU (a) subsets HU (b) subsets Fig. : Shoulder: Convergence rate curves of different OS algorithms with (a) subsets and (b) subsets. The proposed relaxed OS-LALM with subsets exhibits similar convergence rate as the unrelaxed OS-LALM with subsets.

23 4 FBP OS SQS OS FGM OS LALM OS OGM Relaxed OS LALM Fig. : Shoulder: Cropped difference images (displayed from to HU) from the central transaxial plane of the initial FBP image x () x and the reconstructed image x () x using OS algorithms with subsets after iterations. FBP OS SQS OS FGM OS LALM OS OGM Relaxed OS LALM Fig. 3: Shoulder: Cropped difference images (displayed from to HU) from the central transaxial plane of the initial FBP image x () x and the reconstructed image x () x using OS algorithms with subsets after iterations.

Relaxed linearized algorithms for faster X-ray CT image reconstruction

Relaxed linearized algorithms for faster X-ray CT image reconstruction Relaxed linearized algorithms for faster X-ray CT image reconstruction Hung Nien and Jeffrey A. Fessler University of Michigan, Ann Arbor The 13th Fully 3D Meeting June 2, 2015 1/20 Statistical image reconstruction

More information

Optimized first-order minimization methods

Optimized first-order minimization methods Optimized first-order minimization methods Donghwan Kim & Jeffrey A. Fessler EECS Dept., BME Dept., Dept. of Radiology University of Michigan web.eecs.umich.edu/~fessler UM AIM Seminar 2014-10-03 1 Disclosure

More information

Accelerated Dual Gradient-Based Methods for Total Variation Image Denoising/Deblurring Problems (and other Inverse Problems)

Accelerated Dual Gradient-Based Methods for Total Variation Image Denoising/Deblurring Problems (and other Inverse Problems) Accelerated Dual Gradient-Based Methods for Total Variation Image Denoising/Deblurring Problems (and other Inverse Problems) Donghwan Kim and Jeffrey A. Fessler EECS Department, University of Michigan

More information

A Study of Numerical Algorithms for Regularized Poisson ML Image Reconstruction

A Study of Numerical Algorithms for Regularized Poisson ML Image Reconstruction A Study of Numerical Algorithms for Regularized Poisson ML Image Reconstruction Yao Xie Project Report for EE 391 Stanford University, Summer 2006-07 September 1, 2007 Abstract In this report we solved

More information

Uses of duality. Geoff Gordon & Ryan Tibshirani Optimization /

Uses of duality. Geoff Gordon & Ryan Tibshirani Optimization / Uses of duality Geoff Gordon & Ryan Tibshirani Optimization 10-725 / 36-725 1 Remember conjugate functions Given f : R n R, the function is called its conjugate f (y) = max x R n yt x f(x) Conjugates appear

More information

Accelerated primal-dual methods for linearly constrained convex problems

Accelerated primal-dual methods for linearly constrained convex problems Accelerated primal-dual methods for linearly constrained convex problems Yangyang Xu SIAM Conference on Optimization May 24, 2017 1 / 23 Accelerated proximal gradient For convex composite problem: minimize

More information

The Alternating Direction Method of Multipliers

The Alternating Direction Method of Multipliers The Alternating Direction Method of Multipliers With Adaptive Step Size Selection Peter Sutor, Jr. Project Advisor: Professor Tom Goldstein December 2, 2015 1 / 25 Background The Dual Problem Consider

More information

Dual methods and ADMM. Barnabas Poczos & Ryan Tibshirani Convex Optimization /36-725

Dual methods and ADMM. Barnabas Poczos & Ryan Tibshirani Convex Optimization /36-725 Dual methods and ADMM Barnabas Poczos & Ryan Tibshirani Convex Optimization 10-725/36-725 1 Given f : R n R, the function is called its conjugate Recall conjugate functions f (y) = max x R n yt x f(x)

More information

A Unified Approach to Proximal Algorithms using Bregman Distance

A Unified Approach to Proximal Algorithms using Bregman Distance A Unified Approach to Proximal Algorithms using Bregman Distance Yi Zhou a,, Yingbin Liang a, Lixin Shen b a Department of Electrical Engineering and Computer Science, Syracuse University b Department

More information

Math 273a: Optimization Overview of First-Order Optimization Algorithms

Math 273a: Optimization Overview of First-Order Optimization Algorithms Math 273a: Optimization Overview of First-Order Optimization Algorithms Wotao Yin Department of Mathematics, UCLA online discussions on piazza.com 1 / 9 Typical flow of numerical optimization Optimization

More information

Scan Time Optimization for Post-injection PET Scans

Scan Time Optimization for Post-injection PET Scans Presented at 998 IEEE Nuc. Sci. Symp. Med. Im. Conf. Scan Time Optimization for Post-injection PET Scans Hakan Erdoğan Jeffrey A. Fessler 445 EECS Bldg., University of Michigan, Ann Arbor, MI 4809-222

More information

Dual and primal-dual methods

Dual and primal-dual methods ELE 538B: Large-Scale Optimization for Data Science Dual and primal-dual methods Yuxin Chen Princeton University, Spring 2018 Outline Dual proximal gradient method Primal-dual proximal gradient method

More information

STATISTICAL methods for image reconstruction has been

STATISTICAL methods for image reconstruction has been 1 arxiv:140.4381v1 [math.oc] 18 Feb 014 Fast X-Ray CT Image Reconstruction Using the Linearized Augmented Lagrangian Method with Ordered Subsets Hung Nien, Student Member, IEEE, Jeffrey A. Fessler, Fellow,

More information

Contraction Methods for Convex Optimization and monotone variational inequalities No.12

Contraction Methods for Convex Optimization and monotone variational inequalities No.12 XII - 1 Contraction Methods for Convex Optimization and monotone variational inequalities No.12 Linearized alternating direction methods of multipliers for separable convex programming Bingsheng He Department

More information

Optimization methods

Optimization methods Lecture notes 3 February 8, 016 1 Introduction Optimization methods In these notes we provide an overview of a selection of optimization methods. We focus on methods which rely on first-order information,

More information

Coordinate Update Algorithm Short Course Operator Splitting

Coordinate Update Algorithm Short Course Operator Splitting Coordinate Update Algorithm Short Course Operator Splitting Instructor: Wotao Yin (UCLA Math) Summer 2016 1 / 25 Operator splitting pipeline 1. Formulate a problem as 0 A(x) + B(x) with monotone operators

More information

Sparse Gaussian conditional random fields

Sparse Gaussian conditional random fields Sparse Gaussian conditional random fields Matt Wytock, J. ico Kolter School of Computer Science Carnegie Mellon University Pittsburgh, PA 53 {mwytock, zkolter}@cs.cmu.edu Abstract We propose sparse Gaussian

More information

Contraction Methods for Convex Optimization and Monotone Variational Inequalities No.16

Contraction Methods for Convex Optimization and Monotone Variational Inequalities No.16 XVI - 1 Contraction Methods for Convex Optimization and Monotone Variational Inequalities No.16 A slightly changed ADMM for convex optimization with three separable operators Bingsheng He Department of

More information

A Tutorial on Primal-Dual Algorithm

A Tutorial on Primal-Dual Algorithm A Tutorial on Primal-Dual Algorithm Shenlong Wang University of Toronto March 31, 2016 1 / 34 Energy minimization MAP Inference for MRFs Typical energies consist of a regularization term and a data term.

More information

Adaptive Restarting for First Order Optimization Methods

Adaptive Restarting for First Order Optimization Methods Adaptive Restarting for First Order Optimization Methods Nesterov method for smooth convex optimization adpative restarting schemes step-size insensitivity extension to non-smooth optimization continuation

More information

Two-Material Decomposition From a Single CT Scan Using Statistical Image Reconstruction

Two-Material Decomposition From a Single CT Scan Using Statistical Image Reconstruction / 5 Two-Material Decomposition From a Single CT Scan Using Statistical Image Reconstruction Yong Long and Jeffrey A. Fessler EECS Department James M. Balter Radiation Oncology Department The University

More information

Chapter 8 Optimization basics

Chapter 8 Optimization basics Chapter 8 Optimization basics Contents (class version) 8.0 Introduction........................................ 8.2 8.1 Preconditioned gradient descent (PGD) for LS..................... 8.3 Tool: Matrix

More information

Solving DC Programs that Promote Group 1-Sparsity

Solving DC Programs that Promote Group 1-Sparsity Solving DC Programs that Promote Group 1-Sparsity Ernie Esser Contains joint work with Xiaoqun Zhang, Yifei Lou and Jack Xin SIAM Conference on Imaging Science Hong Kong Baptist University May 14 2014

More information

Inexact Alternating Direction Method of Multipliers for Separable Convex Optimization

Inexact Alternating Direction Method of Multipliers for Separable Convex Optimization Inexact Alternating Direction Method of Multipliers for Separable Convex Optimization Hongchao Zhang hozhang@math.lsu.edu Department of Mathematics Center for Computation and Technology Louisiana State

More information

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference ECE 18-898G: Special Topics in Signal Processing: Sparsity, Structure, and Inference Sparse Recovery using L1 minimization - algorithms Yuejie Chi Department of Electrical and Computer Engineering Spring

More information

Big Data Analytics: Optimization and Randomization

Big Data Analytics: Optimization and Randomization Big Data Analytics: Optimization and Randomization Tianbao Yang Tutorial@ACML 2015 Hong Kong Department of Computer Science, The University of Iowa, IA, USA Nov. 20, 2015 Yang Tutorial for ACML 15 Nov.

More information

NOTES ON FIRST-ORDER METHODS FOR MINIMIZING SMOOTH FUNCTIONS. 1. Introduction. We consider first-order methods for smooth, unconstrained

NOTES ON FIRST-ORDER METHODS FOR MINIMIZING SMOOTH FUNCTIONS. 1. Introduction. We consider first-order methods for smooth, unconstrained NOTES ON FIRST-ORDER METHODS FOR MINIMIZING SMOOTH FUNCTIONS 1. Introduction. We consider first-order methods for smooth, unconstrained optimization: (1.1) minimize f(x), x R n where f : R n R. We assume

More information

A SIMPLE PARALLEL ALGORITHM WITH AN O(1/T ) CONVERGENCE RATE FOR GENERAL CONVEX PROGRAMS

A SIMPLE PARALLEL ALGORITHM WITH AN O(1/T ) CONVERGENCE RATE FOR GENERAL CONVEX PROGRAMS A SIMPLE PARALLEL ALGORITHM WITH AN O(/T ) CONVERGENCE RATE FOR GENERAL CONVEX PROGRAMS HAO YU AND MICHAEL J. NEELY Abstract. This paper considers convex programs with a general (possibly non-differentiable)

More information

Optimization methods

Optimization methods Optimization methods Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda /8/016 Introduction Aim: Overview of optimization methods that Tend to

More information

Analytical Approach to Regularization Design for Isotropic Spatial Resolution

Analytical Approach to Regularization Design for Isotropic Spatial Resolution Analytical Approach to Regularization Design for Isotropic Spatial Resolution Jeffrey A Fessler, Senior Member, IEEE Abstract In emission tomography, conventional quadratic regularization methods lead

More information

EE 367 / CS 448I Computational Imaging and Display Notes: Image Deconvolution (lecture 6)

EE 367 / CS 448I Computational Imaging and Display Notes: Image Deconvolution (lecture 6) EE 367 / CS 448I Computational Imaging and Display Notes: Image Deconvolution (lecture 6) Gordon Wetzstein gordon.wetzstein@stanford.edu This document serves as a supplement to the material discussed in

More information

Support Vector Machines

Support Vector Machines Support Vector Machines Support vector machines (SVMs) are one of the central concepts in all of machine learning. They are simply a combination of two ideas: linear classification via maximum (or optimal

More information

arxiv: v3 [math.oc] 29 Jun 2016

arxiv: v3 [math.oc] 29 Jun 2016 MOCCA: mirrored convex/concave optimization for nonconvex composite functions Rina Foygel Barber and Emil Y. Sidky arxiv:50.0884v3 [math.oc] 9 Jun 06 04.3.6 Abstract Many optimization problems arising

More information

A Primal-dual Three-operator Splitting Scheme

A Primal-dual Three-operator Splitting Scheme Noname manuscript No. (will be inserted by the editor) A Primal-dual Three-operator Splitting Scheme Ming Yan Received: date / Accepted: date Abstract In this paper, we propose a new primal-dual algorithm

More information

Sparse Covariance Selection using Semidefinite Programming

Sparse Covariance Selection using Semidefinite Programming Sparse Covariance Selection using Semidefinite Programming A. d Aspremont ORFE, Princeton University Joint work with O. Banerjee, L. El Ghaoui & G. Natsoulis, U.C. Berkeley & Iconix Pharmaceuticals Support

More information

MAP Reconstruction From Spatially Correlated PET Data

MAP Reconstruction From Spatially Correlated PET Data IEEE TRANSACTIONS ON NUCLEAR SCIENCE, VOL 50, NO 5, OCTOBER 2003 1445 MAP Reconstruction From Spatially Correlated PET Data Adam Alessio, Student Member, IEEE, Ken Sauer, Member, IEEE, and Charles A Bouman,

More information

Sparse Optimization Lecture: Dual Methods, Part I

Sparse Optimization Lecture: Dual Methods, Part I Sparse Optimization Lecture: Dual Methods, Part I Instructor: Wotao Yin July 2013 online discussions on piazza.com Those who complete this lecture will know dual (sub)gradient iteration augmented l 1 iteration

More information

Adaptive Primal Dual Optimization for Image Processing and Learning

Adaptive Primal Dual Optimization for Image Processing and Learning Adaptive Primal Dual Optimization for Image Processing and Learning Tom Goldstein Rice University tag7@rice.edu Ernie Esser University of British Columbia eesser@eos.ubc.ca Richard Baraniuk Rice University

More information

On convergence rate of the Douglas-Rachford operator splitting method

On convergence rate of the Douglas-Rachford operator splitting method On convergence rate of the Douglas-Rachford operator splitting method Bingsheng He and Xiaoming Yuan 2 Abstract. This note provides a simple proof on a O(/k) convergence rate for the Douglas- Rachford

More information

Coordinate Update Algorithm Short Course Proximal Operators and Algorithms

Coordinate Update Algorithm Short Course Proximal Operators and Algorithms Coordinate Update Algorithm Short Course Proximal Operators and Algorithms Instructor: Wotao Yin (UCLA Math) Summer 2016 1 / 36 Why proximal? Newton s method: for C 2 -smooth, unconstrained problems allow

More information

A Multilevel Proximal Algorithm for Large Scale Composite Convex Optimization

A Multilevel Proximal Algorithm for Large Scale Composite Convex Optimization A Multilevel Proximal Algorithm for Large Scale Composite Convex Optimization Panos Parpas Department of Computing Imperial College London www.doc.ic.ac.uk/ pp500 p.parpas@imperial.ac.uk jointly with D.V.

More information

ACCELERATED FIRST-ORDER PRIMAL-DUAL PROXIMAL METHODS FOR LINEARLY CONSTRAINED COMPOSITE CONVEX PROGRAMMING

ACCELERATED FIRST-ORDER PRIMAL-DUAL PROXIMAL METHODS FOR LINEARLY CONSTRAINED COMPOSITE CONVEX PROGRAMMING ACCELERATED FIRST-ORDER PRIMAL-DUAL PROXIMAL METHODS FOR LINEARLY CONSTRAINED COMPOSITE CONVEX PROGRAMMING YANGYANG XU Abstract. Motivated by big data applications, first-order methods have been extremely

More information

Master 2 MathBigData. 3 novembre CMAP - Ecole Polytechnique

Master 2 MathBigData. 3 novembre CMAP - Ecole Polytechnique Master 2 MathBigData S. Gaïffas 1 3 novembre 2014 1 CMAP - Ecole Polytechnique 1 Supervised learning recap Introduction Loss functions, linearity 2 Penalization Introduction Ridge Sparsity Lasso 3 Some

More information

Lasso: Algorithms and Extensions

Lasso: Algorithms and Extensions ELE 538B: Sparsity, Structure and Inference Lasso: Algorithms and Extensions Yuxin Chen Princeton University, Spring 2017 Outline Proximal operators Proximal gradient methods for lasso and its extensions

More information

Conditional Gradient (Frank-Wolfe) Method

Conditional Gradient (Frank-Wolfe) Method Conditional Gradient (Frank-Wolfe) Method Lecturer: Aarti Singh Co-instructor: Pradeep Ravikumar Convex Optimization 10-725/36-725 1 Outline Today: Conditional gradient method Convergence analysis Properties

More information

Shiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 9. Alternating Direction Method of Multipliers

Shiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 9. Alternating Direction Method of Multipliers Shiqian Ma, MAT-258A: Numerical Optimization 1 Chapter 9 Alternating Direction Method of Multipliers Shiqian Ma, MAT-258A: Numerical Optimization 2 Separable convex optimization a special case is min f(x)

More information

ALADIN An Algorithm for Distributed Non-Convex Optimization and Control

ALADIN An Algorithm for Distributed Non-Convex Optimization and Control ALADIN An Algorithm for Distributed Non-Convex Optimization and Control Boris Houska, Yuning Jiang, Janick Frasch, Rien Quirynen, Dimitris Kouzoupis, Moritz Diehl ShanghaiTech University, University of

More information

Objective Functions for Tomographic Reconstruction from. Randoms-Precorrected PET Scans. gram separately, this process doubles the storage space for

Objective Functions for Tomographic Reconstruction from. Randoms-Precorrected PET Scans. gram separately, this process doubles the storage space for Objective Functions for Tomographic Reconstruction from Randoms-Precorrected PET Scans Mehmet Yavuz and Jerey A. Fessler Dept. of EECS, University of Michigan Abstract In PET, usually the data are precorrected

More information

Coordinate Descent and Ascent Methods

Coordinate Descent and Ascent Methods Coordinate Descent and Ascent Methods Julie Nutini Machine Learning Reading Group November 3 rd, 2015 1 / 22 Projected-Gradient Methods Motivation Rewrite non-smooth problem as smooth constrained problem:

More information

Frank-Wolfe Method. Ryan Tibshirani Convex Optimization

Frank-Wolfe Method. Ryan Tibshirani Convex Optimization Frank-Wolfe Method Ryan Tibshirani Convex Optimization 10-725 Last time: ADMM For the problem min x,z f(x) + g(z) subject to Ax + Bz = c we form augmented Lagrangian (scaled form): L ρ (x, z, w) = f(x)

More information

ADMM and Fast Gradient Methods for Distributed Optimization

ADMM and Fast Gradient Methods for Distributed Optimization ADMM and Fast Gradient Methods for Distributed Optimization João Xavier Instituto Sistemas e Robótica (ISR), Instituto Superior Técnico (IST) European Control Conference, ECC 13 July 16, 013 Joint work

More information

2 Regularized Image Reconstruction for Compressive Imaging and Beyond

2 Regularized Image Reconstruction for Compressive Imaging and Beyond EE 367 / CS 448I Computational Imaging and Display Notes: Compressive Imaging and Regularized Image Reconstruction (lecture ) Gordon Wetzstein gordon.wetzstein@stanford.edu This document serves as a supplement

More information

arxiv: v4 [math.oc] 1 Apr 2018

arxiv: v4 [math.oc] 1 Apr 2018 Generalizing the optimized gradient method for smooth convex minimization Donghwan Kim Jeffrey A. Fessler arxiv:607.06764v4 [math.oc] Apr 08 Date of current version: April 3, 08 Abstract This paper generalizes

More information

Optimizing Nonconvex Finite Sums by a Proximal Primal-Dual Method

Optimizing Nonconvex Finite Sums by a Proximal Primal-Dual Method Optimizing Nonconvex Finite Sums by a Proximal Primal-Dual Method Davood Hajinezhad Iowa State University Davood Hajinezhad Optimizing Nonconvex Finite Sums by a Proximal Primal-Dual Method 1 / 35 Co-Authors

More information

Proximal splitting methods on convex problems with a quadratic term: Relax!

Proximal splitting methods on convex problems with a quadratic term: Relax! Proximal splitting methods on convex problems with a quadratic term: Relax! The slides I presented with added comments Laurent Condat GIPSA-lab, Univ. Grenoble Alpes, France Workshop BASP Frontiers, Jan.

More information

An efficient ADMM algorithm for high dimensional precision matrix estimation via penalized quadratic loss

An efficient ADMM algorithm for high dimensional precision matrix estimation via penalized quadratic loss An efficient ADMM algorithm for high dimensional precision matrix estimation via penalized quadratic loss arxiv:1811.04545v1 [stat.co] 12 Nov 2018 Cheng Wang School of Mathematical Sciences, Shanghai Jiao

More information

Homework 4. Convex Optimization /36-725

Homework 4. Convex Optimization /36-725 Homework 4 Convex Optimization 10-725/36-725 Due Friday November 4 at 5:30pm submitted to Christoph Dann in Gates 8013 (Remember to a submit separate writeup for each problem, with your name at the top)

More information

Dual Ascent. Ryan Tibshirani Convex Optimization

Dual Ascent. Ryan Tibshirani Convex Optimization Dual Ascent Ryan Tibshirani Conve Optimization 10-725 Last time: coordinate descent Consider the problem min f() where f() = g() + n i=1 h i( i ), with g conve and differentiable and each h i conve. Coordinate

More information

Math 273a: Optimization Subgradient Methods

Math 273a: Optimization Subgradient Methods Math 273a: Optimization Subgradient Methods Instructor: Wotao Yin Department of Mathematics, UCLA Fall 2015 online discussions on piazza.com Nonsmooth convex function Recall: For ˉx R n, f(ˉx) := {g R

More information

MMSE Denoising of 2-D Signals Using Consistent Cycle Spinning Algorithm

MMSE Denoising of 2-D Signals Using Consistent Cycle Spinning Algorithm Denoising of 2-D Signals Using Consistent Cycle Spinning Algorithm Bodduluri Asha, B. Leela kumari Abstract: It is well known that in a real world signals do not exist without noise, which may be negligible

More information

On the Iteration Complexity of Some Projection Methods for Monotone Linear Variational Inequalities

On the Iteration Complexity of Some Projection Methods for Monotone Linear Variational Inequalities On the Iteration Complexity of Some Projection Methods for Monotone Linear Variational Inequalities Caihua Chen Xiaoling Fu Bingsheng He Xiaoming Yuan January 13, 2015 Abstract. Projection type methods

More information

Splitting methods for decomposing separable convex programs

Splitting methods for decomposing separable convex programs Splitting methods for decomposing separable convex programs Philippe Mahey LIMOS - ISIMA - Université Blaise Pascal PGMO, ENSTA 2013 October 4, 2013 1 / 30 Plan 1 Max Monotone Operators Proximal techniques

More information

Sparsity Regularization

Sparsity Regularization Sparsity Regularization Bangti Jin Course Inverse Problems & Imaging 1 / 41 Outline 1 Motivation: sparsity? 2 Mathematical preliminaries 3 l 1 solvers 2 / 41 problem setup finite-dimensional formulation

More information

A FISTA-like scheme to accelerate GISTA?

A FISTA-like scheme to accelerate GISTA? A FISTA-like scheme to accelerate GISTA? C. Cloquet 1, I. Loris 2, C. Verhoeven 2 and M. Defrise 1 1 Dept. of Nuclear Medicine, Vrije Universiteit Brussel 2 Dept. of Mathematics, Université Libre de Bruxelles

More information

This can be 2 lectures! still need: Examples: non-convex problems applications for matrix factorization

This can be 2 lectures! still need: Examples: non-convex problems applications for matrix factorization This can be 2 lectures! still need: Examples: non-convex problems applications for matrix factorization x = prox_f(x)+prox_{f^*}(x) use to get prox of norms! PROXIMAL METHODS WHY PROXIMAL METHODS Smooth

More information

Proximal-like contraction methods for monotone variational inequalities in a unified framework

Proximal-like contraction methods for monotone variational inequalities in a unified framework Proximal-like contraction methods for monotone variational inequalities in a unified framework Bingsheng He 1 Li-Zhi Liao 2 Xiang Wang Department of Mathematics, Nanjing University, Nanjing, 210093, China

More information

FAST ALTERNATING DIRECTION OPTIMIZATION METHODS

FAST ALTERNATING DIRECTION OPTIMIZATION METHODS FAST ALTERNATING DIRECTION OPTIMIZATION METHODS TOM GOLDSTEIN, BRENDAN O DONOGHUE, SIMON SETZER, AND RICHARD BARANIUK Abstract. Alternating direction methods are a common tool for general mathematical

More information

Lecture Note 5: Semidefinite Programming for Stability Analysis

Lecture Note 5: Semidefinite Programming for Stability Analysis ECE7850: Hybrid Systems:Theory and Applications Lecture Note 5: Semidefinite Programming for Stability Analysis Wei Zhang Assistant Professor Department of Electrical and Computer Engineering Ohio State

More information

Distributed Optimization via Alternating Direction Method of Multipliers

Distributed Optimization via Alternating Direction Method of Multipliers Distributed Optimization via Alternating Direction Method of Multipliers Stephen Boyd, Neal Parikh, Eric Chu, Borja Peleato Stanford University ITMANET, Stanford, January 2011 Outline precursors dual decomposition

More information

Variable Metric Forward-Backward Algorithm

Variable Metric Forward-Backward Algorithm Variable Metric Forward-Backward Algorithm 1/37 Variable Metric Forward-Backward Algorithm for minimizing the sum of a differentiable function and a convex function E. Chouzenoux in collaboration with

More information

1 Computing with constraints

1 Computing with constraints Notes for 2017-04-26 1 Computing with constraints Recall that our basic problem is minimize φ(x) s.t. x Ω where the feasible set Ω is defined by equality and inequality conditions Ω = {x R n : c i (x)

More information

Alternative Decompositions for Distributed Maximization of Network Utility: Framework and Applications

Alternative Decompositions for Distributed Maximization of Network Utility: Framework and Applications Alternative Decompositions for Distributed Maximization of Network Utility: Framework and Applications Daniel P. Palomar Hong Kong University of Science and Technology (HKUST) ELEC5470 - Convex Optimization

More information

Recent Developments of Alternating Direction Method of Multipliers with Multi-Block Variables

Recent Developments of Alternating Direction Method of Multipliers with Multi-Block Variables Recent Developments of Alternating Direction Method of Multipliers with Multi-Block Variables Department of Systems Engineering and Engineering Management The Chinese University of Hong Kong 2014 Workshop

More information

Introduction to Alternating Direction Method of Multipliers

Introduction to Alternating Direction Method of Multipliers Introduction to Alternating Direction Method of Multipliers Yale Chang Machine Learning Group Meeting September 29, 2016 Yale Chang (Machine Learning Group Meeting) Introduction to Alternating Direction

More information

Randomized Coordinate Descent with Arbitrary Sampling: Algorithms and Complexity

Randomized Coordinate Descent with Arbitrary Sampling: Algorithms and Complexity Randomized Coordinate Descent with Arbitrary Sampling: Algorithms and Complexity Zheng Qu University of Hong Kong CAM, 23-26 Aug 2016 Hong Kong based on joint work with Peter Richtarik and Dominique Cisba(University

More information

Motivation. Lecture 2 Topics from Optimization and Duality. network utility maximization (NUM) problem:

Motivation. Lecture 2 Topics from Optimization and Duality. network utility maximization (NUM) problem: CDS270 Maryam Fazel Lecture 2 Topics from Optimization and Duality Motivation network utility maximization (NUM) problem: consider a network with S sources (users), each sending one flow at rate x s, through

More information

Comparison of Modern Stochastic Optimization Algorithms

Comparison of Modern Stochastic Optimization Algorithms Comparison of Modern Stochastic Optimization Algorithms George Papamakarios December 214 Abstract Gradient-based optimization methods are popular in machine learning applications. In large-scale problems,

More information

Fast proximal gradient methods

Fast proximal gradient methods L. Vandenberghe EE236C (Spring 2013-14) Fast proximal gradient methods fast proximal gradient method (FISTA) FISTA with line search FISTA as descent method Nesterov s second method 1 Fast (proximal) gradient

More information

Algorithms for Nonsmooth Optimization

Algorithms for Nonsmooth Optimization Algorithms for Nonsmooth Optimization Frank E. Curtis, Lehigh University presented at Center for Optimization and Statistical Learning, Northwestern University 2 March 2018 Algorithms for Nonsmooth Optimization

More information

arxiv: v7 [math.oc] 22 Feb 2018

arxiv: v7 [math.oc] 22 Feb 2018 A SMOOTH PRIMAL-DUAL OPTIMIZATION FRAMEWORK FOR NONSMOOTH COMPOSITE CONVEX MINIMIZATION QUOC TRAN-DINH, OLIVIER FERCOQ, AND VOLKAN CEVHER arxiv:1507.06243v7 [math.oc] 22 Feb 2018 Abstract. We propose a

More information

SPARSE SIGNAL RESTORATION. 1. Introduction

SPARSE SIGNAL RESTORATION. 1. Introduction SPARSE SIGNAL RESTORATION IVAN W. SELESNICK 1. Introduction These notes describe an approach for the restoration of degraded signals using sparsity. This approach, which has become quite popular, is useful

More information

Machine Learning. Support Vector Machines. Manfred Huber

Machine Learning. Support Vector Machines. Manfred Huber Machine Learning Support Vector Machines Manfred Huber 2015 1 Support Vector Machines Both logistic regression and linear discriminant analysis learn a linear discriminant function to separate the data

More information

arxiv: v1 [math.oc] 23 May 2017

arxiv: v1 [math.oc] 23 May 2017 A DERANDOMIZED ALGORITHM FOR RP-ADMM WITH SYMMETRIC GAUSS-SEIDEL METHOD JINCHAO XU, KAILAI XU, AND YINYU YE arxiv:1705.08389v1 [math.oc] 23 May 2017 Abstract. For multi-block alternating direction method

More information

6. Proximal gradient method

6. Proximal gradient method L. Vandenberghe EE236C (Spring 2016) 6. Proximal gradient method motivation proximal mapping proximal gradient method with fixed step size proximal gradient method with line search 6-1 Proximal mapping

More information

About Split Proximal Algorithms for the Q-Lasso

About Split Proximal Algorithms for the Q-Lasso Thai Journal of Mathematics Volume 5 (207) Number : 7 http://thaijmath.in.cmu.ac.th ISSN 686-0209 About Split Proximal Algorithms for the Q-Lasso Abdellatif Moudafi Aix Marseille Université, CNRS-L.S.I.S

More information

Dual Proximal Gradient Method

Dual Proximal Gradient Method Dual Proximal Gradient Method http://bicmr.pku.edu.cn/~wenzw/opt-2016-fall.html Acknowledgement: this slides is based on Prof. Lieven Vandenberghes lecture notes Outline 2/19 1 proximal gradient method

More information

Dual Methods. Lecturer: Ryan Tibshirani Convex Optimization /36-725

Dual Methods. Lecturer: Ryan Tibshirani Convex Optimization /36-725 Dual Methods Lecturer: Ryan Tibshirani Conve Optimization 10-725/36-725 1 Last time: proimal Newton method Consider the problem min g() + h() where g, h are conve, g is twice differentiable, and h is simple.

More information

10-725/36-725: Convex Optimization Spring Lecture 21: April 6

10-725/36-725: Convex Optimization Spring Lecture 21: April 6 10-725/36-725: Conve Optimization Spring 2015 Lecturer: Ryan Tibshirani Lecture 21: April 6 Scribes: Chiqun Zhang, Hanqi Cheng, Waleed Ammar Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer:

More information

Regularizing inverse problems using sparsity-based signal models

Regularizing inverse problems using sparsity-based signal models Regularizing inverse problems using sparsity-based signal models Jeffrey A. Fessler William L. Root Professor of EECS EECS Dept., BME Dept., Dept. of Radiology University of Michigan http://web.eecs.umich.edu/

More information

Contraction Methods for Convex Optimization and Monotone Variational Inequalities No.18

Contraction Methods for Convex Optimization and Monotone Variational Inequalities No.18 XVIII - 1 Contraction Methods for Convex Optimization and Monotone Variational Inequalities No18 Linearized alternating direction method with Gaussian back substitution for separable convex optimization

More information

Convex Optimization. Dani Yogatama. School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA. February 12, 2014

Convex Optimization. Dani Yogatama. School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA. February 12, 2014 Convex Optimization Dani Yogatama School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA February 12, 2014 Dani Yogatama (Carnegie Mellon University) Convex Optimization February 12,

More information

Splitting with Near-Circulant Linear Systems: Applications to Total Variation CT and PET. Ernest K. Ryu S. Ko J.-H. Won

Splitting with Near-Circulant Linear Systems: Applications to Total Variation CT and PET. Ernest K. Ryu S. Ko J.-H. Won Splitting with Near-Circulant Linear Systems: Applications to Total Variation CT and PET Ernest K. Ryu S. Ko J.-H. Won Outline Motivation and algorithm Theory Applications Experiments Conclusion Motivation

More information

Optimization for Machine Learning

Optimization for Machine Learning Optimization for Machine Learning (Problems; Algorithms - A) SUVRIT SRA Massachusetts Institute of Technology PKU Summer School on Data Science (July 2017) Course materials http://suvrit.de/teaching.html

More information

Preconditioning via Diagonal Scaling

Preconditioning via Diagonal Scaling Preconditioning via Diagonal Scaling Reza Takapoui Hamid Javadi June 4, 2014 1 Introduction Interior point methods solve small to medium sized problems to high accuracy in a reasonable amount of time.

More information

A General Framework for a Class of Primal-Dual Algorithms for TV Minimization

A General Framework for a Class of Primal-Dual Algorithms for TV Minimization A General Framework for a Class of Primal-Dual Algorithms for TV Minimization Ernie Esser UCLA 1 Outline A Model Convex Minimization Problem Main Idea Behind the Primal Dual Hybrid Gradient (PDHG) Method

More information

3.10 Lagrangian relaxation

3.10 Lagrangian relaxation 3.10 Lagrangian relaxation Consider a generic ILP problem min {c t x : Ax b, Dx d, x Z n } with integer coefficients. Suppose Dx d are the complicating constraints. Often the linear relaxation and the

More information

A New Look at First Order Methods Lifting the Lipschitz Gradient Continuity Restriction

A New Look at First Order Methods Lifting the Lipschitz Gradient Continuity Restriction A New Look at First Order Methods Lifting the Lipschitz Gradient Continuity Restriction Marc Teboulle School of Mathematical Sciences Tel Aviv University Joint work with H. Bauschke and J. Bolte Optimization

More information

Alternating Direction Method of Multipliers. Ryan Tibshirani Convex Optimization

Alternating Direction Method of Multipliers. Ryan Tibshirani Convex Optimization Alternating Direction Method of Multipliers Ryan Tibshirani Convex Optimization 10-725 Consider the problem Last time: dual ascent min x f(x) subject to Ax = b where f is strictly convex and closed. Denote

More information

ECS289: Scalable Machine Learning

ECS289: Scalable Machine Learning ECS289: Scalable Machine Learning Cho-Jui Hsieh UC Davis Sept 29, 2016 Outline Convex vs Nonconvex Functions Coordinate Descent Gradient Descent Newton s method Stochastic Gradient Descent Numerical Optimization

More information

Reconstruction from Digital Holograms by Statistical Methods

Reconstruction from Digital Holograms by Statistical Methods Reconstruction from Digital Holograms by Statistical Methods Saowapak Sotthivirat Jeffrey A. Fessler EECS Department The University of Michigan 2003 Asilomar Nov. 12, 2003 Acknowledgements: Brian Athey,

More information