A FISTA-like scheme to accelerate GISTA? C. Cloquet 1, I. Loris 2, C. Verhoeven 2 and M. Defrise 1 1 Dept. of Nuclear Medicine, Vrije Universiteit Brussel 2 Dept. of Mathematics, Université Libre de Bruxelles MIMS seminar Manchester, January 11, 2013. ccloquet@vub.ac.be, igloris@ulb.ac.be, cverhoev@ulb.ac.be, mdefrise@vub.ac.be tiny.cc/cloquet
Cone Beam CT @ VUB [Philips Brightview XCT]
Cone Beam CT @ VUB [Philips Brightview XCT] Cone-Beam Compute newscenter.philips.com W. Scarfe et al., J Can Dent Assoc 2006, 72(1):75-80 Figure 1: X-ray beam projection scheme comparing a single detector array fan-beam CT (a) and cone-beam CT (b) geometry. F c c (l
Cone Beam CT @ VUB [Bruker Skyscan microct 1178]
Challenges CT 0.4 2.0 % of the cancers in the US caused by CT studies? [Brenner and Hall, NEJM, 2007] lower the dose, ie : do more with less
Challenges CT 0.4 2.0 % of the cancers in the US caused by CT studies? [Brenner and Hall, NEJM, 2007] Cone-beam specific lower the dose, ie : do more with less Cone-beam artifacts
Medical images are piecewise constant... ar.in.tum.de
... i.e. the gradient of medical images is sparse Sparsity brsoc.org.uk
... i.e. the gradient of medical images is sparse Sparsity Sparse gradient ( f f = x, f y, f z ) t i f (x i) 2 is small Few transitions sharp edges between flat regions [Defrise et al., 2011; Rudin et al., 1992; Sidky and Pan, 2008; Sidky et al., 2006] brsoc.org.uk
CT acquisition and reconstruction Attenuation of an X-ray beam I = I 0 exp y = def ( µ(s) = log ) µ(s)ds ( ) I0 I
CT acquisition and reconstruction Attenuation of an X-ray beam CT acquisition I = I 0 exp y = def ( µ(s) = log ) µ(s)ds ( ) I0 I image : data : x R M y R P J (P projections) projector : K : R M R P J : x {y p = K p x} p=1..p
CT acquisition and reconstruction Attenuation of an X-ray beam CT acquisition I = I 0 exp y = def ( µ(s) = log ) µ(s)ds ( ) I0 I image : data : x R M y R P J (P projections) projector : K : R M R P J : x {y p = K p x} p=1..p Cost function Φ(x, y) = G(x, y) + λ H(A x)
CT acquisition and reconstruction Attenuation of an X-ray beam CT acquisition I = I 0 exp y = def ( µ(s) = log ) µ(s)ds ( ) I0 I image : data : x R M y R P J (P projections) projector : K : R M R P J : x {y p = K p x} p=1..p Cost function Reconstruction Φ(x, y) = G(x, y) + λ H(A x) x = arg min x Φ(x, y)
Cost function Φ(x, y) = G(x, y) + λ H(A x)
Cost function Φ(x, y) = G(x, y) + λ H(A x) Data term : G convex + smooth G(x, y) = 1 K x y 2 2
Cost function Φ(x, y) = G(x, y) + λ H(A x) Data term : G convex + smooth G(x, y) = 1 K x y 2 2 Penalty term : H convex + non smooth with A : any linear operator
Cost function Φ(x, y) = G(x, y) + λ H(A x) Data term : G convex + smooth G(x, y) = 1 K x y 2 2 Penalty term : H convex + non smooth with A : any linear operator Total variation penalty [isotropic] x i x i [100] (A x) i = ( x) i = x i x i [010] x i x i [001] H(z) = z 1
How to solve?
Simultaneous Algebraic Reconstruction Technique [Andersen and Kak, 1984] Initialization x 0 : arbitrary image R M 0 < τ p < 2/ K p K T p.
Simultaneous Algebraic Reconstruction Technique [Andersen and Kak, 1984] Initialization x 0 : arbitrary image R M 0 < τ p < 2/ K p K T p. Iteration : x n+1 = I SART (x n ) x (n,0) = x n for p=0... P-1: x (n,p+1) = x (n,p) + τ p K T p ( yp K p x (n,p)) x n+1 = x (n,p)
Simultaneous Algebraic Reconstruction Technique [Andersen and Kak, 1984] Initialization x 0 : arbitrary image R M 0 < τ p < 2/ K p K T p. Iteration : x n+1 = I SART (x n ) x (n,0) = x n for p=0... P-1: x (n,p+1) = x (n,p) + τ p K T p ( yp K p x (n,p)) x n+1 = x (n,p) No account for any penalty
Current algorithms suited for TV us.123rf.com
Current algorithms suited for TV Algorithms PICCS, ASD-POCS : alternates the minimization of data and of TV { } ISTA : x k = arg min x H(x) + 1 x (x 2t k 1 t k G(x k 1 )) 2 k SART-TV : uses a surrogate and a differentiable TV
Current algorithms suited for TV Algorithms PICCS, ASD-POCS : alternates the minimization of data and of TV { } ISTA : x k = arg min x H(x) + 1 x (x 2t k 1 t k G(x k 1 )) 2 k SART-TV : uses a surrogate and a differentiable TV [PICCS : [Chen et al., 2008], ASD-POCS : [Ramani and Fessler, 2012; Sidky and Pan, 2008], ISTA : [Beck and Teboulle, 2009b; Daubechies et al., 2004], SART-TV : [Defrise et al., 2011]] Drawbacks need of imbricated iterations when using TV (all)
Current algorithms suited for TV Algorithms PICCS, ASD-POCS : alternates the minimization of data and of TV { } ISTA : x k = arg min x H(x) + 1 x (x 2t k 1 t k G(x k 1 )) 2 k SART-TV : uses a surrogate and a differentiable TV [PICCS : [Chen et al., 2008], ASD-POCS : [Ramani and Fessler, 2012; Sidky and Pan, 2008], ISTA : [Beck and Teboulle, 2009b; Daubechies et al., 2004], SART-TV : [Defrise et al., 2011]] Drawbacks need of imbricated iterations when using TV (all) need of a differentiable penalty (SART-TV)
Current algorithms suited for TV Algorithms PICCS, ASD-POCS : alternates the minimization of data and of TV { } ISTA : x k = arg min x H(x) + 1 x (x 2t k 1 t k G(x k 1 )) 2 k SART-TV : uses a surrogate and a differentiable TV [PICCS : [Chen et al., 2008], ASD-POCS : [Ramani and Fessler, 2012; Sidky and Pan, 2008], ISTA : [Beck and Teboulle, 2009b; Daubechies et al., 2004], SART-TV : [Defrise et al., 2011]] Drawbacks need of imbricated iterations when using TV (all) need of a differentiable penalty (SART-TV) no proof of convergence (PICCS, ASD-POCS)
Current algorithms suited for TV Algorithms PICCS, ASD-POCS : alternates the minimization of data and of TV { } ISTA : x k = arg min x H(x) + 1 x (x 2t k 1 t k G(x k 1 )) 2 k SART-TV : uses a surrogate and a differentiable TV [PICCS : [Chen et al., 2008], ASD-POCS : [Ramani and Fessler, 2012; Sidky and Pan, 2008], ISTA : [Beck and Teboulle, 2009b; Daubechies et al., 2004], SART-TV : [Defrise et al., 2011]] Drawbacks need of imbricated iterations when using TV (all) need of a differentiable penalty (SART-TV) no proof of convergence (PICCS, ASD-POCS) slow convergence (ISTA) : Φ(x n, y) Φ(x, y) n 1
FISTA accelerates the convergence of ISTA [Beck and Teboulle, 2009a] Initialization x 1 = 0 x 0 : an arbitrary image R M t 0 = 1 θ 0 = 0
FISTA accelerates the convergence of ISTA [Beck and Teboulle, 2009a] Initialization x 1 = 0 x 0 : an arbitrary image R M t 0 = 1 θ 0 = 0 Iteration x n+1 = I FISTA (x n, x n 1 ) x n+1 = I ISTA ( (1 + θn ) x n θ n x n 1)
FISTA accelerates the convergence of ISTA [Beck and Teboulle, 2009a] Initialization x 1 = 0 x 0 : an arbitrary image R M t 0 = 1 θ 0 = 0 Iteration x n+1 = I FISTA (x n, x n 1 ) x n+1 = I ISTA ( (1 + θn ) x n θ n x n 1) (t n+1, θ n+1 ) = s(t n )
FISTA accelerates the convergence of ISTA [Beck and Teboulle, 2009a] Initialization x 1 = 0 x 0 : an arbitrary image R M t 0 = 1 θ 0 = 0 Iteration x n+1 = I FISTA (x n, x n 1 ) x n+1 = I ISTA ( (1 + θn ) x n θ n x n 1) (t n+1, θ n+1 ) = s(t n ) with s(t n) = ( ) 1+ 1+4 tn 2, tn 1. 2 t n+1
FISTA accelerates the convergence of ISTA [Beck and Teboulle, 2009a] Initialization x 1 = 0 x 0 : an arbitrary image R M t 0 = 1 θ 0 = 0 Iteration x n+1 = I FISTA (x n, x n 1 ) x n+1 = I ISTA ( (1 + θn ) x n θ n x n 1) (t n+1, θ n+1 ) = s(t n ) with s(t n) = ( ) 1+ 1+4 tn 2, tn 1. 2 t n+1 Speed of convergence Φ(x n, y) Φ(x, y) n 2.
A way to overcome the difficulties : Generalized ISTA (GISTA) [Loris and Verhoeven, 2011] Cost function Φ(x, y) = G(x, y) + λ H(A x) suitable for A = reduces to ISTA for A orthogonal no internal iteration proven convergence
GISTA [Loris and Verhoeven, 2011] Initialization x 0 : arbitrary image R M w 0 = 0 R D M τ < 2/ K T K σ < 1/ AA T.
GISTA [Loris and Verhoeven, 2011] Initialization x 0 : arbitrary image R M τ < 2/ K T K w 0 = 0 R D M σ < 1/ AA T. Iteration : (x n+1, w n+1 ) = I GISTA (x n, w n ) x n+1 = x n + τk T (y K x n )
GISTA [Loris and Verhoeven, 2011] Initialization x 0 : arbitrary image R M τ < 2/ K T K w 0 = 0 R D M σ < 1/ AA T. Iteration : (x n+1, w n+1 ) = I GISTA (x n, w n ) x n+1 = x n + τk T (y K x n ) τ T w n
GISTA [Loris and Verhoeven, 2011] Initialization x 0 : arbitrary image R M τ < 2/ K T K w 0 = 0 R D M σ < 1/ AA T. Iteration : (x n+1, w n+1 ) = I GISTA (x n, w n ) x n+1 = x n + τk T (y K x n ) τ T w n ( w n+1 = P λ w n + σ x n+1) τ
GISTA [Loris and Verhoeven, 2011] Initialization x 0 : arbitrary image R M τ < 2/ K T K w 0 = 0 R D M σ < 1/ AA T. Iteration : (x n+1, w n+1 ) = I GISTA (x n, w n ) x n+1 = x n + τk T (y K x n ) τ T w n ( w n+1 = P λ w n + σ x n+1) τ { λ ui / u P λ (u) = i if u i > λ u i if u i λ with u i R D, and u i = ui,x 2 + u2 i,y + u2 i,z. } i=1..m
GISTA [Loris and Verhoeven, 2011] Initialization x 0 : arbitrary image R M τ < 2/ K T K w 0 = 0 R D M σ < 1/ AA T. Iteration : (x n+1, w n+1 ) = I GISTA (x n, w n ) x n+1 = x n + τk T (y K x n ) τ T w n ( w n+1 = P λ w n + σ x n+1) τ x n+1 = x n + τk T (y K x n ) τ T w n+1. { λ ui / u P λ (u) = i if u i > λ u i if u i λ with u i R D, and u i = ui,x 2 + u2 i,y + u2 i,z. } i=1..m
But GISTA is slow cross section of a mouse, short scan, 98 projections, acquired on the Skyscan 1178, after 260 (left) and 1000 iterations (right), reconstructed with GISTA and λ =.01
This work How to go as fast as possible? initialization restart FGISTA
Numerical experiment Dataset Poisson noise : 10 4 photons/lor forbild thorax phantom P = 200 projections of J = 600 pixels imp.uni-erlangen.de Image 2D, M = 600 600
GISTA : initialization matters How many initial SART iterations lead to lowest cost within N < 10 4 iterations? cost function 10 1.6 10 1.5 λ =.0025 : starts with... iter of SART... 0... 1... 4 # initial SART iter. 5 4 3 2 10 1.4 1 10 0 10 1 10 2 10 3 10 4 iteration 10 3 10 2 10 1 λ # of initial SART iter when λ
Restarted GISTA Initialization: M 1 iterations of SART Iteration: (1) perform 1 iteration of SART (2) run GISTA during N iter (3) set w = 0 (4) back to (1) Inspired by the restart of conjugate gradient (see also: [O Donoghue and Candès, 2012; Powell, 1977; Sidky and Pan, 2008])
Restarted GISTA (RGISTA) Does RGISTA lead to a lower cost within N < 10 2 iterations?
Restarted GISTA (RGISTA) Does RGISTA lead to a lower cost within N < 10 2 iterations? 10 3.4 λ =.5 cost function 10 3 10 2.6 no restart restart after 1 restart after 30 10 0 10 1 10 2 iteration NO
Restarted GISTA (RGISTA) Does RGISTA lead to a lower cost within N < 10 2 iterations? cost function 10 3.4 10 3 10 2.6 λ =.5 no restart restart after 1 restart after 30 10 0 10 1 10 2 iteration cost function 10 1.6 10 1.5 λ =.0025 restart after 2 no restart 10 1 10 2 iteration NO YES
Restarted GISTA (RGISTA) Does RGISTA lead to a lower cost within N < 10 2 iterations? cost function 10 3.4 10 3 10 2.6 λ =.5 no restart restart after 1 restart after 30 10 0 10 1 10 2 iteration cost function 10 1.6 10 1.5 λ =.0025 restart after 2 no restart 10 1 10 2 iteration NO YES e.g. restart after 2 iter 6 iter instead of 18.
Restarted GISTA (RGISTA) Does RGISTA lead to a lower cost within N < 10 2 iterations? cost function 10 3.4 10 3 10 2.6 λ =.5 no restart restart after 1 restart after 30 10 0 10 1 10 2 iteration cost function 10 1.6 10 1.5 λ =.0025 restart after 2 no restart 10 1 10 2 iteration NO YES e.g. restart after 2 iter 6 iter instead of 18. efficiency of RGISTA depends on λ
Restarted GISTA (RGISTA) λ = 0.5 λ = 0.0025 4 iter 5000 iter 4 iter 5000 iter 0.3 0.25 0.2 0.15 0.1 0.05 0 0 100 200 300 400 0.4 0.3 0.2 0.1 0 50 100 150 200 250 300 350 400
FGISTA Initialization x 1 = 0 x 0 : an arbitrary image R M w 1 = w 0 = 0 R D M t 0 = 1, θ 0 = 0
FGISTA Initialization x 1 = 0 x 0 : an arbitrary image R M w 1 = w 0 = 0 R D M t 0 = 1, θ 0 = 0 Iteration : (x n, w n ) = I FGISTA (x n, w n, x n 1, w n 1 ) v n = (1 + θ n ) x n θ n x n 1 z n = (1 + θ n ) w n θ n w n 1 (x n, w n ) = I GISTA (v n, z n ) (t n+1, θ n+1 ) = s(t n )
FGISTA Initialization x 1 = 0 x 0 : an arbitrary image R M w 1 = w 0 = 0 R D M t 0 = 1, θ 0 = 0 Iteration : (x n, w n ) = I FGISTA (x n, w n, x n 1, w n 1 ) v n = (1 + θ n ) x n θ n x n 1 z n = (1 + θ n ) w n θ n w n 1 (x n, w n ) = I GISTA (v n, z n ) (t n+1, θ n+1 ) = s(t n ) same fixed points as GISTA
FGISTA Initialization x 1 = 0 x 0 : an arbitrary image R M w 1 = w 0 = 0 R D M t 0 = 1, θ 0 = 0 Iteration : (x n, w n ) = I FGISTA (x n, w n, x n 1, w n 1 ) v n = (1 + θ n ) x n θ n x n 1 z n = (1 + θ n ) w n θ n w n 1 (x n, w n ) = I GISTA (v n, z n ) (t n+1, θ n+1 ) = s(t n ) same fixed points as GISTA reduces to FISTA when A is orthogonal
FGISTA Initialization x 1 = 0 x 0 : an arbitrary image R M w 1 = w 0 = 0 R D M t 0 = 1, θ 0 = 0 Iteration : (x n, w n ) = I FGISTA (x n, w n, x n 1, w n 1 ) v n = (1 + θ n ) x n θ n x n 1 z n = (1 + θ n ) w n θ n w n 1 (x n, w n ) = I GISTA (v n, z n ) (t n+1, θ n+1 ) = s(t n ) same fixed points as GISTA reduces to FISTA when A is orthogonal no proof of convergence
FGISTA cost function 10 4 10 3 λ =.25 GISTA FGISTA FGISTA switched to GISTA after 15 iter 10 0 10 1 10 2 10 3 10 4 10 5 iteration
FGISTA FGISTA, λ=.25, 100 iter GISTA, λ=.25, 100 iter 250 200 250 200 y 150 150 100 50 100 50 100 200 300 400 500 600 x 100 200 300 400 500 600 x profiles 0.3 0.25 0.2 0.15 FGISTA, λ=.25, 100 iter GISTA, λ=.25, 100 iter 200 250 300 350 400 y
Discussion FGISTA and GISTA : share the same fixed points
Discussion FGISTA and GISTA : share the same fixed points Why do FISTA and GISTA not converge to the same values? rounding errors in the algorithm? limit cycle? other update of the parameters? (cf. Chambolle-Pocq)
Discussion FGISTA and GISTA : share the same fixed points Why do FISTA and GISTA not converge to the same values? rounding errors in the algorithm? limit cycle? other update of the parameters? (cf. Chambolle-Pocq) A fixed point algorithm that appears to converge numerically does not necessarily min Φ.
Discussion FGISTA and GISTA : share the same fixed points Why do FISTA and GISTA not converge to the same values? rounding errors in the algorithm? limit cycle? other update of the parameters? (cf. Chambolle-Pocq) A fixed point algorithm that appears to converge numerically does not necessarily min Φ.
Open issues How to determine, on the fly the optimal initialization?
Open issues How to determine, on the fly the optimal initialization? the optimal # and position of the restarts?
Open issues How to determine, on the fly the optimal initialization? the optimal # and position of the restarts? Why is cost(fgista) > cost(gista)?
Remember GISTA reconstructs CT images with proven convergence no internal iteration
Remember GISTA reconstructs CT images with proven convergence no internal iteration Initialization matters.
Remember GISTA reconstructs CT images with proven convergence no internal iteration Initialization matters. Restart and FGISTA may help further.
References A. H. Andersen and A. C. Kak. Simultaneous algebraic reconstruction technique (sart): A superior implementation of the art algorithm. Ultrasonic Imaging, 6:81 94, 1984. Amir Beck and Marc Teboulle. A fast iterative shrinkage-thresholding algorithm for linear inverse problems. Siam J. Imaging Sciences, 2: 183 202, 2009a. Amir Beck and Marc Teboulle. Fast gradient-based algorithms for constrained total variation image denoising and deblurring problems. 2009b. G.-H. Chen, J. Tang, and S. Leng. Prior image constrained compressed sensing (piccs): A method to accurately reconstruct dynamic ct images from highly undersampled projection data sets. Med. Phys., AAPM, 35:660 663, 2008. I. Daubechies, M. Defrise, and C. De Mol. An iterative thresholding algorithm for linear inverse problems with a sparsity constraint. Communications on Pure and Applied Mathematics, 57(11):1413 1457, August 2004. ISSN 1097-0312. doi: 10.1002/cpa.20042. URL http://onlinelibrary.wiley.com/doi/10.1002/cpa.20042/abstract. Michel Defrise, Christian Vanhove, and Xuan Liu. An algorithm for total variation regularization in high-dimensional linear problems. Inverse Problems, 27(6):065002, June 2011. ISSN 0266-5611, 1361-6420. doi: 10.1088/0266-5611/27/6/065002. URL http://iopscience.iop.org/0266-5611/27/6/065002. Ignace Loris and Caroline Verhoeven. On a generalization of the iterative soft-thresholding algorithm for the case of non-separable penalty. Inverse Problems, 27:125007, 2011. doi: 10.1088/0266-5611/27/12/125007. URL http://arxiv.org/abs/1104.1087. B. O Donoghue and E. Candès. Adaptive restart for accelerated gradient schemes. arxiv:1204.3982, april 2012. M. J. D. Powell. Restart procedures for the conjugate gradient method. Mathematical programming, 12:241 254, 1977. S. Ramani and J. A. Fessler. A splitting-based iterative algorithm for accelerated statistical x-ray ct reconstruction. IEEE Trans Med Imaging, 31(3):677 688, 2012. L. Rudin, S. Osher, and E. Fatemi. Nonlinear total variation based noise removal algorithms. Physica D, 60:259 268, 1992. URL http://www.math-info.univ-paris5.fr/ lomn/cours/ece/physicarudinosher.pdf. Emil Y Sidky and Xiaochuan Pan. Image reconstruction in circular cone-beam computed tomography by constrained, total-variation minimization. Physics in Medicine and Biology, 53(17):4777 4807, September 2008. ISSN 0031-9155, 1361-6560. doi: 10.1088/0031-9155/53/17/021. URL http://iopscience.iop.org/0031-9155/53/17/021. Emil Y. Sidky, Chien-Min Kao, and Xiaochuan Pan. Accurate image reconstruction from few-views and limited-angle data in divergent-beam ct. J. X-Ray Sci. Technol, 14:119 139, 2006.
Chambolle-Pocq derived algorithm Initialization p 0 = 0 R P J w 0 = 0 R D M x 0 = 0 R M τσ < 1/ K, Iteration : (p n+1, w n+1, x n+1 ) = I CP (p n, w n, x n ) p n+1 = (p n + σ(y K x n )) /(1 + σ) w n+1 = P λ (w n + σ x n+1 ) x n+1 = x n + τk T p n+1 τ T w n+1. Caution must be taken to adapt the dimensions of K to those of the.
Philips Brightview XCT TV leads to less noise, flat regions and sharp edges Flat panel: 1024 768 sq. el., side of 0.388 mm. Images: 300 270 256, resol. of 0.8 mm. 6 x 10 5 (a) (c) (b) (d) intensity 4 2 0 0 100 200 300 position (voxels) Figure : (a) SART, P = 720 views, 5 iterations. (b) SART, P = 100 views, 5 iterations. (c) profiles (blue=sart, P = 720, red=gista, P = 720). (d) GISTA P = 720 views, λ = 0.018, 30 iterations. The images are average of 3 consecutive slices.
Philips Brightview XCT: RISTA vs GISTA TV intensity 55 (a) 50 45 40 35 30 25 20 15 0.25 0.3 0.35 0.4 0.45 0.5 LS 6 x 10 5 (b) 4 2 0 0 50 100 150 200 250 300 position (pixels) (c) (d) Figure : (a) (LS, TV) curves for GISTA, P = 100 views, λ = 0.01. The 41th iter. are highlighted by a cross. (b) Profiles of the 41th iteration. (c) GISTA, 41 iterations. (d) RISTA, 41 iterations. The images are average of 3 consecutive slices. In this figure, blue=gist without restart (GISTA), green=gist with restart after the 10th iteration (RISTA). See Fig. 1 for SART with 100 views.
Skyscan
Skyscan
Skyscan
Skyscan
Skyscan
Skyscan