Sparsifying Transform Learning for Compressed Sensing MRI

Sparsifying Transform Learning for Compressed Sensing MRI Saiprasad Ravishankar and Yoram Bresler Department of Electrical and Computer Engineering and Coordinated Science Laborarory University of Illinois at Urbana-Champaign April 8, 2013

Outline Why compressed sensing MRI (CSMRI)? Nonadaptive CSMRI Synthesis Dictionary Learning MRI Transform vs. Synthesis Model Transform Learning MRI Formulations Algorithms Results (Static MRI) Conclusions

Motivation for Compressed Sensing MRI Data are samples in k-space, acquired sequentially in time. Acquisition rate limited by MR physics, etc. CS allows recovery of images from limited measurements Sparsity in transform domain or dictionary Acquisition incoherent with sparse model Reconstruction non-linear, non-convex Fig. from Lustig et al. 07

Compressed Sensing MRI (Nonadaptive) min x F u x y 2 2 +λ Ψx 1 (1) x C P - Image as vector, y C m - measurements. F u C m P - Undersampled Fourier encoding matrix (m < P). Ψ C T P - global, orthonormal transform. Total Variation penalty also added to (1) [Lustig et al. 07]. CSMRI with non-adaptive transforms limited to 2.5-3 fold undersampling [Ma et al. 08].

Synthesis Dictionary Learning The DL problem - min R x Dα 2 2 s.t. α 0 s (2) D,{α } R C n P extracts n n patch of x. D C n K - patch-based dictionary. α C K - sparse, R x Dα. s - sparsity level. DL problem is NP-hard. Algorithms such as K-SVD 1 alternate between finding D and {α }. 1 [Aharon et al. 06]

Learning Dictionaries from Undersampled Data (DLMRI) 2 (P0) min x,d,{α } Sparse Fitting {}}{ R x Dα 2 2 +ν F ux y 2 2 }{{} Data Fidelity s.t. α 0 s. (P0) learns D, and reconstructs x, from only undersampled y. But, (P0) NP-hard, non-convex even if l 0 -quasinorm relaxed to l 1. DLMRI solves (P0) by alternating between DL (solving for D,{α }) and reconstruction update (solving for x). 2 [Ravishankar & Bresler 11]

2D Random Sampling Example - 6x undersampling 0.3 0.25 0.2 0.15 0.1 0.05 LDP 3 reconstruction (22 db) LDP error magnitude 0 0.3 0.25 0.2 0.15 0.1 0.05 DLMRI reconstruction (32 db) DLMRI error magnitude 0 Data from Miki Lustig, UC Berkeley. 3 LDP - Lustig, Donoho, and Pauly ( 07).

Drawbacks of DLMRI DLMRI computations do not scale well O(Kn 2 P) for a P-pixel image, and D C n K. Cost dominated by dictionary learning, particularly sparse coding, which by itself is an NP-hard problem. DL algorithms such as K-SVD can get stuck in bad local minima or even saddle points. Can we learn better, more efficient sparse models for MR images?

Synthesis Model for Sparse Representation Given a signal y C n, and dictionary D C n K, we assume y = Dx with x 0 K. Real world signals modeled as y = Dx +e, e is deviation term. Given D, and sparsity level s, ˆx = argmin x y Dx 2 2 s.t. x 0 s This is the NP-hard synthesis sparse coding problem. Greedy and l 1 -relaxation algorithms are computationally expensive.

Transform Model for Sparse Representation Given a signal y C n, and transform W C m n, we model Wy = x +η with x 0 m and η - error term. Natural images are approximately sparse in Wavelets, DCT. Given W, and sparsity s, transform sparse coding is ˆx = argmin x Wy x 2 2 s.t. x 0 s ˆx computed exactly by thresholding Wy. Sparse coding is cheap! Signal recovered as W ˆx. Sparsifying transforms exploited for compression (JPEG2000), etc.

Square Transform Learning (P1) min W,{α } Sparsification Error {}}{ WR x α 2 2 λ s.t. α 0 s Regularizers {( }} ){ log detw W 2 F Sparsification error - measures deviation of patch in transform domain from perfect sparsity. λ > 0. The log detw restricts solution to full rank transforms. W 2 F keeps obective function bounded from below. ( ) Minimizing λ log detw W 2 F encourages reduction of condition number. The solution to (P1) is perfectly conditioned (κ = 1) as λ. (P1) is non-convex.

Transform Learning (TL) Algorithm Algorithm for (P1) alternates between updating {α } and W. Sparse Coding Step solves (P1) with fixed W. min WR x α 2 {α } 2 s.t. α 0 s (3) Easy problem - Solution ˆα computed exactly by thresholding WR x, and retaining s largest magnitude coefficients. Transform Update Step solves (P1) with fixed α s. min WR x α 2 W 2 λ log detw +µ W 2 F (4) Closed-form solution: Ŵ = U 2 (Σ+ ( ) Σ 2 )1 2 +2λI n Q H L 1, where ( ) R xx H R H +µi n = LL H, and L 1 R xα H = QΣU H.

TL Properties Obective converges for our exact alternating algorithm. Empirical evidence suggests convergence to same obective value regardless of initialization. Computational cost of TL : O(MNn 2 ) for N training signals, M iterations, and W C n n is significantly lower than cost for DL : O(MNn 3 ). Reduction in order by n for n n patch. Large values of λ enforce well-conditioning of transform.

Transform Learning MRI (TLMRI) (P2) min x,w,{α } WR x α 2 2 +λq(w)+ν F ux y 2 2 s.t. α 0 s. Similar to DLMRI formulation, but uses transform model. Q(W) = log detw + W 2 F. We modify (P2) by introducing extra variables ˆx in a penalty-type formulation, which leads to efficient algorithms. (P2) min Wˆx α 2 2 +λq(w)+ν F ux y 2 2 x,w,{ˆx },{α } +τ R x ˆx 2 2 s.t. α 0 s. Penalty R x ˆx 2 2 will also help us adaptively choose sparsity s.

TLMRI Algorithm - Denoising Step (P2) solved using alternating minimization. For given image x (corrupted), (P2) reduces to a denoising problem, with ˆx the denoised patches. Denoising Step - (P3) min W,{ˆx },{α } Wˆx α 2 2 +λq(w)+τ s.t. α 0 s. R x ˆx 2 2 Denoising involves: Transform learning (solve for W, {α } with fixed ˆx = WR x, s = s). Variable sparsity patch update (solving for ˆx and s ).

TLMRI Algorithm - Denoising Step The variable sparsity patch update involves solving (P3a) min Wˆx H s (WR x) 2 {ˆx } 2 +τ R x ˆx 2 2 H s (b) thresholds to s largest elements of b C n. For fixed s, (P3a) reduces to separate least squares problems in each ˆx. As s ր n, the denoising error R x ˆx LS 2 ց 0, with ˆxLS 2 the least squares solution for specific s. We pick s so that the error is below a threshold C - can be done efficiently [Ravishankar & Bresler 12]. C decreases over iterations, as iterates become more refined.

TLMRI Algorithm - Reconstruction Update Step Reconstruction Update Step - (P4) min x τ R x ˆx 2 2 +ν F ux y 2 2 Update performed directly in k-space { S(k x,k y ), (k x,k y ) / Ω Fx (k x,k y ) = S(k x,k y)+ν S 0(k x,k y) 1+ν, (k x,k y ) Ω (5) Fx(k x,k y ) - updated k-space value, S 0 (k x,k y ) - measured value, Ω- subset of k-space sampled. S = F RH ˆx β, ν = ν τβ (β - Number of patches covering any pixel).

TLMRI Algorithm Properties Every step of our algorithm involves efficient closed-form solutions. Per-iteration computational cost of TLMRI is lower than that of DLMRI in order by factor n (patch size).

Cartesian Sampling with 4x undersampling 33 PSNR 32 31 30 TLMRI DLMRI LDP Zero Filling 29 28 5 10 15 20 25 Iteration Number Original Image Zero-filling recon. PSNR vs. Iterations (PSNR = 28.94 db) TLMRI with square transform is better and 12x faster than 4x overcomplete DLMRI with 6 6 patches. TLMRI significantly better and also faster than LDP that employs fixed transforms.

Cartesian Sampling with 4x undersampling TLMRI recon. DLMRI recon. (PSNR = 32.54 db) (PSNR = 32.40 db) 0.2 0.2 0.15 0.15 0.1 0.1 0.05 0.05 TLMRI Error 0 DLMRI Error 0

Unconstrained TLMRI TLMRI algorithm requires setting error thresholds for variable sparsity update, and uses a penalty method-type approach. Alternative scheme employs an l 0 penalty instead, and uses the Augmented Lagrangian. (P5) Wˆx α 2 2 +λq(w)+ν F ux y 2 2 min x,w,{ˆx },{α },µ +η 2 α 0 + Re { µ H (R x ˆx ) } + τ R x ˆx 2 2 2 µ is a Lagrange multiplier matrix with µ C n as columns. This is an unconstrained formulation, which is still non-convex. We solve it using the alternating direction method of multipliers (ADMM). We can group some terms together in (P5) and set µ = µ τ.

Algorithm for Unconstrained TLMRI Update of α uses simple hard thresholding of W ˆx with threshold level η. min W ˆx α 2 2 +η2 α 0 (6) {α } Update of W uses Closed-Form Solution. min W ˆx α 2 W 2 λ log detw +λ W 2 F (7) Update of {ˆx } involves a least squares problem in each ˆx. min Wˆx α 2 {ˆx } 2 + τ R x ˆx µ 2 (8) 2 2 Update of µ : µ = µ (R x ˆx ). Update of x done efficiently in k-space. τ min R x ˆx µ 2 x 2 +ν F 2 ux y 2 2 (9)

Unconstrained TLMRI - Cartesian 4x undersampling 33 0.2 0.2 32 0.15 0.15 PSNR 31 30 Unconstrained TLMRI DLMRI 0.1 0.05 0.1 0.05 29 5 10 15 20 25 Iteration Number PSNR Unconstrained TLMRI DLMRI vs. Iterations Error Error (PSNR = 32.55 db) (PSNR = 32.40 db) 0 0 Our algorithm for Unconstrained TLMRI is better and 19x faster than DLMRI. Penalty approach performs similarly to unconstrained one with appropriate choice of error thresholds, but is slower.

Unconstrained TLMRI - 2D Random 5x undersampling 0.25 0.2 0.15 0.1 0.05 TLMRI 4 recon. (PSNR = 30.52 db) TLMRI Error 0 DLMRI recon. (PSNR = 28.70 db) DLMRI Error 0.25 0.2 0.15 0.1 0.05 0 Data from Miki Lustig, UC Berkeley. 4 12x Speedup over DLMRI.

Conclusions We proposed transform learning for undersampled MRI (TLMRI). Each step in our algorithms involves simple closed-form solutions. TLMRI provides comparable or better reconstructions than DLMRI. TLMRI is significantly faster than DLMRI. Unconstrained TLMRI algorithm faster than penalty-based approach. Speedups over DLMRI increase with patch size: >40x for 8 8 patches application to 3D/4D reconstruction. Iterates in our TLMRI algorithms empirically observed to converge. Future Work: Adaptive overcomplete transforms and doubly sparse transforms for MRI. Extension to dynamic MRI, functional MRI, etc.