ITERATED SRINKAGE ALGORITHM FOR BASIS PURSUIT MINIMIZATION Michael Elad The Computer Science Department The Technion Israel Institute o technology Haia 3000, Israel * SIAM Conerence on Imaging Science May 5-7, 005 Minneapolis, Minnesota Sparse Representations Theory and Applications in Image Processing * Joint wor with Michael Zibulevsy and Boaz Matalon
Noise Removal Our story begins with signal/image denoising Remove Additive Noise 00 years o activity numerous algorithms. Considered Directions include: PDE, statistical estimators, adaptive ilters, inverse problems & regularization, examplebased restoration, sparse representations,
Shrinage For Denoising Shrinage is a simple yet eective sparsity-based denoising algorithm [Donoho & Johnstone, 993]. Justiication : minimax near-optimal over the Besov (smoothness) signal space (complicated!!!!). Apply Wavelet Transorm LUT Apply Inv. Wavelet Transorm Justiication : Bayesian (MAP) optimal [Simoncelli & Adelson 996, Moulin & Liu 999]. In both justiications, an additive Gaussian white noise and a unitary transorm are crucial assumptions or the optimality claims. 3
Redundant Transorms? Apply Redundant Transorm LUT Apply its (pseudo) Inverse Transorm Number o coeicients This (much!) scheme greater is still applicable, than and it wors ine (tested with curvelet, contourlet, the number undecimated o input wavelet, and more). samples (pixels) However, it is no longer the optimal solution or the MAP criterion. WE SHOW THAT THE TODAY S ABOVE SHRINKAGE FOCUS: METHOD IS THE FIRST IS SHRINKAGE ITERATION STILL IN A VERY RELEVANT EFFECTIVE WHEN AND HANDLING SIMPLE ALGORITHM REDUNDANT THAT (OR MINIMIZES NON-UNITARY) THE BASIS TRANSFORMS? PURSUIT, AND AS SUCH, IT IS A HOW? NEW PURSUIT WHY? TECHNIQUE. 4
Agenda. Bayesian Point o View a Unitary Transorm Optimality o shrinage. What About Redundant Representation? Is shrinage is relevant? Why? How? 3. Conclusions Thomas Bayes 70-76 5
The MAP Approach Minimize the ollowing unction with respect to x: ( x) x y + λ Pr( x) Log-Lielihood term Prior or regularization Unnown to be recovered Given measurements 6
Image Prior? During the past several decades we have made all sort o guesses about the prior Pr(x): Pr ( x) λ x Pr ( x) λ Lx Pr ( x) λ Lx W Pr ( x) λρ{ Lx} Energy Smoothness Adapt+ Smooth Robust Statistics Pr ( x) λ x Total- Variation Pr ( x) λ Wx Wavelet Sparsity Pr ( x) λ Tx Today s Focus Sparse & Redundant Mumord & Shah ormulation, Compression algorithms as priors, 7
(Unitary) Wavelet Sparsity ( x) x y + λ Wx Deine xˆ Wx L is unitarily invariant H ( xˆ ) W ( xˆ ŷy ) + λ xˆ ( x) xˆ ŷ + λ ( xˆ ŷ ) xˆ + λ xˆ x H W xˆ We got a separable set o D optimization problems 8
Why Shrinage? Want to minimize this -D unction with respect to z () z ( z a) + λ z LUT z opt a A LUT can be built or any other robust unction (replacing the z ), including nonconvex ones (e.g., L 0 norm)!! z opt a λ 0 a + λ a λ a < λ a λ 9
Agenda. Bayesian Point o View a Unitary Transorm Optimality o shrinage n. What About Redundant Representation? Is shrinage is relevant? Why? How? 3. Conclusions n T R 0
An Overcomplete Transorm T x α ( x) x y + λ Tx Redundant transorms are important because they can (i) Lead to a shit-invariance property, (ii) Represent images better (because o orientation/scale analysis), (iii) Enable deeper sparsity (when used in conjunction with the BP).
Analysis versus Synthesis Analysis Prior: ~ ( x) x y + λ Tx + ( α) T α y + λ α Deine α x Tx T + α Synthesis Prior: ~ However α D Basis Pursuit ( ) α y + λ α D ~ Argmin α α TT + α ( α ) Argmin ( x ) x x
Basis Pursuit As Objective Our Objective: ~ ( α) Dα y + λ α Dα-y - Getting a sparse solution implies that y is composed o ew atoms rom D 3
Sequential Coordinate Descent ~ Our objective ( α) Dα y + λ α The unnown, α, has entries. How about optimizing w.r.t. each o them sequentially? The objective per each becomes ~ ~ j () z zd y + λ z Set j Fix all entries o α apart rom the j-th one Optimize with respect to α j jj+ mod 4
We Get Sequential Shrinage BEFORE: and the solution was We had this -D unction to minimize z opt S { a, λ} () z ( z a) + λ z a λ a λ 0 a < λ a + λ a λ z zd ~ j y + λ ~ NOW: Our -D objective is () z and the solution now is z opt d H ~ y j λ H S, S j, d d d j j j { } d ~ y λ 5
Sequential? Not Good!! Set j ~ y y Dα + d j α j and Fix all entries o α apart rom the j-th one α opt j d j { } d ~ y λ H S j, Optimize with respect to α j jj+ mod This method requires drawing one column at a time rom D. In most transorms this is not comortable at all!!! This also means that MP and its variants are inadequate. 6
How About Parallel Shrinage? ~ Our objective ( α) Dα y + λ α Assume a current solution α n. Using the previous method, we have descent directions obtained by a simple shrinage. How about taing all o them at once, with a proper relaxation? Little bit o math lead to For j: Compute the descent direction per α j : v j. Update the solution by αn+ αn + μ v j j 7
Parallel Coordinate Descent (PCD) Bac-projection to the signal domain Shrinage operation The synthesis error e α + S { H( ) } α + QD y Dα, λq α + μ e α Normalize by a diagonal matrix H Q diag { D D} Update by exact line-search At all stages, the dictionary is applied as a whole, either directly, or via its adjoint 8
PCD The First Iteration Assume: Zero initialization α 0 0 D is a tight rame with normalized columns (QI) Line search is replaced with μ e α + S { H( ) } α + QD y Dα, λq α + μ e α { H } D y, α e S λ x { H } y, D S D λ 9
Relation to Simple Shrinage? Apply Redundant Transorm LUT Apply its (pseudo) Inverse Transorm H D y { H } α S D y, λ Dα The irst iteration in our algorithm the intuitive shrinage!!! 0
PCD Convergence Analysis We have proven convergence to the global minimizer o the BPDN objective unction (with smoothing): α α * Approximate asymptotic convergence rate analysis yields: M m [ ( α + ) ( α *)] [ ( α ) ( α *)] M + m where M and m are the largest and smallest eigenvalues o respectively (H is the Hessian). This rate equals that o the Steepest-Descent algorithm, preconditioned by the Hessian s diagonal..5 Q 0 HQ Substantial urther speed-up can be obtained using the subspace optimization algorithm (SESOP) [Zibulevsy and Naris 004]. 0.5
Image Denoising Minimize ~ ( α) Dα y + λ Mα The Matrix M gives a variance per each coeicient, learned rom the corrupted image. D is the contourlet transorm (recent version). The length o α: ~e+6. The Seq. Shrinage algorithm cannot be simulated or this dim.. 9000 8000 7000 6000 5000 4000 3000 000 000 Objective unction Iterative Shrinage Steepest Descent Conjugate Gradient Truncated Newton 0 4 6 8 0 4 6 8 Iterations
Image Denoising ~ Minimize ( α) Dα y + λ Mα 3 3 30 Denoising PSNR Evaluate Dαˆ x True 9 8 7 Even though one iteration o our algorithm is equivalent in complexity to that o the SD, the perormance is much better 6 5 4 3 Iterative Shrinage Steepest Descent Conjugate Gradient Truncated Newton 0 4 6 8 0 4 6 8 Iterations 3
Image Denoising Original Image Noisy Image with σ0 Iterated Shrinage First Iteration PSNR8.30dB Iterated Shrinage second iteration PSNR3.05dB 4
Closely Related Wor Several recent wors have devised iterative shrinage algorithms, each with a dierent motivation: E-M algorithm or image deblurring - [Figueiredo & Nowa 003]. Minimize KWα Surrogate unctionals or deblurring as above [Daubechies, Derise, & De-Mol, 004] and [Figueiredo & Nowa 005]. PCD minimization or denoising (as shown above) [Elad, 005]. While these algorithms are similar, they are in act dierent. Our recent wor have shown that: PCD gives aster convergence, compared to the surrogate algorithms. All the above methods can be urther improved by SESOP, leading to [ ( ) ( *)] M m α + α [ ( α ) ( α *)] M+ m y + λ α 5
Agenda. Bayesian Point o View a Unitary Transorm Optimality o shrinage. What About Redundant Representation? Is shrinage is relevant? Why? How? 3. Conclusions 6
Conclusion Shrinage is an appealing signal denoising technique Compute all the CD directions, and use the average Getting what? When optimal? How? Go Parallel We obtain an easy to implement iterated shrinage algorithm (PCD). This algorithm has been thoroughly studied (convergence, rate, comparisons). For additive Gaussian noise and unitary transorms How to avoid the need to extract atoms? What i the transorm is redundant? Option : apply sequential coordinate descent which leads to a sequential shrinage algorithm 7
THANK YOU!! These slides and accompanying papers can be ound in http://www.cs.technion.ac.il/~elad 8