ITERATED SRINKAGE ALGORITHM FOR BASIS PURSUIT MINIMIZATION

Similar documents
Sparse & Redundant Representations by Iterated-Shrinkage Algorithms

Sparse & Redundant Signal Representation, and its Role in Image Processing

Image Denoising with Shrinkage and Redundant Representations

Noise Removal? The Evolution Of Pr(x) Denoising By Energy Minimization. ( x) An Introduction to Sparse Representation and the K-SVD Algorithm

Bayesian Paradigm. Maximum A Posteriori Estimation

Coordinate and subspace optimization methods for linear least squares with non-quadratic regularization

TRACKING SOLUTIONS OF TIME VARYING LINEAR INVERSE PROBLEMS

Machine Learning for Signal Processing Sparse and Overcomplete Representations. Bhiksha Raj (slides from Sourish Chaudhuri) Oct 22, 2013

Super-Resolution. Shai Avidan Tel-Aviv University

Machine Learning for Signal Processing Sparse and Overcomplete Representations

EE 381V: Large Scale Optimization Fall Lecture 24 April 11

Tutorial: Sparse Signal Processing Part 1: Sparse Signal Representation. Pier Luigi Dragotti Imperial College London

The Analysis Cosparse Model for Signals and Images

ANALYSIS OF p-norm REGULARIZED SUBPROBLEM MINIMIZATION FOR SPARSE PHOTON-LIMITED IMAGE RECOVERY

Estimation Error Bounds for Frame Denoising

Morphological Diversity and Source Separation

Overcomplete Dictionaries for. Sparse Representation of Signals. Michal Aharon

Slicing the Transform - A Discriminative Approach for Wavelet Denoising

Sparse linear models

EUSIPCO

Generalized greedy algorithms.

A discretized Newton flow for time varying linear inverse problems

Super-Resolution. Dr. Yossi Rubner. Many slides from Miki Elad - Technion

Recovery of Sparse Signals from Noisy Measurements Using an l p -Regularized Least-Squares Algorithm

SPARSE SIGNAL RESTORATION. 1. Introduction

SOS Boosting of Image Denoising Algorithms

LINEARIZED BREGMAN ITERATIONS FOR FRAME-BASED IMAGE DEBLURRING

Detecting Sparse Structures in Data in Sub-Linear Time: A group testing approach

Generalized Orthogonal Matching Pursuit- A Review and Some

Sparse Optimization Lecture: Basic Sparse Optimization Models

Learning Sparsifying Transforms

Data Sparse Matrix Computation - Lecture 20

Multiple Change Point Detection by Sparse Parameter Estimation

Sparsity Regularization

IPAM Summer School Optimization methods for machine learning. Jorge Nocedal

Convex Optimization and l 1 -minimization

MCMC Sampling for Bayesian Inference using L1-type Priors

Five Lectures on Sparse and Redundant Representations Modelling of Images. Michael Elad

sparse and low-rank tensor recovery Cubic-Sketching

Image representation with multi-scale gradients

A Generalized Uncertainty Principle and Sparse Representation in Pairs of Bases

An Homotopy Algorithm for the Lasso with Online Observations

Efficient Variational Inference in Large-Scale Bayesian Compressed Sensing

Recent developments on sparse representation

LEARNING OVERCOMPLETE SPARSIFYING TRANSFORMS FOR SIGNAL PROCESSING. Saiprasad Ravishankar and Yoram Bresler

Lecture 8 Optimization

Design of Image Adaptive Wavelets for Denoising Applications

Image Noise: Detection, Measurement and Removal Techniques. Zhifei Zhang

Introduction to Compressed Sensing

MIT 9.520/6.860, Fall 2018 Statistical Learning Theory and Applications. Class 08: Sparsity Based Regularization. Lorenzo Rosasco

Signal Recovery, Uncertainty Relations, and Minkowski Dimension

MLCC 2018 Variable Selection and Sparsity. Lorenzo Rosasco UNIGE-MIT-IIT

Large-Scale L1-Related Minimization in Compressive Sensing and Beyond

Sparsity in Underdetermined Systems

Independent Component Analysis. Contents

Probabilistic Model of Error in Fixed-Point Arithmetic Gaussian Pyramid

A Simple Explanation of the Sobolev Gradient Method

Oslo Class 6 Sparsity based regularization

Structured matrix factorizations. Example: Eigenfaces

On Sparsity, Redundancy and Quality of Frame Representations

Low-Complexity Image Denoising via Analytical Form of Generalized Gaussian Random Vectors in AWGN

1 Sparsity and l 1 relaxation

Provable Alternating Minimization Methods for Non-convex Optimization

New Coherence and RIP Analysis for Weak. Orthogonal Matching Pursuit

Compressed Sensing in Astronomy

A Multilevel Iterated-Shrinkage Approach to l 1 Penalized Least-Squares Minimization

Image restoration: numerical optimisation

IN THE blind source separation (BSS) setting, the instantaneous

Sparsity and Morphological Diversity in Source Separation. Jérôme Bobin IRFU/SEDI-Service d Astrophysique CEA Saclay - France

Linear Inverse Problems

Lecture 5 : Projections

A Dual Sparse Decomposition Method for Image Denoising

One Picture and a Thousand Words Using Matrix Approximtions October 2017 Oak Ridge National Lab Dianne P. O Leary c 2017

MMSE Denoising of 2-D Signals Using Consistent Cycle Spinning Algorithm

Elaine T. Hale, Wotao Yin, Yin Zhang

Compressed sensing. Or: the equation Ax = b, revisited. Terence Tao. Mahler Lecture Series. University of California, Los Angeles

Truncation Strategy of Tensor Compressive Sensing for Noisy Video Sequences

Curvelet imaging & processing: sparseness constrained least-squares migration

Dictionary Learning for L1-Exact Sparse Coding

Wavelet-Based Nonparametric Modeling of Hierarchical Functions in Colon Carcinogenesis

SEQUENTIAL SUBSPACE FINDING: A NEW ALGORITHM FOR LEARNING LOW-DIMENSIONAL LINEAR SUBSPACES.

Digital Image Processing. Lecture 6 (Enhancement) Bu-Ali Sina University Computer Engineering Dep. Fall 2009

Bayesian Methods for Sparse Signal Recovery

Scattered Data Approximation of Noisy Data via Iterated Moving Least Squares

A Generative Perspective on MRFs in Low-Level Vision Supplemental Material

MIT 9.520/6.860, Fall 2017 Statistical Learning Theory and Applications. Class 19: Data Representation by Design

IMAGE RESTORATION: TOTAL VARIATION, WAVELET FRAMES, AND BEYOND

Machine Learning for NLP

Generalized Newton-Type Method for Energy Formulations in Image Processing

Compressive Sensing and Beyond

Adaptive Corrected Procedure for TVL1 Image Deblurring under Impulsive Noise

Sparse Approximation and Variable Selection

arxiv: v1 [stat.ml] 22 Nov 2016

Recovering overcomplete sparse representations from structured sensing

Design of Projection Matrix for Compressive Sensing by Nonsmooth Optimization

arxiv: v1 [cs.it] 12 Mar 2014

Learning MMSE Optimal Thresholds for FISTA

2 Regularized Image Reconstruction for Compressive Imaging and Beyond

The Sparsest Solution of Underdetermined Linear System by l q minimization for 0 < q 1

Complementary Matching Pursuit Algorithms for Sparse Approximation

Transcription:

ITERATED SRINKAGE ALGORITHM FOR BASIS PURSUIT MINIMIZATION Michael Elad The Computer Science Department The Technion Israel Institute o technology Haia 3000, Israel * SIAM Conerence on Imaging Science May 5-7, 005 Minneapolis, Minnesota Sparse Representations Theory and Applications in Image Processing * Joint wor with Michael Zibulevsy and Boaz Matalon

Noise Removal Our story begins with signal/image denoising Remove Additive Noise 00 years o activity numerous algorithms. Considered Directions include: PDE, statistical estimators, adaptive ilters, inverse problems & regularization, examplebased restoration, sparse representations,

Shrinage For Denoising Shrinage is a simple yet eective sparsity-based denoising algorithm [Donoho & Johnstone, 993]. Justiication : minimax near-optimal over the Besov (smoothness) signal space (complicated!!!!). Apply Wavelet Transorm LUT Apply Inv. Wavelet Transorm Justiication : Bayesian (MAP) optimal [Simoncelli & Adelson 996, Moulin & Liu 999]. In both justiications, an additive Gaussian white noise and a unitary transorm are crucial assumptions or the optimality claims. 3

Redundant Transorms? Apply Redundant Transorm LUT Apply its (pseudo) Inverse Transorm Number o coeicients This (much!) scheme greater is still applicable, than and it wors ine (tested with curvelet, contourlet, the number undecimated o input wavelet, and more). samples (pixels) However, it is no longer the optimal solution or the MAP criterion. WE SHOW THAT THE TODAY S ABOVE SHRINKAGE FOCUS: METHOD IS THE FIRST IS SHRINKAGE ITERATION STILL IN A VERY RELEVANT EFFECTIVE WHEN AND HANDLING SIMPLE ALGORITHM REDUNDANT THAT (OR MINIMIZES NON-UNITARY) THE BASIS TRANSFORMS? PURSUIT, AND AS SUCH, IT IS A HOW? NEW PURSUIT WHY? TECHNIQUE. 4

Agenda. Bayesian Point o View a Unitary Transorm Optimality o shrinage. What About Redundant Representation? Is shrinage is relevant? Why? How? 3. Conclusions Thomas Bayes 70-76 5

The MAP Approach Minimize the ollowing unction with respect to x: ( x) x y + λ Pr( x) Log-Lielihood term Prior or regularization Unnown to be recovered Given measurements 6

Image Prior? During the past several decades we have made all sort o guesses about the prior Pr(x): Pr ( x) λ x Pr ( x) λ Lx Pr ( x) λ Lx W Pr ( x) λρ{ Lx} Energy Smoothness Adapt+ Smooth Robust Statistics Pr ( x) λ x Total- Variation Pr ( x) λ Wx Wavelet Sparsity Pr ( x) λ Tx Today s Focus Sparse & Redundant Mumord & Shah ormulation, Compression algorithms as priors, 7

(Unitary) Wavelet Sparsity ( x) x y + λ Wx Deine xˆ Wx L is unitarily invariant H ( xˆ ) W ( xˆ ŷy ) + λ xˆ ( x) xˆ ŷ + λ ( xˆ ŷ ) xˆ + λ xˆ x H W xˆ We got a separable set o D optimization problems 8

Why Shrinage? Want to minimize this -D unction with respect to z () z ( z a) + λ z LUT z opt a A LUT can be built or any other robust unction (replacing the z ), including nonconvex ones (e.g., L 0 norm)!! z opt a λ 0 a + λ a λ a < λ a λ 9

Agenda. Bayesian Point o View a Unitary Transorm Optimality o shrinage n. What About Redundant Representation? Is shrinage is relevant? Why? How? 3. Conclusions n T R 0

An Overcomplete Transorm T x α ( x) x y + λ Tx Redundant transorms are important because they can (i) Lead to a shit-invariance property, (ii) Represent images better (because o orientation/scale analysis), (iii) Enable deeper sparsity (when used in conjunction with the BP).

Analysis versus Synthesis Analysis Prior: ~ ( x) x y + λ Tx + ( α) T α y + λ α Deine α x Tx T + α Synthesis Prior: ~ However α D Basis Pursuit ( ) α y + λ α D ~ Argmin α α TT + α ( α ) Argmin ( x ) x x

Basis Pursuit As Objective Our Objective: ~ ( α) Dα y + λ α Dα-y - Getting a sparse solution implies that y is composed o ew atoms rom D 3

Sequential Coordinate Descent ~ Our objective ( α) Dα y + λ α The unnown, α, has entries. How about optimizing w.r.t. each o them sequentially? The objective per each becomes ~ ~ j () z zd y + λ z Set j Fix all entries o α apart rom the j-th one Optimize with respect to α j jj+ mod 4

We Get Sequential Shrinage BEFORE: and the solution was We had this -D unction to minimize z opt S { a, λ} () z ( z a) + λ z a λ a λ 0 a < λ a + λ a λ z zd ~ j y + λ ~ NOW: Our -D objective is () z and the solution now is z opt d H ~ y j λ H S, S j, d d d j j j { } d ~ y λ 5

Sequential? Not Good!! Set j ~ y y Dα + d j α j and Fix all entries o α apart rom the j-th one α opt j d j { } d ~ y λ H S j, Optimize with respect to α j jj+ mod This method requires drawing one column at a time rom D. In most transorms this is not comortable at all!!! This also means that MP and its variants are inadequate. 6

How About Parallel Shrinage? ~ Our objective ( α) Dα y + λ α Assume a current solution α n. Using the previous method, we have descent directions obtained by a simple shrinage. How about taing all o them at once, with a proper relaxation? Little bit o math lead to For j: Compute the descent direction per α j : v j. Update the solution by αn+ αn + μ v j j 7

Parallel Coordinate Descent (PCD) Bac-projection to the signal domain Shrinage operation The synthesis error e α + S { H( ) } α + QD y Dα, λq α + μ e α Normalize by a diagonal matrix H Q diag { D D} Update by exact line-search At all stages, the dictionary is applied as a whole, either directly, or via its adjoint 8

PCD The First Iteration Assume: Zero initialization α 0 0 D is a tight rame with normalized columns (QI) Line search is replaced with μ e α + S { H( ) } α + QD y Dα, λq α + μ e α { H } D y, α e S λ x { H } y, D S D λ 9

Relation to Simple Shrinage? Apply Redundant Transorm LUT Apply its (pseudo) Inverse Transorm H D y { H } α S D y, λ Dα The irst iteration in our algorithm the intuitive shrinage!!! 0

PCD Convergence Analysis We have proven convergence to the global minimizer o the BPDN objective unction (with smoothing): α α * Approximate asymptotic convergence rate analysis yields: M m [ ( α + ) ( α *)] [ ( α ) ( α *)] M + m where M and m are the largest and smallest eigenvalues o respectively (H is the Hessian). This rate equals that o the Steepest-Descent algorithm, preconditioned by the Hessian s diagonal..5 Q 0 HQ Substantial urther speed-up can be obtained using the subspace optimization algorithm (SESOP) [Zibulevsy and Naris 004]. 0.5

Image Denoising Minimize ~ ( α) Dα y + λ Mα The Matrix M gives a variance per each coeicient, learned rom the corrupted image. D is the contourlet transorm (recent version). The length o α: ~e+6. The Seq. Shrinage algorithm cannot be simulated or this dim.. 9000 8000 7000 6000 5000 4000 3000 000 000 Objective unction Iterative Shrinage Steepest Descent Conjugate Gradient Truncated Newton 0 4 6 8 0 4 6 8 Iterations

Image Denoising ~ Minimize ( α) Dα y + λ Mα 3 3 30 Denoising PSNR Evaluate Dαˆ x True 9 8 7 Even though one iteration o our algorithm is equivalent in complexity to that o the SD, the perormance is much better 6 5 4 3 Iterative Shrinage Steepest Descent Conjugate Gradient Truncated Newton 0 4 6 8 0 4 6 8 Iterations 3

Image Denoising Original Image Noisy Image with σ0 Iterated Shrinage First Iteration PSNR8.30dB Iterated Shrinage second iteration PSNR3.05dB 4

Closely Related Wor Several recent wors have devised iterative shrinage algorithms, each with a dierent motivation: E-M algorithm or image deblurring - [Figueiredo & Nowa 003]. Minimize KWα Surrogate unctionals or deblurring as above [Daubechies, Derise, & De-Mol, 004] and [Figueiredo & Nowa 005]. PCD minimization or denoising (as shown above) [Elad, 005]. While these algorithms are similar, they are in act dierent. Our recent wor have shown that: PCD gives aster convergence, compared to the surrogate algorithms. All the above methods can be urther improved by SESOP, leading to [ ( ) ( *)] M m α + α [ ( α ) ( α *)] M+ m y + λ α 5

Agenda. Bayesian Point o View a Unitary Transorm Optimality o shrinage. What About Redundant Representation? Is shrinage is relevant? Why? How? 3. Conclusions 6

Conclusion Shrinage is an appealing signal denoising technique Compute all the CD directions, and use the average Getting what? When optimal? How? Go Parallel We obtain an easy to implement iterated shrinage algorithm (PCD). This algorithm has been thoroughly studied (convergence, rate, comparisons). For additive Gaussian noise and unitary transorms How to avoid the need to extract atoms? What i the transorm is redundant? Option : apply sequential coordinate descent which leads to a sequential shrinage algorithm 7

THANK YOU!! These slides and accompanying papers can be ound in http://www.cs.technion.ac.il/~elad 8