Sparse & Redundant Representations by Iterated-Shrinkage Algorithms

Similar documents
Bayesian Paradigm. Maximum A Posteriori Estimation

ITERATED SRINKAGE ALGORITHM FOR BASIS PURSUIT MINIMIZATION

Sparse & Redundant Signal Representation, and its Role in Image Processing

Coordinate and subspace optimization methods for linear least squares with non-quadratic regularization

Sparsity in Underdetermined Systems

SPARSE SIGNAL RESTORATION. 1. Introduction

Image Denoising with Shrinkage and Redundant Representations

Basis Pursuit Denoising and the Dantzig Selector

Large-Scale L1-Related Minimization in Compressive Sensing and Beyond

Elaine T. Hale, Wotao Yin, Yin Zhang

Noise Removal? The Evolution Of Pr(x) Denoising By Energy Minimization. ( x) An Introduction to Sparse Representation and the K-SVD Algorithm

A Generalized Uncertainty Principle and Sparse Representation in Pairs of Bases

Optimization Algorithms for Compressed Sensing

A Multilevel Iterated-Shrinkage Approach to l 1 Penalized Least-Squares Minimization

Motivation Sparse Signal Recovery is an interesting area with many potential applications. Methods developed for solving sparse signal recovery proble

Coordinate descent. Geoff Gordon & Ryan Tibshirani Optimization /

Generalized Orthogonal Matching Pursuit- A Review and Some

Inverse problems and sparse models (1/2) Rémi Gribonval INRIA Rennes - Bretagne Atlantique, France

Sparsity Regularization

Wavelet Footprints: Theory, Algorithms, and Applications

Computing approximate PageRank vectors by Basis Pursuit Denoising

Minimizing Isotropic Total Variation without Subiterations

Scaled gradient projection methods in image deblurring and denoising

AM 205: lecture 19. Last time: Conditions for optimality Today: Newton s method for optimization, survey of optimization methods

Super-Resolution. Dr. Yossi Rubner. Many slides from Miki Elad - Technion

Bayesian Methods for Sparse Signal Recovery

Learning MMSE Optimal Thresholds for FISTA

Lecture 25: November 27

c 2011 International Press Vol. 18, No. 1, pp , March DENNIS TREDE

The Analysis Cosparse Model for Signals and Images

Sparse Solutions of Systems of Equations and Sparse Modelling of Signals and Images

SOS Boosting of Image Denoising Algorithms

Nonlinear Optimization for Optimal Control

Sparsity and Morphological Diversity in Source Separation. Jérôme Bobin IRFU/SEDI-Service d Astrophysique CEA Saclay - France

Five Lectures on Sparse and Redundant Representations Modelling of Images. Michael Elad

Convex Optimization and l 1 -minimization

Morphological Diversity and Source Separation

Sparse Approximation and Variable Selection

Super-Resolution. Shai Avidan Tel-Aviv University

Double Sparsity: Learning Sparse Dictionaries for Sparse Signal Approximation

Compressed sensing. Or: the equation Ax = b, revisited. Terence Tao. Mahler Lecture Series. University of California, Los Angeles

Proximal Newton Method. Ryan Tibshirani Convex Optimization /36-725

Inverse problems and sparse models (6/6) Rémi Gribonval INRIA Rennes - Bretagne Atlantique, France.

An Homotopy Algorithm for the Lasso with Online Observations

Rapid, Robust, and Reliable Blind Deconvolution via Nonconvex Optimization

Linear Inverse Problems

Detecting Sparse Structures in Data in Sub-Linear Time: A group testing approach

Generalized greedy algorithms.

Due Giorni di Algebra Lineare Numerica (2GALN) Febbraio 2016, Como. Iterative regularization in variable exponent Lebesgue spaces

AM 205: lecture 19. Last time: Conditions for optimality, Newton s method for optimization Today: survey of optimization methods

TRACKING SOLUTIONS OF TIME VARYING LINEAR INVERSE PROBLEMS

EUSIPCO

Machine Learning for Signal Processing Sparse and Overcomplete Representations. Bhiksha Raj (slides from Sourish Chaudhuri) Oct 22, 2013

Efficient Variational Inference in Large-Scale Bayesian Compressed Sensing

Robust Sparse Recovery via Non-Convex Optimization

Overcomplete Dictionaries for. Sparse Representation of Signals. Michal Aharon

Introduction to Logistic Regression and Support Vector Machine

Homotopy methods based on l 0 norm for the compressed sensing problem

Sensing systems limited by constraints: physical size, time, cost, energy

Inverse problem and optimization

Scale Mixture Modeling of Priors for Sparse Signal Recovery

Gradient Descent with Sparsification: An iterative algorithm for sparse recovery with restricted isometry property

An Introduction to Sparse Approximation

Tutorial: Sparse Signal Processing Part 1: Sparse Signal Representation. Pier Luigi Dragotti Imperial College London

Optimization and Root Finding. Kurt Hornik

SPARSE signal representations have gained popularity in recent

Compressive Sensing and Beyond

Reconstruction of Block-Sparse Signals by Using an l 2/p -Regularized Least-Squares Algorithm

A memory gradient algorithm for l 2 -l 0 regularization with applications to image restoration

Compressed Sensing: a Subgradient Descent Method for Missing Data Problems

2 Regularized Image Reconstruction for Compressive Imaging and Beyond

Pathwise coordinate optimization

Handbook of Blind Source Separation, Independent Component Analysis and Applications. P. Comon and C. Jutten Eds

Numerical Linear Algebra Primer. Ryan Tibshirani Convex Optimization /36-725

Sparsity and Compressed Sensing

Enhanced Compressive Sensing and More

cross-language retrieval (by concatenate features of different language in X and find co-shared U). TOEFL/GRE synonym in the same way.

EE 381V: Large Scale Optimization Fall Lecture 24 April 11

Optimization. Benjamin Recht University of California, Berkeley Stephen Wright University of Wisconsin-Madison

Sparse Optimization: Algorithms and Applications. Formulating Sparse Optimization. Motivation. Stephen Wright. Caltech, 21 April 2007

Denoising and Compression Using Wavelets

CPSC 540: Machine Learning

MMSE Denoising of 2-D Signals Using Consistent Cycle Spinning Algorithm

Wavelet Based Image Restoration Using Cross-Band Operators

Estimation Error Bounds for Frame Denoising

A Quest for a Universal Model for Signals: From Sparsity to ConvNets

Multidisciplinary System Design Optimization (MSDO)

OWL to the rescue of LASSO

A discretized Newton flow for time varying linear inverse problems

Signal Sparsity Models: Theory and Applications

Multiple Change Point Detection by Sparse Parameter Estimation

Sparse linear models

Sparse analysis Lecture V: From Sparse Approximation to Sparse Signal Recovery

Sparse Optimization Lecture: Basic Sparse Optimization Models

Applied Machine Learning for Biomedical Engineering. Enrico Grisan

Preconditioning. Noisy, Ill-Conditioned Linear Systems

A Review of Sparsity-Based Methods for Analysing Radar Returns from Helicopter Rotor Blades

MLCC 2018 Variable Selection and Sparsity. Lorenzo Rosasco UNIGE-MIT-IIT

SIGNAL SEPARATION USING RE-WEIGHTED AND ADAPTIVE MORPHOLOGICAL COMPONENT ANALYSIS

A Multilevel Proximal Algorithm for Large Scale Composite Convex Optimization

Transcription:

Sparse & Redundant Representations by Michael Elad * The Computer Science Department The Technion Israel Institute of technology Haifa 3000, Israel 6-30 August 007 San Diego Convention Center San Diego, CA USA Wavelet XII (670) * Joint work with: Boaz Joseph Michael Matalon Shtok Zibulevsky

Today s Talk is About f the Minimization of the Function ( ) = D x + λρ( ) by Today Why This we Topic? will discuss: Why Key this for minimization turning theory task to is applications important? in Sparse-land, Which Shrinkage applications is intimately could benefit coupled from with this wavelet, minimization? How The can applications it be minimized we target effectively? fundamental signal processing ones. What This iterated-shrinkage is a hot topic in harmonic methods are analysis. out there? and /4

Agenda. Motivating the Minimization of f() Describing various applications that need this minimization. Some Motivating Facts General purpose optimization tools, and the unitary case ˆ = ArgMin 3. We describe five versions of those in detail D x + λρ( ) 4. Some Results Image deblurring results Why? 5. Conclusions 3/4

Lets Start with Image Denoising y : Given measurements x : Unknown to be recovered Noise? { } v ~ N 0, σ I Remove Additive Many of the existing image denoising algorithms are related to the minimization of an energy function of the form f ( x) = x y Pr( x) Likelihood: Relation to measurements + y = x + v, Prior or regularization We will use a Sparse & Redundant Representation prior. 4/4

Our MAP Energy Function We assume that x is created by M: where is a sparse & redundant representation and D is a known dictionary. This leads to: ˆ = ArgMin Dx y + λρ M D ( ) xˆ = Dˆ x D=x= = This MAP denoising algorithm is known as basis Pursuit Denoising [Chen, Donoho, Saunders 995]. ˆ = ArgMin D x + λρ( ) The term ρ() measures the sparsity of the solution : This is Our Problem!!! L 0 -norm ( 0 ) leads to non-smooth & non-convex problem. p p The L p norm ( ) with 0<p is often found to be equivalent. Many other ADDITIVE sparsity measures are possible. 5/4

General (linear) Inverse Problems Assume that x is known to emerge from M, as before. Suppose we observe y = Hx + v, a blurred and noisy version of x. How could we recover x? A MAP estimator leads to: ˆ = argmin HD y + λρ( ) ˆ = M D ArgMin x R N Hx xˆ D x = Dˆ Noise + λρ ( ) y R This is Our Problem!!! M? xˆ 6/4

Inverse Problems of Interest De-Noising De-Blurring In-Painting De-Mosaicing ˆ = ArgMin D x + λρ( ) This is Our Problem!!! Tomography Image Scale-Up & super-resolution And more 7/4

Signal Separation D M v M D β x x ˆ = ArgMin z Given a mixture z=x +x +v two sources, M and M, and white Gaussian D xnoise + λρ v, we ( ) desire to separate it to its ingredients. This is Our Problem!!! [ ] v Written differently: z = D D + β Thus, solving this problem using MAP leads to the Morphological Component Analysis (MCA) [Starck, Elad, Donoho, 005]: ˆ β ˆ = argmin [ D, D ] z β + λρ β 8/4

Compressed-Sensing [Candes et.al. 006], [Donoho, 006] In compressed-sensing we compress the signal x by exploiting its origin. This is done by p<<n random projections. The core idea: y Px = ˆPD = ArgMin (P size: p n) D holds x ( ) all + λρ the information about the original signal x, even though p<<n. Reconstruction? Use MAP again and solve D ˆ = argmin This is Our Problem!!! PD y + λρ ( ) xˆ = Dˆ M Px y? xˆ x Noise 9/4

Brief Summary # The minimization of the function f ( ) = D x + λρ( ) is a worthy task, serving many & various applications. So, How This Should be Done? 0/4

Agenda. Motivating the Minimization of f() Describing various applications that need this minimization. Some Motivating facts General purpose optimization tools, and the unitary case 3. We describe five versions of those in detail 4. Some Results Image deblurring results 5. Conclusions Apply Wavelet Transform LUT Apply Inv. Wavelet Transform /4

Is there a Problem? f ( ) = D x + λρ( ) The first thought: With all the existing knowledge in optimization, we could find a solution. Methods to consider: (Normalized) Steepest Descent compute the gradient and follow it s path. Conjugate Gradient use the gradient and the previous update direction, combined by a preset formula. Pre-Conditioned SD weight the gradient by the Hessian s diagonal inverse. Truncated Newton Use the gradient and Hessian to define a linear system, and solve it approximately by a set of CG steps. Interior-Point Algorithms Separate to positive and negative entries, and use both the primal and the dual problems + barrier for forcing positivity. /4

General-Purpose Software? So, simply download one of many general-purpose packages: 0 L-Magic (interior-point solver), 5 Sparselab (interior-point solver), MOSEK (various tools), 5 0 5 0 Matlab Optimization 30 Toolbox (various tools), 5 0 5 0 5 30 A Problem: General purpose software-packages (algorithms) are typically performing poorly on our task. Possible reasons: The fact that the solution is expected to be sparse (or nearly so) in our problem is not exploited in such algorithms. H f = D D x The Hessian of f() tends to be highly H ill-conditioned near the (sparse) solution. f = D D + λρ' ' ( ) ( ) + λρ' ( ) ( ) ( ) So, are we stuck? Is this problem really that complicated? 00 50 00 50 Relatively small entries for the non-zero entries in the solution 3/4

Consider the Unitary Case (DD H =I) H D Define x = β We got a separable set of m identical D optimization problems f f f f ( ) = D x + λρ( ) H ( ) = D DD x + λρ( ) ( ) ( ) = D β + λρ( ) ( ) = β + λρ( ) = m j= = ( ) β + λρ( ) j j j Use H DD = I L is unitarily invariant 4/4

The D Task We need to solve the following D problem: opt = ArgMin = opt S ρ, λ ( β) ( ) β + λρ( ) LUT opt Such a Look-Up-Table (LUT) opt =S ρ,λ (β) can be built for ANY sparsity measure function ρ(), including non-convex ones and non-smooth ones (e.g., L 0 norm), giving in all cases the GLOBAL minimizer of g(). β 5/4

The Unitary Case: A Summary Minimizing f ( ) = D x + λρ( ) = β + λρ( ) is done by: H D x = β x Multiply by D H β LUT Sρ,λ ( β) ˆ Multiply by D xˆ DONE! The obtained solution is the GLOBAL minimizer of f(), even if f() is non-convex. 6/4

Brief Summary # f The minimization of ( ) = D x + λρ( ) Leads to two very Contradicting Observations:. The problem is quite hard classic optimization find it hard.. The problem is trivial for the case of unitary D. How Can We Enjoy This Simplicity in the General Case? 7/4

Agenda. Motivating the Minimization of f() Describing various applications that need this minimization. Some Motivating Facts General purpose optimization tools, and the unitary case 3. We describe five versions of those in detail 4. Some Results Image deblurring results 5. Conclusions 8/4

? We will present THE PRINCIPLES of several leading methods: Bound-Optimization and EM [Figueiredo & Nowak, `03], Surrogate-Separable-Function (SSF) [Daubechies, Defrise, & De-Mol, `04], Parallel-Coordinate-Descent (PCD) algorithm [Elad `05], [Matalon, et.al. `06], IRLS-based algorithm [Adeyemi & Davies, `06], and Stepwise-Ortho-Matching Pursuit (StOMP) [Donoho et.al. `07]. Common to all is a set of operations in every iteration that includes: (i) Multiplication by D, (ii) Multiplication by D H, and (iii) A Scalar shrinkage on the solution S ρ,λ (). Some of these algorithms pose a direct generalization of the unitary case, their st iteration is the solver we have seen. 9/4

. The Proximal-Point Method Aim: minimize f() Suppose it is found to be too hard. Define a surrogate-function g(, 0 )=f()+dist(- 0 ), using a general (uni-modal, non-negative) distance function. Then, the following algorithm necessarily converges to a local minima of f() [Rokafellar, `76]: 0 Minimize Minimize k k+ Minimize g(, 0 ) g(, ) g(, k ) Comments: (i) Is the minimization of g(, 0 ) easier? It better be! (ii) Looks like it will slow-down convergence. Really? 0/4

The Proposed Surrogate-Functions Our original function is: f ( ) = D x + λρ( ) c The distance to use: dist (, 0 ) = 0 D D Proposed by [Daubechies, Defrise, & De-Mol `04]. Require c > r( D H D ). 0 The beauty in this choice: the term D vanishes c H g(, ) ( ) H 0 = λρ + β where β = D 0 0 * 0 Minimization mof λ g(, β ρ( j j = c 0 ) is done in a closed + form by shrinkage, done on the vector j j= c β k, and this j generates c the solution k+ of the next iteration. ( x D0 ) + c0 It is a separable sum of m D problems. Thus, we have a closed form solution by THE SAME SHRINKAGE!! /4

The Resulting SSF Algorithm While the Unitary case solution is given by x Multiply by D H β LUT S ρ,λ ( β) ˆ Multiply by D xˆ ˆ = x S ( H ) D x ; xˆ = ˆ ρ, λ D the general case, by SSF requires : + - + Multiply by D H /c + β k S λ ρ, c k ( β) = S D c H LUT k + ( x D ) + + λ ρ, k k c Multiply by D xˆ /4

. Bound-Optimization Technique Aim: minimize f() Suppose it is found to be too hard. Define a function Q(, 0 ) that satisfies the following conditions: Q( 0, 0 )=f( 0 ), Q(, 0 ) f() for all, and Q(, ) ( ) 0 f Q(, 0 )= f() at 0. Then, the following algorithm necessarily converges to a local minima of f() [Hunter & Lange, (Review)`04]: 0 Well, regarding this method Minimize Minimize The above is closely related to the EM algorithm [Neal & Hinton, `98]. Q(, 0 ) Q(, ) k k+ Minimize Q(, k ) Figueiredo & Nowak s method (`03): use the BO idea to minimize f(). 0 They use the VERY SAME Surrogate functions we saw before. 3/4

3. Start With Coordinate Descent f ( ) = D x + λρ( ) We aim to minimize. First, consider the Coordinate Descent (CD) algorithm. This is a D minimization problem: ( ) f j = opt j = ArgMin D It has a closed for solution, using a simple SHRINKAGE d as before, applied on j d the j scalar <e j,d j >. d - j e j opt j e j = S ( ) - + λρ j + λρ ρ, λ ( ) d j j x ( ) d e H j j ( ) + λρ j LUT opt in 4/4

Parallel Coordinate Descent (PCD) {v Current solution for minimization of f() m j } j= : Descent directions obtained by the previous CD algorithm We will take the sum of these m descent directions for the update step. Line search is mandatory. k v m v v v j m-dimensional space m v j j= This leads to 5/4

The PCD Algorithm [Elad, `05] [Matalon, Elad, & Zibulevsky, `06] [ ( ( ) ) ] H S QD x + k+ = k + µ ρ, Q λ D ( H ) D D Where Q = diag and µ represents a line search (LS). k k k x + - + Multiply by QD H + β k Sρ, Qλ ( β) LUT k + Line Search Multiply by D xˆ Note: Q can be computed quite easily off-line. Its storage is just like storing the vector k. 6/4

Algorithms Speed-Up k 3 k k+ = k +? v Surprising as it may sound, these very 0 effective acceleration k methods k can be T v = S λ D x D c Use line-search vas is: k + with = v v: ρ, Option 3 implemented Sequential with Subspace no additional OPtimization cost c (SESOP): (i.e., multiplications k + = k +µ ( v k ) by D or D T ) µ M = + k+ k k M+ k M k k k k v k L M µ 0 v For example (SSF): ( ) { + } [Zibulevsky & Narkis, 05] [Elad, Matalon, & Zibulevsky, `07] k k 7/4

Brief Summary #3 For an effective minimization of the function f ( ) = D x + λρ( ) we saw several iterated-shrinkage algorithms, built using. Proximal Point Method. Bound Optimization 3. Parallel Coordinate Descent 4. Iterative Reweighed LS 5. Fixed Point Iteration 6. Greedy Algorithms How Are They Performing? 8/4

Agenda. Motivating the Minimization of f() Describing various applications that need this minimization. Some Motivating Facts General purpose optimization tools, and the unitary case 3. We describe five versions of those in detail 4. Some Results Image deblurring results 5. Conclusions 9/4

A Deblurring Experiment x Hx 5 5 kernel ( n + m + ) 7 n,m 7 v y White Gaussian Noise σ = y ˆ = argmin 0.07 0.06 0.05 0.04 0.03 HD y + λρ ( ) xˆ 0.0 0.0 xˆ = Dˆ 30/4

Penalty Function: More Details The blur operator f A given blurred and noisy image of size 56 56 λ = 0.075 ( ) = HD y + λρ( ) Note: This experiment is similar (but not eqiuvalent) to one of tests done in [Figueiredo & Nowak `05], that leads to D un-decimated Haar wavelet state-of-the-art results. transform, 3 resolution layers, 7: redundancy ρ Approximation of L with slight (S=0.0) smoothing ( ) = S log + S 3/4

So, The Results: The Function Value f()-f min 0 9 SSF 0 8 SSF-LS SSF-SESOP-5 0 7 0 6 0 5 0 4 0 3 0 0 5 0 5 0 5 30 35 40 45 50 Iterations/Computations 3/4

So, The Results: The Function Value f()-f min 0 9 SSF 0 8 SSF-LS SSF-SESOP-5 0 7 PCD-LS PCD-SESOP-5 0 6 Comment: Both SSF and PCD (and their accelerated versions) are provably converging to the minima of f(). 0 5 0 4 0 3 0 0 5 0 5 0 5 30 35 40 45 50 Iterations/Computations 33/4

So, The Results: The Function Value 0 9 f()-f min 0 8 StOMP PDCO 0 7 0 6 0 5 0 4 0 3 0 0 50 00 50 00 50 Iterations/Computations 34/4

So, The Results: ISNR 0 ISNR [db] 5 0-5 6.4dB SSF SSF-LS SSF-SESOP-5-0 -5 ISNR = 0 log 0-0 0 5 0 5 0 5 30 y D k x 0 x 0 Iterations/Computations 35 40 45 50 35/4

So, The Results: ISNR 0 ISNR [db] 5 0-5 -0 7.03dB SSF SSF-LS SSF-SESOP-5 PCD-LS PCD-SESOP-5-5 -0 0 5 0 5 0 5 30 Iterations/Computations 35 40 45 50 36/4

So, The Results: ISNR 0 5 0-5 -0-5 ISNR [db] -0 0 50 00 50 00 50 Iterations/Computations StOMP PDCO Comments: StOMP is inferior in speed and final quality (ISNR=5.9dB) due to to over-estimated support. PDCO is very slow due to the numerous inner Least-Squares iterations done by CG. It is not competitive with the Iterated-Shrinkage methods. 37/4

Visual Results PCD-SESOP-5 Results: original (left), Measured (middle), and Restored (right): Iteration: 9 03 45 67 8 ISNR=0.069583 ISNR=.4694 ISNR=-6.778 ISNR=4.84 ISNR=4.976 ISNR=5.5875 ISNR=6.88 ISNR=6.6479 ISNR=6.6789 ISNR=7.03 ISNR=6.946 db db db 38/4

Agenda. Motivating the Minimization of f() Describing various applications that need this minimization. Some Motivating Facts General purpose optimization tools, and the unitary case 3. We describe five versions of those in detail 4. Some Results Image deblurring results 5. Conclusions 39/4

Conclusions The Bottom Line If your work leads you to the need to minimize the problem: Then: f ( ) = D x + λρ( ) We recommend you use an Iterated-Shrinkage algorithm. SSF and PCD are Preferred: both are provably converging to the (local) minima of f(), and their performance is very good, getting a reasonable result in few iterations. Use SESOP Acceleration it is very effective, and with hardly any cost. There is Room for more work on various aspects of these algorithms see the accompanying paper. 40/4

Thank You for Your Time & Attention This field of research is very hot More information, including these slides and the accompanying paper, can be found on my web-page http://www.cs.technion.ac.il/~elad THE END!! 4/4

4/4

3. The IRLS-Based Algorithm Use the following principles [Edeyemi & Davies `06]: () Iterative Reweighed Least-Squares (IRLS) & () Fixed-Point Iteration f ( ) = D x + λρ( ) Solve = min D H somehow: D ρ ( ) = H W( ) H ( x D+ λ ) + λ W( ( ) = 0 k+ k ( ) ρ 0 ρ This is the IRLS algorithm, used in FOCUSS [Gorodinsky & Rao `97]. ( m) ρ ( ) j j 0 m 43/4

The IRLS-Based Algorithm Use the following principles [Edeyemi & Davies `06]: () Fixed-Point Iteration k D Task: solve the system Φ H Solvesomehow: λidea: assign indices D ( ) H + = W k + I D x c c The actual algorithm: x H ( x D) + λw( ) + c c = 0 ( The DFixed-Point-Iteration ) + λw( ) c method: + c = 0 H D x k k k+ k k+ ( x D) + λw( ) = 0 k ( x ) x = 0 Φ D( x ) + x k + = 0 k k ( ) ( ) = Φ ( x ) k+ k Notes: () For convergence, we should require c>r(d H D)/. () This algorithm cannot guarantee local-minimum. Diagonal Entries λ ρ c j ( j ) + j 44/4

4. Stagewise-OMP [Donoho, Drori, Starck, & Tsaig, `07] x xˆ + - + rk S β Multiply T s k + by D H Multiply by D β k ( ) LUT ~ k Minimize D-x ^ on S StOMP is originally designed to solve S ( D H( x D ) ) and especially so for random dictionary k D (Compressed-Sensing). Nevertheless, it is used elsewhere (restoration) [Fadili & Starck, `06]. If S grows by one item at each iteration, this becomes OMP. LS uses K 0 CG steps, each equivalent to iterated-shrinkage step. min D x + λ 0 45/4

So, The Results: The Function Value f()-f min 0 8 0 7 0 9 SSF SSF-LS SSF-SESOP-5 IRLS IRLS-LS 0 6 0 5 0 4 0 3 0 0 5 0 5 0 5 30 35 40 45 50 Iterations 46/4

So, The Results: ISNR SKIP 0 ISNR [db] 5 0-5 -0 SSF SSF-LS SSF-SESOP-5 IRLS IRLS-LS -5-0 0 5 0 5 0 5 30 Iteration 35 40 45 50 47/4