Sparse & Redundant Signal Representation, and its Role in Image Processing

Similar documents
Noise Removal? The Evolution Of Pr(x) Denoising By Energy Minimization. ( x) An Introduction to Sparse Representation and the K-SVD Algorithm

Overcomplete Dictionaries for. Sparse Representation of Signals. Michal Aharon

Sparse & Redundant Representations by Iterated-Shrinkage Algorithms

Machine Learning for Signal Processing Sparse and Overcomplete Representations. Bhiksha Raj (slides from Sourish Chaudhuri) Oct 22, 2013

ITERATED SRINKAGE ALGORITHM FOR BASIS PURSUIT MINIMIZATION

SPARSE signal representations have gained popularity in recent

Machine Learning for Signal Processing Sparse and Overcomplete Representations

Sparsity in Underdetermined Systems

Morphological Diversity and Source Separation

Inverse problems and sparse models (1/2) Rémi Gribonval INRIA Rennes - Bretagne Atlantique, France

EUSIPCO

Sparse molecular image representation

Sensing systems limited by constraints: physical size, time, cost, energy

Signal Recovery, Uncertainty Relations, and Minkowski Dimension

2.3. Clustering or vector quantization 57

A Generalized Uncertainty Principle and Sparse Representation in Pairs of Bases

Sparse Solutions of Systems of Equations and Sparse Modelling of Signals and Images

Sparse linear models

Five Lectures on Sparse and Redundant Representations Modelling of Images. Michael Elad

Equivalence Probability and Sparsity of Two Sparse Solutions in Sparse Representation

The Iteration-Tuned Dictionary for Sparse Representations

c 2011 International Press Vol. 18, No. 1, pp , March DENNIS TREDE

Tutorial: Sparse Signal Processing Part 1: Sparse Signal Representation. Pier Luigi Dragotti Imperial College London

New Coherence and RIP Analysis for Weak. Orthogonal Matching Pursuit

Recent developments on sparse representation

SIGNAL SEPARATION USING RE-WEIGHTED AND ADAPTIVE MORPHOLOGICAL COMPONENT ANALYSIS

SOS Boosting of Image Denoising Algorithms

Image Denoising with Shrinkage and Redundant Representations

A Quest for a Universal Model for Signals: From Sparsity to ConvNets

Sparse Solutions of Linear Systems of Equations and Sparse Modeling of Signals and Images: Final Presentation

Image Noise: Detection, Measurement and Removal Techniques. Zhifei Zhang

An Introduction to Sparse Approximation

A tutorial on sparse modeling. Outline:

EE 381V: Large Scale Optimization Fall Lecture 24 April 11

LEARNING OVERCOMPLETE SPARSIFYING TRANSFORMS FOR SIGNAL PROCESSING. Saiprasad Ravishankar and Yoram Bresler

Sparsity and Morphological Diversity in Source Separation. Jérôme Bobin IRFU/SEDI-Service d Astrophysique CEA Saclay - France

Bayesian Paradigm. Maximum A Posteriori Estimation

Structured matrix factorizations. Example: Eigenfaces

The Analysis Cosparse Model for Signals and Images

Double Sparsity: Learning Sparse Dictionaries for Sparse Signal Approximation

Sparse Estimation and Dictionary Learning

Signal Sparsity Models: Theory and Applications

Inverse problems and sparse models (6/6) Rémi Gribonval INRIA Rennes - Bretagne Atlantique, France.

Introduction to Compressed Sensing

SHIFT-INVARIANT DICTIONARY LEARNING FOR SPARSE REPRESENTATIONS: EXTENDING K-SVD

A simple test to check the optimality of sparse signal approximations

Compressed Sensing in Astronomy

Topographic Dictionary Learning with Structured Sparsity

Greedy Dictionary Selection for Sparse Representation

Image representation with multi-scale gradients

Simultaneous Sparsity

Large-Scale L1-Related Minimization in Compressive Sensing and Beyond

Rémi Gribonval - PANAMA Project-team INRIA Rennes - Bretagne Atlantique, France

Detecting Sparse Structures in Data in Sub-Linear Time: A group testing approach

Randomness-in-Structured Ensembles for Compressed Sensing of Images

Applied Machine Learning for Biomedical Engineering. Enrico Grisan

Sparse Solutions of Linear Systems of Equations and Sparse Modeling of Signals and Images!

IEEE SIGNAL PROCESSING LETTERS, VOL. 22, NO. 9, SEPTEMBER

IN a series of recent papers [1] [4], the morphological component

Wavelet Footprints: Theory, Algorithms, and Applications

Blind Compressed Sensing

Generalized Orthogonal Matching Pursuit- A Review and Some

Compressed Sensing: Extending CLEAN and NNLS

5742 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 55, NO. 12, DECEMBER /$ IEEE

Sparsifying Transform Learning for Compressed Sensing MRI

Sparse Solutions of an Undetermined Linear System

Sparse Modeling. in Image Processing and Deep Learning. Michael Elad

A NEW FRAMEWORK FOR DESIGNING INCOHERENT SPARSIFYING DICTIONARIES

Image compression using an edge adapted redundant dictionary and wavelets

Compressed sensing. Or: the equation Ax = b, revisited. Terence Tao. Mahler Lecture Series. University of California, Los Angeles

Inverse problems, Dictionary based Signal Models and Compressed Sensing

From Sparse Solutions of Systems of Equations to Sparse Modeling of Signals and Images

Compressed Sensing and Neural Networks

Sparse linear models and denoising

Lecture: Introduction to Compressed Sensing Sparse Recovery Guarantees

Sparse analysis Lecture V: From Sparse Approximation to Sparse Signal Recovery

Analysis of Denoising by Sparse Approximation with Random Frame Asymptotics

Morphological Diversity and Sparsity for Multichannel Data Restoration

Deterministic sampling masks and compressed sensing: Compensating for partial image loss at the pixel level

On the Projection Matrices Influence in the Classification of Compressed Sensed ECG Signals

Introduction to Sparsity. Xudong Cao, Jake Dreamtree & Jerry 04/05/2012

A new method on deterministic construction of the measurement matrix in compressed sensing

A simple test to check the optimality of sparse signal approximations

Bayesian Methods for Sparse Signal Recovery

Efficient Data-Driven Learning of Sparse Signal Models and Its Applications

MATCHING PURSUIT WITH STOCHASTIC SELECTION

Scalable color image coding with Matching Pursuit

Near Optimal Signal Recovery from Random Projections

Compressed Sensing. 1 Introduction. 2 Design of Measurement Matrices

STRUCTURE-AWARE DICTIONARY LEARNING WITH HARMONIC ATOMS

Dictionary learning for speech based on a doubly sparse greedy adaptive dictionary algorithm

Dictionary Learning for L1-Exact Sparse Coding

of Orthogonal Matching Pursuit

Directionlets. Anisotropic Multi-directional Representation of Images with Separable Filtering. Vladan Velisavljević Deutsche Telekom, Laboratories

Image Processing by the Curvelet Transform

Uncertainty principles and sparse approximation

17 Random Projections and Orthogonal Matching Pursuit

Uniform Uncertainty Principle and Signal Recovery via Regularized Orthogonal Matching Pursuit

Uniqueness Conditions for A Class of l 0 -Minimization Problems

Mathematical introduction to Compressed Sensing

Transcription:

Sparse & Redundant Signal Representation, and its Role in Michael Elad The CS Department The Technion Israel Institute of technology Haifa 3000, Israel Wave 006 Wavelet and Applications Ecole Polytechnique Federale de Lausanne (EPFL) July 10 th 14 th, 006 D x

Today s Talk is About Sparsity and Redundancy We will try to show today that Sparsity & Redundancy can be used to design new/renewed & powerful signal/image processing tools (e.g., transforms, priors, models, compression, ), The obtained machinery works very well we will show these ideas deployed to image processing applications. D x

Agenda 1. A Visit to Sparseland Motivating Sparsity & Overcompleteness. Problem 1: Transforms & Regularizations How & why should this work? 3. Problem : What About D? The quest for the origin of signals Welcome to Sparseland 4. Problem 3: Applications Image filling in, denoising, separation, compression, D x 3

Generating Signals in Sparseland N K A fixed Dictionary D M A sparse & random vector x N Every column in D (dictionary) is a prototype signal (Atom). The vector is generated randomly with few non-zeros in random locations and random values. D x 4

Sparseland Signals Are Special Multiply by D x M D Simple: Every generated signal is built as a linear combination of few atoms from our dictionary D Rich: A general model: the obtained signals are a special type mixtureof-gaussians (or Laplacians). D x 5

Transforms in Sparseland? Assume that x is known to emerge from M.. We desire simplicity, independence, and expressiveness. How about Given x, find the that generated it in M? D M x R N T K R D D x 6

So, In Order to Transform We need to solve an under-determined linear system of equations: D x Among all (infinitely many) possible solutions we want the sparsest!! We will measure sparsity using the L 0 norm: Known 0 D x 7

Measure of Sparsity? x p p k j 1 x p p ( x) x f 1 1 As p 0 we get a count of the non-zeros in the vector 1 p p p 0 p p p < 1 0-1 +1 x D x 8

Signal s Transform in Sparseland A sparse & random vector Multiply by D x D Min s.t. 0 x D ˆ 4 Major Questions Is ˆ? Under which conditions? Are there practical ways to get ˆ? How effective are those ways? How would we get D? D x 9

Inverse Problems in Sparseland? Assume that x is known to emerge from M. Suppose we observe y Hx + v, a blurred and noisy version of x with v ε. How will we recover x? How about find the that generated the x again? y R M M Hx Q D x R N ˆ R K Noise D x 10

Inverse Problems in Sparseland? A sparse & random vector x Multiply by D D blur by H v y Hx + v Min 0 y HD s.t. ε ˆ 4 Major Questions (again!) Is ˆ? How far can it go? Are there practical ways to get ˆ? How effective are those ways? How would we get D? D x 11

Back Home Any Lessons? Several recent trends worth looking at: JPEG to JPEG000 - From (L -norm) KLT to wavelet and nonlinear approximation Sparsity. From Wiener to robust restoration From L -norm (Fourier) to L 1. (e.g., TV, Beltrami, wavelet shrinkage ) Sparsity. Sparseland From unitary to richer representations Frames, shift-invariance, steerable wavelet, contourlet, curvelet Overcompleteness. Approximation theory Non-linear approximation Sparsity & Overcompleteness. ICA and related models is HERE Independence and Sparsity. D x 1

To Summarize so far M The Sparseland model for signals is interesting since it is rich and yet every signal has a simple description Who cares? We do! this model is relevant to us just as well. So, what are the implications? (a) Practical solvers? (b) How will we get D? (c) Applications? Sparsity? looks complicated We need to solve (or approximate the solution of) a linear system with more unknowns than equations, while finding the sparsest possible solution D x 13

Agenda 1. A Visit to Sparseland Motivating Sparsity & Overcompleteness T. Problem 1: Transforms & Regularizations How & why should this work? 3. Problem : What About D? The quest for the origin of signals Q 4. Problem 3: Applications Image filling in, denoising, separation, compression, D x 14

Lets Start with the Transform Our dream for Now: Find the sparsest solution of D x Known Put formally, P 0 : Min 0 s.t. x D known D x 15

Question 1 Uniqueness? M x D Multiply by D Min s.t. 0 x D ˆ Suppose we can solve this exactly Why should we necessarily get ˆ? It might happen that eventually ˆ <. 0 0 D x 16

Matrix Spark Definition: Given a matrix D, σspark{d} is the smallest and and number of columns that are linearly dependent. Donoho & E. ( 0) * Example: 1 0 0 0 1 0 1 0 0 1 0 0 1 0 0 0 0 0 1 0 Rank 4 Spark 3 * In tensor decomposition, Kruskal defined something similar already in 1989. D x 17

Uniqueness Rule Suppose this problem has been solved somehow P 0 : Min 0 s.t. x D Uniqueness Donoho & E. ( 0) If we found a representation that satisfy σ > 0 Then necessarily it is unique (the sparsest). M This result implies that if generates signals using sparse enough, the solution of the above will find it exactly. D x 18

Question Practical P 0 Solver? M x D Multiply by D Min s.t. 0 x D ˆ Are there reasonable ways to find ˆ? D x 19

Matching Pursuit (MP) Mallat & Zhang (1993) The MP is a greedy algorithm that finds one atom at a time. Step 1: find the one atom that best matches the signal. Next steps: given the previously found atoms, find the next one to best fit The Orthogonal MP (OMP) is an improved version that re-evaluates the coefficients after each round. D x 0

Basis Pursuit (BP) Min 0 Instead of solving s.t. x D Chen, Donoho, & Saunders (1995) Min 1 Solve Instead s.t. x D The newly defined problem is convex (linear programming). Very efficient solvers can be deployed: Interior point methods [Chen, Donoho, & Saunders (`95)], Sequential shrinkage for union of ortho-bases [Bruce et.al. (`98)], Iterated shrinkage [Figuerido & Nowak (`03), Daubechies, Defrise, & Demole ( 04), E. (`05), E., Matalon, & Zibulevsky (`06)]. D x 1

Question 3 Approx. Quality? M x D Multiply by D Min s.t. 0 x D ˆ How effective are the MP/BP in finding ˆ? D x

Evaluating the Spark Compute D D T Assume normalized columns D T D The Mutual Coherence M is the largest off-diagonal entry in absolute value. The Mutual Coherence is a property of the dictionary (just like the Spark ). In fact, the following relation can be shown: σ 1 + 1 M D x 3

BP and MP Equivalence Equivalence Donoho & E. ( 0) Gribonval & Nielsen ( 03) Tropp ( 03) Temlyakov ( 03) Given a signal x with a representation x D, ( ) Assuming that < 0.5 1 + 1 M, BP and MP are 0 Guaranteed to find the sparsest solution. MP and BP are different in general (hard to say which is better). The above result corresponds to the worst-case. Average performance results are available too, showing much better bounds [Donoho (`04), Candes et.al. (`04), Tanner et.al. (`05), Tropp et.al. (`06)]. D x 4

What About Inverse Problems? A sparse & random vector x Multiply by D D blur by H v y Hx + v Min 0 y HD s.t. ε ˆ We had similar questions regarding uniqueness, practical solvers, and their efficiency. It turns out that similar answers are applicable here due to several recent works [Donoho, E. and Temlyakov (`04), Tropp (`04), Fuchs (`04), Gribonval et. al. (`05)]. D x 5

To Summarize so far The Sparseland model for signals is relevant to us. We can design transforms and priors based on it Find the sparsest solution? Use pursuit Algorithms Why works so well? (a) How shall we find D? (b) Will this work for applications? Which? What next? A sequence of works during the past 3-4 years gives theoretic justifications for these tools behavior D x 6

Agenda 1. A Visit to Sparseland Motivating Sparsity & Overcompleteness. Problem 1: Transforms & Regularizations How & why should this work? 3. Problem : What About D? The quest for the origin of signals 4. Problem 3: Applications Image filling in, denoising, separation, compression, D x 7

Problem Setting Multiply by D M { } X P j j 1 0 L x D Given these P examples and a fixed size [N K] dictionary D: 1. Is D unique?. How would we find D? D x 8

Uniqueness? Comments: Uniqueness Aharon, E., & Bruckstein (`05) { x } P j j 1 If is rich enough* and if L < Spark then D is unique. M { D} Rich Enough : The signals from could be clustered to groups that share the same support. At least L+1 examples per each are needed. This result is proved constructively, but the number of examples needed to pull this off is huge we will show a far better method next. A parallel result that takes into account noise is yet to be constructed. K L D x 9

Practical Approach Objective X D A Min D, A P j 1 D j x Each example is a linear combination of atoms from D j s.t. j, L Each example has a sparse representation with no more than L atoms (n,k,l are assumed known, D has norm. columns) j 0 Field & Olshausen (96 ) Engan et. al. (99 ) Lewicki & Sejnowski (00 ) Cotter et. al. (03 ) Gribonval et. al. (04 ) D x 30

K Means For Clustering Clustering: An extreme sparse coding Initialize D D Sparse Coding Nearest Neighbor Dictionary Update Column-by-Column by Mean computation over the relevant examples X T D x 31

The K SVD Algorithm General Initialize D D Sparse Coding Use MP or BP Dictionary Update Column-by-Column by SVD computation X T Aharon, E., & Bruckstein (`04) D x 3

K SVD Sparse Coding Stage Min A P j 1 D j x j s.t. j, j 0 L D For the j th example we solve X T Min D x s.t. j 0 L Pursuit Problem!!! D x 33

K SVD Dictionary Update Stage D d k? { } P 1 G k : The examples in X j j and that use the column d k. The content of d k influences only the examples in G k. Let us fix all A apart from the k th column and seek both d k and the k th column to better fit the residual! D x 34

K SVD Dictionary Update Stage We should solve: d Min k, k T k d k E F Residual E D dk? k d k is obtained by SVD on the examples residual in G k. D x 35

K SVD: A Synthetic Experiment Create A 0 30 random dictionary with normalized columns Generate 000 signal examples with 3 atoms per each and add noise Train a dictionary using the KSVD and MOD and compare D D 1 0.9 0.8 Results Relative Atoms Found 0.7 0.6 0.5 0.4 0.3 0. 0.1 MOD performance K-SVD performance 0 0 5 10 15 0 5 30 35 40 45 50 Iteration D x 36

To Summarize so far The Sparseland model for signals is relevant to us. In order to use it effectively we need to know D How D can be found? Use the K-SVD algorithm Will it work well? (a) Deploy to applications (b) Generalize in various ways: multiscale, non-negative factorization, speed-up, (c) Performance guarantees? What next? We have established a uniqueness result and shown how to practically train D using the K-SVD D x 37

Agenda 1. A Visit to Sparseland Motivating Sparsity & Overcompleteness. Problem 1: Transforms & Regularizations How & why should this work? 3. Problem : What About D? The quest for the origin of signals 4. Problem 3: Applications Image filling in, denoising, separation, compression, D x 38

Application 1: Image Inpainting Assume: the signal x has been created by xd 0 with very sparse 0. Missing values in x imply missing rows in this linear system. By removing these rows, we get. Now solve Min 0 ~ ~ D x s.t. ~ x D ~ D 0 x If 0 was sparse enough, it will be the solution of the above problem! Thus, computing D 0 recovers x perfectly. D x 39

Application 1: The Practice Given a noisy image y, we can clean it using the Maximum A- posteriori Probability estimator by solving ˆ ArgMin p p + λ y D What if some of the pixels in that image are missing (filled with zeros)? Define a mask operator as the diagonal matrix W, and now solve instead p ˆ ArgMin + λ y WD p xˆ Dˆ xˆ Dˆ When handling general images, there is a need to concatenate two dictionaries to get an effective treatment of both texture and cartoon contents This leads to separation [E., Starck, & Donoho ( 05)]. D x 40

Inpainting Results Source Predetermined dictionary: Curvelet (cartoon) + Overlapped DCT (texture) Outcome D x 41

Inpainting Results Source predetermined dictionary: Curvelet (cartoon) + Overlapped DCT (texture) Outcome D x 4

Inpainting Results 0% 50% 80% D x 43

Application : Image Denoising Given a noisy image y, we have already mentioned the ability to clean it by solving ˆ ArgMin p p + λ y D When using the K-SVD, it cannot train a dictionary for large support images How do we go from local treatment of patches to a global prior? The solution: Force shift-invariant sparsity - on each patch of size N-by-N (e.g., N8) in the image, including overlaps. xˆ Dˆ D x 44

Application : Image Denoising ˆ ArgMin p p + λ y D xˆ Dˆ Our MAP penalty becomes xˆ ArgMin x,{ ij } ij 1 x y + μ ij R ij x D ij Extracts a patch in the ij location s.t. ij 0 0 L Our prior D x 45

Application : Image Denoising x,{ ij } ij, D 1 xˆ ArgMin x y + μ R ij ij x D xy and D known x and ij known ij s.t. ij 0 0 L D and ij known Compute ij per patch ij Min R s.t. ij x 0 0 D L using the matching pursuit Compute D to minimize Min R ij ijx D using SVD, updating one column at a time K-SVD Compute x by 1 T T x I + μ R ij R ij y + μ R ij Dij ij ij which is a simple averaging of shifted patches D x 46

Application : The Algorithm Sparse-code every patch of 8-by-8 pixels D Update the dictionary based on the above codes Compute the output image x Complexity of this algorithm: O(N L Iterations) per pixel. For N8, L1, and 10 iterations, we need 640 operations per pixel. D x 47

Denoising Results Source The results of this algorithm compete favorably with the state-of-the-art (e.g., GSM+steerable wavelets [Portilla, Strela, Wainwright, & Simoncelli ( 03)] - giving ~0.5-1dB better results) Result 30.89dB Noisy image σ 0 The obtained Initial dictionary after (overcomplete 10 iterations DCT) 64 56 D x 48

Application 3: Compression The problem: Compressing photo-id images. General purpose methods (JPEG, JPEG000) do not take into account the specific family. By adapting to the image-content (PCA/K-SVD), better results could be obtained. For these techniques to operate well, train dictionaries locally (per patch) using a training set of images is required. In PCA, only the (quantized) coefficients are stored, whereas the K-SVD requires storage of the indices as well. Geometric alignment of the image is very helpful and should be done. D x 49

Application 3: The Algorithm Divide the image into disjoint 15-by-15 patches. For each compute mean and dictionary Per each patch find the operating parameters (number of atoms L, quantization Q) Warp, remove the mean from each patch, sparse code using L atoms, and apply Q) D x Training set (500 images) On the training set Detect main features and warp the images to a common reference (0 parameters) On the test image 50

Compression Results 11.99 10.49 8.81 5.56 Results for 80 Bytes per each file 10.83 10.93 8.9 8.71 7.89 8.61 4.8 5.58 D x 51

Compression Results 15.81 13.89 10.66 6.60 Results for 550 Bytes per each file 14.67 15.30 1.41 1.57 9.44 10.7 5.49 6.36 D x 5

Compression Results? 18.6 1.30 7.61? Results for 400 Bytes per each file? 16.1 16.81 11.38 1.54 6.31 7.0 D x 53

Today We Have Discussed 1. A Visit to Sparseland Motivating Sparsity & Overcompleteness. Problem 1: Transforms & Regularizations How & why should this work? 3. Problem : What About D? The quest for the origin of signals 4. Problem 3: Applications Image filling in, denoising, separation, compression, D x 54

Summary Sparsity and Overcompleteness are important ideas that can be used in designing better tools in signal/image processing There are difficulties in using them! We are working on resolving those difficulties: Performance of pursuit alg. Speedup of those methods, Training the dictionary, Demonstrating applications, Future transforms and regularizations will be datadriven, non-linear, overcomplete, and promoting sparsity. The dream? D x 55