Sparse Approximation of Signals with Highly Coherent Dictionaries

Similar documents
Sparse analysis Lecture III: Dictionary geometry and greedy algorithms

An Introduction to Sparse Approximation

Introduction to Compressed Sensing

SGN Advanced Signal Processing Project bonus: Sparse model estimation

Recovery of Sparse Signals from Noisy Measurements Using an l p -Regularized Least-Squares Algorithm

MLCC 2018 Variable Selection and Sparsity. Lorenzo Rosasco UNIGE-MIT-IIT

Generalized Orthogonal Matching Pursuit- A Review and Some

Multiple Change Point Detection by Sparse Parameter Estimation

The Lanczos and conjugate gradient algorithms

Introduction to Sparsity. Xudong Cao, Jake Dreamtree & Jerry 04/05/2012

Inverse problems and sparse models (1/2) Rémi Gribonval INRIA Rennes - Bretagne Atlantique, France

Complementary Matching Pursuit Algorithms for Sparse Approximation

Key words. conjugate gradients, normwise backward error, incremental norm estimation.

New Coherence and RIP Analysis for Weak. Orthogonal Matching Pursuit

Simultaneous Sparsity

Lecture 9: Numerical Linear Algebra Primer (February 11st)

Least squares: the big idea

SPARSE signal representations have gained popularity in recent

Applied Machine Learning for Biomedical Engineering. Enrico Grisan

STA141C: Big Data & High Performance Statistical Computing

Compressive sensing in the analog world

STA141C: Big Data & High Performance Statistical Computing

MATCHING-PURSUIT DICTIONARY PRUNING FOR MPEG-4 VIDEO OBJECT CODING

The Iteration-Tuned Dictionary for Sparse Representations

Sparse representation classification and positive L1 minimization

MATCHING PURSUIT WITH STOCHASTIC SELECTION

Linear Methods for Regression. Lijun Zhang

5.6. PSEUDOINVERSES 101. A H w.

Sparse Legendre expansions via l 1 minimization

Rice University. Endogenous Sparse Recovery. Eva L. Dyer. A Thesis Submitted in Partial Fulfillment of the Requirements for the Degree

Analysis of Greedy Algorithms

Neural Networks Lecture 4: Radial Bases Function Networks

c 2011 International Press Vol. 18, No. 1, pp , March DENNIS TREDE

Greedy Signal Recovery and Uniform Uncertainty Principles

Reconstruction of Block-Sparse Signals by Using an l 2/p -Regularized Least-Squares Algorithm

Numerical Methods I Non-Square and Sparse Linear Systems

Review: Learning Bimodal Structures in Audio-Visual Data

Sparse linear models

Stability and Robustness of Weak Orthogonal Matching Pursuits

A NEW FRAMEWORK FOR DESIGNING INCOHERENT SPARSIFYING DICTIONARIES

NONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

Matrix Factorization and Analysis

Sparse Solutions of Linear Systems of Equations and Sparse Modeling of Signals and Images!

Abstract The following linear inverse problem is considered: given a full column rank m n data matrix A and a length m observation vector b, nd the be

Iterative Matching Pursuit and its Applications in Adaptive Time-Frequency Analysis

DS-GA 1002 Lecture notes 10 November 23, Linear models

Lecture 9: Krylov Subspace Methods. 2 Derivation of the Conjugate Gradient Algorithm

On Sparsity, Redundancy and Quality of Frame Representations

Low-rank Matrix Completion with Noisy Observations: a Quantitative Comparison

Designing Information Devices and Systems I Discussion 13B

Convex Optimization and l 1 -minimization

Sparse linear models and denoising

Sparse Solutions of Linear Systems of Equations and Sparse Modeling of Signals and Images: Final Presentation

Block Bidiagonal Decomposition and Least Squares Problems

Numerical Linear Algebra Primer. Ryan Tibshirani Convex Optimization /36-725

AM205: Assignment 2. i=1

Recovery Guarantees for Rank Aware Pursuits

Bindel, Fall 2016 Matrix Computations (CS 6210) Notes for

The Analysis Cosparse Model for Signals and Images

Subset Selection. Deterministic vs. Randomized. Ilse Ipsen. North Carolina State University. Joint work with: Stan Eisenstat, Yale

Chapter 3 Transformations

EUSIPCO

Three Generalizations of Compressed Sensing

2. Review of Linear Algebra

Structured matrix factorizations. Example: Eigenfaces

L-statistics based Modification of Reconstruction Algorithms for Compressive Sensing in the Presence of Impulse Noise

sparse and low-rank tensor recovery Cubic-Sketching

Subset Selection. Ilse Ipsen. North Carolina State University, USA

R-Linear Convergence of Limited Memory Steepest Descent

FEM and sparse linear system solving

Wavelet Footprints: Theory, Algorithms, and Applications

Low-Rank Factorization Models for Matrix Completion and Matrix Separation

Model-Based Compressive Sensing for Signal Ensembles. Marco F. Duarte Volkan Cevher Richard G. Baraniuk

Dominant feature extraction

Dictionary Learning for L1-Exact Sparse Coding

QALGO workshop, Riga. 1 / 26. Quantum algorithms for linear algebra.

Conjugate Gradient algorithm. Storage: fixed, independent of number of steps.

EE 381V: Large Scale Optimization Fall Lecture 24 April 11

COMPARATIVE ANALYSIS OF ORTHOGONAL MATCHING PURSUIT AND LEAST ANGLE REGRESSION

SVD, PCA & Preprocessing

Sensing systems limited by constraints: physical size, time, cost, energy

Sparse Solutions of Systems of Equations and Sparse Modelling of Signals and Images

Institute for Advanced Computer Studies. Department of Computer Science. Two Algorithms for the The Ecient Computation of

AM 205: lecture 8. Last time: Cholesky factorization, QR factorization Today: how to compute the QR factorization, the Singular Value Decomposition

GREEDY SIGNAL RECOVERY REVIEW

Designing Information Devices and Systems I Spring 2018 Lecture Notes Note 25

Sparse Approximation and Variable Selection

Recent developments on sparse representation

Bindel, Fall 2016 Matrix Computations (CS 6210) Notes for At a high level, there are two pieces to solving a least squares problem:

Estimation Error Bounds for Frame Denoising

Lecture 3: Review of Linear Algebra

QR-decomposition. The QR-decomposition of an n k matrix A, k n, is an n n unitary matrix Q and an n k upper triangular matrix R for which A = QR

Lecture 3: Review of Linear Algebra

arxiv: v1 [math.na] 26 Nov 2009

ISyE 691 Data mining and analytics

Bias-free Sparse Regression with Guaranteed Consistency

5742 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 55, NO. 12, DECEMBER /$ IEEE

CME 302: NUMERICAL LINEAR ALGEBRA FALL 2005/06 LECTURE 0

INDUSTRIAL MATHEMATICS INSTITUTE. B.S. Kashin and V.N. Temlyakov. IMI Preprint Series. Department of Mathematics University of South Carolina

Optimum Rate Communication by Fast Sparse Superposition Codes

Transcription:

Sparse Approximation of Signals with Highly Coherent Dictionaries Bishnu P. Lamichhane and Laura Rebollo-Neira b.p.lamichhane@aston.ac.uk, rebollol@aston.ac.uk Support from EPSRC (EP/D062632/1) is acknowledged http://www.epsrc.co.uk New Trends and Directions in Harmonic Analysis, Approximation Theory, and Image Analysis, Inzell Summer School, Germany, Sept. 17 21, 2007

Outline 0 Outline 1. Sparse Approximation of a Signal Introduction Optimized Orthogonal Matching Pursuit Adaptive Biorthogonalization Swapping Controlling the Condition Number 2. Oblique Matching Pursuit Oblique Projections Structured Noise Filtering 3. Conclusion

Sparse Approximation 1 Introduction The problem of non-linear approximation consists of representing a signal f as a linear superposition of minimal number of basis functions, called atoms selected from a redundant set called a dictionary, D = {α i } i J. We want to minimize x 0, so that i J α ix i = f or for given ɛ > 0, we want to obtain i J α ix i f ɛ. (NP-hard and highly non-linear problem) Matching pursuit methodologies yield a trade-off between the optimality and computational complexity. Working with a highly coherent dictionary the subset selected by matching pursuit methods can have a large condition number. We want to control the condition number in a step-wise manner.

2 Optimized Orthogonal Matching Pursuit Optimized Orthogonal Matching Pursuit Given a signal f H. We select a set of atoms {α li } k 1 i=1 spanning the space V k 1. The new index l k is the maximizer over all n J \J k 1 of γ n, f (OMP), or γ n, f γ n, γ n 0, (OOMP), γ n = α n P Vn α n, where J k 1 = {l 1,, l k 1 }. The orthogonal projection of the signal f onto V k 1 P Vk 1 f = k 1 i=1 α li β k 1 i, f = k 1 i=1 β k 1 i α li, f = } k 1 k 1 i=1 c k 1 i α li, where the set {α li } k 1 i=1 is biorthogonal to {βk 1 i i OOMP criterion is based on minimizing the norm of the residual at each step. R k 1 2 = R k 2 + γ n, f 2 i=1, and V k 1 = span{β k 1 γ n 2 } k 1 i=1.

3 Biorthogonalization Recursive Biorthogonalization We want to update and downdate the orthogonal projection when an atom is added to the set {α li } k 1 i=1, and an atom is deleted from the set {α l i } k i=1. Forward step: After selecting the kth atom we have {α li } k i=1. The biorthogonal functions {βi k}k i=1 can be constructed from {βk 1 i } k 1 i=1 using β k k = β k i = β k 1 i γ lk γ lk 1 2; γ l k = α lk P Vk 1 α lk, β k k α lk, β k 1 i, i = 1,, k. Backward step: The biorthogonal set {β k i }k i=1 spanning V k is given and one atom, say α lj, is to be removed from V k. Denote by V k\j the reduced subspace V k\j = span{α l1,..., α lj 1, α lj+1,..., α lk }. The biorthogonal set is to be modified according to the equation β k\j i = β k i βk j βk j, βk i β k j 2, i = 1,..., j 1, j + 1,..., k.

The orthogonal projector onto V k = V k 1 {α lk }, P Vk, is then given by k k P Vk = α li βi k, = βi k α li,. i=1 and the orthogonal projector onto V k\j by k P Vk\j = α li β k\j i, = i=1 i j i=1 k i=1 i j β k\j i α li,. 3 Biorthogonalization The coefficients are to be modified as when including the atom α lk c k k c k i = β k i, f = ck 1 i = βk k, f, c k k βk 1 i, α k, i = 1,..., k 1. and when deleting an atom α lj c k\j i = β k\j i, f = c k i βk i, βk j ck j β k j 2, i = 1,..., j 1, j + 1,..., k.

Swapping 3 Biorthogonalization Swapping is based on comparing whether removing one already chosen atom and including a new one improve the approximation. Backward and forward steps are made easy with adaptive biorthogonalization. Backward step: Downdate P Vk f to P Vk\j f The index to be removed is the minimizer over all n = 1,..., k c k n β k n. Forward step: Update P Vk 1 f to P Vk f The new index l k is the maximizer over all n J \J k 1 of γ n, f (OMP), or γ n, f γ n, γ n 0, (OOMP), where J k 1 = {l 1,, l k 1 }.

Theoretical Bound 3 Biorthogonalization OMP selects only the optimal atoms if Φ optφ npt 1 < 1 (Tropp 2004). The bound is obtained from the inequality Φ nptr n f Φ opt R n f < 1, where Φ opt and Φ npt form sets of optimal and non-optimal atoms, and R n f = f P Vn f. Using φ, R n f = R n φ, f, we get R nφ npt f R n Φ opt f < 1, which gives better bound. Figure 1: Signal independent and signal dependent check for OMP and OOMP

More Coherent B-spline Dictionary 3 Biorthogonalization The theoretical bound does not apply at all. Signals are generated by selecting atoms randomly and combining them linearly The signal is exactly recovered with swapping Without swapping only the signals up to sparsity 80 are exactly recovered Error Number of Selected Atoms Babel Function Figure 2: Exact signal recovery using and not using swapping operation

Controlling the Condition Number 4 Condition Control Restrict the search space with γ n ɛ 1 and stop when γ n,f γ n ɛ 2, γ n = α n P Vk α n. Both depend on the dictionary and the signal. If W k = [β1, k, βk k], 1 gives the smallest singular value of the selected basis. W k The search can be broken if it goes beyond some limit. Optimal trade-off between the condition number and the approximation? Need a cheap condition estimator. Implementation of the matching pursuit based on the incremental QRdecomposition and the biorthogonal functions = the use of incremental norm estimator proposed by Duff and Voemel. Assume W k = [α l1,, α lk ], then W T k W k = I k. Suppose that W k = Q k R k (QR-decomposition of W k ). The condition number of the basis W k = W k W k = R 1 k R k.

Controlling the Condition Number 4 Condition Control W k = [α l1,, α lk ] = Q k R k. Update the QR-decomposition of W k to get the QR-decomposition of W n k = [α l 1,, α lk, α n ]. Hence W n k = Qn k Rn k with R n k = ( Rk v n k 0 r n k Assume that σ k = R k z be the norm of the matrix with z = 1. The idea of incremental norm estimator is to maximize the quantity R kẑ n with ẑ = (sz, c)t, and s 2 + c 2 n = 1. An analytical expression for the maximum of Rkẑ is easily obtained [Duff and Voemel]. (R n k ) 1 = 1/σ n k with σn k being the smallest singular value of the matrix Rn k. The same incremental norm estimator can be used to compute the smallest singular value of R n k. ).

Controlling the Condition Number 4 Condition Control The smallest and largest singular values of R k and corresponding right singular vectors can be cheaply computed by applying a few power iterations to R k and initialized by some intelligent guess. R 1 k Let Jk ɛ = {n : γ n,f γ ɛ k }. Assume that κ n k the estimated condition number of the basis [W k, α n ], n Jk ɛ. Select the atom with index l k+1 = arg min n J ɛ k κ n k. Alternative way: Assume that D k := { α n, n J \J k } is sorted out in such a way that { γ n,f γ n, n J \J k} is in descending order, and κ k is the condition number of the basis at the kth step, then starting from the first atom in D k, estimate the condition number κ n k of [W κ k, α n ], n J \J k until n k κ is less than some fixed k number. Swapping can be done to improve the approximation or the condition number of the basis.

4 Condition Control Numerical Results Given a set find a subset with a given condition number ax n, a = {1, 2}, n = {1, 2, 3, 4, 5}, x = 10. (Old Alg. is based on SVD: Golub and Van Loan) Thrs x 2x x 2 2x 2 x 3 2x 3 x 4 2x 4 x 5 2x 5 Bspline Old 37 41 49 52 82 93 109 114 122 125 (143) New 34 39 48 50 75 87 105 110 119 122 Wavelet Old 152 163 182 189 202 207 216 219 226 229 (635) New 158 169 185 194 205 209 217 221 228 230 Figure 3: The ratios of the thresholds (thrs/cnd) and the computed condition numbers, B-spline dictionary (left), Wavelet dictionary (right)

4 Condition Control Numerical Results for OOMP We apply OOMP with and without condition control to approximate the modulated chirp function f = exp(x) cos(kπx 2 ) for k = 5, 10, 15, 20, 25, 30, 35, 40 and 45. The search is stopped when the condition number is greater than 10 6. Figure 4: Chirp function (left) error (middle) and number of selected atoms (right) versus k for the approximation of the modulated chirp function

4 Condition Control Numerical Results for Swapping Signals are chosen by selecting the atoms randomly and using linear combination of them. Figure 5: Error and condition growth with swapping operation

Oblique Projection 5 Oblique Matching Pursuit We are given three subspaces V, W and W of H with H = W W = V + W. The oblique projector E VW : H V is uniquely defined with the properties E VW v = v, v V, and E VW w = 0, w W if V W ={0}. For f H, Orthogonal projection: P Vk f = arg min g V k f g. Oblique projection: E Vk W f = arg min g V k f g PW with f, g PW = f, P W g. Following error bound holds: f P Vk f f E Vk W f 1 cos(θ) f P V k f, where θ is the angle between the spaces V k and W.

Oblique Matching Pursuit 5 Oblique Matching Pursuit If a signal f H is corrupted with noise, and the noise is known to lie in W with f = g + h with g V and h W, then g = E VW f. In general, only a redundant set D of atoms, called a dictionary, is known, which span V. The signal can be even corrupted with other noise. How to select atoms from the set D to span g? H = W W = V + W. Non-uniqueness of V in Use OOMP on the modified dictionary P W D to select atoms and construct oblique projector E Vk W. We call it Oblique matching pursuit. The efficient implementation can be done by using biorthogonal functions as before which span the space P W V k.

5 Oblique Matching Pursuit Figure 6: Oblique matching pursuit applied to the signal corrupted with only structured noise- called backrgound (left) and corrupted with structured and white noise σ = 0.001 (right) Note: The direct oblique projection (computed by T-SVD) explodes in case of the signal with white noise.

6 Conclusion Conclusion and Remark The recursive biorthogonal approach yields an efficient implmentation for the Matching Pursuit method. Forward and backward basis selection can be combined to introduce swapping operation based on the minimal residual condition. The incremental condition estimator can be easily adapted within the recursive biorthogonal approach to deal with the ill-conditioning. Oblique Matching Pursuit (Weighted Matching Pursuit) can be used to filter the structured noise. Theoretical investigation of the swapping operation. Generalizing the result for the multiple selection. Develop a domain decomposition algorithm. Combining different algorithms for efficiency and flexibility. Thank You