An Introduction to Sparse Approximation

Similar documents
Sparse analysis Lecture III: Dictionary geometry and greedy algorithms

Sparse analysis Lecture V: From Sparse Approximation to Sparse Signal Recovery

Sparse analysis Lecture II: Hardness results for sparse approximation problems

Combining geometry and combinatorics

Sparse analysis Lecture VII: Combining geometry and combinatorics, sparse matrices for sparse signal recovery

Algorithms for sparse analysis Lecture I: Background on sparse approximation

Greedy Signal Recovery and Uniform Uncertainty Principles

Model-Based Compressive Sensing for Signal Ensembles. Marco F. Duarte Volkan Cevher Richard G. Baraniuk

Simultaneous Sparsity

Sensing systems limited by constraints: physical size, time, cost, energy

Sparsity in Underdetermined Systems

Generalized Orthogonal Matching Pursuit- A Review and Some

Inverse problems and sparse models (1/2) Rémi Gribonval INRIA Rennes - Bretagne Atlantique, France

Uniform Uncertainty Principle and Signal Recovery via Regularized Orthogonal Matching Pursuit

Orthogonal Matching Pursuit for Sparse Signal Recovery With Noise

New Coherence and RIP Analysis for Weak. Orthogonal Matching Pursuit

Introduction to Compressed Sensing

MLCC 2018 Variable Selection and Sparsity. Lorenzo Rosasco UNIGE-MIT-IIT

Tutorial: Sparse Signal Recovery

Large-Scale L1-Related Minimization in Compressive Sensing and Beyond

GREEDY SIGNAL RECOVERY REVIEW

Compressed sensing. Or: the equation Ax = b, revisited. Terence Tao. Mahler Lecture Series. University of California, Los Angeles

Inverse problems and sparse models (6/6) Rémi Gribonval INRIA Rennes - Bretagne Atlantique, France.

Analysis of Greedy Algorithms

CoSaMP: Greedy Signal Recovery and Uniform Uncertainty Principles

Lecture: Introduction to Compressed Sensing Sparse Recovery Guarantees

Uniform Uncertainty Principle and signal recovery via Regularized Orthogonal Matching Pursuit

Z Algorithmic Superpower Randomization October 15th, Lecture 12

Reconstruction from Anisotropic Random Measurements

EE 381V: Large Scale Optimization Fall Lecture 24 April 11

ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis

Sparse Approximation and Variable Selection

Multipath Matching Pursuit

SPARSE signal representations have gained popularity in recent

INDUSTRIAL MATHEMATICS INSTITUTE. B.S. Kashin and V.N. Temlyakov. IMI Preprint Series. Department of Mathematics University of South Carolina

The Sparsity Gap. Joel A. Tropp. Computing & Mathematical Sciences California Institute of Technology

Bhaskar Rao Department of Electrical and Computer Engineering University of California, San Diego

Compressive Sensing and Beyond

Thresholds for the Recovery of Sparse Solutions via L1 Minimization

Copyright by Joel Aaron Tropp 2004

Detecting Sparse Structures in Data in Sub-Linear Time: A group testing approach

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Compressed Sensing and Redundant Dictionaries

Signal Recovery From Incomplete and Inaccurate Measurements via Regularized Orthogonal Matching Pursuit

CoSaMP. Iterative signal recovery from incomplete and inaccurate samples. Joel A. Tropp

Signal Recovery from Permuted Observations

A Tutorial on Compressive Sensing. Simon Foucart Drexel University / University of Georgia

Sparsity Models. Tong Zhang. Rutgers University. T. Zhang (Rutgers) Sparsity Models 1 / 28

Sparse linear models

Sparse regression. Optimization-Based Data Analysis. Carlos Fernandez-Granda

Compressed Sensing and Sparse Recovery

Pre-weighted Matching Pursuit Algorithms for Sparse Recovery

Mismatch and Resolution in CS

Lecture 22: More On Compressed Sensing

Compressed Sensing: Extending CLEAN and NNLS

The Analysis Cosparse Model for Signals and Images

Near Ideal Behavior of a Modified Elastic Net Algorithm in Compressed Sensing

Noisy Signal Recovery via Iterative Reweighted L1-Minimization

Compressed Sensing and Redundant Dictionaries

Introduction to Sparsity. Xudong Cao, Jake Dreamtree & Jerry 04/05/2012

Equivalence Probability and Sparsity of Two Sparse Solutions in Sparse Representation

Compressed Sensing - Near Optimal Recovery of Signals from Highly Incomplete Measurements

Shifting Inequality and Recovery of Sparse Signals

Sparse Optimization Lecture: Sparse Recovery Guarantees

IEEE SIGNAL PROCESSING LETTERS, VOL. 22, NO. 9, SEPTEMBER

Optimization for Compressed Sensing

MATCHING PURSUIT WITH STOCHASTIC SELECTION

Solution Recovery via L1 minimization: What are possible and Why?

A new method on deterministic construction of the measurement matrix in compressed sensing

Sparse recovery using sparse random matrices

Sparse Solutions of an Undetermined Linear System

Part IV Compressed Sensing

Compressed Sensing and Neural Networks

Sparse Approximation of Signals with Highly Coherent Dictionaries

Strengthened Sobolev inequalities for a random subspace of functions

Sparse Legendre expansions via l 1 minimization

Recent Developments in Compressed Sensing

Lecture Notes 9: Constrained Optimization

Sparse & Redundant Signal Representation, and its Role in Image Processing

Inverse problems, Dictionary based Signal Models and Compressed Sensing

Recovery of Sparse Signals Using Multiple Orthogonal Least Squares

Supremum of simple stochastic processes

Uncertainty principles and sparse approximation

The Sparsest Solution of Underdetermined Linear System by l q minimization for 0 < q 1

Methods for sparse analysis of high-dimensional data, II

Bayesian Methods for Sparse Signal Recovery

High-dimensional Statistics

c 2011 International Press Vol. 18, No. 1, pp , March DENNIS TREDE

Sparse Solutions of Systems of Equations and Sparse Modelling of Signals and Images

Compressive Sensing Theory and L1-Related Optimization Algorithms

Compressed Sensing with Very Sparse Gaussian Random Projections

Optimisation Combinatoire et Convexe.

On Sparsity, Redundancy and Quality of Frame Representations

Lecture 5 : Sparse Models

Motivation Sparse Signal Recovery is an interesting area with many potential applications. Methods developed for solving sparse signal recovery proble

AN INTRODUCTION TO COMPRESSIVE SENSING

Near Optimal Signal Recovery from Random Projections

ESL Chap3. Some extensions of lasso

Abstract This paper is about the efficient solution of large-scale compressed sensing problems.

Gradient Descent with Sparsification: An iterative algorithm for sparse recovery with restricted isometry property

Transcription:

An Introduction to Sparse Approximation Anna C. Gilbert Department of Mathematics University of Michigan

Basic image/signal/data compression: transform coding

Approximate signals sparsely Compress images, signals, data accurately (mathematics) concisely (statistics) efficiently (algorithms)

Redundancy If one orthonormal basis is good, surely two (or more) are better...

Redundancy If one orthonormal basis is good, surely two (or more) are better......especially for images

Dictionary Definition A dictionary D in R n is a collection {ϕ l } d l=1 Rn of unit-norm vectors: ϕ l 2 = 1 for all l. Elements are called atoms If span{ϕ l } = R n, the dictionary is complete If {ϕ l } are linearly dependent, the dictionary is redundant

Matrix representation Form a matrix Φ = [ ] ϕ 1 ϕ 2... ϕ d so that Φc = c l ϕ l. l Φ c x Φᵀ

Examples: Fourier Dirac Φ = [ F I ] ϕ l (t) = 1 n e 2πilt/n l = 1, 2,..., n ϕ l (t) = δ l (t) l = n + 1, n + 2,..., 2n

Sparse Problems Exact. Given a vector x R n and a complete dictionary Φ, solve min c c 0 s.t. x = Φc i.e., find a sparsest representation of x over Φ. Error. Given ɛ 0, solve min c c 0 s.t. x Φc 2 ɛ i.e., find a sparsest approximation of x that achieves error ɛ. Sparse. Given k 1, solve min c x Φc 2 s.t. c 0 k i.e., find the best approximation of x using k atoms.

NP-hardness Theorem Given an arbitrary redundant dictionary Φ and a signal x, it is NP-hard to solve the sparse representation problem Sparse. [Natarajan 95,Davis 97] Corollary Error is NP-hard as well. Corollary It is NP-hard to determine if the optimal error is zero for a given sparsity level k.

Exact Cover by 3-sets: X3C Definition Given a finite universe U, a collection X of subsets X 1, X 2,..., X d s.t. X i = 3 for each i, does X contain a disjoint collection of subsets whose union = U? Classic NP-hard problem. Proposition Any instance of X3C is reducible in polynomial time to Sparse. X1 X2 X3 XN u

Bad news, Good news Bad news Given any polynomial time algorithm for Sparse, there is a dictionary Φ and a signal x such that algorithm returns incorrect answer Pessimistic: worst case Cannot hope to approximate solution, either

Bad news, Good news Bad news Given any polynomial time algorithm for Sparse, there is a dictionary Φ and a signal x such that algorithm returns incorrect answer Pessimistic: worst case Cannot hope to approximate solution, either Good news Natural dictionaries are far from arbitrary Perhaps natural dictionaries admit polynomial time algorithms Optimistic: rarely see worst case Leverage our intuition from orthogonal basis

Hardness depends on instance Redundant dictionary Φ input signal x NP-hard arbitrary arbitrary depends on choice of Φ example: spikes and sines fixed fixed compressive sensing random (distribution?) random (distribution?) random signal model

Sparse algorithms: exploit geometry Orthogonal case: pull off atoms one at a time, with dot products in decreasing magnitude

Sparse algorithms: exploit geometry Orthogonal case: pull off atoms one at a time, with dot products in decreasing magnitude Why is orthogonal case easy? inner products between atoms are small it s easy to tell which one is the best choice When atoms are (nearly) parallel, can t tell which one is best

Coherence Definition The coherence of a dictionary µ = max j l ϕ j, ϕ l φ1 φ1 φ2 φ3 φ2 φ3 Small coherence (good) Large coherence (bad)

Large, incoherent dictionaries Fourier Dirac, d = 2n, µ = 1 n wavelet packets, d = n log n, µ = 1 2 There are large dictionaries with coherence close to the lower (Welch) bound; e.g., Kerdock codes, d = n 2, µ = 1/ n 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0

Greedy algorithms Build approximation one step at a time......choose the best atom at each step

Orthogonal Matching Pursuit OMP [Mallat 93, Davis 97] Input. Dictionary Φ, signal x, steps k Output. Coefficient vector c with k nonzeros, Φc x Initialize. counter t = 1, c = 0 1. Greedy selection. l t = argmax Φ (x Φc) 2. Update. Find c l1,..., c lt to solve min x c ls ϕ ls s 2 new approximation a t Φc 3. Iterate. t t + 1, stop when t > k.

Many greedy algorithms with similar outline Matching Pursuit: replace step 2. by c lt c lt + x Φc, ϕ lt Thresholding Choose m atoms where x, ϕ l are among m largest Alternate stopping rules: x Φc 2 ɛ max l x Φc, ϕ l ɛ Many other variations

Convergence of OMP Theorem Suppose Φ is a complete dictionary for R n. For any vector x, the residual after t steps of OMP satisfies x Φc 2 C t. [Devore-Temlyakov]

Convergence of OMP Theorem Suppose Φ is a complete dictionary for R n. For any vector x, the residual after t steps of OMP satisfies x Φc 2 C t. [Devore-Temlyakov] Even if x can be expressed sparsely, OMP may take n steps before the residual is zero. But, sometimes OMP correctly identifies sparse representations.

Exact Recovery Condition and coherence Theorem (ERC) A sufficient condition for OMP to identify Λ after k steps is that Φ + Λ ϕ l 1 < 1 max l/ Λ where A + = (A A) 1 A. [Tropp 04] Theorem The ERC holds whenever k < 1 2 (µ 1 + 1). Therefore, OMP can recover any sufficiently sparse signals. [Tropp 04] For most redundant dictionaries, k < 1 2 ( n + 1).

Sparse representation with OMP Suppose x has k-sparse representation x = l Λ b l ϕ l where Λ = k Sufficient to find Λ When can OMP do so? Define Φ Λ = [ ϕ l1 ϕ l2 ϕ lk ] l s Λ Ψ Λ = [ ϕ l1 ϕ l2 ϕ ln k ] Define greedy selection ratio ρ(r) = max l/ Λ r, ϕ l max l Λ r, ϕ l = Ψ Λ r Φ Λ r = OMP chooses good atom iff ρ(r) < 1 l s / Λ and max i.p. bad atoms max i.p. good atoms

Sparse Theorem Assume k 1 3µ. For any vector x, the approximation x after k steps of OMP satisfies x x 2 1 + 6k x x k 2 where x k is the best k-term approximation to x. [Tropp 04] Theorem Assume 4 k 1 µ. Two-phase greedy pursuit produces x s.t. x x 2 3 x x k 2. Assume k 1 µ. Two-phase greedy pursuit produces x s.t. x x 2 ( 2µk 2 ) 1 + (1 2µk) 2 x x k 2. [Gilbert, Strauss, Muthukrishnan, Tropp 03]

Alternative algorithmic approach Exact: non-convex optimization min c 0 s.t. x = Φc

Alternative algorithmic approach Exact: non-convex optimization min c 0 s.t. x = Φc Convex relaxation of non-convex problem min c 1 s.t. x = Φc Error: non-convex optimization arg min c 0 s.t. x Φc 2 ɛ

Alternative algorithmic approach Exact: non-convex optimization min c 0 s.t. x = Φc Convex relaxation of non-convex problem min c 1 s.t. x = Φc Error: non-convex optimization arg min c 0 s.t. x Φc 2 ɛ Convex relaxation of non-convex problem arg min c 1 s.t. x Φc 2 δ.

Convex relaxation: algorithmic formulation Well-studied algorithmic formulation [Donoho, Donoho-Elad-Temlyakov, Tropp, and many others] Optimization problem = linear program: linear objective function (with variables c +, c ) and linear or quadratic constraints Still need algorithm for solving optimization problem Hard part of analysis: showing solution to convex problem = solution to original problem

Exact Recovery Condition Theorem (ERC) A sufficient condition for BP to recover the sparsest representation of x is that max Φ + Λ ϕ l l/ Λ 1 < 1 where A + = (A T A) 1 A T. [Tropp 04]

Exact Recovery Condition Theorem (ERC) A sufficient condition for BP to recover the sparsest representation of x is that max Φ + Λ ϕ l l/ Λ 1 < 1 where A + = (A T A) 1 A T. [Tropp 04] Theorem The ERC holds whenever k < 1 2 (µ 1 + 1). Therefore, BP can recover any sufficiently sparse signals. [Tropp 04]

Alternate optimization formulations Constrained minimization: arg min c 1 s.t. x Φc 2 δ. Unconstrained minimization: minimize L(c; γ, x) = 1 2 x Φc 2 2 + γ c 1. Many algorithms for l 1 -regularization

Sparse approximation: Optimization vs. Greedy Exact and Error amenable to convex relaxation and convex optimization Sparse not amenable to convex relaxation arg min Φc x 2 s.t. c 0 k but appropriate for greedy algorithms

Connection between... Sparse Approximation and Statistical Learning

Sparsity in statistical learning p N X = y responses Xj =(X1j,..., XNj) T predictor variables β Goal: Given X and y, find α and coeffs. β R p for linear model that minimizes the error (ˆα, ˆβ) = arg min X β (y α) 2 2. Solution: Least squares: low bias but large variance and hard to interpret lots of non-zero coefficients Shrink β j s, make β sparse.

Algorithms in statistical learning Brute force: Calculate Mallows C p for every subset of predictor variables, and choose the best one. Greedy algorithms: Forward selection, forward stagewise, least angle regression (LARS), backward elimination. Constrained optimization: Quadratic programming problem with linear constraints (e.g., LASSO). Unconstrained optimization: regularization techniques Sparse approximation and SVM equivalence [Girosi 96]

Connection between... Sparse Approximation and Compressed Sensing

Interchange roles Φ = measurements/coeffs. data/image

Problem statement (TCS Perspective) m as small as possible Construct Matrix Φ: R n R m Decoding algorithm D Assume x has low complexity: x is k-sparse (with noise) Given Φx for any signal x R n, we can, with high probability, quickly recover x with x x p (1 + ɛ) min x y q = (1 + ɛ) x x k q y k sparse

Comparison with Sparse Approximation Sparse: Given y and Φ, find (sparse) x such that y = Φx. Return x with guarantee Φ x y 2 small compared with y Φx k 2. CS: Given y and Φ, find (sparse) x such that y = Φx. Return x with guarantee x x p small compared with x x k q. p and q not always the same, not always = 2.

Comparison with Statistical Learning p N X = y responses Xj =(X1j,..., XNj) T predictor variables β Goal: Given X and y, find α and coeffs. β R p for linear model that minimizes the error (ˆα, ˆβ) = arg min X β (y α) 2 2. Statistics: X drawn iid from distribution (i.e., cheap generation), characterize mathmematical performance as a function of distribution TCS: X (or distribution) is constructed (i.e., expensive), characterize algorithmic performance as a function of space, time, and randomness

Analogy: root-finding p with p p ɛ p with f( p) 0 ɛ root = p ɛ ɛ Sparse: Given f (and y = 0), find p such that f (p) = 0. Return p with guarantee f ( p) 0 small. CS: Given f (and y = 0), find p such that f (p) = 0. Return p with guarantee p p small.

Parameters 1. Number of measurements m 2. Recovery time 3. Approximation guarantee (norms, mixed) 4. One matrix vs. distribution over matrices 5. Explicit construction 6. Universal matrix (for any basis, after measuring) 7. Tolerance to measurement noise

Applications Applications Data stream algorithms * Data stream algorithms x i = number of items with index i x i = number of can items maintain with index Ax under i increments to x and recover can maintain Φx approximation under increments to x to x recover approximation to x Efficient data sensing * Efficient data sensing digital/analog digital cameras cameras analog-to-digital analog-to-digital converters converters high throughput biological screening (pooling designs) Error-correcting codes * Error-correcting codes {y R n Ay = 0} code {y R n Φy x = = error 0} vector, Ax = syndrome x = error vector, Φx = syndrome

Two approaches Geometric [Donoho 04],[Candes-Tao 04, 06],[Candes-Romberg-Tao 05], [Rudelson-Vershynin 06], [Cohen-Dahmen-DeVore 06], and many others... Dense recovery matrices that satisfy RIP (e.g., Gaussian, Fourier) Geometric recovery methods (l 1 minimization, LP) x = argmin z 1 s.t. Φz = Φx Uniform guarantee: one matrix A that works for all x Combinatorial [Gilbert-Guha-Indyk-Kotidis-Muthukrishnan-Strauss 02], [Charikar-Chen-FarachColton 02] [Cormode-Muthukrishnan 04], [Gilbert-Strauss-Tropp-Vershynin 06, 07] Sparse random matrices (typically) Combinatorial recovery methods or weak, greedy algorithms Per-instance guarantees, later uniform guarantees

Summary * Sparse approximation, statistical learning, and compressive sensing intimately related * Many models of computation and scientific/technological problems in which they all arise * Algorithms for all similar: optimization and greedy * Community progress on geometric and statistical models for matrices Φ and signals x, different problem instance types * Explicit constructions? * Better/different geometric/statistical models? * Better connections with coding and complexity theory?