Sparse analysis Lecture V: From Sparse Approximation to Sparse Signal Recovery

Similar documents
An Introduction to Sparse Approximation

Combining geometry and combinatorics

Sparse analysis Lecture VII: Combining geometry and combinatorics, sparse matrices for sparse signal recovery

Sparse analysis Lecture III: Dictionary geometry and greedy algorithms

CS on CS: Computer Science insights into Compresive Sensing (and vice versa) Piotr Indyk MIT

Greedy Signal Recovery and Uniform Uncertainty Principles

GREEDY SIGNAL RECOVERY REVIEW

CoSaMP: Greedy Signal Recovery and Uniform Uncertainty Principles

Tutorial: Sparse Signal Recovery

Linear Sketches A Useful Tool in Streaming and Compressive Sensing

CoSaMP. Iterative signal recovery from incomplete and inaccurate samples. Joel A. Tropp

Model-Based Compressive Sensing for Signal Ensembles. Marco F. Duarte Volkan Cevher Richard G. Baraniuk

Large-Scale L1-Related Minimization in Compressive Sensing and Beyond

Simultaneous Sparsity

Uniform Uncertainty Principle and Signal Recovery via Regularized Orthogonal Matching Pursuit

Sensing systems limited by constraints: physical size, time, cost, energy

New Coherence and RIP Analysis for Weak. Orthogonal Matching Pursuit

Solution Recovery via L1 minimization: What are possible and Why?

Uniform Uncertainty Principle and signal recovery via Regularized Orthogonal Matching Pursuit

An Adaptive Sublinear Time Block Sparse Fourier Transform

Sparse recovery using sparse random matrices

Signal Recovery from Permuted Observations

Algorithms for sparse analysis Lecture I: Background on sparse approximation

Compressive Sensing and Beyond

Recent Developments in Compressed Sensing

Sample Optimal Fourier Sampling in Any Constant Dimension

Exponential decay of reconstruction error from binary measurements of sparse signals

Compressed Sensing - Near Optimal Recovery of Signals from Highly Incomplete Measurements

Explicit Constructions for Compressed Sensing of Sparse Signals

Multipath Matching Pursuit

Compressed sensing. Or: the equation Ax = b, revisited. Terence Tao. Mahler Lecture Series. University of California, Los Angeles

Reconstruction from Anisotropic Random Measurements

Compressed Sensing and Sparse Recovery

Tutorial: Sparse Recovery Using Sparse Matrices. Piotr Indyk MIT

Near Optimal Signal Recovery from Random Projections

Generalized Orthogonal Matching Pursuit- A Review and Some

Thresholds for the Recovery of Sparse Solutions via L1 Minimization

Sparse Recovery Using Sparse (Random) Matrices

Compressibility of Infinite Sequences and its Interplay with Compressed Sensing Recovery

Sparse Optimization Lecture: Sparse Recovery Guarantees

Rapidly Computing Sparse Chebyshev and Legendre Coefficient Expansions via SFTs

Heavy Hitters. Piotr Indyk MIT. Lecture 4

ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Pre-weighted Matching Pursuit Algorithms for Sparse Recovery

Sparse Solutions of an Undetermined Linear System

Tutorial: Sparse Recovery Using Sparse Matrices. Piotr Indyk MIT

Lecture 16: Compressed Sensing

Near-Optimal Sparse Recovery in the L 1 norm

Compressed Sensing: Lecture I. Ronald DeVore

ORTHOGONAL matching pursuit (OMP) is the canonical

Algorithmic linear dimension reduction in the l 1 norm for sparse vectors

Towards an Algorithmic Theory of Compressed Sensing

IEEE SIGNAL PROCESSING LETTERS, VOL. 22, NO. 9, SEPTEMBER

Three Generalizations of Compressed Sensing

Sparse Fourier Transform (lecture 4)

Compressed Sensing and Linear Codes over Real Numbers

EE 381V: Large Scale Optimization Fall Lecture 24 April 11

Lecture 13 October 6, Covering Numbers and Maurey s Empirical Method

Near Ideal Behavior of a Modified Elastic Net Algorithm in Compressed Sensing

Noisy Signal Recovery via Iterative Reweighted L1-Minimization

Compressed Sensing with Very Sparse Gaussian Random Projections

The Analysis Cosparse Model for Signals and Images

Inverse problems and sparse models (1/2) Rémi Gribonval INRIA Rennes - Bretagne Atlantique, France

Combining geometry and combinatorics: a unified approach to sparse signal recovery

Elaine T. Hale, Wotao Yin, Yin Zhang

CS 229r: Algorithms for Big Data Fall Lecture 19 Nov 5

Compressed Sensing: Extending CLEAN and NNLS

A tutorial on sparse modeling. Outline:

MATCHING PURSUIT WITH STOCHASTIC SELECTION

AN INTRODUCTION TO COMPRESSIVE SENSING

COMPRESSED SENSING IN PYTHON

Signal Recovery From Incomplete and Inaccurate Measurements via Regularized Orthogonal Matching Pursuit

COMPRESSED SENSING AND BEST k-term APPROXIMATION

Orthogonal Matching Pursuit for Sparse Signal Recovery With Noise

Recovery Guarantees for Rank Aware Pursuits

Strengthened Sobolev inequalities for a random subspace of functions

SPARSE signal representations have gained popularity in recent

Optimization for Compressed Sensing

Compressive Sensing Theory and L1-Related Optimization Algorithms

Sparse analysis Lecture II: Hardness results for sparse approximation problems

Gradient Descent with Sparsification: An iterative algorithm for sparse recovery with restricted isometry property

Lecture 16 Oct. 26, 2017

Exact Signal Recovery from Sparsely Corrupted Measurements through the Pursuit of Justice

Recovering overcomplete sparse representations from structured sensing

Compressive Sensing of Streams of Pulses

Sparse Solutions of Systems of Equations and Sparse Modelling of Signals and Images

Optimal Deterministic Compressed Sensing Matrices

Efficient Inverse Cholesky Factorization for Alamouti Matrices in G-STBC and Alamouti-like Matrices in OMP

Sparse Legendre expansions via l 1 minimization

INDUSTRIAL MATHEMATICS INSTITUTE. B.S. Kashin and V.N. Temlyakov. IMI Preprint Series. Department of Mathematics University of South Carolina

Compressed sensing and best k-term approximation

The Fundamentals of Compressive Sensing

of Orthogonal Matching Pursuit

Detecting Sparse Structures in Data in Sub-Linear Time: A group testing approach

Sparse Expander-like Real-valued Projection (SERP) matrices for compressed sensing

Instance Optimal Decoding by Thresholding in Compressed Sensing

Introduction How it works Theory behind Compressed Sensing. Compressed Sensing. Huichao Xue. CS3750 Fall 2011

Wavelet decomposition of data streams. by Dragana Veljkovic

Lecture 22: More On Compressed Sensing

Transcription:

Sparse analysis Lecture V: From Sparse Approximation to Sparse Signal Recovery Anna C. Gilbert Department of Mathematics University of Michigan

Connection between... Sparse Approximation and Compressed Sensing

Encoding schemes Image Linear encoding Φ Matrix image independent Nonlinear decoding Approx. Image Nonlinear encoding Ω Linear decoding Matrix image dependent Approx. Image Sparse Approximation Compressed Sensing

Linear encoding, nonlinear decoding Φ x = c highly nonlinear decoding x Φ y = d highly nonlinear decoding y Measure accuracy of decoded signal with respect to x k, best k-term approximation for x (in some orthonormal basis)

Problem statement m as small as possible (Design) matrix Φ: R n R m (Design Φ so that) given Φx = y for any signal x R n, there is an algorithm to recover x with x x p C x x k q

Parameters Number of measurements m Recovery time Approximation guarantee (p, q norms, mixed) One matrix vs. distribution over matrices Explicit construction Tolerance to measurement noise

Comparison with Sparse Approximation Sparse: Given y and Φ, find (sparse) x such that y = Φx. Return x with guarantee Φ x y 2 small compared with y Φx k 2. CS: Given y and Φ, find (sparse) x such that y = Φx. Return x with guarantee x x 2 small compared with 1 k x x k 1.

Analogy: root-finding p with p p ɛ p with f( p) 0 ɛ root = p ɛ ɛ Sparse: Given f (and y = 0), find p such that f (p) = 0. Return p with guarantee f ( p) 0 small. CS: Given f (and y = 0), find p such that f (p) = 0. Return p with guarantee p p small.

Root-finding analogy? Φx = x 1 + 1 2 x 2 2 4 1.5 3.5 1 3 0.5 2.5 0!0.5 2!1 1.5!1.5 1!2!2!1.5!1!0.5 0 0.5 1 1.5 2 0.5

Algorithms for CS Convex optimization Greedy iterative methods

Algorithms for CS Convex optimization Greedy iterative methods Problem formulations Recover entire signal Recover k significant terms

Algorithms for CS Convex optimization Greedy iterative methods Problem formulations Recover entire signal Recover k significant terms Role of probability: Probabilistic method: if choose Φ from distribution, whp it will satisfy certain properties Use randomness in algorithm

Algorithms for CS Convex optimization Greedy iterative methods Problem formulations Recover entire signal Recover k significant terms Role of probability: Probabilistic method: if choose Φ from distribution, whp it will satisfy certain properties Use randomness in algorithm Sparse representation is central to successful algorithms

CS Geometric methods Suppose Φ satisfies RIP(2, 2k, δ k ) condition: for any 2k-sparse vector x, (1 δ k ) x 2 Φx 2 (1 + δ k ) x 2. Given Φx = y, the solution x to the convex relaxation problem x = argmin z 1 s.t. Φz = Φx satisfies x x 2 C/ k x x k 1.

CS Geometric methods Suppose Φ satisfies RIP(2, 2k, δ k ) condition: for any 2k-sparse vector x, (1 δ k ) x 2 Φx 2 (1 + δ k ) x 2. Given Φx = y, the solution x to the convex relaxation problem x = argmin z 1 s.t. Φz = Φx satisfies x x 2 C/ k x x k 1. Dense recovery matrices: if draw Φ from random matrix with iid (sub-)gaussian entries or random rows of Fourier matrix, m = O(k log n/k) rows and n columns, Φ satisfies RIP(2) with high probability Example of probabilistic method to generate matrix not constructive Uniform guarantee: one matrix Φ that works for all x [Donoho 04],[Candes-Tao 04, 06],[Candes-Romberg-Tao 05], [Rudelson-Vershynin 06], [Cohen-Dahmen-DeVore 06], and many others...

CS Greedy algorithms Suppose Φ satisfies RIP(2, 2k, δ) condition. Given Φx = y, there are greedy iterative algorithms that produce x with x 0 = k x x 2 C( x x k 2 + 1/ k x x k 1 ). [Tropp-Needell 07],[Blumensath-Davies 08], and others Architecture of algorithms is Greedy Pursuit OMP Maximize: choose λ = set of m columns of Φ with large dot product Φ T r Update: Λ = Λ λ a = linear combination over Φ Λ Iterate: r = y Φa

Computational costs Computational time All dominated by Φ T r, matrix-vector product LP: O(T n), greedy: O(nk log(k/n) log( x 2 )) Storage Have to store Φ k log(n/k) n matrix (unless it has special structure!) Store x as you build it (only non-zero entries or entire vector) Randomness Have to generate Φ Theoretically, all entries are iid, truly random Note: drand48 is pseudo-random

Connection between... Sparse Approximation/Compressive Sensing and Streaming/Sublinear Algorithms

Data streams Data Streams 129.132.69.131 192.16.1.201 172.16.254.1 IP packets header: source/destination address, payload size, protocol, etc. payload: transmitted data IP packets: Backbone link bandwidth = 40 Gbits/sec Over 90% packets are no more than 500 Bytes Monitor 10 million packets/second header: source/destination address, payload size, protocol, etc. payload: transmitted data

Heavy hitters (129.132.69.131, 128) (192.16.1.201, 1024) (172.16.254.1, 64) # Bytes 129.132.69.131 172.16.254.1 192.16.1.201 IP address

Linear measurements (129.132.69.131, 128) =

Linear measurements (129.132.69.131, 128) (192.16.1.201, 1024) =

Linear measurements (129.132.69.131, 128) (192.16.1.201, 1024) (172.16.254.1, 64) =

Streaming algorithms Data (2) Storage space Auxiliary memory (4) what s stored in memory should be composable (1) per-item processing time (3) Time to produce output Resource constraints for (1), (2), and (3) should be poly log(d)

Streaming/Sub-linear algorithms There are sub-linear algorithms for CS running time: O(k polylog n) measurements/storage: O(k log(n/k)) error guarantees: match or l 1 /l 1 or l 2 /l 2 (with different probabilistic constructions) quite different geometric restrictions on Φ use and exploit pseudo-randomness to reduce storage space and speed up algorithms all conditions are sufficient, none seem to be necessary [Gilbert-Guha-Indyk-Kotidis-Muthukrishnan-Strauss 02], [Charikar-Chen-FarachColton 02] [Cormode-Muthukrishnan 04], [Gilbert-Strauss-Tropp-Vershynin 06, 07],[Gilbert-Li-Porat-Strauss 10]

New algorithms, phase transitions, random models d 1d histogram V-OPT (2,1) bi-criteria sparsity k d/2 d/log(d) 1/3 d random subdictionaries L1 optimization k ⅓μ -1 sufficient conditions for OMP and L1 optimizaiton 1 1/ d 1/3 1/2 1-1/d coherence μ 1

Alternative problem formulations + algorithms Dictionary Φ = piecewise constants (in 1 dimension) Extremely high coherence µ 1 1/d

Alternative problem formulations + algorithms Dictionary Φ = piecewise constants (in 1 dimension) Extremely high coherence µ 1 1/d Linear combinations of piecewise constants L1 L2 R2 R1

Alternative problem formulations + algorithms Dictionary Φ = piecewise constants (in 1 dimension) Extremely high coherence µ 1 1/d Linear combinations of piecewise constants L1 L2 R2 R1 but, more natural to count buckets in histograms L1 R1, L2 R2, L3 R3 Use at most twice as many buckets as piecewise constants in sparse representation

V-OPT Theorem There is a dynamic programming algorithm which produces the k-bucket histogram H k that minimizes x H k 2. The algorithm runs in time O(kd 2 ). [Jagadish, et al 98].

V-OPT Theorem There is a dynamic programming algorithm which produces the k-bucket histogram H k that minimizes x H k 2. The algorithm runs in time O(kd 2 ). [Jagadish, et al 98]. Dynamic programming is a different algorithmic technique from both greedy iterative algorithms and convex optimization.

V-OPT OPT(j,k-1) + cost(j+1,d) Idea: within a bucket, mean of signal values is best approximation 1 j+1 d

V-OPT OPT(j,k-1) + cost(j+1,d) Idea: within a bucket, mean of signal values is best approximation Assume last bucket is on [j + 1, d] 1 j+1 d

V-OPT OPT(j,k-1) + cost(j+1,d) 1 j+1 d Idea: within a bucket, mean of signal values is best approximation Assume last bucket is on [j + 1, d] What can we say about the remaining k 1 buckets?

V-OPT OPT(j,k-1) + cost(j+1,d) 1 j+1 d Idea: within a bucket, mean of signal values is best approximation Assume last bucket is on [j + 1, d] What can we say about the remaining k 1 buckets? must be optimal for range [1, j] with k 1 buckets

V-OPT OPT(j,k-1) + cost(j+1,d) 1 j+1 d where opt[d, k] = min {opt[j, k 1] + cost[(j + 1), d]} 1 j<d opt[j, k] = minimum cost of representing the set of values on [1, j] by histogram with k buckets and cost[(j + 1), d] = l 2 error on [j + 1, d]

V-OPT[Jagadish, et al 98] Input. signal x, buckets k Output. k bucket histogram H k x for i=1 to d for j=1 to k for l=1 to i-1 (split pt of j-1 bucket hist. and last bucket) OPT[i, j] = min OPT[i, j], OPT[l,j-1] + cost[l+1,i] OPT k d

Images Signals There is a big difference between 1 and any higher dimension for histograms

Images Signals There is a big difference between 1 and any higher dimension for histograms Finding the optimal k-rectangle histogram in 2d is NP-hard

Images Signals There is a big difference between 1 and any higher dimension for histograms Finding the optimal k-rectangle histogram in 2d is NP-hard There is an efficient algorithm which achieves minimal error (for k rectangles) with at most 4k rectangles. [Muthukrishnan and Strauss, 2003]

Summary Sparse approximation spawned many research directions, activities, applications, in many fields New algorithms, new applications Many new veins to be mined in next lectures