Near Optimal Signal Recovery from Random Projections

Similar documents
Robust Uncertainty Principles: Exact Signal Reconstruction from Highly Incomplete Frequency Information

Near Optimal Signal Recovery From Random Projections: Universal Encoding Strategies?

arxiv:math/ v3 [math.ca] 4 Apr 2006

5406 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 52, NO. 12, DECEMBER 2006

Error Correction via Linear Programming

Stable Signal Recovery from Incomplete and Inaccurate Measurements

Thresholds for the Recovery of Sparse Solutions via L1 Minimization

Combining geometry and combinatorics

Practical Signal Recovery from Random Projections

Sparse analysis Lecture VII: Combining geometry and combinatorics, sparse matrices for sparse signal recovery

Sparse Optimization Lecture: Sparse Recovery Guarantees

Reconstruction from Anisotropic Random Measurements

An Overview of Sparsity with Applications to Compression, Restoration, and Inverse Problems

Sparsity in Underdetermined Systems

Lecture: Introduction to Compressed Sensing Sparse Recovery Guarantees

Signal Recovery from Permuted Observations

Large-Scale L1-Related Minimization in Compressive Sensing and Beyond

Compressed sensing. Or: the equation Ax = b, revisited. Terence Tao. Mahler Lecture Series. University of California, Los Angeles

Stochastic geometry and random matrix theory in CS

Sparse analysis Lecture V: From Sparse Approximation to Sparse Signal Recovery

Super-resolution via Convex Programming

Constrained optimization

An Introduction to Sparse Approximation

Recovering overcomplete sparse representations from structured sensing

A Survey of Compressive Sensing and Applications

Randomness-in-Structured Ensembles for Compressed Sensing of Images

Lecture Notes 9: Constrained Optimization

Solution-recovery in l 1 -norm for non-square linear systems: deterministic conditions and open questions

Introduction How it works Theory behind Compressed Sensing. Compressed Sensing. Huichao Xue. CS3750 Fall 2011

Compressed Sensing and Sparse Recovery

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

AN INTRODUCTION TO COMPRESSIVE SENSING

Compressive Sensing with Random Matrices

Signal Recovery, Uncertainty Relations, and Minkowski Dimension

Algorithms for sparse analysis Lecture I: Background on sparse approximation

Uniform Uncertainty Principle and signal recovery via Regularized Orthogonal Matching Pursuit

Mathematics Subject Classification (2000). Primary 00A69, 41-02, 68P30; Secondary 62C65.

MAT 585: Johnson-Lindenstrauss, Group testing, and Compressed Sensing

CS 229r: Algorithms for Big Data Fall Lecture 19 Nov 5

Multiscale Geometric Analysis: Thoughts and Applications (a summary)

Primal Dual Pursuit A Homotopy based Algorithm for the Dantzig Selector

INDUSTRIAL MATHEMATICS INSTITUTE. B.S. Kashin and V.N. Temlyakov. IMI Preprint Series. Department of Mathematics University of South Carolina

Uncertainty principles and sparse approximation

The uniform uncertainty principle and compressed sensing Harmonic analysis and related topics, Seville December 5, 2008

Optimisation Combinatoire et Convexe.

Towards a Mathematical Theory of Super-resolution

Pre-weighted Matching Pursuit Algorithms for Sparse Recovery

Recent Developments in Compressed Sensing

Phase Transition Phenomenon in Sparse Approximation

Compressibility of Infinite Sequences and its Interplay with Compressed Sensing Recovery

The Sparsity Gap. Joel A. Tropp. Computing & Mathematical Sciences California Institute of Technology

COMPRESSED SENSING IN PYTHON

ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis

An Introduction to Sparse Representations and Compressive Sensing. Part I

Inverse problems and sparse models (6/6) Rémi Gribonval INRIA Rennes - Bretagne Atlantique, France.

Package R1magic. August 29, 2016

Compressive Sensing Theory and L1-Related Optimization Algorithms

Inverse problems, Dictionary based Signal Models and Compressed Sensing

Near Ideal Behavior of a Modified Elastic Net Algorithm in Compressed Sensing

Package R1magic. April 9, 2013

Compressed Sensing and Neural Networks

A Structured Construction of Optimal Measurement Matrix for Noiseless Compressed Sensing via Polarization of Analog Transmission

ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis

Tractable Upper Bounds on the Restricted Isometry Constant

Compressive Sensing and Beyond

Lecture 16 Oct. 26, 2017

Quantitative Robust Uncertainty Principles and Optimally Sparse Decompositions

Compressive Sensing (CS)

Robust Principal Component Analysis

Quantitative Robust Uncertainty Principles and Optimally Sparse Decompositions

Sensing systems limited by constraints: physical size, time, cost, energy

Strengthened Sobolev inequalities for a random subspace of functions

Compressed Sensing: Lecture I. Ronald DeVore

Tutorial: Sparse Signal Recovery

Sparse recovery using sparse random matrices

Z Algorithmic Superpower Randomization October 15th, Lecture 12

A Generalized Uncertainty Principle and Sparse Representation in Pairs of Bases

Simultaneous Sparsity

Greedy Signal Recovery and Uniform Uncertainty Principles

MIT 9.520/6.860, Fall 2017 Statistical Learning Theory and Applications. Class 19: Data Representation by Design

Lecture 3. Random Fourier measurements

Making Flippy Floppy

Estimating Unknown Sparsity in Compressed Sensing

arxiv: v1 [cs.it] 14 Apr 2009

GREEDY SIGNAL RECOVERY REVIEW

Minimizing the Difference of L 1 and L 2 Norms with Applications

Gauge optimization and duality

ANALOG-TO-DIGITAL converters (ADCs) are an essential. Democracy in Action: Quantization, Saturation, and Compressive Sensing

Optimization-based sparse recovery: Compressed sensing vs. super-resolution

The convex algebraic geometry of rank minimization

Sparse & Redundant Signal Representation, and its Role in Image Processing

Model-Based Compressive Sensing for Signal Ensembles. Marco F. Duarte Volkan Cevher Richard G. Baraniuk

Beyond incoherence and beyond sparsity: compressed sensing in the real world

Generalized Orthogonal Matching Pursuit- A Review and Some

Uniform Uncertainty Principle and Signal Recovery via Regularized Orthogonal Matching Pursuit

Data Sparse Matrix Computation - Lecture 20

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 57, NO. 11, NOVEMBER On the Performance of Sparse Recovery

We will now focus on underdetermined systems of equations: data acquisition system

Small Ball Probability, Arithmetic Structure and Random Matrices

The Dantzig Selector

Transcription:

1 Near Optimal Signal Recovery from Random Projections Emmanuel Candès, California Institute of Technology Multiscale Geometric Analysis in High Dimensions: Workshop # 2 IPAM, UCLA, October 2004 Collaborators: Justin Romberg (Caltech), Terence Tao (UCLA)

2 Recovery Problem Object f R N we wish to reconstruct: digital signal, image; dataset. Can take linear measurements y k = f, ψ k, k = 1, 2,..., K. How many measurements do we need to do recover f to within accuracy ɛ f f l2 ɛ for typical objects f taken from some class f F R N. Interested in practical reconstruction methods.

3 Agenda Background: exact reconstruction of sparse signals Near-optimal reconstruction of compressible signals Uniform uncertainty principles Exact reconstruction principles Relationship with coding theory Numerical experiments

Sparse Signals 4 Vector f R N ; digital signal, coefficients of a digital signal/image, etc.) T nonzero coordinates ( T spikes) T := {t, f(t) 0} Do not know the locations of the spikes Do not know the amplitude of the spikes

5 Recovery of Sparse Signals Sparse signal f: T spikes Available information y = F f, F is K by N with K << N Can we recover f from K measurements?

6 Fourier Ensemble Random set Ω {0,..., N 1}, Ω = K. Random frequency measurements: observe (F f) k = ˆf(k) ˆf(k) = N 1 t=0 f(t)e i2πkt/n, k Ω

7 Exact Recovery from Random Frequency Samples Available information: y k = ˆf(k), Ω random and Ω = K. To recover f, simply solve where (P 1 ) f = argmin g R N g l1, subject to F g = F f. g l1 := Theorem 1 (C., Romberg, Tao) Suppose N 1 t=0 g(t). K α T log N. Then the reconstruction is exact with prob. greater than 1 O(N αρ ) for some fixed ρ > 0: f = f. (N.b. ρ 1/29 works).

8 Exact Recovery from Gaussian Measurements Gaussian random matrix F (k, t) = X k,t, X k,t i.i.d. N(0, 1) This will be called the Gaussian ensemble Solve (P 1 ) f = argmin g R N g l1 subject to F g = F f. Theorem 2 (C., Tao) Suppose K α T log N. Then the reconstruction is exact with prob. greater than 1 O(N αρ ) for some fixed ρ > 0: f = f.

9 Gaussian Random Measurements y k = f, X, X t i.i.d. N(0, 1) 0.2 Random basis element 0.15 0.1 0.05 0 0.05 0.1 0.15 0.2 0 50 100 150 200 250 300

10 Equivalence Combinatorial optimization problem (P 0 ) min g g l0 := #{t, g(t) 0}, F g = F f Convex optimization problem (LP) Equivalence: (P 1 ) min g g l1, F g = F f For K T log N, the solutions to (P 0 ) and (P 1 ) are unique and are the same!

11 About the l 1 -norm Minimum l 1 -norm reconstruction in widespread use Santosa and Symes (1986) proposed this rule to reconstruct spike trains from incomplete data Connected with Total-Variation approaches, e.g. Rudin, Osher, Fatemi (1992) More recently, l 1 -minimization, Basis Pursuit, has been proposed as a convex alternative to the combinatorial norm l 0. Chen, Donoho Saunders (1996) Relationships with uncertainty principles: Donoho & Huo (01), Gribonval & Nielsen (03), Tropp (03) and (04), Donoho & Elad (03)

12 min l 1 as LP min x l1 subject to Ax = b Reformulated as an LP (at least since the 50 s). Split x into x = x + x min 1 T x + + 1 T x subject to ( A ) A x + = b x x + 0, x 0

13 Reconstruction of Spike Trains from Fourier Samples Gilbert et al. (04) Santosa & Symes (86) Dobson & Santosa (96) Bresler & Feng (96) Vetterli et. al. (03)

14 Why Does This Work? Geometric Viewpoint Suppose f R 2, f = (0, 1). f = f y = F f f y = F f f Exact Miss

Higher Dimensions 15

16 Duality in Linear/Convex Programming f unique solution if and only if dual is feasible Dual is feasible if there is P R N P is in the rowspace of F P is a subgradient of f l1 P f l1 P (t) = sgn(f(t)), t T P (t) < 1, t T c

17 Interpretation: Dual Feasibility with Freq. Samples P(t) ^P(!) t! Space Frequency

18 Numerical Results Signal length N = 256 Randomly place T spikes, observe K Gaussian coefficients Measure % recovered perfectly White = always recovered, black = never recovered Empirical probablities of exact reconstruction: spikes and random basis 10 20 30 W 40 50 60 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 T / W

Numerical Results 19 Signal length N = 1024 Randomly place T spikes, observe K random frequencies Measure % recovered perfectly white = always recovered, black = never recovered 100 K 20 40 60 80 recovery rate 90 80 70 60 50 40 100 30 20 120 10 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 T /K T /K

Reconstruction of Piecewise Polynomials, I 20 Randomly select a few jump discontinuities Randomly select cubic polynomial in between jumps Observe about 500 random coefficients Minimize l 1 norm of wavelet coefficients 0.3 0.2 Exact Reconstruction of a random piecewise cubic polynomial from 500 random coefficients Original signal Reconstructed signal 0.1 0 S(t) 0.1 0.2 0.3 0.4 0.5 0 500 1000 1500 2000 2500 t Reconstructed signal

21 Reconstruction of Piecewise Polynomials,II Randomly select 8 jump discontinuities Randomly select cubic polynomial in between jumps Observe about 200 Fourier coefficients at random 0.6 Exact Reconstruction of a random piecewise cubic polynomial from 60 Fourier samples Original signal Reconstructed signal 0.4 0.2 0 S(t) 0.2 0.4 0.6 0.8 0 500 1000 1500 2000 2500 t Reconstructed signal

22 Reconstruction of Piecewise Polynomials, III 0.6 0.4 Exact Reconstruction of a random piecewise cubic polynomial from 60 Fourier samples Original signal Reconstructed signal 1 0.8 Exact Reconstruction of a random piecewise cubic polynomial from 60 Fourier samples Original signal Reconstructed signal 0.6 0.2 0.4 0 0.2 S(t) S(t) 0 0.2 0.2 0.4 0.4 0.6 0.6 0.8 0.8 0 500 1000 1500 2000 2500 t Reconstructed signal 1 0 500 1000 1500 2000 2500 t Reconstructed signal About 200 Fourier coefficients only!

Minimum TV Reconstruction 23 Many extensions: min g g T V s.t. ĝ(ω) = ˆf(ω), ω Ω Naive Reconstruction Reconstruction: min BV + nonnegativity constraint 50 50 100 100 150 150 200 200 250 50 100 150 200 250 min l 2 250 50 100 150 200 250 min TV Exact!

24 Other Phantoms Classical Reconstruction Total Variation Reconstruction 50 50 100 100 150 150 200 200 250 50 100 150 200 250 min l 2 250 50 100 150 200 250 min TV Exact!

Compressible Signals 25 In real life, signals are not sparse but most of them are compressible Compressible signals: rearrange the entries in decreasing order f 2 (1) f 2 (2)... f 2 (N) F p (C) = {f : f (n) Cn 1/p, n} This is what makes transform coders work (sparse coding) 1 power decay law 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 10 20 30 40 50 60 70 80 90 100

26 Compressible Signals I: Wavelets in 1D 10 2 Decay of Reordered Mother Wavelet Coefficients 10 0 10 2 0.5 0.4 0.3 Doppler Signal 10 4 10 6 0.2 0.1 0 10 8 10 10 0.1 0.2 10 12 0.3 10 14 0.4 0.5 0 500 1000 1500 2000 2500 10 16 10 1 10 2 10 3 10 4

27 Compressible Signals II: Wavelets in 2D 10 4 Decay of Reordered Mother Wavelet Coefficients 10 3 Lena Barbara 50 10 2 100 150 200 10 1 10 0 250 10 1 300 350 400 450 10 2 10 3 10 4 500 50 100 150 200 250 300 350 400 450 500 10 5 10 1 10 2 10 3 10 4 10 5 10 6

28 Compressible Signals II: Curvelets 10 3 Decay of Reordered Curvelet Coefficients 10 2 Lena Barbara 50 10 1 100 150 200 10 0 10 1 250 10 2 300 350 400 450 10 3 10 4 10 5 500 50 100 150 200 250 300 350 400 450 500 10 6 10 0 10 1 10 2 10 3 10 4 10 5 10 6

29 Examples of Compressible Signals Smooth signals. Continuous-time object has s bounded derivatives, then nth largest entry of the wavelet or Fourier coefficient sequence C n s 1/2 1 dimension f (n) C n s/d 1/2 d dimension Signals with bounded variations. In 2 dimensions, the BV norm of a continuous time object is approximately f BV f L1 In the wavelet domain θ(f) (n) C n 1. Many other examples: e.g. Gabor atoms and certain classes of oscillatory signals, curvelets and images with edges, etc.

30 Nonlinear Approximation of Compressible Signals f F p (C), f (n) C n 1/p Keep K-largest entries in f f K f f K C K r, r = 1/p 1/2. E.g. p = 1, f f K C K 1/2. Recovery of Compressible Signals How many measurements to recover f to within precision ɛ = K r. Intuition: at least K, probably many more.

31 Where Are the Largest Coefficients? 1 0.8 0.6 0.4 0.2 0 0.2 0.4 0.6 0.8 1 0 5 10 15 20 25 30 35 40 45 50

32 Near Optimal Recovery of Compressible Signals Select K Gaussian random vectors (X k ), k = 1,..., K X k N(0, I N ) Observe y k = f, X k Reconstruct by solving (P 1 ); minimize the l 1 -norm subject to constraints. Theorem 3 (C., Tao) Suppose that f F p (C) for 0 < p 1 or f l1 p = 1. Then with overwhelming probability, for f # f 2 = C (K/ log N) r. See also Donoho (2004)

33 Big Surprise Want to know an object up to an error ɛ; e.g. an object whose wavelet coefficients are sparse. Strategy 1: Oracle tells exactly (or you collect all N wavelet coefficients) which K coefficients are large and measure those f f K ɛ Strategy 2: Collect K log N random coefficients and reconstruct using l 1. Surprising claim Same performance but with only K log N coefficients! Performance is achieved by solving an LP.

34 Optimality Can you do with fewer than K log N for accuracy K r? Simple answer: NO (at least in the range K << N)

35 Optimality: Example f B 1 := {f, f l1 1} Entropy numbers: for a given set F R N, we N(F, r) is the smallest number of Euclidean balls of radius r which cover F e k = inf{r > 0 : N(F, r) 2 k 1 }. Interpretation in coding theory: to encode a signal from F to within precision e k, one would need at least k bits. Entropy estimates (Schütt (1984), Kühn (2001)) e k ( ) 1/2 log(n/k + 1), log N k N. k To encode an object f in the l 1 -ball to within precision 1/ K one would need to spend at least O(K log(n/k)) bits. For K N β, β < 1, O(K log N) bits.

36 Gelfand n-width (Optimality) k measurements F f; this sets the constraint that f live on an affine space f 0 + S where S is a linear subspace of co-dimension less or equal to k. The data available for the problem cannot distinguish any object belonging to that plane. For our problem, the data cannot distinguish between any two points in the intersection B 1 f 0 + S. Therefore, any reconstruction procedure f (y) based upon y = F Ω f would obey sup f F f f diam(b 1 S). 2 The Gelfand numbers of a set F are defined as c k = inf S {sup f F f S : codim(s) < k},

Gelfand Width 37

38 Gelfand and entropy numbers (Optimality) Gelfand numbers dominate the entropy numbers (Carl, 1981) ( ) 1/2 log N/k e k c k k Therefore, for error 1/k k log N/k #meas. Similar argument for F p

39 Something Special about Gaussian Measurements? Works with the other measurement ensembles Binary ensemble: F (k, t) = ±1 with prob. 1/2 f f 2 = C (K/ log N) r. Fourier ensemble: f f 2 = C (K/ log 3 N) r.

40 Axiomatization I: Uniform Uncertainty Principle (UUP) A measurement matrix F obeys the UUP with oversampling factor λ if for all subsets T such that T α K/λ, the matrix F T obtained by extracting T columns obeys the bounds 1 2 K/N λ min(f T F T ) λ max (F T F T ) 3 2 K/N This must be true w.p. at least 1 O(N ρ/α )

41 UUP: Interpretation Suppose F is the randomly sampled DFT, Ω set of random frequencies. Signal f with support T obeying T αk/λ UUP says that with overwhelming probability ˆf Ω 3K/2N f W. Heisenberg No concentration is possible unless K N Uniform because must hold for all such T s

42 Axiomatization II: Exact Reconstruction Principle (ERP) A measurement matrix F obeys the ERP with oversampling factor λ if for each fixed subset T T α K/λ, and each sign vector σ defined on T, σ(t) = 1, there exists P R N s.t. (i) P is in the row space of F (ii) P (t) = σ(t), for all t T ; (iii) and P (t) 1 2 for all t T c Interpretation: Gives exact reconstruction for sparse signals

43 Near-optimal Recovery Theorem [C., Tao] Measurement ensemble obeys UUP with oversampling factor λ 1 Measurement ensemble obeys ERP with oversampling factor λ 2 Object f F p (C). f f C (K/λ) r, λ = max(λ 1, λ 2 ).

44 UUP for the Gaussian Ensemble F (k, t) = X k,t / N Singular values of random Gaussian matrices (1 c) K N σ min(f T ) σ max (F T ) (1 + c) K N with overwhelming probability (exceeding 1 e βk ). Reference, S. J. Szarek, Condition numbers of random matrices, J. Complexity (1991), See also Ledoux (2001), Johnstone (2002), El-Karoui (2004) Marchenko-Pastur law: c = T /K. Union bound give result for all T provided T γ K/(log N/K).

45 ERP for the Gaussian Ensemble P = F F T (F T F T ) 1 sgn(f) := F V. P is in the row space of F P agrees with sgn(f) on T P T = F T F T (F T F T ) 1 sgn(f) = sgn(f) On the complement of T : P T c = FT V. c P (t) 1 N K k=1 X k,t V k. By independence of F T c and V, conditional distribution L(P (t) V ) N(0, V 2 /N)

46 With overwhelming probability (UUP) V F T (F T F T ) 1 sgn(f) 6N/K T so that L(P (t) V ) N(0, σ 2 ). σ 2 6 T /K In conlusion: for t T c P (P (t) > 1/2) e βk/ T. ERP holds if K α T log N.

47 Universal Codes Want to compress sparse signals Encoder. To encode a discrete signal f, the encoder simply calculates the coefficients y k = f, X k and quantizes the vector y. Decoder. The decoder then receives the quantized values and reconstructs a signal by solving the linear program (P 1 ). Conjecture Asymptotically nearly achieves the information theoretic limit.

48 Information Theoretic Limit: Example Want to encode the unit-l 1 ball: f R N : t f(t) 1. Want to achieve distortion D f f 2 D How many bits? Lower bounded by entropy of the unit-l 1 ball: # bits C D (log(n/d) + 1) Same as number of measured coefficients

49 Robustness Say with K coefficients f f 2 1/K Say we loose half of the bits (packet loss). How bad is the reconstruction? f f 50% 2 2/K Democratic!

50 Reconstruction from 100 Random Coefficients 6 5 Reconstruction of "Blocks" from 100 random coefficients Original signal Reconstructed signal 4 3 2 1 0 1 2 0 200 400 600 800 1000 1200 Compressed signal

51 Reconstruction from Random Coefficients 0.2 Reconstruction of "CUSP" from 150 random coefficients Original signal Reconstructed signal 0.6 Reconstruction of "Doppler" from 200 random coefficients Original signal Reconstructed signal 0.1 0.4 0 0.2 0.1 0 0.2 0.2 0.3 0.4 0.4 0.5 0.6 0.6 0 200 400 600 800 1000 1200 Cusp from 150 coeffs. 0.8 0 200 400 600 800 1000 1200 Doppler from 200 coeffs.

52 Reconstruction from Random Coefficients (Method II) 0.2 0.1 Original signal Reconstructed signal Reconstruction of "Cusp" from 150 random coefficients 0 0.1 0.2 0.3 0.4 0.5 0.6 0 200 400 600 800 1000 1200

53 Summary Possible to reconstruct a compressible signal from a few measurements only Need to randomize measurements Need to solve an LP This strategy is nearly optimal Many applications