Introduction to compressive sampling

Similar documents
Compressed sensing. Or: the equation Ax = b, revisited. Terence Tao. Mahler Lecture Series. University of California, Los Angeles

An Introduction to Sparse Approximation

Introduction to Compressed Sensing

Near Ideal Behavior of a Modified Elastic Net Algorithm in Compressed Sensing

New Coherence and RIP Analysis for Weak. Orthogonal Matching Pursuit

Compressive Sensing and Beyond

Large-Scale L1-Related Minimization in Compressive Sensing and Beyond

Generalized Orthogonal Matching Pursuit- A Review and Some

Model-Based Compressive Sensing for Signal Ensembles. Marco F. Duarte Volkan Cevher Richard G. Baraniuk

Introduction How it works Theory behind Compressed Sensing. Compressed Sensing. Huichao Xue. CS3750 Fall 2011

INDUSTRIAL MATHEMATICS INSTITUTE. B.S. Kashin and V.N. Temlyakov. IMI Preprint Series. Department of Mathematics University of South Carolina

COMPRESSED SENSING IN PYTHON

Compressed Sensing. 1 Introduction. 2 Design of Measurement Matrices

The Fundamentals of Compressive Sensing

Strengthened Sobolev inequalities for a random subspace of functions

Lecture 22: More On Compressed Sensing

Sparsity in Underdetermined Systems

Lecture: Introduction to Compressed Sensing Sparse Recovery Guarantees

Compressed Sensing and Related Learning Problems

Reconstruction from Anisotropic Random Measurements

IEEE SIGNAL PROCESSING LETTERS, VOL. 22, NO. 9, SEPTEMBER

Compressed Sensing: Lecture I. Ronald DeVore

Super-resolution via Convex Programming

Lecture Notes 9: Constrained Optimization

Compressive Sensing Theory and L1-Related Optimization Algorithms

SPARSE signal representations have gained popularity in recent

The Analysis Cosparse Model for Signals and Images

Noisy Signal Recovery via Iterative Reweighted L1-Minimization

Orthogonal Matching Pursuit for Sparse Signal Recovery With Noise

A new method on deterministic construction of the measurement matrix in compressed sensing

Enhanced Compressive Sensing and More

Random projections. 1 Introduction. 2 Dimensionality reduction. Lecture notes 5 February 29, 2016

Pre-weighted Matching Pursuit Algorithms for Sparse Recovery

Compressed Sensing: Extending CLEAN and NNLS

Optimization Algorithms for Compressed Sensing

of Orthogonal Matching Pursuit

Inverse problems and sparse models (6/6) Rémi Gribonval INRIA Rennes - Bretagne Atlantique, France.

Tractable Upper Bounds on the Restricted Isometry Constant

Signal Recovery from Permuted Observations

CoSaMP. Iterative signal recovery from incomplete and inaccurate samples. Joel A. Tropp

Recent Developments in Compressed Sensing

Exponential decay of reconstruction error from binary measurements of sparse signals

Compressed Sensing and Sparse Recovery

Introduction to Sparsity. Xudong Cao, Jake Dreamtree & Jerry 04/05/2012

Thresholds for the Recovery of Sparse Solutions via L1 Minimization

AN INTRODUCTION TO COMPRESSIVE SENSING

GREEDY SIGNAL RECOVERY REVIEW

Near Optimal Signal Recovery from Random Projections

Optimization for Compressed Sensing

Inverse problems and sparse models (1/2) Rémi Gribonval INRIA Rennes - Bretagne Atlantique, France

Exact Signal Recovery from Sparsely Corrupted Measurements through the Pursuit of Justice

Stochastic geometry and random matrix theory in CS

Sparse linear models

Primal Dual Pursuit A Homotopy based Algorithm for the Dantzig Selector

An Overview of Compressed Sensing

Multipath Matching Pursuit

Basis Pursuit Denoising and the Dantzig Selector

A Survey of Compressive Sensing and Applications

Sparse analysis Lecture V: From Sparse Approximation to Sparse Signal Recovery

Bhaskar Rao Department of Electrical and Computer Engineering University of California, San Diego

Sparsity Regularization

Compressive Sensing (CS)

People Hearing Without Listening: An Introduction To Compressive Sampling

Robust Uncertainty Principles: Exact Signal Reconstruction from Highly Incomplete Frequency Information

Solution Recovery via L1 minimization: What are possible and Why?

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference

Sparse Solutions of Systems of Equations and Sparse Modelling of Signals and Images

Sparse Optimization Lecture: Sparse Recovery Guarantees

Exploiting Sparsity for Wireless Communications

Wavelet Footprints: Theory, Algorithms, and Applications

The Pros and Cons of Compressive Sensing

Solving Underdetermined Linear Equations and Overdetermined Quadratic Equations (using Convex Programming)

Recovery of Compressible Signals in Unions of Subspaces

Sparse linear models and denoising

Randomness-in-Structured Ensembles for Compressed Sensing of Images

Part IV Compressed Sensing

Sparse Optimization Lecture: Basic Sparse Optimization Models

The Sparsest Solution of Underdetermined Linear System by l q minimization for 0 < q 1

Color Scheme. swright/pcmi/ M. Figueiredo and S. Wright () Inference and Optimization PCMI, July / 14

Necessary and sufficient conditions of solution uniqueness in l 1 minimization

Komprimované snímání a LASSO jako metody zpracování vysocedimenzionálních dat

Stable Signal Recovery from Incomplete and Inaccurate Measurements

5742 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 55, NO. 12, DECEMBER /$ IEEE

Compressed Sensing Using Reed- Solomon and Q-Ary LDPC Codes

Recovering overcomplete sparse representations from structured sensing

Optimisation Combinatoire et Convexe.

Structured matrix factorizations. Example: Eigenfaces

Compressed Sensing and Neural Networks

Interpolation via weighted l 1 minimization

COMS 4721: Machine Learning for Data Science Lecture 6, 2/2/2017

RSP-Based Analysis for Sparsest and Least l 1 -Norm Solutions to Underdetermined Linear Systems

Analog-to-Information Conversion

Sensing systems limited by constraints: physical size, time, cost, energy

Performance Analysis of Compressive Sensing Algorithms for Image Processing

CS 229r: Algorithms for Big Data Fall Lecture 19 Nov 5

Block-sparse Solutions using Kernel Block RIP and its Application to Group Lasso

AN OVERVIEW OF ROBUST COMPRESSIVE SENSING OF SPARSE SIGNALS IN IMPULSIVE NOISE

Towards a Mathematical Theory of Super-resolution

Sparse analysis Lecture III: Dictionary geometry and greedy algorithms

ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis

Transcription:

Introduction to compressive sampling Sparsity and the equation Ax = y Emanuele Grossi DAEIMI, Università degli Studi di Cassino e-mail: e.grossi@unicas.it Gorini 2010, Pistoia

Outline 1 Introduction Traditional data acquisition Compressive data acquisition 2 Compressed sensing Measurement protocol Recovery procedure Recovery conditions Sensing matrices 3 Discussion Connections with other fields Numerical methods Applications Conclusions

Outline 1 Introduction Traditional data acquisition Compressive data acquisition 2 Compressed sensing Measurement protocol Recovery procedure Recovery conditions Sensing matrices 3 Discussion Connections with other fields Numerical methods Applications Conclusions

Traditional data acquisition Uniformly sample (or sense) data at Nyquist rate Compress data (adaptive, non linear) data sample size n compress transmit/store size k n receive size k decompress recovered data size n

Sparsity/Compressibility Many signals can be well-approximated by a sparse expansion in terms of a suitable basis, i.e., by few of non-zero coefficients Fourier transform time, n = 512 frequency, k = 6 n

Introduction Compressed sensing Discussion Sparsity/Compressibility Wavelet transform 1.5 MB image wavelet domain

Introduction Compressed sensing Discussion Sparsity/Compressibility Wavelet transform 1100 1000 900 800 700 600 500 400 300 200 100 0 n = 6.016 106 coefficients threshold 0 1 2 3 4 wavelet coefficients (sorted) k = 7% of n 5 6 6 x 10

Introduction Compressed sensing Discussion Sparsity/Compressibility Wavelet transform original compress & decompress

Traditional data acquisition data transmit/store sample compress size n size k n receive size k decompress recovered data size n Pro: simple data recovery Cons: inefficient n can be very large even if k is small n transform coefficients must be computed but only the largest k are stored also the location of the k largest coefficients must be encoded (=overhead) in some applications measurements can be costly, lengthy, or otherwise difficult (e.g., radar, MRI, etc.)

Outline 1 Introduction Traditional data acquisition Compressive data acquisition 2 Compressed sensing Measurement protocol Recovery procedure Recovery conditions Sensing matrices 3 Discussion Connections with other fields Numerical methods Applications Conclusions

Compressive data acquisition Why spend so much effort to acquire all the data when most of it will be discarded? Wouldn t it be possible to acquire the data in a compressed form so that one does not need to throw away anything? Yes: compressed sensing (CS) Compressed sensing, a.k.a. as compressive sensing or compressive sampling, is a simple and efficient signal acquisition protocol E. J. Candès and T. Tao, Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information, IEEE Trans. Inorm. Theory, 2006; D. L. Donoho, Compressed sensing, IEEE Trans. Inorm. Theory, 2006.

Compressive data acquisition CS samples in a signal independent fashion at low rate later uses computational power and exploit sparsity for reconstruction from what appears to be an incomplete set of measurements data compressed sensing transmit/store size m = O(k ln n) receive size m reconstruct sparsity-aware recovered data size n reduced measurement time reduced sampling rates reduced ADC resources usage

Outline 1 Introduction Traditional data acquisition Compressive data acquisition 2 Compressed sensing Measurement protocol Recovery procedure Recovery conditions Sensing matrices 3 Discussion Connections with other fields Numerical methods Applications Conclusions

CS recipe Sparse signal representation Coded measurements (sampling process) Recovery algorithms (non-linear)

Sparse signal representation Many types of real-world signals (e.g., sound, images, video) can be viewed as an n-dimensional vector of real numbers, where n is large (e.g., n = 10 6 ) They may have a concise representation in a suitable basis 1 n s = s. = x i ψ i = Ψ x s i=1 n signal to be sensed basis vectors (not necessarily orthogonal) into an n n matrix Ψ = (ψ 1 ψ n ), e.g., spikes, sinusoids, wavelets, etc. signal coefficients into an n-dimensional vector

Sparsity and compressibility x p = ( n i=1 x i p) 1/p, lp -norm, e.g., x 2 = n x i 2, Euclidian x 1 = i=1 n x i, gives the Manhattan distance i=1 x 0 = card { i {1,..., n} : x i 0 }, number of non-zero entries (little abuse: it is not a norm)

Sparsity and compressibility x p = ( n i=1 x i p) 1/p, lp -norm Definitions: x sparse if x 0 n x k-sparse, if x 0 k n best k-term approximation of x x k = arg min x w p w: w 0 k = x with the smallest n k entries set to 0 x compressible if x x k p ck r, for some c > 0 and r > 1 (namely, decay quickly in k)

Traditional compression If x is compressible then encode x with x k Inefficiencies of the protocol: adaptive (i.e., x must be known to select its largest k entries) can be non-linear

Take m linear measurements Measurement model y = Φ s = ΦΨx = Ax linear model measurement matrix (m n) measures (m-dimensional vector) Common sensing matrices: the rows of Φ are Dirac delta s y contains the samples of s the rows of Φ are sinusoids many others y contains the Fourier coefficients (typical in MRI)

Traditional data acquisition (again) s size n measurement process y = Ax size m n compress x k size k n sensor side x k size k decompress ŝ size n receiver side compression is adaptive (or signal-dependent): need to know x inversion of Ax = y at the sensor side neglect sparsity: m n for matrix inversion

CS intuition If x is k-sparse, then it should have k degrees of freedom, not n only k measurements or so are needed Analogy with the 12 coin problem: Of 12 coins, one is counterfeit and weighs either more or less than the others. Find the counterfeit coin and say if its lighter or heavier with 3 weighings on a balance scale. General problem: (3 p 3)/2 coins coins and p weighings 3 coins: possible weighing plan 1 st weighing 2 nd weighing 1 2 1 3

CS intuition 12 coins: possible weighing plan 1 st weighing 2 nd weighing 3 rd weighing 1 2 3 10 4 5 6 11 1 2 3 11 7 8 9 10 1 4 7 10 2 5 8 12 Key points counterfeit data is sparse weigh the coins in suitably chosen batches each measurement picks up little information about many coins

CS protocol s size n measurement process y = Ax sensor side size m = O(k ln n) y size m reconstruct ŝ size n receiver side exploit sparsity m can be comparable with k inversion of Ax = y at the receiver side through non-linear processing measurements have to be suitably designed; remarkably, random measurement matrices work! non-adaptive sensing (i.e., signal-independent): need not to know x

CS protocol s size n measurement process y = Ax sensor side size m = O(k ln n) y size m reconstruct ŝ size n receiver side What is needed then is: a reconstruction algorithm to invert Ax = y a sensing matrix Φ that gives a good A

Outline 1 Introduction Traditional data acquisition Compressive data acquisition 2 Compressed sensing Measurement protocol Recovery procedure Recovery conditions Sensing matrices 3 Discussion Connections with other fields Numerical methods Applications Conclusions

The equation Ax = y Assume w.l.o.g. rank(a) = min{m, n} m = n, determined system solution = A 1 y m > n, over-determined system; two cases y I (A) solution = ( A T A) 1 A T y = A y y / I (A) (e.g., noisy measurements) no solution; the Least-squares (LS) one is arg min Ax y 2 = ( A T A) 1 A T y x }{{} A m < n, under-determined system infinite solutions; the LS one is arg min x 2 = A T( AA T ) 1 x: Ax=y }{{} A y

The under-determined case Recovery of x is possible only if prior information is available if x is low-energy (i.e., x 2 is small), then LS is reasonable Least-squares min x x 2, s.t. Ax = y Ax = y unique solution for any A and y solution in closed form

The under-determined case Recovery of x is possible only if prior information is available if x is sparse (i.e., x 0 is small) Problem (P 0 ) min x x 0, s.t. Ax = y Ax = y solution not always unique problem in general NP-hard

Uniqueness Proposition If any 2k m columns of A are linearly independent, then any k-sparse signal x can be recovered uniquely from Ax. Proof: If not there would exists x 1, x 2 such that Ax 1 = Ax 2. This implies A(x 1 x 2 ) = 0, with x 1 x 2 2k-sparse, and it is not possible Observation If (A) i, j are Gaussian (or from other continuous distribution) i.i.d. then the condition is satisfied w.p.1.

Computational complexity p p=2 p=1 x p=1/3 p=0 0 x x p p is { convex, if p 1 non-convex, otherwise That s why (P 0 ) is hard!

Computational complexity p p=2 p=1 x p=1/3 p=0 0 x Possible ways: look for iterative algorithms: greedy algorithms convex relaxation: use the convex norm with the lowest p

l 1 regularization Problem (P 1 ) min x x 1, s.t. Ax = y (P 1 ) is a convex optimization problem and admits a solution It can be recast as min t, x n t i, i=1 s.t. x i t i i, Ax = y linear program (LP) in the real case, second order cone program (SOCP) in the complex case fast (polynomial time), accurate algorithms are available

l 1 regularization Problem (P 1 ) min x x 1, s.t. Ax = y Ax = y Heuristic way to obtain sparse solutions In the example: the solution is always unique and sparse unless the line has ±45 slope if A is sampled from an i.i.d. continuous distribution, this happens w.p.0

l 0, l 1, and l 2 together {z : Az = y} ˆx = x {z : Az = y} x {z : Az = y} ˆx = x ˆx l 0 l 2 l 1 x is k-sparse, and y = Ax Example: k = 1, A R 2 3, any 2 columns of A are linearly independent

l 0, l 1, and l 2 together {z : Az = y} ˆx = x {z : Az = y} x {z : Az = y} ˆx = x ˆx l 0 l 2 l 1 x is k-sparse, and y = Ax l 0 works if any 2k columns of A are linearly independent l 2 never works l 1 works if the condition on A is strengthened

Example Reconstruction of a 512-long signal from 120 random measurements 5 1.5 1 0.5 0 0.5 1 1.5 4 3 2 1 0 1 2 3 4 0 100 200 300 400 500 5 0 100 200 300 400 500 superposition of 10 cosines in the time domain frequency domain

Example Reconstruction of a 512-long signal from 120 random measurements 5 4 3 2 1 0 1 2 3 4 1 0.5 0 0.5 1 5 4 3 2 1 0 1 2 3 4 5 0 100 200 300 400 500 0 100 200 300 400 500 5 0 100 200 300 400 500 frequency l 2 reconstruction l 1 reconstruction

Example Reconstruction of a 256 256 image (= 65536 pixels) from 5481 measurements in the Fourier domain Shepp-Logan phantom (a toy model for MRI) sampling pattern in the frequency domain (22 approximately radial lines) E. J. Candès, J. Romberg, and T. Tao, Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information, IEEE Trans. Inform. Theory, 2006.

Example Reconstruction of a 256 256 image (= 65536 pixels) from 5481 measurements in the Fourier domain original min-energy reconstruction min-tv reconstruction E. J. Candès, J. Romberg, and T. Tao, Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information, IEEE Trans. Inform. Theory, 2006.

Outline 1 Introduction Traditional data acquisition Compressive data acquisition 2 Compressed sensing Measurement protocol Recovery procedure Recovery conditions Sensing matrices 3 Discussion Connections with other fields Numerical methods Applications Conclusions

Sparse recovery & incoherence Theorem Let (a 1 a n ) be the columns of A, normalized so that a i 2 = 1 i, M = max i j a T i a j, and y = Ax. If x 0 < (1 + 1/M)/2 then x is the unique solution of (P 1 ). M = max i j a T i a j is called mutual coherence Easy to check but coarse/pessimistic D. L. Donoho and M. Elad, Optimally sparse representation in general (nonorthogonal) dictionaries via l1 minimization, Proc. Nat. Acad. Sci., 2003

Sparse recovery & RIP For k {1, 2,..., n}, let δ k be the smallest δ such that (1 δ) x 2 2 Ax 2 2 (1 + δ) x 2 2, x : x 0 k A satisfies the restricted isometry property (RIP) of order k if δ k [0, 1), i.e., any k columns are nearly orthogonal Theorem Let y = Ax and ˆx be the solution of (P 1 ). If δ 2k < 2 1, then x ˆx 1 C 0 x x k 1 x ˆx 2 C 0 x x k 1 / k for some constant C 0. In particular, if x is k-sparse, ˆx = x. E. J. Candès, The restricted isometry property and its implications for compressed sensing, Compte Rendus de l Academie des Sciences, 2008.

Noisy measurements Any measurement process introduces noise (say n) y = Ax + n In this case, if n 2 ɛ, Problem (P ɛ 1) min x x 1, s.t. Ax y 2 ɛ (P ɛ 1 ) is a convex optimization problem and can be recast as SOCP min t, x n t i, i=1 s.t. x i t i i, Ax y 2 ɛ

Noisy measurements Problem (P ɛ 1) min x x 1, s.t. Ax y 2 ɛ Ax y 2 ɛ

Theorem Approximate recovery & RIP Let y = Ax + n, with n 2 ɛ, and ˆx be the solution of (P ɛ 1 ). If δ 2k < 2 1, then x ˆx 2 C 0 x x k 1 / k + C 1 ɛ for some constant C 1 (C 0 same as before). Stable recovery Reconstruction error bounded by 2 terms: same as in the noiseless case proportional to the noise level C 0 and C 1 are rather small, e.g., if δ = 0.25, then C 0 5.5 and C 1 6 E. J. Candès, The restricted isometry property and its implications for compressed sensing, Compte Rendus de l Academie des Sciences, 2008.

Sparse recovery & NSP A has the null space property (NSP) of order k if, for some γ (0, 1), η T 1 γ η T c 1, η ker(a), T {1,..., n}, card(t) k The elements in the null space should have no structure (look like noise) NSP is actually equivalent to sparse l 1 -recovery since Theorem Let y = Ax. If A has NSP of order k, then x is the solution of (P 1 ) x k-sparse. Conversely, if x is the solution of (P 1 ) x k-sparse, then A has NSP of order 2k. A. Cohen, W. Dahmen, and R. DeVore, Compressed sensing and best k-term approximation, J. Amer. Math. Soc., 2009.

Recovery conditions Mutual coherence: easy to check but coarse/pessimistic RIP: maybe almost sharp, works in the noisy case, but hard to compute not invariant to invertible linear transformations G, i.e., y = Ax Gy = GAx, but A satisfies RIP GA satisfies RIP NSP: tight but hard to compute (usually NSP is verified through RIP) not available in the noisy case Others: many conditions are present in the literature (e.g., incoherence between Φ and Ψ)

How many measurements? If x 1 R, the reconstruction error from linear measurements ln n/m+1 of any recovery method is lower bounded by C 2 R m, for some constant C 2 If A is such that δ 2k 2 1 then x ˆx 2 C 0 x x k 1 / k C 0 R/ k Thus C 0R ln n/m+1 k C 2 R m, and then, for a constant C, m Ck(ln n/m + 1) O(k ln n) measurements are sufficient to recover the signal with an accuracy comparable to that attainable with direct knowledge of the k largest coefficients B. Kashin, Diameters of some finite-dimensional sets and classes of smooth functions, Izv. Akad. Nauk SSSR, Ser. Mat., 1977; A. Y. Garnaev and E. D. Gluskin, On widths of the Euclidean ball, Sov. Math. Dokl., 1984.

Outline 1 Introduction Traditional data acquisition Compressive data acquisition 2 Compressed sensing Measurement protocol Recovery procedure Recovery conditions Sensing matrices 3 Discussion Connections with other fields Numerical methods Applications Conclusions

Goal: find A which satisfies RIP Good sensing matrices Deterministic constructions of A have been proposed but m is much larger than the optimal value Try with random matrices and accept a (hopefully small) probability of failure Key property Concentration inequality The random matrix A satisfies the concentration inequality if, x and ɛ (0, 1), P ( Ax 2 2 x 2 2 ɛ x 2 2) 2e nc ɛ where c ɛ > 0

Good sensing matrices Theorem Let δ (0, 1). If A satisfies the concentration inequality, then there exist constants c 1, c 2 > 0 depending only on δ such that the restricted isometry constant of A satisfies δ k δ with probability exceeding 1 2e c 1m provided that m c 2 k ln n/k. Observation: m c 2 k ln n/k m c 2 1+c 2 k(ln n/m + 1) R. Baraniuk, M. Davenport, R. DeVore, and M. Wakin, A Simple Proof of the Restricted Isometry Property for Random Matrices, Constr. Approx., 2009

Random sensing Random matrices Concentration inequality RIP solution of (P 1 )-(P ɛ 1 ) Random matrices allow perfect/approximate recovery of k-sparse/compressible signals with overwhelming probability using O(k ln n) measurements Examples Two important cases satisfy the concentration inequality Gaussian: (A) i, j N (0, 1/m) i.i.d. Bernoully: (A) i, j B(1/2) i.i.d. with values ±1/ m

The sensing matrix Recall that y = Φs = }{{} ΦΨ x, A { Φ = sensing matrix Ψ = sparsifying matrix If Ψ orthogonal Φ = AΨ T Not actually needed: just take Φ Gaussian or Bernoully If Φ satisfies the concentration inequality, so A does: P ( Ax 2 2 x 2 2 ɛ x 2 ) ( 2 =P Φs 2 2 Ψ T s 2 2 ɛ Ψ T s 2 ) 2 P ( Φs 2 2 s 2 2 ɛ s 2 ) 2 Random sensing is universal: it does not matter in which basis the signal is sparse (Ψ not needed at the sensor side)

Random partial Fourier matrices Gaussian and Bernoulli matrices provide minimal number of measurements but physical constraints on the sensor may preclude Gaussian both are not structured no fast matrix-vector multiplication algorithm Possible alternative: select m rows uniformly at random in a n n Fourier matrix F (F) h,k = 1 n e 2πihk/n equivalent to observe m random entries of the DFT of the signal RIP holds with high probability, but m Ck ln 4 n

More random matrices (A) i, j i.i.d. with distribution 1 6 δ 3 m + 2 3 δ 0 + 1 6 δ 3 m ; in this case m Ck ln n/k A is formed selecting m vectors uniformly at random from the surface of the unit l 2 sphere in R m ; in this case m Ck ln n/k A is formed selecting m rows uniformly at random from a unitary matrix U, and re-normalizing the columns to be unit l 2 -norm; in this case m Cµ 2 k ln 4 n where µ = n max i, j (U) i, j (in the Fourier matrix µ = 1)

More random matrices If U = ΦΨ, both unitary, then m Cµ 2 k ln 4 n, µ = n max < φ i, ψ j > i, j µ is a measure of the mutual incoherence between the measurement basis and the sparsity basis µ [ 1, n ], and low coherence is good: e.g., Φ = Fourier & Ψ = I gives µ = 1, i.e., maximal incoherence basis vectors of Ψ must be spread out in the basis Φ (e.g., in the Fourier-Identity case, δ exponential) sparse signal incoherent measurements

Outline 1 Introduction Traditional data acquisition Compressive data acquisition 2 Compressed sensing Measurement protocol Recovery procedure Recovery conditions Sensing matrices 3 Discussion Connections with other fields Numerical methods Applications Conclusions

Three equivalent problems There is a considerable interest in solving the unconstrained optimization problem (convex but non differentiable) { } min Ax y 2 x 2 + λ x 1 (1) Example: Bayesian estimation x Laplace, n white Gaussian, the MAP estimate of x from y = Ax + n solves (1), since max x f (x y) max x f (y x)f (x) max e 1 2σ 2 Ax y 2 2 e γ x 1 x

Three equivalent problems There is a considerable interest in solving the unconstrained optimization problem (convex but non differentiable) { } min Ax y 2 x 2 + λ x 1 (1) Problem (1) is closely related to min x 1, s.t. Ax y 2 x 2 ɛ (2a) min Ax y 2 x 2, s.t. x 1 η (2b) The solution of (2a), which is just (P ɛ 1 ), is either x = 0, or a solution of (1) for some λ > 0 The solution of (2b) is also a solution for (1) for some η

Geophysics: early references min x { Ax y 2 2 + λ x 1 } (1) Claerbount and Muir wrote in 1973 In deconvolving any observed seismic trace, it is rather disappointing to discover that there is a nonzero spike at every point in time regardless of the data sampling rate. One might hope to find spikes only where real geologic discontinuities take place. Perhaps the l 1 norm can be utilized to give a [sparse] output trace... Santosa and Symes proved in 1986 that (1) succeeds under mild conditions in recovering spike trains from seismic traces J. F. Claerbout and F. Muir, Robust modeling of erratic data, Geophysics, 1973. F. Santosa and W. W. Symes, Linear inversion of band-limited reflection seismograms, SIAM J. Sci. Statist. Comput.1986.

signal to be represented Signal processing: basis pursuit y = ( a 1 a n ) basis vectors (overcomplete) x 1. x n = Ax There is a very large number of basis functions (called dictionary) so that x is likely to be sparse (sparse) coefficients Goal: find a good fit of the signal as a linear combination of a small number of the basis functions i.e., basis pursuit (BP) BP finds signal representations in overcomplete dictionaries solving min x x 1, s.t. Ax = y S. S. Chen, D. L. Donoho, and M. A. Saunders, Atomic decomposition by basis pursuit, SIAM J. Sci. Comput., 1999.

Signal processing: basis pursuit y = Ax + n noise Can also deal with noisy measurements of the signal (basis pursuit denoising) In this case which is (P ɛ 1 ), or equivalently min x x 1, s.t. Ax y 2 2 ɛ min x { Ax y 2 2 + λ x 1 } S. S. Chen, D. L. Donoho, and M. A. Saunders, Atomic decomposition by basis pursuit, SIAM J. Sci. Comput., 1999.

Statistics: linear regression A common problem in statistics is linear regression y = n a i x i + n = Ax + n i=1 measures (response variables) regressors (explanatory variables) parameter vector (regression coefficients) noise (error term) i.i.d., zero-mean In oder to mitigate modeling biases, a large number of regressors can be included m < n Goals: minimize the prediction error y Ax 2 (good data fit) identifying the significative regressors (variable selection)

Regularization Penalized regression can be used min x { Ax y 2 2 + λ x p } As the parameter λ varies over (0, ), its solution traces out the optimal trade-off curve The most common is ridge regression { min Ax y 2 x 2 + λ x 2 } 2 The solution is ˆx = (A T A + λi) 1 A T y, but it cannot produce model selection A. E. Hoerl and R. W. Kennard, Ridge regression: applications to nonorthogonal problems, Technometrics, 1970.

Regularization Bridge regression is more general { min Ax y 2 x 2 + λ x p } p If p 1 and λ is sufficiently large it combines parameter estimation and model selection The p = 1 case is related to the least absolute shrinkage and selection operator (lasso) min x Ax y 2, s.t. x 1 η Lasso and problem (P ɛ 1 ) are formally identical, and equivalent to { } min Ax y 2 x 2 + λ x 1 I. E. Frank and J. H. Freidman, A statistical view of some chemometrics regression tools, Technometrics 1993. R. Tibshirani, Regression shrinkage and selection via the lasso, J. Roy. Statist. Soc. B, 1996.

Variants of lasso If more prior information (other than sparsity) is available on x, it can be included in the optimization problem through proper penalizing terms The fused lasso preserves local constancy when the regressors are properly arranged { } n min Ax y 2 x 2 + λ 1 x 1 + λ 2 x i x i 1 In reconstruction/denoising problems this can be used to recover sparse and piece-wise constant signals If the signal is smooth, the total variation n i=2 x i x i 1 can be substituted with a quadratic smoothing n i=2 x i x i 1 2 i=2 R. Tibshirani, M. Saunders, S. Rosset, J. Zhu, and K. Knight, Sparsity and smoothness via the fused lasso, J. Roy. Stat. Soc. B, 2005.

Variants of lasso The group lasso promotes group selection { } k min Ax y 2 x 2 + λ x i 2 where x = (x T 1 i=1 xt k )T has been partitioned in k groups effective for recovery of sparse signals where coefficients appears in groups The elastic net is a stabilized version of lasso { } min Ax y 2 x 2 + λ 1 x 1 + λ 2 x 2 It can select more than m variables even when m < n. It s in between ridge and lasso M. Yuan and Y. Lin, Model selection and estimation in regression with grouped variables, J. Roy. Stat. Soc. B, 2006. H. Zou and T. Hastie, Regularization and variable selection via the elastic net, J. Roy. Stat. Soc. B, 2005.

Outline 1 Introduction Traditional data acquisition Compressive data acquisition 2 Compressed sensing Measurement protocol Recovery procedure Recovery conditions Sensing matrices 3 Discussion Connections with other fields Numerical methods Applications Conclusions

Convex programs 1 min x { Ax y 2 2 + λ x 1 } can be recast as a perturbed LP (a quadratic program (QP) with structure similar to LP) 2 min x Ax y 2 2, s.t. x 1 η QP 3 min x x 1, s.t. Ax = y can be cast as a LP 4 min x x 1, s.t. Ax y 2 2 ɛ can be cast as a SOCP

Algorithms Can be all solved through standard convex optimization methods, e.g., interior point methods (primal-dual, log-barrier, etc.) general purposes solvers can handle small to medium size problems optimized algorithms (with fast matrix-vector operations) can scale to large problems Homotopy methods, e.g., least angle regression (LARS): compute the entire solution path (i.e., for any λ > 0) exploit the piece-wise linear property of the regularization path fast if the solution is very sparse Greedy algorithms for signal reconstructions, e.g., matching pursuit MP and orthogonal MP (OMP) not based on optimization iteratively chooses the dictionary element with the highest inner product with the current residual low complexity but less powerfull

Outline 1 Introduction Traditional data acquisition Compressive data acquisition 2 Compressed sensing Measurement protocol Recovery procedure Recovery conditions Sensing matrices 3 Discussion Connections with other fields Numerical methods Applications Conclusions

Applications Compressed sensing is advantageous whenever signals are sparse in a known basis measurements (or computation at the sensor end) are expensive, but computations at the receiver end are cheap Such situations can arise in Compressive imaging (e.g., the single-pixel camera ) Medical imaging (e.g., MRI and computed tomography) AD conversion Computational biology (e.g., DNA microarrays) Geophysical data analysis (e.g., seismic data recovery) Radar Sensor networks Astronomy and others

Compressive imaging: the single-pixel camera http://dsp.rice.edu/cscamera

Compressive imaging: the single-pixel camera http://dsp.rice.edu/cscamera

Compressive imaging: the single-pixel camera original image and CS reconstruction (65536 pixels) from 3300 measurements (5%) http://dsp.rice.edu/cscamera

Compressive imaging: the single-pixel camera original image and CS reconstruction (65536 pixels) from 6600 measurements (10%) http://dsp.rice.edu/cscamera

Medical imaging: MRI (1) (3) 1 MRI scans the patient by collecting coefficients in the frequency domain 2 These coefficients are very sparse 3 Inverse Fourier transform produces medical image

Medical imaging: MRI original image Rapid acquisition of a mouse heart beating in dynamic MRI M.E. Davies and T. Blumensath, Faster & greedier: algorithms for sparse reconstruction of large datasets, IEEE ISCCSP 2008

Medical imaging: MRI Reconstruction from 20% of available measurements (linear and CS) Rapid acquisition of a mouse heart beating in dynamic MRI M.E. Davies and T. Blumensath, Faster & greedier: algorithms for sparse reconstruction of large datasets, IEEE ISCCSP 2008

Medical imaging: MRI original image Angiogram with observations along 80 lines in the Fourier domain and 16129 measurements E. J. Candès and J. Romberg, Practical signal recovery from random projections, SPIE Conf. on Wavelet App. in Signal and Image Process. 2008

Medical imaging: MRI minimum energy and CS reconstructions Angiogram with observations along 80 lines in the Fourier domain and 16129 measurements E. J. Candès and J. Romberg, Practical signal recovery from random projections, SPIE Conf. on Wavelet App. in Signal and Image Process. 2008

Medical imaging: MRI detail Angiogram with observations along 80 lines in the Fourier domain and 16129 measurements E. J. Candès and J. Romberg, Practical signal recovery from random projections, SPIE Conf. on Wavelet App. in Signal and Image Process. 2008

AD conversion: the random demodulator x(t) LPF R n/r y(n) p(t) high rate pseudo-noise sequence x(t) = l Λ a le 2πif lt, multi-tone signal Λ {0, ±1,..., ±(W/2 1), W/2}, W/2 N, card(λ) = k W sampling rate: R = O(k ln W) no need for a high-rate ADC J. A. Tropp, J. N. Laska, M. F. Duarte, J. K. Romberg, and R. G. Baraniuk, Beyond Nyquist: efficient sampling of sparse bandlimited signals, IEEE Trans. Inform. Theory, 2010

AD conversion: the random demodulator x(t) LPF R n/r y(n) p(t) high rate pseudo-noise sequence X(f) X(f)P (f) Y (f) W/2 W/2 W/2 W/2 R/2 R/2 Each frequency receives a unique signature that can be discerned by examining the filter output J. A. Tropp, J. N. Laska, M. F. Duarte, J. K. Romberg, and R. G. Baraniuk, Beyond Nyquist: efficient sampling of sparse bandlimited signals, IEEE Trans. Inform. Theory, 2010

AD conversion: the modulated wideband converter p 1(t) LPF B n/b y 1(n) x(t) p m(t) LPF B n/b y m(n) mixing functions p i(t) B X(f) W 10 s GHz 0 W/2 W/2 Practical sampling stage for sparse wideband analog signals It enables generating a low rate sequence corresponding to each of the bands, without going through the high Nyquist rate M. Mishali and Y. C. Eldar, From theory to practice: sub-nyquist sampling of sparse wideband analog signals, IEEE Trans. Signal Process., 2010

AD conversion: the modulated wideband converter p 1(t) LPF B n/b y 1(n) x(t) p m(t) LPF B n/b y m(n) mixing functions p i(t) Hardware implementation M. Mishali and Y. C. Eldar, From theory to practice: sub-nyquist sampling of sparse wideband analog signals, IEEE Trans. Signal Process., 2010

CDMA synchronization and channel estimation known code matrix A (pilot symbols) user 1 user 2 user K unknown channel vector x received multiplex y }{{} shifts of the user s signature noise n Model: y = Ax + n, x sparse Standard method: m > n & LS channel response of the user Sparse recovery allow m < n higher data-rates D. Angelosante, E. Grossi, G. G. Giannakis, and M. Lops, Sparsity-Aware Estimation of CDMA System Parameters, EURASIP J. Adv. Signal. Process., 2010

CDMA synchronization and channel estimation 10 1 10 0 P=10, N=15, K=5, ISR=0dB Lasso LS OMP 10 3 10 2 P=4, N=15, K=5, ISR=0dB Lasso LS OMP 10 1 10 1 NMSE 10 2 10 3 NMSE 10 0 10 1 10 2 10 4 10 3 10 5 0 10 20 30 40 50 SNR 10 4 0 10 20 30 40 50 SNR Normalized mean square error (NMSE) in channel estimation for known number of active users (left over-determinate case, right under-determinate case) D. Angelosante, E. Grossi, G. G. Giannakis, and M. Lops, Sparsity-Aware Estimation of CDMA System Parameters, EURASIP J. Adv. Signal. Process., 2010

CDMA synchronization and channel estimation 1 0.9 0.8 0.7 P=10, N=15, K=10, S=5, SNR=20dB, ISR=0dB 10 0 10 1 P=10, N=15, K=10, S=5, ISR=0dB, P FA =0.01 Lasso LS OMP P D 0.6 0.5 Lasso LS OMP P MD 10 2 0.4 0.3 0.2 10 3 0.1 0 0 0.2 0.4 0.6 0.8 1 P FA 10 4 0 10 20 30 40 50 SNR User activity detection for unknown number of active users: receiver operating characteristics (left) and probability of miss versus SNR (right) D. Angelosante, E. Grossi, G. G. Giannakis, and M. Lops, Sparsity-Aware Estimation of CDMA System Parameters, EURASIP J. Adv. Signal. Process., 2010

Outline 1 Introduction Traditional data acquisition Compressive data acquisition 2 Compressed sensing Measurement protocol Recovery procedure Recovery conditions Sensing matrices 3 Discussion Connections with other fields Numerical methods Applications Conclusions

Conclusions Compressed sensing is an efficient signal acquisition protocol that collect data in a compressed form Linear measurements can be taken at low rate and non-adaptively (signal independent) Sparsity is exploited for reconstruction The measurement matrix must be properly chosen but Random matrices work

Some on-line resources CS resources http://dsp.rice.edu/cs CS blog http://nuit-blanche.blogspot.com/ Software: SparseLab http://sparselab.stanford.edu/ l 1 -magic http://www.acm.caltech.edu/l1magic/ GPSR http://www.lx.it.pt/ mtf/gpsr/ l 1 LS http://www.stanford.edu/ boyd/l1_ls/