Introduction to Compressed Sensing

Similar documents
Model-Based Compressive Sensing for Signal Ensembles. Marco F. Duarte Volkan Cevher Richard G. Baraniuk

Lecture: Introduction to Compressed Sensing Sparse Recovery Guarantees

Compressed Sensing. 1 Introduction. 2 Design of Measurement Matrices

Constrained optimization

New Coherence and RIP Analysis for Weak. Orthogonal Matching Pursuit

SPARSE signal representations have gained popularity in recent

Sensing systems limited by constraints: physical size, time, cost, energy

Inverse problems and sparse models (1/2) Rémi Gribonval INRIA Rennes - Bretagne Atlantique, France

Inverse problems and sparse models (6/6) Rémi Gribonval INRIA Rennes - Bretagne Atlantique, France.

Sparse Solutions of Linear Systems of Equations and Sparse Modeling of Signals and Images!

An Introduction to Sparse Approximation

PHASE RETRIEVAL OF SPARSE SIGNALS FROM MAGNITUDE INFORMATION. A Thesis MELTEM APAYDIN

Designing Information Devices and Systems I Discussion 13B

A Tutorial on Compressive Sensing. Simon Foucart Drexel University / University of Georgia

Sparsity Models. Tong Zhang. Rutgers University. T. Zhang (Rutgers) Sparsity Models 1 / 28

Compressive Sensing and Beyond

Lecture 22: More On Compressed Sensing

Compressed sensing. Or: the equation Ax = b, revisited. Terence Tao. Mahler Lecture Series. University of California, Los Angeles

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

The Pros and Cons of Compressive Sensing

Sparse linear models

Introduction to Sparsity. Xudong Cao, Jake Dreamtree & Jerry 04/05/2012

Generalized Orthogonal Matching Pursuit- A Review and Some

Lecture Notes 9: Constrained Optimization

Elaine T. Hale, Wotao Yin, Yin Zhang

Pre-weighted Matching Pursuit Algorithms for Sparse Recovery

Methods for sparse analysis of high-dimensional data, II

Compressed Sensing: Extending CLEAN and NNLS

The Analysis Cosparse Model for Signals and Images

Random projections. 1 Introduction. 2 Dimensionality reduction. Lecture notes 5 February 29, 2016

Noisy Signal Recovery via Iterative Reweighted L1-Minimization

Greedy Signal Recovery and Uniform Uncertainty Principles

Compressed Sensing: Lecture I. Ronald DeVore

Sparse Optimization Lecture: Sparse Recovery Guarantees

Machine Learning for Signal Processing Sparse and Overcomplete Representations. Bhiksha Raj (slides from Sourish Chaudhuri) Oct 22, 2013

Strengthened Sobolev inequalities for a random subspace of functions

RSP-Based Analysis for Sparsest and Least l 1 -Norm Solutions to Underdetermined Linear Systems

Sparse Solutions of Linear Systems of Equations and Sparse Modeling of Signals and Images: Final Presentation

Multipath Matching Pursuit

Optimization for Compressed Sensing

Near Ideal Behavior of a Modified Elastic Net Algorithm in Compressed Sensing

Stochastic geometry and random matrix theory in CS

GREEDY SIGNAL RECOVERY REVIEW

Z Algorithmic Superpower Randomization October 15th, Lecture 12

Constructing Explicit RIP Matrices and the Square-Root Bottleneck

Robust Sparse Recovery via Non-Convex Optimization

On Modified l_1-minimization Problems in Compressed Sensing

Mismatch and Resolution in CS

Three Generalizations of Compressed Sensing

Reconstruction from Anisotropic Random Measurements

ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis

EE 381V: Large Scale Optimization Fall Lecture 24 April 11

Sparse Optimization Lecture: Dual Certificate in l 1 Minimization

Sparsity Regularization

MIT 9.520/6.860, Fall 2017 Statistical Learning Theory and Applications. Class 19: Data Representation by Design

1 Introduction to Compressed Sensing

Lecture 13 October 6, Covering Numbers and Maurey s Empirical Method

Machine Learning for Signal Processing Sparse and Overcomplete Representations

Recovery of Low Rank and Jointly Sparse. Matrices with Two Sampling Matrices

AN INTRODUCTION TO COMPRESSIVE SENSING

Sparse Solutions of Systems of Equations and Sparse Modelling of Signals and Images

Robust multichannel sparse recovery

Introduction How it works Theory behind Compressed Sensing. Compressed Sensing. Huichao Xue. CS3750 Fall 2011

Compressive sensing of low-complexity signals: theory, algorithms and extensions

Lecture 16: Compressed Sensing

Sparse Solutions of an Undetermined Linear System

Recovery of Sparse Signals from Noisy Measurements Using an l p -Regularized Least-Squares Algorithm

An algebraic perspective on integer sparse recovery

Recent Developments in Compressed Sensing

L-statistics based Modification of Reconstruction Algorithms for Compressive Sensing in the Presence of Impulse Noise

Conditions for Robust Principal Component Analysis

The Fundamentals of Compressive Sensing

Sparse Approximation of Signals with Highly Coherent Dictionaries

COMPARATIVE ANALYSIS OF ORTHOGONAL MATCHING PURSUIT AND LEAST ANGLE REGRESSION

A Review of Sparsity-Based Methods for Analysing Radar Returns from Helicopter Rotor Blades

6 Compressed Sensing and Sparse Recovery

Signal Recovery from Permuted Observations

DS-GA 1002 Lecture notes 10 November 23, Linear models

of Orthogonal Matching Pursuit

ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis

Supremum of simple stochastic processes

Bayesian Methods for Sparse Signal Recovery

Complementary Matching Pursuit Algorithms for Sparse Approximation

Sparse and Low Rank Recovery via Null Space Properties

Primal Dual Pursuit A Homotopy based Algorithm for the Dantzig Selector

Inverse problems, Dictionary based Signal Models and Compressed Sensing

Exponential decay of reconstruction error from binary measurements of sparse signals

Info-Greedy Sequential Adaptive Compressed Sensing

Fast Reconstruction Algorithms for Deterministic Sensing Matrices and Applications

Methods for sparse analysis of high-dimensional data, II

The Sparsest Solution of Underdetermined Linear System by l q minimization for 0 < q 1

DFG-Schwerpunktprogramm 1324

MLCC 2018 Variable Selection and Sparsity. Lorenzo Rosasco UNIGE-MIT-IIT

Structured Compressed Sensing: From Theory to Applications

CS 229r: Algorithms for Big Data Fall Lecture 19 Nov 5

EUSIPCO

Two added structures in sparse recovery: nonnegativity and disjointedness. Simon Foucart University of Georgia

Exact Topology Identification of Large-Scale Interconnected Dynamical Systems from Compressive Observations

CSC 576: Variants of Sparse Learning

Linear Convergence of Adaptively Iterative Thresholding Algorithms for Compressed Sensing

Transcription:

Introduction to Compressed Sensing Alejandro Parada, Gonzalo Arce University of Delaware August 25, 2016

Motivation: Classical Sampling 1

Motivation: Classical Sampling Issues Some applications Radar Spectral Imaging Medical Imaging Remote surveillance Issue Sampling rate is too high! 2

Reduction in the computation costs for measuring signals that have an sparse representation. 4

Compressed Sensing (CS) vs Classical Sampling CS x R n Random measurements Measurements as inner products x, φ Classical Sampling Continuous signals Infinite-length signals 6

Compressed Sensing (CS) vs Classical Sampling CS Recovery: Non linear. Classical Sampling Recovery: linear processing. 7

Compressed sensing basics x R n is acquired taking m < n measurements 8

Compressed Sensing Basics x R n is acquired taking m < n measurements y = Ax (1) y : Measurement vector A: CS sensing matrix m n 9

The Spectral Imaging Problem I Push broom spectral imaging: Expensive, low sensing speed, senses N N L voxels I Optical Filters; Sequential sensing of N N L voxels; limited by number of colors 2/69

Why is this Important? I Remote sensing and surveillance in the Visible, NIR, SWIR I Devices are challenging in NIR and SWIR due to SWaP I Medical imaging and other applications 3/69

Introduction Datacube Compressive Measurements f = g = H + w Underdetermined system of equations ˆf = {min Îg H Î 2 + Î Î 1 } 8/69

Matrix CASSI representation g = Hf I Data cube: N N L I Spectral bands: L I Spatial resolution: N N I Sensor size N (N + L 1) I V = N(N + L 1) 14/69

Preliminar Results: K = 1 Random Snapshots 467 nm K=1 477 nm K=1 487 nm K=1 497 nm K=1 510 nm K=1 525 nm K=1 540 nm K=1 557 nm K=1 578 nm K=1 602 nm K=1 628 nm K=1 45/69

Ultrafast Photography 5/50

Since m < n, it is possible to have Ax y = Ax with x x Motivation Design A such that x can be uniquely identifiable from y, with x in an specific space of signals Even when m n 10

Characteristics of this space? with x = n φ i θ i (2) i=1 x = Φθ (3) Φ = [φ 1,..., φ n ] θ = [θ 1,..., θ n ] T Definition A signal x is k sparse in the basis frame Φ if exists θ R n with k = supp(θ) n such that x = Φθ 11

Characteristics of this space? Definition The space of k sparse signals Σ k is defined as Σ k = {x : x 0 k} (4) x 0 : Number of nonzero elements in x (It is called the l 0 norm) 12

Examples of Sparse Signals Figure: (a): Original Image. (b) Wavelet Representation. 1 1 Compressed Sensing: Theory and Applications, Eldar 13

Compressible Signals Real signals Non exactly sparse. Real signals Good approximations on Σ k Compressible Signals 15

Compressible Signals Figure: (a): Original Image. (b) Wavelet Representation (Keeping 10%). 16

Compressible Signals Real signals Non exactly sparse. Real signals Good approximations on Σ k Compressible Signals σ k (x) p min ˆx Σ k x ˆx p (5) where x p = ( n i=1 x i p ) 1 p 17

Sensing Matrices A 1 Identify uniquely x Σ k given Sensing matrix A y = Ax 2 How to get x given y? 18

Sensing Matrices Let Λ {1, 2,..., n} A Λ : The matrix that contains all columns of A indexed by Λ. Example If Λ = {1, 3} A = a 1,1 a 1,2 a 1,3 a 1,4 a 2,1 a 2,2 a 2,3 a 2,4 a 3,1 a 3,2 a 3,3 a 3,4 Then A Λ = a 1,1 a 1,3 a 2,1 a 2,3 a 3,1 a 3,3 19

Sensing Matrices: When supp(x) is known Λ = supp(x) y = Ax = A Λ x Λ (6) If A Λ is full column rank If A Λ is full rank m k where x Λ = A Λ y If Λ is known Recover x from a subspace. A Λ = (A ΛA Λ ) 1 A Λ 20

Sensing matrices: When supp(x) is unknown CS central idea How to choose A? Information of x is preserved. Recover uniquely x from y = Ax 21

Null Space Conditions Null space of the matrix A N (A) = {z : Az = 0} Uniqueness in recovery Ax Ax, x x A(x x ) 0 x, x Σ k so (x x ) / N (A) but (x x ) Σ 2k Desired N (A) Σ 2k = 22

The spark of a given matrix A is the smallest number of columns of A that are linearly dependent. The procedure Look for all combinations of r columns, r = 2, 3,..., n. If for any of the combinations we get linear dependency, then the spark is given for the number of vectors in that combination. 24

The spark of a given matrix A is the smallest number of columns of A that are linearly dependent. Example A = 1 1 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 1 Rank Rank(A) = 4 Spark Spark(A) = 2 25

Rank Example A of size m n with m < n All entries of A represented by i.i.d random variables. Spark Rank(A) = m Spark(A) = m + 1 Any submatrix of size m m is non singular 3 3 The rank of a random matrix, X. Feng 30

Uniqueness of sparse recovery Theorem For any vector y R m, there exists at most one signal x Σ k such that y = Ax if and only if spark(a) > 2k. For uniqueness we must have that m 2k 32

Spark Example A of size m n with m < n All entries of A are i.i.d random variables 4 Spark(A) = m + 1 Unique recovery of x Σ k from y = Ax if spark(a) > 2k m + 1 > 2k k < m + 1 2 4 The rank of a random matrix, X. Feng 35

Robust Signal Recovery CS Where we are? Spark Condition NSP Exactly sparse signals Real signals? 51

The restricted isometry property (RIP) CS y = Ax (20) x: Exactly sparse. x: approximately sparse. Noise? What happen if: y = Ax + Noise (21) 55

The restricted isometry property (RIP) Definition A matrix A satisfies the restricted isometry property (RIP) of order k if there exists a δ k such that (1 δ k ) x 2 2 Ax 2 2 (1 + δ k ) x 2 2 (22) for all x Σ k 56

Coherence RIP NSP Spark condition Disadvantages RIP: NP-hard to calculate NSP: NP-hard to calculate 71

Coherence Definition The coherence of a matrix A, denoted µ(a), is the largest absolute inner product between any two columns a i, a j of A: µ(a) = max 1 i<j n a i, a j a i 2 a j 2 (30) 72

Coherence 73

Properties of the coherence Theorem Let A be a matrix of size m n with m n, (n 2) whose columns are normalized so that a i = 1 for all i. Then the coherence of A satisfies n m µ(a) 1 (31) m(n 1) lower bound: Welch bound 74

Coherence and the Spark Lemma For any matrix A spark(a) 1 + 1 µ(a) (35) Unique recovery on Σ k spark(a) > 2k 84

Uniqueness via coherence Theorem (Uniqueness via coherence) If k < 1 2 ( 1 + 1 ) µ(a) then for each measurement vector y R m there exists at most one signal x Σ k such that y = Ax. (36) 85

Theorem Let A be an m n matrix that satisfies the RIP of order 2k with constant δ (0, 1/2]. Then ( n ) m Ck log (37) k where C = (1/2) log( 24 + 1) 0.28 Johnson-Lindenstrauss lemma 89

CS Where we are? Given y and A find x Σ k such that y = Ax (43) What we have? Conditions = Unique x. Spark NSP RIP Coherence How to design A How to get x? 99

Recovery algorithms Recovery Guarantees Outline 1 Recovery algorithms 2 Recovery Guarantees

Recovery algorithms Recovery Guarantees Recovery algorithms The problem can be formulated as ˆx = arg min x 0 s.t x B(y) (2) Noise free recovery: B(y) = {x : Ax = y} Noise: B(y) = {x : Ax y ε} 2

Recovery algorithms Recovery Guarantees If x is represented in a basis Φ such as x = Φθ, the the problem is written as ˆθ = arg min θ θ 0 s.t. θ B(y) (3) 3

Recovery algorithms Recovery Guarantees ˆx = arg min x 0 s.t x B(y) Check for different values of k Solve the problem for all Λ = k y A Λ x Λ Expose extremely high computational cost. 4

Recovery algorithms Recovery Guarantees l 1 recovery Relaxation ˆx = arg min x 1 s.t y = Ax (4) Computationally feasible. Can formulated as a (Linear programming) LP problem: Basis Pursuit 5

Recovery algorithms Recovery Guarantees In the presence of noise B(y) = {x : Ax y 2 ε} (5) It is possible to get a Lagrangian relaxation as ˆx = arg min x x 1 + λ Ax y 2 (6) known as basis pursuit denoising 6

Recovery algorithms Recovery Guarantees Example(BP) A = 1 4.44 y = 0.2 0 2.2 0 1.2 1 1.2 1 1 1 1 1 1.2 1.2 1 1.2 1 1 1 1 1 1 1 1, with x = Note: 4.44 2.1071307 4.44 4.44 0 0 0 Solution Using CVX solve l 1 relaxed version. ˆx = 2.1071 2.1071 0 0 0 7

Recovery algorithms Recovery Guarantees Because the computational cost It is clear why, replacing ˆx = arg min x 0 s.t x B(y) by Not so trivial to see how the solution of l 1 relaxation problem is an approximate solution to the original problem l 0 problem. ˆx = arg min x 1 s.t y = Ax is convenient 13

The 1 Norm and Sparsity I The 0 norm is defined by: ÎxÎ 0 =#{i : x(i) = 0} Sparsity of x is measured by its number of non-zero elements. I The 1 norm is defined by: ÎxÎ 1 = q i x(i) 1 norm has two key properties: I Robust data fitting I Sparsity inducing norm I The 2 norm is defined by: ÎxÎ 2 =( q i x(i) 2 ) 1/2 2 norm is not e ective in measuring sparsity of x 6/50

Why 1 Norm Promotes Sparsity? Given two N-dimensional signals: I x 1 =(1, 0,...,0) æ Spike signal I x 2 =(1/ Ô N,1/ Ô N,...,1/ Ô N) æ Comb signal I x 1 and x 2 have the same 2 norm: Îx 1 Î 2 =1and Îx 2 Î 2 =1. I However, Îx 1 Î 1 =1and Îx 2 Î 1 = Ô N. 7/50

1 Norm in Regression I Linear regression is widely used in science and engineering. Given A œ R m n and b œ R m ; m>n Find x s.t. b = Ax (overdetermined) 8/50

1 Norm Regression Two approaches: I Minimize the 2 norm of the residuals min xœr n Îb AxÎ 2 The 2 norm penalizes large residuals I Minimizes the 1 norm of the residuals min xœr n Îb AxÎ 1 The 1 norm puts much more weight on small residuals 9/50

1 Norm Regression m = 500, n = 150. A = randn(m, n) and b = randn(m, 1) 30 25 20 15 10 5 160 140 120 100 80 60 40 20 0 3 2 1 0 1 2 3 2 Residuals 3 2 1 0 1 2 3 1 Residuals 10/50

Recovery algorithms Recovery Guarantees Greedy Algorithms Greedy Pursuits algo- Thresholding rithms Built iteratively an estimate of x starting by 0 and iteratively add new components. For each iteration the nonzero components of x are optimized. Built iteratively an estimate of x. For each iteration a subset of nonzero components of x are selected, while other components are removed (make their valus 0). 15

Recovery algorithms Recovery Guarantees Greedy Algorithms Greedy Pursuits Matching Pursuit (MP). Orthogonal Matching Pursuit (OMP). algo- Thresholding rithms Compressive Sampling Matching Pursuit (CoSaMP). Iterative Hard Thresholding (IHT). 16

Recovery algorithms Recovery Guarantees OMP Intuition MP Find the column of A most correlated with y Aˆx Determine the support Update all coefficients over the support. Find the column of A most correlated with y Aˆx Determine the support Update the coefficient with the related column. 17

Recovery algorithms Recovery Guarantees Matching Pursuit (MP) min i,x y a ix 2 i = arg max j x = at i y a i 2 2 (a T j y)2 a j 2 2 MP update with r 0 = y r l = r l 1 at i r l 1 a i 2 a i 2 ˆx l ˆx l 1 ˆx l i at i r l 1 a i 2 2 19

Recovery algorithms Recovery Guarantees Iterative Hard Thresholding (IHT) Variant of CoSaMP IHT update ˆx i = T (ˆx i 1 + A T (y Aˆx i 1 ), k) 22

Recovery algorithms Example Comparison l 1 optimization OMP MP IHT CoSaMP Error versus sparsity level A is of dimension 512 1024 Recovery Guarantees A: Entries taken from N (0, 1) A: Random partial Fourier matrix. For each value of sparsity k, a k sparse vector of dimension n 1 is built. The nonzero locations in x are selected at random, and the values are taken from N (0, 1). 23

Recovery algorithms Recovery Guarantees Figure: Comparison of the performance of different reconstruction algorithms in terms of the sparsity level. (a) Gaussian matrix (b) Random partial Fourier matrix 24

Recovery algorithms Recovery Guarantees Recovery guarantees Guarantees RIP-based. Coherence based. Pessimistic Recovery is possible for much more relaxed versions than those stated by some Theoretic results. 26

Recovery algorithms Recovery Guarantees Signal recovery in noise Theorem (RIP-based noisy l 1 recovery) Suppose that A satisfies the RIP of order 2k with δ 2k < 2 1, and let y = Ax + e where e 2 ε. Then, when B(y) = {z : Az y 2 }, the solution ˆx to ˆx = arg min x 1 s.t y = Ax (11) obeys ˆx x 2 C 0 σ k (x) 1 k + C 2 ε (12) 32

Recovery algorithms Recovery Guarantees RIP Guarantees Difficult to calculate the RIP for large size matrices Coherence Guarantees Exploit advantages of using coherence for structured matrices. 34

Recovery algorithms Recovery Guarantees Theorem (Coherence-based l 1 recovery with bounded noise) Suppose that A has coherence µ and that x Σ k with k < (1/µ + 1)/4. Furthermore, suppose that we obtain measurements of th form y = Ax + e with γ = e 2. Then when B(y) = {z : Az y 2 } with ε > γ, the solution ˆx to ˆx = arg min x 1 s.t y = Ax (14) obeys x ˆx 2 γ + ε 1 µ(4k 1) (15) 35

Recovery algorithms Recovery Guarantees Guarantees on greedy methods Theorem (RIP-based OMP recovery) Suppose that A satisfies the RIP of order k + 1 with δ k+1 < 1/(3 k) and let y = Ax. Then OMP can recover a k sparse signal exactly in k iterations. 37

Recovery algorithms Recovery Guarantees Theorem (RIP-based thresholding recovery) Suppose that A satisfies the RIP or order ck with constant δ and let y = Ax + e where e 2 ε. Then the outputs ˆx of the CoSaMP, subspace pursuit, and IHT algorithms obeys ˆx x 2 C 1 σ k (x) 2 + C 2 σ k (x) 1 k + C 3 ε (19) The requirements on the parameters c, δ of the RIP and the values of C 1, C 2 and C 3 are specific to each algorithm. 38