Sparse Recovery Beyond Compressed Sensing

Similar documents
Optimization-based sparse recovery: Compressed sensing vs. super-resolution

Towards a Mathematical Theory of Super-resolution

Super-resolution via Convex Programming

Support Detection in Super-resolution

Constrained optimization

Super-Resolution of Point Sources via Convex Programming

Super-resolution of point sources via convex programming

Demixing Sines and Spikes: Robust Spectral Super-resolution in the Presence of Outliers

On the recovery of measures without separation conditions

Combining Sparsity with Physically-Meaningful Constraints in Sparse Parameter Estimation

Overview. Optimization-Based Data Analysis. Carlos Fernandez-Granda

Lecture Notes 9: Constrained Optimization

Robust Recovery of Positive Stream of Pulses

Rapid, Robust, and Reliable Blind Deconvolution via Nonconvex Optimization

ROBUST BLIND SPIKES DECONVOLUTION. Yuejie Chi. Department of ECE and Department of BMI The Ohio State University, Columbus, Ohio 43210

Information and Resolution

Sparse linear models

Introduction to Compressed Sensing

Strengthened Sobolev inequalities for a random subspace of functions

Blind Deconvolution Using Convex Programming. Jiaming Cao

Recovering overcomplete sparse representations from structured sensing

Gauge optimization and duality

ELE 538B: Sparsity, Structure and Inference. Super-Resolution. Yuxin Chen Princeton University, Spring 2017

Solving Corrupted Quadratic Equations, Provably

Going off the grid. Benjamin Recht Department of Computer Sciences University of Wisconsin-Madison

Random projections. 1 Introduction. 2 Dimensionality reduction. Lecture notes 5 February 29, 2016

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 57, NO. 11, NOVEMBER On the Performance of Sparse Recovery

Compressed sensing. Or: the equation Ax = b, revisited. Terence Tao. Mahler Lecture Series. University of California, Los Angeles

The uniform uncertainty principle and compressed sensing Harmonic analysis and related topics, Seville December 5, 2008

Part IV Compressed Sensing

Noisy Signal Recovery via Iterative Reweighted L1-Minimization

Self-Calibration and Biconvex Compressive Sensing

MIT 9.520/6.860, Fall 2017 Statistical Learning Theory and Applications. Class 19: Data Representation by Design

A Survey of Compressive Sensing and Applications

Statistical Issues in Searches: Photon Science Response. Rebecca Willett, Duke University

Sparse Legendre expansions via l 1 minimization

Sparse Parameter Estimation: Compressed Sensing meets Matrix Pencil

The convex algebraic geometry of rank minimization

Introduction How it works Theory behind Compressed Sensing. Compressed Sensing. Huichao Xue. CS3750 Fall 2011

Sparsity Regularization

Three Recent Examples of The Effectiveness of Convex Programming in the Information Sciences

COMPRESSED SENSING IN PYTHON

Sparsity Models. Tong Zhang. Rutgers University. T. Zhang (Rutgers) Sparsity Models 1 / 28

Signal Recovery, Uncertainty Relations, and Minkowski Dimension

2D X-Ray Tomographic Reconstruction From Few Projections

An iterative hard thresholding estimator for low rank matrix recovery

Stable Signal Recovery from Incomplete and Inaccurate Measurements

Inverse problems, Dictionary based Signal Models and Compressed Sensing

An algebraic perspective on integer sparse recovery

AN INTRODUCTION TO COMPRESSIVE SENSING

Solving Underdetermined Linear Equations and Overdetermined Quadratic Equations (using Convex Programming)

Inverse problems and sparse models (6/6) Rémi Gribonval INRIA Rennes - Bretagne Atlantique, France.

STAT 200C: High-dimensional Statistics

PHASE RETRIEVAL OF SPARSE SIGNALS FROM MAGNITUDE INFORMATION. A Thesis MELTEM APAYDIN

Sparse and Low Rank Recovery via Null Space Properties

Reconstruction from Anisotropic Random Measurements

Compressive Sensing and Beyond

Parameter Estimation for Mixture Models via Convex Optimization

Variational Bayesian Inference Techniques

Sparse Solutions of an Undetermined Linear System

Three Generalizations of Compressed Sensing

Near Optimal Signal Recovery from Random Projections

Near Ideal Behavior of a Modified Elastic Net Algorithm in Compressed Sensing

Application to Hyperspectral Imaging

INDUSTRIAL MATHEMATICS INSTITUTE. B.S. Kashin and V.N. Temlyakov. IMI Preprint Series. Department of Mathematics University of South Carolina

ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis

Compressibility of Infinite Sequences and its Interplay with Compressed Sensing Recovery

A New Estimate of Restricted Isometry Constants for Sparse Solutions

Minimizing the Difference of L 1 and L 2 Norms with Applications

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

The Sparsest Solution of Underdetermined Linear System by l q minimization for 0 < q 1

Exact Joint Sparse Frequency Recovery via Optimization Methods

Enhanced Compressive Sensing and More

Compressed Sensing and Sparse Recovery

Lecture: Introduction to Compressed Sensing Sparse Recovery Guarantees

Lecture 22: More On Compressed Sensing

Optimal Value Function Methods in Numerical Optimization Level Set Methods

Optimization for Compressed Sensing

Limitations in Approximating RIP

Lecture Notes 10: Matrix Factorization

A CONVEX-PROGRAMMING FRAMEWORK FOR SUPER-RESOLUTION

Exponential decay of reconstruction error from binary measurements of sparse signals

Super-resolution by means of Beurling minimal extrapolation

EE 367 / CS 448I Computational Imaging and Display Notes: Image Deconvolution (lecture 6)

Super-Resolution of Mutually Interfering Signals

An Introduction to Sparse Approximation

TRACKING SOLUTIONS OF TIME VARYING LINEAR INVERSE PROBLEMS

Super-Resolution from Noisy Data

An example of Bayesian reasoning Consider the one-dimensional deconvolution problem with various degrees of prior information.

Sensing systems limited by constraints: physical size, time, cost, energy

Sparse solutions of underdetermined systems

Rui ZHANG Song LI. Department of Mathematics, Zhejiang University, Hangzhou , P. R. China

Compressed Sensing: Lecture I. Ronald DeVore

Wavelet Footprints: Theory, Algorithms, and Applications

Sparse recovery for spherical harmonic expansions

Constructing Explicit RIP Matrices and the Square-Root Bottleneck

Generalized Line Spectral Estimation via Convex Optimization

Mathematical analysis of ultrafast ultrasound imaging

Recovery of Simultaneously Structured Models using Convex Optimization

Blind Identification of Invertible Graph Filters with Multiple Sparse Inputs 1

Transcription:

Sparse Recovery Beyond Compressed Sensing Carlos Fernandez-Granda www.cims.nyu.edu/~cfgranda Applied Math Colloquium, MIT 4/30/2018

Acknowledgements Project funded by NSF award DMS-1616340

Separable Nonlinear Inverse Problems Sparse Recovery for SNL Problems Super-resolution Deconvolution SNL Problems with Correlation Decay

Separable Nonlinear Inverse Problems Sparse Recovery for SNL Problems Super-resolution Deconvolution SNL Problems with Correlation Decay

Separable nonlinear inverse (SNL) problems Aim: estimate parameters θ 1,..., θ s R d from data y R n Relation between data and each θ j governed by nonlinear function φ Contributions of θ 1,..., θ s combine linearly with unknown coeffs c R s s y = c (j) φ(θ j ) j=1 = [ φ(θ 1 ) φ(θ 2 ) φ(θ s ) ] c n > s, easy if we know θ 1,..., θ s

SNL problems Super-resolution Deconvolution Source localization in EEG Direction of arrival in radar / sonar Magnetic-resonance fingerprinting

Magnetic-resonance fingerprinting (Ma et al, 2013) Goal: Estimate magnetic relaxation-time constants of tissues in a voxel 0.1 φ(θ) 0 0.1 0 1 2 3 time (s) θ (ms) 120 300 720 3200

Separable Nonlinear Inverse Problems Sparse Recovery for SNL Problems Super-resolution Deconvolution SNL Problems with Correlation Decay

Methods to tackle SNL problems Nonlinear least-squares solved by descent methods Drawback: local minima Prony-based / Finite-rate of innovation Drawback: challenging to apply beyond super-resolution Reformulate as sparse-recovery problem (drawbacks discussed at the end)

Linearization Linearize problem by lifting to a higher-dimensional space True parameters: θ T1,..., θ Ts Grid of parameters: θ 1,..., θ N, N >> n 0 y = [ φ(θ 1 ) φ(θ T1 ) φ(θ Ts ) φ(θ N ) ] c (1) c (s) 0 s = c (j) φ(θ Tj ) j=1

Sparse Recovery for SNL Problems Find a sparse c such that y = Φ grid c Underdetermined linear inverse problem with sparsity prior

Popular approach: l 1 -norm minimization minimize c 1 subject to Φ grid c = y

Popular approach: l 1 -norm minimization Deconvolution: Deconvolution with the l 1 norm, Taylor et al (1979) EEG: Selective minimum-norm solution of the biomagnetic inverse problem, Matsuura and Okabe (1995) Direction-of-arrival in radar / sonar: A sparse signal reconstruction perspective for source localization with sensor arrays, Malioutov et al (2005) and many, many others...

Magnetic-resonance fingerprinting Multicompartment magnetic resonance fingerprinting (Tang, F., Lannuzel, Bernstein, Lattanzi, Cloos, Knoll, Asslaender 2018) 0.1 0.4 0 φ(θ 1 ) φ(θ 2 ) Data 0.1 0 1 2 3 time (s) 0.2 0 Ground truth Min. l 1 est. 0.5 1 1.5 2 θ (s)

Main question Under what conditions can SNL problems be solved by l 1 -norm minimization?

Continuous dictionary Analysis should apply to arbitrarily fine grids We model the coefficients / parameter values as an atomic measure s x := c (j) δ θtj j=1 s y = c (j) φ(θ Tj ) j=1 = φ(θ) x( dθ) = Φx Intuitively, Φ is a continuous dictionary with n rows

Sparse Recovery for SNL Problems Find a sparse x such that y = φ(θ) x( dθ) (Extremely) underdetermined linear inverse problem with sparsity prior

Total-variation norm Continuous counterpart of the l 1 norm Not the total variation of a piecewise-constant function c 1 = x TV = sup v, c v 1 sup f (t) x ( dt) f C [0,1], f 1 [0,1] If x = j c jδ θj then x TV = c 1

Main question For an SNL problem, when does minimize subject to x TV φ(θ) x( dθ) = y achieve exact recovery? Wait, isn t this just compressed sensing?

Compressed sensing Recover s-sparse vector x of dimension m from n < m measurements y = Ax Key assumption: A is random, and hence satisfies restricted-isometry properties with high probability

Restricted isometry property (Candès, Tao 2006) An m n matrix A satisfies the restricted isometry property (RIP) if there exists 0 < κ < 1 such that for any s-sparse vector x (1 κ) x 2 Ax 2 (1 + κ) x 2 2s-RIP implies that for any s-sparse signals x 1, x 2 Ax 2 Ax 1 2

Restricted isometry property (Candès, Tao 2006) An m n matrix A satisfies the restricted isometry property (RIP) if there exists 0 < κ < 1 such that for any s-sparse vector x (1 κ) x 2 Ax 2 (1 + κ) x 2 2s-RIP implies that for any s-sparse signals x 1, x 2 Ax 2 Ax 1 2 = A (x 2 x 1 ) 2

Restricted isometry property (Candès, Tao 2006) An m n matrix A satisfies the restricted isometry property (RIP) if there exists 0 < κ < 1 such that for any s-sparse vector x (1 κ) x 2 Ax 2 (1 + κ) x 2 2s-RIP implies that for any s-sparse signals x 1, x 2 Ax 2 Ax 1 2 = A (x 2 x 1 ) 2 (1 κ) x 2 x 1 2

Separable nonlinear problems If φ is smooth, nearby columns in Φ grid := [ φ(θ 1 ) φ(θ 2 ) φ(θ N ) ] are highly correlated so RIP does not hold! There are x 1, x 2 such that Ax 1 Ax 2 Sparsity is not enough, we need additional restrictions!

Separable Nonlinear Inverse Problems Sparse Recovery for SNL Problems Super-resolution Deconvolution SNL Problems with Correlation Decay

Super-resolution Joint work with Emmanuel Candès (Stanford)

Limits of resolution in imaging The resolving power of lenses, however perfect, is limited (Lord Rayleigh) Diffraction imposes a fundamental limit on the resolution of optical systems

Fluorescence microscopy Data Point sources Low-pass blur (Figures courtesy of V. Morgenshtern)

Sensing model for super-resolution Point sources Point-spread function Data = Spectrum =

Super-resolution s sources with locations θ 1,..., θ s, modeled as superposition of spikes x = j c (j)δ θj c j C, θ j T [0, 1] We observe Fourier coefficients up to cut-off frequency f c y(k) = = 1 0 exp ( i2πkt) x (dt) s c (j) exp ( i2πkθ j ) j=1 SNL problem where exp ( i2πθ j ( f c )) φ(θ j ) = exp ( i2πθ j f c )

Fundamental questions 1. Is the problem well posed? 2. Does TV -norm minimization work?

Is the problem well posed? = Measurement operator = low-pass samples with cut-off frequency f c

Is the problem well posed? = Effect of measurement operator on sparse vectors?

Is the problem well posed? = Submatrix can be very ill conditioned!

Is the problem well posed? = If support is spread out there is hope

Minimum separation The minimum separation of the support of x is = inf θ θ (θ,θ ) support(x) : θ θ

Conditioning of submatrix with respect to If < 1/f c the problem is ill posed If > 1/f c the problem becomes well posed Proved asymptotically by Slepian and non-asymptotically by Moitra 1/f c is the diameter of the main lobe of the point-spread function (twice the Rayleigh distance)

Example: 25 spikes, f c = 10 3, = 0.8/f c Signals Data (in signal space) 0.3 0.2 0.1 300 200 100 0 0 0.3 0.2 0.1 300 200 100 0 0

Example: 25 spikes, f c = 10 3, = 0.8/f c 0.3 Signal 1 Signal 2 300 Signal 1 Signal 2 0.2 200 0.1 100 0 0 Signals Data (in signal space)

Example: 25 spikes, f c = 10 3, = 0.8/f c The difference is almost in the null space of the measurement operator 10 0 0.2 10 2 0 10 4 10 6 0.2 10 8 2,000 1,000 0 1,000 2,000 Difference Spectrum

Theoretical questions 1. Is the problem well posed? 2. Does TV -norm minimization work?

Super-resolution via TV-norm minimization minimize subject to x TV φ(θ) x( dθ) = y

Dual certificate for TV-norm minimization v R n is a dual certificate associated to x = j c j δ θj c j R, θ j T if Q (θ) := v T φ (θ) Q (θ j ) = sign (c j ) if θ j T Q (θ) < 1 if θ / T Dual variable guaranteeing that x TV is optimal

Dual certificate For any x + h such that φ(θ) h( dθ) = 0 x + h TV = sup f (θ) x ( dθ) + f (θ) h ( dθ) f 1 [0,1] [0,1]

Dual certificate For any x + h such that φ(θ) h( dθ) = 0 x + h TV = sup f (θ) x ( dθ) + f 1 [0,1] [0,1] Q (θ) x ( dθ) + [0,1] [0,1] f (θ) h ( dθ) Q (θ) h ( dθ)

Dual certificate For any x + h such that φ(θ) h( dθ) = 0 x + h TV = sup f (θ) x ( dθ) + f 1 [0,1] θ j T [0,1] Q (θ) x ( dθ) + [0,1] [0,1] [0,1] Q (θ) h ( dθ) Q (θ j ) c j δ θj ( dθ) + v T f (θ) h ( dθ) [0,1] φ (θ) h ( dθ)

Dual certificate For any x + h such that φ(θ) h( dθ) = 0 x + h TV = sup f (θ) x ( dθ) + f 1 [0,1] θ j T [0,1] Q (θ) x ( dθ) + = x TV [0,1] [0,1] [0,1] Q (θ) h ( dθ) Q (θ j ) c j δ θj ( dθ) + v T f (θ) h ( dθ) [0,1] φ (θ) h ( dθ)

Dual certificate For any x + h such that φ(θ) h( dθ) = 0 x + h TV = sup f (θ) x ( dθ) + f 1 [0,1] θ j T [0,1] Q (θ) x ( dθ) + = x TV [0,1] [0,1] [0,1] Q (θ) h ( dθ) Q (θ j ) c j δ θj ( dθ) + v T f (θ) h ( dθ) [0,1] φ (θ) h ( dθ) Existence of Q for any sign pattern implies that x is the unique solution

Dual certificate for super-resolution v C n is a dual certificate associated to x = j c j δ θj c j C, θ j T if Q (θ) := v φ (θ) = f c k= f c v k exp (i2πkθ) Q (θ j ) = sign (c j ) if θ j T Q (θ) < 1 if θ / T Linear combination of low pass sinusoids

Certificate for super-resolution 1 0 1 Aim: Interpolate sign pattern

Certificate for super-resolution 1 0 1 Interpolation with a low-frequency fast-decaying kernel F q(t) = θ j T α j F (t θ j )

Certificate for super-resolution 1 0 1 Interpolation with a low-frequency fast-decaying kernel F q(t) = θ j T α j F (t θ j )

Certificate for super-resolution 1 0 1 Interpolation with a low-frequency fast-decaying kernel F q(t) = θ j T α j F (t θ j )

Certificate for super-resolution 1 0 1 Interpolation with a low-frequency fast-decaying kernel F q(t) = θ j T α j F (t θ j )

Certificate for super-resolution 1 0 1 Interpolation with a low-frequency fast-decaying kernel F q(t) = θ j T α j F (t θ j )

Certificate for super-resolution 1 0 1 Technical detail: Magnitude of certificate locally exceeds 1

Certificate for super-resolution 1 0 1 Technical detail: Magnitude of certificate locally exceeds 1 Solution: Add correction term and force derivative to vanish on support Q(θ) = θ j T α j F (θ θ j ) + β j F (θ θ j )

Certificate for super-resolution 1 0 1 Technical detail: Magnitude of certificate locally exceeds 1 Solution: Add correction term and force derivative to vanish on support Q(θ) = θ j T α j F (θ θ j ) + β j F (θ θ j )

Guarantees for super-resolution Theorem [Candès, F. 2012] If the minimum separation of the signal support obeys 2 /f c then recovery via convex programming is exact Theorem [Candès, F. 2012] In 2D convex programming super-resolves point sources with a minimum separation of 2.38 /f c where f c is the cut-off frequency of the low-pass kernel

Guarantees for super-resolution Theorem [F. 2016] If the minimum separation of the signal support obeys 1.26 /f c, then recovery via convex programming is exact Theorem [Candès, F. 2012] In 2D convex programming super-resolves point sources with a minimum separation of 2.38 /f c where f c is the cut-off frequency of the low-pass kernel

Numerical evaluation of minimum separation f c = 30 f c = 40 f c = 50 1 Number of spikes 25 20 15 10 5 Number of spikes 30 20 10 Number of spikes 30 20 10 0.8 0.6 0.4 0.2 0.2 0.4 0.6 0.8 1 Minimum separation min fc 0.2 0.4 0.6 0.8 1 Minimum separation min fc 0.2 0.4 0.6 0.8 1 Minimum separation min fc 0 Numerically TV-norm minimization succeeds if 1 f c

Separable Nonlinear Inverse Problems Sparse Recovery for SNL Problems Super-resolution Deconvolution SNL Problems with Correlation Decay

Deconvolution Joint work with Brett Bernstein (Courant)

Seismology

Reflection seismology Geological section Acoustic impedance Reflection coefficients

Reflection seismology Sensing Ref. coeff. Pulse Data Data convolution of pulse and reflection coefficients

Model for the pulse: Ricker wavelet σ σ

Toy model for reflection seismology

Toy model for reflection seismology

Toy model for reflection seismology

Toy model for reflection seismology

Toy model for reflection seismology

Toy model for reflection seismology

Toy model for reflection seismology

Toy model for reflection seismology

Toy model for reflection seismology

Deconvolution s sources with locations θ 1,..., θ s, modeled as superposition of spikes x = j c (j)δ θj c j R, θ j T [0, 1] We observe samples of convolution with kernel K y(k) = (K x) (s k ) s = c (j) K (s k θ j ) j=1 SNL problem where K (s 1 θ j ) φ(θ j ) = K (s n θ j )

Theoretical questions 1. Is the problem well posed? 2. Does TV -norm minimization work?

Minimum separation Kernels are approximately low-pass The support cannot be too clustered

Sampling proximity We need two samples per spike Convolution kernel decays: at least two samples close to each spike Samples S and support T have sample proximity γ if for every θ i T there exist s i, s i S such that θ i s i γ and θ i s i γ We consider arbitrary non-uniform sampling patterns with fixed γ

Sampling proximity γ γ γ γ

Theoretical questions 1. Is the problem well posed? 2. Does TV -norm minimization work?

Deconvolution via TV-norm minimization minimize subject to x TV φ(θ) x( dθ) = y

Dual certificate for SNL problems v R n is a dual certificate associated to x = j c j δ θj c j R, θ j T if n Q (θ) := v T φ (θ) = v k K (s k θ) k=1 Q (θ j ) = sign (c j ) if θ j T Q (θ) < 1 if θ / T Linear combination of shifted copies of K fixed at the samples

Certificate for deconvolution +1 θ 1 θ 2 θ 3 1

Certificate construction Only use subset S containing 2 samples close to each spike Q(θ) = sj S v j K (s j θ) Fit v so that for all θ i T Q (θ i ) = sign (c i ) Q (θ i ) = 0

It works! +1 s 2,1 θ2 s 2,2 s 1,1 θ 1 s 1,2 s 3,1 θ 3 s 3,2 Gaussian Kernel 1

It works! +1 s 2,1 θ2 s 2,2 s 1,1 θ 1 s 1,2 s 3,1 θ 3 s 3,2 Ricker Kernel 1

Certificate construction Problem: The construction is difficult to analyze (coefficients vary) Solution: Reparametrization into bumps and waves Q(θ) = sj S v j K (s j θ) = θ i T α i B θi (θ, s i,1, s i,2 ) + β i W θi (θ, s i,1, s i,2 ),

Bump function (Gaussian kernel) +1 Bump Gaussian s 1 θ i s 2 B θi (θ, s i,1, s i,2 ) := b i,1 K( s i,1 θ) + b i,2 K( s i,2 θ) B θi (θ i, s i,1, s i,2 ) = 1 θ B θ i (θ i, s i,1, s i,2 ) = 0

Wave function (Gaussian kernel) Wave Gaussian s 1 s 2 θ i W θi (θ, s i,1, s i,2 ) = w i,1 K( s i,1 θ) + w i,2 K( s i,2 θ) W θi (θ i, s i,1, s i,2 ) = 0 θ W θ i (θ i, s i,1, s i,2 ) = 1

Bump function (Ricker wavelet) +1 Bump Ricker s 1 θ i s 2 B θi (θ, s i,1, s i,2 ) := b i,1 K( s i,1 θ) + b i,2 K( s i,2 θ) B θi (θ i, s i,1, s i,2 ) = 1 θ B θ i (θ i, s i,1, s i,2 ) = 0

Wave function (Ricker wavelet) Wave Ricker s 1 s 2 θ i W θi (θ, s i,1, s i,2 ) = w i,1 K( s i,1 θ) + w i,2 K( s i,2 θ) W θi (θ i, s i,1, s i,2 ) = 0 θ W θ i (θ i, s i,1, s i,2 ) = 1

Certificate construction Reparametrization decouples the coefficients Q(θ) = sj S v j K (s j θ) = θ i T α i B θi (θ, s i,1, s i,2 ) + β i W θi (θ, s i,1, s i,2 ) sign (c i ) B θi (θ, s i,1, s i,2 ) θ i T

Certificate for deconvolution (Gaussian kernel) +1 s 2,1 θ2 s 2,2 s 1,1 θ 1 s 1,2 s 3,1 θ 3 s 3,2 Gaussian Kernel 1

Certificate for deconvolution (Gaussian kernel) +1 s 2,1 θ 2 s 2,2 s 1,1 θ 1 s 1,2 s 3,1 θ 3 s 3,2 Bump Wave 1

Certificate for deconvolution (Ricker wavelet) +1 s 2,1 θ2 s 2,2 s 1,1 θ 1 s 1,2 s 3,1 θ 3 s 3,2 Ricker Kernel 1

Certificate for deconvolution (Ricker wavelet) +1 s 2,1 θ2 s 2,2 Bump Wave s 1,1 θ 1 s 1,2 s 3,1 θ 3 s 3,2 1

Exact recovery guarantees [Bernstein, F. 2017] 1.2 Gaussian Kernel Sample Proximity 1 0.8 0.6 0.4 Numerical Recovery Exact Recovery 0.2 1 2 3 4 5 6 7 8 Spike Separation

Exact recovery guarantees [Bernstein, F. 2017] 0.8 Ricker Kernel Sample Proximity 0.7 0.6 0.5 0.4 0.3 0.2 0.1 Numerical Recovery Exact Recovery 1 2 3 4 5 6 7 8 Spike Separation

Separable Nonlinear Inverse Problems Sparse Recovery for SNL Problems Super-resolution Deconvolution SNL Problems with Correlation Decay

SNL problems Joint work with Brett Bernstein (Courant) and Sheng Liu (CDS, NYU)

General SNL problems The function φ may not be available explicitly but can often be computed numerically by solving a differential equation Source localization in EEG Direction of arrival in radar / sonar Magnetic-resonance fingerprinting

Mathematical model Signal: superposition of Dirac measures with support T x = j c j δ θj c j R, θ j T [0, 1] Data: n measurements following SNL model y = φ (θ) x (dθ)

Diffusion on a rod with varying conductivity θ 1

Diffusion on a rod with varying conductivity θ 2 θ 1

Diffusion on a rod with varying conductivity θ 2 θ 1 θ 3

Diffusion on a rod with varying conductivity

Diffusion on a rod with varying conductivity φ(θ) can be computed by solving differential equation s 0 s 1 s 2 s 3 s 4 s 5 s 6 s 7 s 8 s 9 s 10s 11s 12s 13s 14s 15s 16s 17s 18s 19s 20s 21s 22s 23s 24s 25s 26s 27s 28 θ 0 θ 1 θ 2

Time-frequency pulses θ 1

Time-frequency pulses θ 2 θ 1

Time-frequency pulses θ 1 θ 2 θ 3

Time-frequency pulses

Time-frequency pulses Gabor wavelets in 1D φ(θ) k = exp ( (s k θ) 2 ) sin(150 s k (s k θ)) θ [0, 1] 2σ θ 0 θ 1 θ 2 s 0 s 1 s 2 s 3 s 4 s 5 s 6 s 7 s 8 s 9 s 10 s 11 s 12 s 13 s 14 s 15 s 16 s 17 s 18 s 19 s 20 s 21 s 22

Sparse estimation for general SNL problems Problem: Sparse recovery requires RIP-like properties that do not hold for SNL problems with smooth φ (even if we discretize) We cannot hope to recover all sparse signals How about signals such that φ(θ i ) T φ(θ j ) is small for all θ i θ j in T? Challenge: Prove guarantees for general SNL problems that only depend on correlation structure

SNL problems with correlation decay Aim: Guarantees for signals under separation conditions with respect to support-centered correlations ρ 1,..., ρ s ρ i (θ) := φ(θ i ) T φ(θ)

Diffusion on a rod with varying conductivity s 0 s 1 s 2 s 3 s 4 s 5 s 6 s 7 s 8 s 9 s 10s 11s 12s 13s 14s 15s 16s 17s 18s 19s 20s 21s 22s 23s 24s 25s 26s 27s 28 θ 0 θ 1 θ 2

Support-centered correlations θ 0 θ 1 θ 2

Time-frequency pulses φ(θ) k = exp ( (s k θ) 2 ) sin(150 s k (s k θ)) θ [0, 1] 2σ θ 0 θ 1 θ 2 s 0 s 1 s 2 s 3 s 4 s 5 s 6 s 7 s 8 s 9 s 10 s 11 s 12 s 13 s 14 s 15 s 16 s 17 s 18 s 19 s 20 s 21 s 22

Support-centered correlations θ 0 θ 1 θ 2

Correlation decay Parametrized by F i < N i < N + i < F + i and σ i Parameters can be different at each θ i Fi 2σ i θ i 1σ Fi Ni i N i + F i + F i + i + 1σ i F i + + 2σ i F

Correlation decay ρ i is concave in [N i, N + i ]: ρ i (θ) < γ 0 ρ i is bounded outside [N i, N + i ]: ρ i (θ) < γ 1 ρ i decays for θ < F i and θ > F + i : ρ i (θ) < γ 2 e ( θ F i )/σ i for θ < F i ρ i (θ) < γ 2 e ( θ F + i )/σ i for θ > F + Choice of exponential decay is arbitrary i

Correlation decay Additional condition on correlation derivatives ρ (q,r) i (θ) := φ (q) (θ i ) T φ (r) (θ) ρ (q,r) i ρ (q,r) i ρ (q,r) i decays for θ < F i and θ > F + i, for q = 0, 1, r = 0, 1, 2: (θ) < γ 2 e ( θ F i )/σ i for θ < F i (θ) < γ 2 e ( θ F + i )/σ i for θ > F + Open question: Is this necessary? i

Normalized distance Normalized distance from θ to θ j > θ d j (θ) := F j σ j θ If d j (θ) is large, φ(θ) and φ(θ j ) are not very correlated θ 1 θ F2 σ2 d 2 (θ) θ 2 F 3 σ 3 d 3 (θ) θ 3

Minimum separation conditions Normalized distance between spikes is equal to θ 1 F 1 + σ F2 2 θ 2 F 3 θ 3 2 σ 3 θ 1 θ 1 2 F 2 σ2 2 θ 2 F 3 θ 3 3 σ3 2

Guarantees for SNL problems with decaying correlation Theorem [Bernstein, F. 2018] For any SNL problem with decaying correlation TV-norm minimization achieves exact recovery under the separation conditions if > C for a fixed constant C depending on the decay bounds γ 0, γ 1, γ 2

Dual certificate for SNL problems v R n is a dual certificate associated to x = j c j δ θj c j R, θ j T if Q (θ) := v T φ (θ) Q (θ j ) = sign (c j ) if θ j T Q (θ) < 1 if θ / T

Dual certificate construction Use support-centered correlations to interpolate sign pattern Q (θ) := s α i ρ i (θ) i=1 +1 1

Dual certificate construction Use support-centered correlations to interpolate sign pattern Q (θ) := s α i ρ i (θ) i=1 +1 1

Dual certificate construction Use support-centered correlations to interpolate sign pattern Q (θ) := s α i ρ i (θ) i=1 +1 1

Dual certificate construction Use support-centered correlations to interpolate sign pattern Q (θ) := s α i ρ i (θ) i=1

Dual certificate construction Use support-centered correlations to interpolate sign pattern s Q (θ) := α i ρ i (θ) i=1 s = α i φ(θ i ) T φ(θ) i=1

Dual certificate construction Use support-centered correlations to interpolate sign pattern s Q (θ) := α i ρ i (θ) i=1 s = α i φ(θ i ) T φ(θ) i=1 = v T φ(θ) v := s α i φ(θ i ) i=1

Dual certificate construction Use support-centered correlations to interpolate sign pattern s Q (θ) := α i ρ i (θ) i=1 s = α i φ(θ i ) T φ(θ) i=1 = v T φ(θ) v := s α i φ(θ i ) Technical detail: Correction term to ensure derivative vanishes i=1

Robustness to noise / outliers Variations of dual certificates establish robustness at small noise levels (Candès, F. 2013), (F. 2013), (Bernstein, F. 2017) Exact recovery with constant number of outliers (up to log factors) (F., Tang, Wang, Zheng 2017), (Bernstein, F. 2017) Open questions: Analysis of higher-noise levels and discretization error, robustness for positive amplitudes

Drawbacks Solving convex program is computationally expensive Approach doesn t scale well at high dimensions In practice, reweighting is need to obtain sparse solutions for noisy data Open question: Analysis of other techniques (reweighting methods, descent methods on nonconvex cost functions)

Conclusion Previous works focus mostly on random operators For deterministic problems sparsity is not enough! Under separation conditions: 1. Sharp guarantees for super-resolution and deconvolution 2. General guarantees for SNL problems with correlation decay

References Prolate spheroidal wave functions, Fourier analysis, and uncertainty V V - The discrete case. D. Slepian. Bell System Technical Journal, 1978 Super-resolution, extremal functions and the condition number of Vandermonde matrices. A. Moitra. Symposium on Theory of Computing (STOC), 2015 The recoverability limit for superresolution via sparsity. L. Demanet, N. Nguyen Robust recovery of stream of pulses using convex optimization. T. Bendory, S. Dekel, A. Feuer. J. of Math. Analysis and App., 2016 Super-resolution without separation. G. Schiebinger, E. Robeva, B. Recht. Information and Inference, 2016

References Towards a mathematical theory of super-resolution. E. J. Candès, C. Fernandez-Granda. Comm. on Pure and Applied Math., 2013 Super-resolution from noisy data. E. J. Candès, C. Fernandez-Granda. J. of Fourier Analysis and Applications, 2014 Support detection in super-resolution. C. Fernandez-Granda. Proceedings of SampTA 2013 Super-resolution of point sources via convex programming. C. Fernandez-Granda. Information and Inference, 2016

References Demixing sines and spikes: robust spectral super-resolution in the presence of outliers. C. Fernandez-Granda, G. Tang, X. Wang, L. Zheng. Information and Inference, 2017 Multicompartment magnetic-resonance fingerprinting. S. Tang, C. Fernandez-Granda, S. Lannuzel, B. Bernstein, R. Lattanzi, M. Cloos, F. Knoll, J. Asslaender Deconvolution of point sources: A sampling theorem and robustness guarantees. B. Bernstein, C. Fernandez-Granda Sparse recovery beyond compressed sensing: Separable nonlinear inverse problems. B. Bernstein, C. Fernandez-Granda, S. Liu