Sparse Recovery Beyond Compressed Sensing

Sparse Recovery Beyond Compressed Sensing Carlos Fernandez-Granda www.cims.nyu.edu/~cfgranda Applied Math Colloquium, MIT 4/30/2018

Acknowledgements Project funded by NSF award DMS-1616340

Separable Nonlinear Inverse Problems Sparse Recovery for SNL Problems Super-resolution Deconvolution SNL Problems with Correlation Decay

Separable nonlinear inverse (SNL) problems Aim: estimate parameters θ 1,..., θ s R d from data y R n Relation between data and each θ j governed by nonlinear function φ Contributions of θ 1,..., θ s combine linearly with unknown coeffs c R s s y = c (j) φ(θ j ) j=1 = [ φ(θ 1 ) φ(θ 2 ) φ(θ s ) ] c n > s, easy if we know θ 1,..., θ s

SNL problems Super-resolution Deconvolution Source localization in EEG Direction of arrival in radar / sonar Magnetic-resonance fingerprinting

Magnetic-resonance fingerprinting (Ma et al, 2013) Goal: Estimate magnetic relaxation-time constants of tissues in a voxel 0.1 φ(θ) 0 0.1 0 1 2 3 time (s) θ (ms) 120 300 720 3200

Separable Nonlinear Inverse Problems Sparse Recovery for SNL Problems Super-resolution Deconvolution SNL Problems with Correlation Decay

Methods to tackle SNL problems Nonlinear least-squares solved by descent methods Drawback: local minima Prony-based / Finite-rate of innovation Drawback: challenging to apply beyond super-resolution Reformulate as sparse-recovery problem (drawbacks discussed at the end)

Linearization Linearize problem by lifting to a higher-dimensional space True parameters: θ T1,..., θ Ts Grid of parameters: θ 1,..., θ N, N >> n 0 y = [ φ(θ 1 ) φ(θ T1 ) φ(θ Ts ) φ(θ N ) ] c (1) c (s) 0 s = c (j) φ(θ Tj ) j=1

Sparse Recovery for SNL Problems Find a sparse c such that y = Φ grid c Underdetermined linear inverse problem with sparsity prior

Popular approach: l 1 -norm minimization minimize c 1 subject to Φ grid c = y

Popular approach: l 1 -norm minimization Deconvolution: Deconvolution with the l 1 norm, Taylor et al (1979) EEG: Selective minimum-norm solution of the biomagnetic inverse problem, Matsuura and Okabe (1995) Direction-of-arrival in radar / sonar: A sparse signal reconstruction perspective for source localization with sensor arrays, Malioutov et al (2005) and many, many others...

Magnetic-resonance fingerprinting Multicompartment magnetic resonance fingerprinting (Tang, F., Lannuzel, Bernstein, Lattanzi, Cloos, Knoll, Asslaender 2018) 0.1 0.4 0 φ(θ 1 ) φ(θ 2 ) Data 0.1 0 1 2 3 time (s) 0.2 0 Ground truth Min. l 1 est. 0.5 1 1.5 2 θ (s)

Main question Under what conditions can SNL problems be solved by l 1 -norm minimization?

Continuous dictionary Analysis should apply to arbitrarily fine grids We model the coefficients / parameter values as an atomic measure s x := c (j) δ θtj j=1 s y = c (j) φ(θ Tj ) j=1 = φ(θ) x( dθ) = Φx Intuitively, Φ is a continuous dictionary with n rows

Sparse Recovery for SNL Problems Find a sparse x such that y = φ(θ) x( dθ) (Extremely) underdetermined linear inverse problem with sparsity prior

Total-variation norm Continuous counterpart of the l 1 norm Not the total variation of a piecewise-constant function c 1 = x TV = sup v, c v 1 sup f (t) x ( dt) f C [0,1], f 1 [0,1] If x = j c jδ θj then x TV = c 1

Main question For an SNL problem, when does minimize subject to x TV φ(θ) x( dθ) = y achieve exact recovery? Wait, isn t this just compressed sensing?

Compressed sensing Recover s-sparse vector x of dimension m from n < m measurements y = Ax Key assumption: A is random, and hence satisfies restricted-isometry properties with high probability

Restricted isometry property (Candès, Tao 2006) An m n matrix A satisfies the restricted isometry property (RIP) if there exists 0 < κ < 1 such that for any s-sparse vector x (1 κ) x 2 Ax 2 (1 + κ) x 2 2s-RIP implies that for any s-sparse signals x 1, x 2 Ax 2 Ax 1 2

Separable nonlinear problems If φ is smooth, nearby columns in Φ grid := [ φ(θ 1 ) φ(θ 2 ) φ(θ N ) ] are highly correlated so RIP does not hold! There are x 1, x 2 such that Ax 1 Ax 2 Sparsity is not enough, we need additional restrictions!

Separable Nonlinear Inverse Problems Sparse Recovery for SNL Problems Super-resolution Deconvolution SNL Problems with Correlation Decay

Super-resolution Joint work with Emmanuel Candès (Stanford)

Limits of resolution in imaging The resolving power of lenses, however perfect, is limited (Lord Rayleigh) Diffraction imposes a fundamental limit on the resolution of optical systems

Fluorescence microscopy Data Point sources Low-pass blur (Figures courtesy of V. Morgenshtern)

Sensing model for super-resolution Point sources Point-spread function Data = Spectrum =

Super-resolution s sources with locations θ 1,..., θ s, modeled as superposition of spikes x = j c (j)δ θj c j C, θ j T [0, 1] We observe Fourier coefficients up to cut-off frequency f c y(k) = = 1 0 exp ( i2πkt) x (dt) s c (j) exp ( i2πkθ j ) j=1 SNL problem where exp ( i2πθ j ( f c )) φ(θ j ) = exp ( i2πθ j f c )

Fundamental questions 1. Is the problem well posed? 2. Does TV -norm minimization work?

Is the problem well posed? = Measurement operator = low-pass samples with cut-off frequency f c

Is the problem well posed? = Effect of measurement operator on sparse vectors?

Is the problem well posed? = Submatrix can be very ill conditioned!

Is the problem well posed? = If support is spread out there is hope

Minimum separation The minimum separation of the support of x is = inf θ θ (θ,θ ) support(x) : θ θ

Conditioning of submatrix with respect to If < 1/f c the problem is ill posed If > 1/f c the problem becomes well posed Proved asymptotically by Slepian and non-asymptotically by Moitra 1/f c is the diameter of the main lobe of the point-spread function (twice the Rayleigh distance)

Example: 25 spikes, f c = 10 3, = 0.8/f c Signals Data (in signal space) 0.3 0.2 0.1 300 200 100 0 0 0.3 0.2 0.1 300 200 100 0 0

Example: 25 spikes, f c = 10 3, = 0.8/f c 0.3 Signal 1 Signal 2 300 Signal 1 Signal 2 0.2 200 0.1 100 0 0 Signals Data (in signal space)

Example: 25 spikes, f c = 10 3, = 0.8/f c The difference is almost in the null space of the measurement operator 10 0 0.2 10 2 0 10 4 10 6 0.2 10 8 2,000 1,000 0 1,000 2,000 Difference Spectrum

Theoretical questions 1. Is the problem well posed? 2. Does TV -norm minimization work?

Super-resolution via TV-norm minimization minimize subject to x TV φ(θ) x( dθ) = y

Dual certificate for TV-norm minimization v R n is a dual certificate associated to x = j c j δ θj c j R, θ j T if Q (θ) := v T φ (θ) Q (θ j ) = sign (c j ) if θ j T Q (θ) < 1 if θ / T Dual variable guaranteeing that x TV is optimal

Dual certificate For any x + h such that φ(θ) h( dθ) = 0 x + h TV = sup f (θ) x ( dθ) + f (θ) h ( dθ) f 1 [0,1] [0,1]

Dual certificate For any x + h such that φ(θ) h( dθ) = 0 x + h TV = sup f (θ) x ( dθ) + f 1 [0,1] [0,1] Q (θ) x ( dθ) + [0,1] [0,1] f (θ) h ( dθ) Q (θ) h ( dθ)

Dual certificate For any x + h such that φ(θ) h( dθ) = 0 x + h TV = sup f (θ) x ( dθ) + f 1 [0,1] θ j T [0,1] Q (θ) x ( dθ) + [0,1] [0,1] [0,1] Q (θ) h ( dθ) Q (θ j ) c j δ θj ( dθ) + v T f (θ) h ( dθ) [0,1] φ (θ) h ( dθ)

Dual certificate For any x + h such that φ(θ) h( dθ) = 0 x + h TV = sup f (θ) x ( dθ) + f 1 [0,1] θ j T [0,1] Q (θ) x ( dθ) + = x TV [0,1] [0,1] [0,1] Q (θ) h ( dθ) Q (θ j ) c j δ θj ( dθ) + v T f (θ) h ( dθ) [0,1] φ (θ) h ( dθ)

Dual certificate for super-resolution v C n is a dual certificate associated to x = j c j δ θj c j C, θ j T if Q (θ) := v φ (θ) = f c k= f c v k exp (i2πkθ) Q (θ j ) = sign (c j ) if θ j T Q (θ) < 1 if θ / T Linear combination of low pass sinusoids

Certificate for super-resolution 1 0 1 Aim: Interpolate sign pattern

Certificate for super-resolution 1 0 1 Interpolation with a low-frequency fast-decaying kernel F q(t) = θ j T α j F (t θ j )

Certificate for super-resolution 1 0 1 Technical detail: Magnitude of certificate locally exceeds 1

Certificate for super-resolution 1 0 1 Technical detail: Magnitude of certificate locally exceeds 1 Solution: Add correction term and force derivative to vanish on support Q(θ) = θ j T α j F (θ θ j ) + β j F (θ θ j )

Guarantees for super-resolution Theorem [Candès, F. 2012] If the minimum separation of the signal support obeys 2 /f c then recovery via convex programming is exact Theorem [Candès, F. 2012] In 2D convex programming super-resolves point sources with a minimum separation of 2.38 /f c where f c is the cut-off frequency of the low-pass kernel

Guarantees for super-resolution Theorem [F. 2016] If the minimum separation of the signal support obeys 1.26 /f c, then recovery via convex programming is exact Theorem [Candès, F. 2012] In 2D convex programming super-resolves point sources with a minimum separation of 2.38 /f c where f c is the cut-off frequency of the low-pass kernel

Numerical evaluation of minimum separation f c = 30 f c = 40 f c = 50 1 Number of spikes 25 20 15 10 5 Number of spikes 30 20 10 Number of spikes 30 20 10 0.8 0.6 0.4 0.2 0.2 0.4 0.6 0.8 1 Minimum separation min fc 0.2 0.4 0.6 0.8 1 Minimum separation min fc 0.2 0.4 0.6 0.8 1 Minimum separation min fc 0 Numerically TV-norm minimization succeeds if 1 f c

Separable Nonlinear Inverse Problems Sparse Recovery for SNL Problems Super-resolution Deconvolution SNL Problems with Correlation Decay

Deconvolution Joint work with Brett Bernstein (Courant)

Seismology

Reflection seismology Geological section Acoustic impedance Reflection coefficients

Reflection seismology Sensing Ref. coeff. Pulse Data Data convolution of pulse and reflection coefficients

Model for the pulse: Ricker wavelet σ σ

Toy model for reflection seismology

Deconvolution s sources with locations θ 1,..., θ s, modeled as superposition of spikes x = j c (j)δ θj c j R, θ j T [0, 1] We observe samples of convolution with kernel K y(k) = (K x) (s k ) s = c (j) K (s k θ j ) j=1 SNL problem where K (s 1 θ j ) φ(θ j ) = K (s n θ j )

Theoretical questions 1. Is the problem well posed? 2. Does TV -norm minimization work?

Minimum separation Kernels are approximately low-pass The support cannot be too clustered

Sampling proximity We need two samples per spike Convolution kernel decays: at least two samples close to each spike Samples S and support T have sample proximity γ if for every θ i T there exist s i, s i S such that θ i s i γ and θ i s i γ We consider arbitrary non-uniform sampling patterns with fixed γ

Sampling proximity γ γ γ γ

Theoretical questions 1. Is the problem well posed? 2. Does TV -norm minimization work?

Deconvolution via TV-norm minimization minimize subject to x TV φ(θ) x( dθ) = y

Dual certificate for SNL problems v R n is a dual certificate associated to x = j c j δ θj c j R, θ j T if n Q (θ) := v T φ (θ) = v k K (s k θ) k=1 Q (θ j ) = sign (c j ) if θ j T Q (θ) < 1 if θ / T Linear combination of shifted copies of K fixed at the samples

Certificate for deconvolution +1 θ 1 θ 2 θ 3 1

Certificate construction Only use subset S containing 2 samples close to each spike Q(θ) = sj S v j K (s j θ) Fit v so that for all θ i T Q (θ i ) = sign (c i ) Q (θ i ) = 0

It works! +1 s 2,1 θ2 s 2,2 s 1,1 θ 1 s 1,2 s 3,1 θ 3 s 3,2 Gaussian Kernel 1

It works! +1 s 2,1 θ2 s 2,2 s 1,1 θ 1 s 1,2 s 3,1 θ 3 s 3,2 Ricker Kernel 1

Certificate construction Problem: The construction is difficult to analyze (coefficients vary) Solution: Reparametrization into bumps and waves Q(θ) = sj S v j K (s j θ) = θ i T α i B θi (θ, s i,1, s i,2 ) + β i W θi (θ, s i,1, s i,2 ),

Bump function (Gaussian kernel) +1 Bump Gaussian s 1 θ i s 2 B θi (θ, s i,1, s i,2 ) := b i,1 K( s i,1 θ) + b i,2 K( s i,2 θ) B θi (θ i, s i,1, s i,2 ) = 1 θ B θ i (θ i, s i,1, s i,2 ) = 0

Wave function (Gaussian kernel) Wave Gaussian s 1 s 2 θ i W θi (θ, s i,1, s i,2 ) = w i,1 K( s i,1 θ) + w i,2 K( s i,2 θ) W θi (θ i, s i,1, s i,2 ) = 0 θ W θ i (θ i, s i,1, s i,2 ) = 1

Bump function (Ricker wavelet) +1 Bump Ricker s 1 θ i s 2 B θi (θ, s i,1, s i,2 ) := b i,1 K( s i,1 θ) + b i,2 K( s i,2 θ) B θi (θ i, s i,1, s i,2 ) = 1 θ B θ i (θ i, s i,1, s i,2 ) = 0

Wave function (Ricker wavelet) Wave Ricker s 1 s 2 θ i W θi (θ, s i,1, s i,2 ) = w i,1 K( s i,1 θ) + w i,2 K( s i,2 θ) W θi (θ i, s i,1, s i,2 ) = 0 θ W θ i (θ i, s i,1, s i,2 ) = 1

Certificate construction Reparametrization decouples the coefficients Q(θ) = sj S v j K (s j θ) = θ i T α i B θi (θ, s i,1, s i,2 ) + β i W θi (θ, s i,1, s i,2 ) sign (c i ) B θi (θ, s i,1, s i,2 ) θ i T

Certificate for deconvolution (Gaussian kernel) +1 s 2,1 θ2 s 2,2 s 1,1 θ 1 s 1,2 s 3,1 θ 3 s 3,2 Gaussian Kernel 1

Certificate for deconvolution (Gaussian kernel) +1 s 2,1 θ 2 s 2,2 s 1,1 θ 1 s 1,2 s 3,1 θ 3 s 3,2 Bump Wave 1

Certificate for deconvolution (Ricker wavelet) +1 s 2,1 θ2 s 2,2 s 1,1 θ 1 s 1,2 s 3,1 θ 3 s 3,2 Ricker Kernel 1

Certificate for deconvolution (Ricker wavelet) +1 s 2,1 θ2 s 2,2 Bump Wave s 1,1 θ 1 s 1,2 s 3,1 θ 3 s 3,2 1

Exact recovery guarantees [Bernstein, F. 2017] 1.2 Gaussian Kernel Sample Proximity 1 0.8 0.6 0.4 Numerical Recovery Exact Recovery 0.2 1 2 3 4 5 6 7 8 Spike Separation

Exact recovery guarantees [Bernstein, F. 2017] 0.8 Ricker Kernel Sample Proximity 0.7 0.6 0.5 0.4 0.3 0.2 0.1 Numerical Recovery Exact Recovery 1 2 3 4 5 6 7 8 Spike Separation

Separable Nonlinear Inverse Problems Sparse Recovery for SNL Problems Super-resolution Deconvolution SNL Problems with Correlation Decay

SNL problems Joint work with Brett Bernstein (Courant) and Sheng Liu (CDS, NYU)

General SNL problems The function φ may not be available explicitly but can often be computed numerically by solving a differential equation Source localization in EEG Direction of arrival in radar / sonar Magnetic-resonance fingerprinting

Mathematical model Signal: superposition of Dirac measures with support T x = j c j δ θj c j R, θ j T [0, 1] Data: n measurements following SNL model y = φ (θ) x (dθ)

Diffusion on a rod with varying conductivity θ 1

Diffusion on a rod with varying conductivity θ 2 θ 1

Diffusion on a rod with varying conductivity θ 2 θ 1 θ 3

Diffusion on a rod with varying conductivity

Diffusion on a rod with varying conductivity φ(θ) can be computed by solving differential equation s 0 s 1 s 2 s 3 s 4 s 5 s 6 s 7 s 8 s 9 s 10s 11s 12s 13s 14s 15s 16s 17s 18s 19s 20s 21s 22s 23s 24s 25s 26s 27s 28 θ 0 θ 1 θ 2

Time-frequency pulses θ 1

Time-frequency pulses θ 2 θ 1

Time-frequency pulses θ 1 θ 2 θ 3

Time-frequency pulses

Time-frequency pulses Gabor wavelets in 1D φ(θ) k = exp ( (s k θ) 2 ) sin(150 s k (s k θ)) θ [0, 1] 2σ θ 0 θ 1 θ 2 s 0 s 1 s 2 s 3 s 4 s 5 s 6 s 7 s 8 s 9 s 10 s 11 s 12 s 13 s 14 s 15 s 16 s 17 s 18 s 19 s 20 s 21 s 22

Sparse estimation for general SNL problems Problem: Sparse recovery requires RIP-like properties that do not hold for SNL problems with smooth φ (even if we discretize) We cannot hope to recover all sparse signals How about signals such that φ(θ i ) T φ(θ j ) is small for all θ i θ j in T? Challenge: Prove guarantees for general SNL problems that only depend on correlation structure

SNL problems with correlation decay Aim: Guarantees for signals under separation conditions with respect to support-centered correlations ρ 1,..., ρ s ρ i (θ) := φ(θ i ) T φ(θ)

Diffusion on a rod with varying conductivity s 0 s 1 s 2 s 3 s 4 s 5 s 6 s 7 s 8 s 9 s 10s 11s 12s 13s 14s 15s 16s 17s 18s 19s 20s 21s 22s 23s 24s 25s 26s 27s 28 θ 0 θ 1 θ 2

Support-centered correlations θ 0 θ 1 θ 2

Time-frequency pulses φ(θ) k = exp ( (s k θ) 2 ) sin(150 s k (s k θ)) θ [0, 1] 2σ θ 0 θ 1 θ 2 s 0 s 1 s 2 s 3 s 4 s 5 s 6 s 7 s 8 s 9 s 10 s 11 s 12 s 13 s 14 s 15 s 16 s 17 s 18 s 19 s 20 s 21 s 22

Support-centered correlations θ 0 θ 1 θ 2

Correlation decay Parametrized by F i < N i < N + i < F + i and σ i Parameters can be different at each θ i Fi 2σ i θ i 1σ Fi Ni i N i + F i + F i + i + 1σ i F i + + 2σ i F

Correlation decay ρ i is concave in [N i, N + i ]: ρ i (θ) < γ 0 ρ i is bounded outside [N i, N + i ]: ρ i (θ) < γ 1 ρ i decays for θ < F i and θ > F + i : ρ i (θ) < γ 2 e ( θ F i )/σ i for θ < F i ρ i (θ) < γ 2 e ( θ F + i )/σ i for θ > F + Choice of exponential decay is arbitrary i

Correlation decay Additional condition on correlation derivatives ρ (q,r) i (θ) := φ (q) (θ i ) T φ (r) (θ) ρ (q,r) i ρ (q,r) i ρ (q,r) i decays for θ < F i and θ > F + i, for q = 0, 1, r = 0, 1, 2: (θ) < γ 2 e ( θ F i )/σ i for θ < F i (θ) < γ 2 e ( θ F + i )/σ i for θ > F + Open question: Is this necessary? i

Normalized distance Normalized distance from θ to θ j > θ d j (θ) := F j σ j θ If d j (θ) is large, φ(θ) and φ(θ j ) are not very correlated θ 1 θ F2 σ2 d 2 (θ) θ 2 F 3 σ 3 d 3 (θ) θ 3

Minimum separation conditions Normalized distance between spikes is equal to θ 1 F 1 + σ F2 2 θ 2 F 3 θ 3 2 σ 3 θ 1 θ 1 2 F 2 σ2 2 θ 2 F 3 θ 3 3 σ3 2

Guarantees for SNL problems with decaying correlation Theorem [Bernstein, F. 2018] For any SNL problem with decaying correlation TV-norm minimization achieves exact recovery under the separation conditions if > C for a fixed constant C depending on the decay bounds γ 0, γ 1, γ 2

Dual certificate for SNL problems v R n is a dual certificate associated to x = j c j δ θj c j R, θ j T if Q (θ) := v T φ (θ) Q (θ j ) = sign (c j ) if θ j T Q (θ) < 1 if θ / T

Dual certificate construction Use support-centered correlations to interpolate sign pattern Q (θ) := s α i ρ i (θ) i=1 +1 1

Dual certificate construction Use support-centered correlations to interpolate sign pattern Q (θ) := s α i ρ i (θ) i=1

Dual certificate construction Use support-centered correlations to interpolate sign pattern s Q (θ) := α i ρ i (θ) i=1 s = α i φ(θ i ) T φ(θ) i=1

Dual certificate construction Use support-centered correlations to interpolate sign pattern s Q (θ) := α i ρ i (θ) i=1 s = α i φ(θ i ) T φ(θ) i=1 = v T φ(θ) v := s α i φ(θ i ) i=1

Dual certificate construction Use support-centered correlations to interpolate sign pattern s Q (θ) := α i ρ i (θ) i=1 s = α i φ(θ i ) T φ(θ) i=1 = v T φ(θ) v := s α i φ(θ i ) Technical detail: Correction term to ensure derivative vanishes i=1

Robustness to noise / outliers Variations of dual certificates establish robustness at small noise levels (Candès, F. 2013), (F. 2013), (Bernstein, F. 2017) Exact recovery with constant number of outliers (up to log factors) (F., Tang, Wang, Zheng 2017), (Bernstein, F. 2017) Open questions: Analysis of higher-noise levels and discretization error, robustness for positive amplitudes

Drawbacks Solving convex program is computationally expensive Approach doesn t scale well at high dimensions In practice, reweighting is need to obtain sparse solutions for noisy data Open question: Analysis of other techniques (reweighting methods, descent methods on nonconvex cost functions)

Conclusion Previous works focus mostly on random operators For deterministic problems sparsity is not enough! Under separation conditions: 1. Sharp guarantees for super-resolution and deconvolution 2. General guarantees for SNL problems with correlation decay

References Prolate spheroidal wave functions, Fourier analysis, and uncertainty V V - The discrete case. D. Slepian. Bell System Technical Journal, 1978 Super-resolution, extremal functions and the condition number of Vandermonde matrices. A. Moitra. Symposium on Theory of Computing (STOC), 2015 The recoverability limit for superresolution via sparsity. L. Demanet, N. Nguyen Robust recovery of stream of pulses using convex optimization. T. Bendory, S. Dekel, A. Feuer. J. of Math. Analysis and App., 2016 Super-resolution without separation. G. Schiebinger, E. Robeva, B. Recht. Information and Inference, 2016

References Towards a mathematical theory of super-resolution. E. J. Candès, C. Fernandez-Granda. Comm. on Pure and Applied Math., 2013 Super-resolution from noisy data. E. J. Candès, C. Fernandez-Granda. J. of Fourier Analysis and Applications, 2014 Support detection in super-resolution. C. Fernandez-Granda. Proceedings of SampTA 2013 Super-resolution of point sources via convex programming. C. Fernandez-Granda. Information and Inference, 2016

References Demixing sines and spikes: robust spectral super-resolution in the presence of outliers. C. Fernandez-Granda, G. Tang, X. Wang, L. Zheng. Information and Inference, 2017 Multicompartment magnetic-resonance fingerprinting. S. Tang, C. Fernandez-Granda, S. Lannuzel, B. Bernstein, R. Lattanzi, M. Cloos, F. Knoll, J. Asslaender Deconvolution of point sources: A sampling theorem and robustness guarantees. B. Bernstein, C. Fernandez-Granda Sparse recovery beyond compressed sensing: Separable nonlinear inverse problems. B. Bernstein, C. Fernandez-Granda, S. Liu