Going off the grid. Benjamin Recht University of California, Berkeley. Joint work with Badri Bhaskar Parikshit Shah Gonnguo Tang

Similar documents
Going off the grid. Benjamin Recht Department of Computer Sciences University of Wisconsin-Madison

Combining Sparsity with Physically-Meaningful Constraints in Sparse Parameter Estimation

Super-resolution via Convex Programming

Towards a Mathematical Theory of Super-resolution

ROBUST BLIND SPIKES DECONVOLUTION. Yuejie Chi. Department of ECE and Department of BMI The Ohio State University, Columbus, Ohio 43210

arxiv: v2 [cs.it] 16 Feb 2013

Sparse Parameter Estimation: Compressed Sensing meets Matrix Pencil

Parameter Estimation for Mixture Models via Convex Optimization

ELE 538B: Sparsity, Structure and Inference. Super-Resolution. Yuxin Chen Princeton University, Spring 2017

Optimization-based sparse recovery: Compressed sensing vs. super-resolution

From Compressed Sensing to Matrix Completion and Beyond. Benjamin Recht Department of Computer Sciences University of Wisconsin-Madison

Compressed Sensing Off the Grid

Strengthened Sobolev inequalities for a random subspace of functions

Fast 2-D Direction of Arrival Estimation using Two-Stage Gridless Compressive Sensing

LARGE SCALE 2D SPECTRAL COMPRESSED SENSING IN CONTINUOUS DOMAIN

Support Detection in Super-resolution

The convex algebraic geometry of linear inverse problems

arxiv:submit/ [cs.it] 25 Jul 2012

SPECTRAL COMPRESSIVE SENSING WITH POLAR INTERPOLATION. Karsten Fyhn, Hamid Dadkhahi, Marco F. Duarte

Approximate Support Recovery of Atomic Line Spectral Estimation: A Tale of Resolution and Precision

ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis

Sparse DOA estimation with polynomial rooting

Compressed Sensing and Robust Recovery of Low Rank Matrices

The convex algebraic geometry of rank minimization

Off-the-Grid Line Spectrum Denoising and Estimation with Multiple Measurement Vectors

COMPRESSED SENSING (CS) is an emerging theory

Massive MIMO: Signal Structure, Efficient Processing, and Open Problems II

Information and Resolution

AN ALGORITHM FOR EXACT SUPER-RESOLUTION AND PHASE RETRIEVAL

Gauge optimization and duality

A Fast Algorithm for Reconstruction of Spectrally Sparse Signals in Super-Resolution

A Super-Resolution Algorithm for Multiband Signal Identification

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference

A New Estimate of Restricted Isometry Constants for Sparse Solutions

Compressed Sensing and Affine Rank Minimization Under Restricted Isometry

Compressive Sensing of Sparse Tensor

Solving Underdetermined Linear Equations and Overdetermined Quadratic Equations (using Convex Programming)

On the recovery of measures without separation conditions

Demixing Sines and Spikes: Robust Spectral Super-resolution in the Presence of Outliers

Linear System Identification via Atomic Norm Regularization

Rapid, Robust, and Reliable Blind Deconvolution via Nonconvex Optimization

Random projections. 1 Introduction. 2 Dimensionality reduction. Lecture notes 5 February 29, 2016

Robust Uncertainty Principles: Exact Signal Reconstruction from Highly Incomplete Frequency Information

Super-Resolution of Mutually Interfering Signals

Sparse and Low-Rank Matrix Decompositions

Sparse Recovery Beyond Compressed Sensing

Lecture Notes 9: Constrained Optimization

Three Generalizations of Compressed Sensing

Minimizing the Difference of L 1 and L 2 Norms with Applications

Three Recent Examples of The Effectiveness of Convex Programming in the Information Sciences

Noisy Signal Recovery via Iterative Reweighted L1-Minimization

Blind Deconvolution Using Convex Programming. Jiaming Cao

SIGNALS with sparse representations can be recovered

Constrained optimization

Compressed Sensing, Sparse Inversion, and Model Mismatch

Self-Calibration and Biconvex Compressive Sensing

Optimisation Combinatoire et Convexe.

Robust Principal Component Analysis

Sparse Interactions: Identifying High-Dimensional Multilinear Systems via Compressed Sensing

Exact Joint Sparse Frequency Recovery via Optimization Methods

Fast Angular Synchronization for Phase Retrieval via Incomplete Information

Spectral Compressed Sensing via Structured Matrix Completion

New Coherence and RIP Analysis for Weak. Orthogonal Matching Pursuit

Low-Rank Matrix Recovery

arxiv: v1 [math.oc] 11 Jun 2009

Computable Performance Analysis of Sparsity Recovery with Applications

Matrix completion: Fundamental limits and efficient algorithms. Sewoong Oh Stanford University

Signal Recovery, Uncertainty Relations, and Minkowski Dimension

An Extended Frank-Wolfe Method, with Application to Low-Rank Matrix Completion

Efficient Spectral Methods for Learning Mixture Models!

Compressive Line Spectrum Estimation with Clustering and Interpolation

Robust Recovery of Positive Stream of Pulses

arxiv: v1 [cs.it] 21 Feb 2013

arxiv: v4 [cs.it] 21 Dec 2017

Generalized Orthogonal Matching Pursuit- A Review and Some

CSC 576: Variants of Sparse Learning

Compressive Two-Dimensional Harmonic Retrieval via Atomic Norm Minimization

An iterative hard thresholding estimator for low rank matrix recovery

Tractable Upper Bounds on the Restricted Isometry Constant

IEEE SIGNAL PROCESSING LETTERS, VOL. 22, NO. 9, SEPTEMBER

Multipath Matching Pursuit

Lecture: Introduction to Compressed Sensing Sparse Recovery Guarantees

GREEDY SIGNAL RECOVERY REVIEW

Chapter 3 Compressed Sensing, Sparse Inversion, and Model Mismatch

Probabilistic Low-Rank Matrix Completion with Adaptive Spectral Regularization Algorithms

Uniqueness Conditions For Low-Rank Matrix Recovery

Compressive Sensing under Matrix Uncertainties: An Approximate Message Passing Approach

Recovery of Simultaneously Structured Models using Convex Optimization

Analysis of Robust PCA via Local Incoherence

Recovering any low-rank matrix, provably

The Analysis Cosparse Model for Signals and Images

Simultaneous Sparsity

Phase recovery with PhaseCut and the wavelet transform case

Gaussian Phase Transitions and Conic Intrinsic Volumes: Steining the Steiner formula

Solving Corrupted Quadratic Equations, Provably

arxiv: v1 [cs.it] 4 Nov 2017

The Convex Geometry of Linear Inverse Problems

Exact Low Tubal Rank Tensor Recovery from Gaussian Measurements

Super-Resolution of Point Sources via Convex Programming

+1 (626) Dwight Way Berkeley, CA 94704

Transcription:

Going off the grid Benjamin Recht University of California, Berkeley Joint work with Badri Bhaskar Parikshit Shah Gonnguo Tang

imaging astronomy seismology spectroscopy x(t) = kx j= c j e i2 f jt DOA Estimation Sampling GPS Radar Ultrasound

imaging astronomy seismology spectroscopy x(t) = kx c j g(t j ) j= DOA Estimation Sampling GPS Radar Ultrasound

imaging astronomy seismology spectroscopy x(t) = kx j= c j g(t j )e i2 f jt DOA Estimation Sampling GPS Radar Ultrasound

imaging astronomy seismology spectroscopy x(s) =PSF~ 8 < : kx j= 9 = c j (s s j ) ; DOA Estimation Sampling GPS Radar Ultrasound

imaging astronomy seismology spectroscopy ˆx() = d PSF() kx c j e i2 s j j= DOA Estimation Sampling GPS Radar Ultrasound

Observe a sparse combination of sinusoids sx x m = c k e i2 mu k for some u k 2 [, ) k= Spectrum Estimation: find a combination of sinusoids agreeing with time series data Classic (79...): Prony s method Assume coefficients are positive for simplicity r k= c k e i2 u k e i4 u k e i6 u k e i2 u k e i4 u k e i6 u k = x x x 2 x 3 x x x x 2 x 2 x x x x 3 x 2 x x =: toep(x) toep(x) is positive semidefinite, and any null vector corresponds to a polynomial that vanishes at MUSIC, ESPRIT, Cadzow, etc. e i2 u k

Observe a sparse combination of sinusoids sx x m = c k e i2 mu k for some u k 2 [, ) k= Spectrum Estimation: find a combination of sinusoids agreeing with time series data sparse Contemporary: x Fc n N F ab =exp(i2 ab/n) Solve with LASSO: minimize x Fc 2 2 + µ c

Observe a sparse combination of sinusoids sx x m = c k e i2 mu k for some u k 2 [, ) k= Spectrum Estimation: find a combination of sinusoids agreeing with time series data Classic SVD grid free need to know model order lack of quantitative theory unstable in practice Contemporary gridding+l minimization robust model selection quantitative theory discretization error basis mismatch numerical instability Can we bridge the gap?

Parsimonious Models rank model weights atoms Search for best linear combination of fewest atoms rank = fewest atoms needed to describe the model

Atomic Norms Given a basic set of atoms,, define the function kxk A =inf{t > : x 2 tconv(a)} A When is centrosymmetric, we get a norm A kxk A =inf{ X a2a c a : x = X a2a c a a} IDEA: When does this work? minimize subject to kzk A z = y

Atomic Norm Minimization IDEA: minimize subject to kzk A z = y Generalizes existing, powerful methods Rigorous formula for developing new analysis algorithms Precise, tight bounds on number of measurements needed for model recovery One algorithm prototype for a myriad of dataanalysis applications Chandrasekaran, R, Parrilo, and Willsky

Spectrum Estimation Observe a sparse combination of sinusoids sx x? m = c? ke i2 mu? k for some u? k 2 [, ) k= Observe: y = x? + (signal plus noise) Atomic Set 82 >< A = 6 4 >: e e e e i i2 +i i4 +i i2 n+i 3 7 5 : 2 [, 2 ), 2 [, ) 9 >= >; Classical techniques (Prony, Matrix Pencil, MUSIC, ESPRIT, Cadzow), use the fact that noiseless moment matrices are low-rank: rx k= k 2 6 4 e k i e 2 ki e 3 ki 3 2 7 6 5 4 e k i e 2 ki e 3 ki 3 7 5 = 2 6 4 µ µ µ 2 µ 3 µ µ µ µ 2 µ 2 µ µ µ µ 3 µ 2 µ µ 3 7 5

Atomic Norm for Spectrum Estimation IDEA: minimize kzk A subject to k z yk apple Atomic Set: 82 >< A = 6 4 >: e e e e i i2 +i i4 +i. i2 n+i 3 7 5 : 2 [, 2 ), 2 [, ) 9 >= >; How do we solve the optimization problem? Can we approximate the true signal from partial and noisy measurements? Can we estimate the frequencies from partial and noisy measurements?

Which atomic norm for sinusoids? x m = sx c k e i2 mu k for some u k 2 [, ) k= When the rx k= c k 2 6 4 e i2 u k e i4 u k e i6 u k c k 3 2 7 6 5 4 are positive e i2 u k e i4 u k e i6 u k 3 7 5 = 2 6 4 x x x 2 x 3 x x x x 2 x 2 x x x x 3 x 2 x x 3 7 5 For general coefficients, the convex hull is characterized by linear matrix inequalities (Toeplitz positive semidefinite) kxk A =inf 2 t + 2 w : apple t x x toep(w) Moment Curve of [t,t 2,t 3,t 4,...], t 2 S

A A [ { A} conv(a) conv(a [ { A})

cos(2 u k)/2 + cos(2 u 2 k)/2 cos(2 u k)/2 cos(2 u 2 k)/2 u =.43 u 2 =.4

Nearly optimal rates Observe a sparse combination of sinusoids sx x? m = c? ke i2 mu? k for some u? k 2 [, ) k= Observe: y = x? + Assume frequencies are far apart: (signal plus noise) min p6=q d(u p,u q ) 4 n Solve: ˆx = arg min x 2 kx yk2 2 + µkxk A Error Rate: n kˆx x? k 2 2 apple C 2 s log(n) n No algorithm can do better than apple E n kˆx x? k 2 C 2 s log(n/s) 2 n even if the frequencies are wellseparated No algorithm can do better than n kˆx x? k 2 2 C n 2 s even if we knew all of the frequencies (uk * )

Mean Square Error Performance Profile P ( ) Frequencies generated at random with /n separation. Random phases, fading amplitudes. Cadzow and MUSIC provided model order. AST (Atomic Norm Soft Thresholding) and LASSO estimate noise power. Lower is better Performance profile over all parameter values and settings For algorithm s: P s ( )= # {p 2 P : MSE s(p) apple #(P) Higher is better min s MSE s (p)}

Extracting the decomposition How do you extract the frequencies? Look at the dual norm: kvk A = maxha, vi = max a2a u2[,) Dual norm is the maximum modulus of a polynomial nx k= v k e 2 iku At optimality, maximum modulus is attained at the support of the signal works way better than Prony interpolation in practice

Aside: the proof How do you prove any compressed sensing-esque result? Look at the dual norm: kvk A = maxha, vi = max a2a u2[,) nx k= v k e 2 iku Generic proof: Construct a polynomial such that the atoms you want maximize the polynomial. Then prove that the polynomial is bounded everywhere else.

Localization Guarantees Dual Polynomial Magnitude Estimated Frequency.6/n True Frequency Spurious Frequency..2.3.4.5.6.7.8.9 Frequency Spurious Amplitudes l: ˆf l F ĉ l C k 2 (n) n Weighted Frequency Deviation l: ˆf l N j ĉ l f j T d(f j, ˆf l ) 2 C 2 k 2 (n) n Near region approximation c j l: ˆfl Nj ĉ l C 3 k 2 (n) n

Frequency Localization Spurious Amplitudes Weighted Frequency Deviation Near region approximation l: ˆf l F ĉ l C k 2 (n) n l: ˆf l N j ĉ l f j T d(f j, ˆf l ) 2 C 2 k 2 (n) n c j l: ˆfl Nj ĉ l C 3 k 2 (n) n.8.8.8 Ps(β).6.4 Ps(β).6.4 Ps(β).6.4.2 8 AST MUSIC Cadzow 2 4 6 8 β AST MUSIC Cadzow.2.4.3 AST MUSIC Cadzow 2 3 4 5 β AST MUSIC Cadzow.2 5 4 AST MUSIC Cadzow 5 5 2 β AST MUSIC Cadzow m 6 4 k = 32 m2.2 k = 32 m3 3 2 k = 32 2. 5 5 5 2 SNR (db) 5 5 5 2 SNR (db) 5 5 5 2 SNR (db)

Incomplete Data/Random Sampling Observe a random subset of samples On the grid, Candes, Romberg & Tao (24) Off the grid, new, Compressed Sensing extended to the wide continuous domain Recover the missing part by solving T {,,...,n } minimize kzk A z subject to z T = x T. Extract frequencies from the dual optimal solution Full observation Random sampling

Exact Recovery (off the grid) { Δ sx x m = c k exp(2 imu k ) k= WANT {u k,c k } Theorem: (Candès and Fernandez-Granda 22) A line spectrum with minimum frequency separation Δ > 4/s can be recovered from the first 2s Fourier coefficients via atomic norm minimization. Theorem: (Tang, Bhaskar, Shah, and R. 22) A line spectrum with minimum frequency separation Δ > 4/n can be recovered from most subsets of the first n Fourier coefficients of size at least O(s log(s) log(n)). s random samples are better than s equispaced samples. On a grid, this is just compressed sensing Off the grid, this is new See [BCGLS2] for a sketching approach to a similar problem No balancing of coherence as n grows.

Mean Square Error Performance Profile P ( ) SDP BP:4x BP:64x.5.5.5.5.5.5.5.5.5.5.5.5.5.5.5

Discretization Discretize the parameter space to get a finite dictionary x(t) =? NX l= c l e i2 f lt,f l 2 grids Wakin, Becker, et.al (22), Fannjiang, Strohmer &Yan (2), Bajwa, Haupt, Sayeed & Nowak (2), Tropp, Laska, Duarte, Romberg & Baraniuk (2), Herman & Strohmer (29), Malioutov, Cetin & Willsky (25), Candes, Romberg & Tao (24) Basis mismatch Chi, Scharf, Pezeshki & Calderbank (2), Herman & Strohmer (2)

Approximation through discretization Relaxations: A A 2 =) kxk A2 applekxk A T T Let be a finite net. T T ( X Let be a matrix whose columns are the set ) A kxk A =inf c k : x = c k ` + extra equality constraints When polynomials of bounded degree are Lipschitz on T, there is a lambda in (,] such that kxk A applekxk A applekxk A

Approximation through discretization Relaxations: A A 2 =) kxk A2 applekxk A T T Let be a finite net. T T ( X Let be a matrix whose columns are the set ) A kxk A =inf c k : x = c k ` + extra equality constraints When polynomials of bounded degree are Lipschitz on T, there is a lambda in (,] such that kxk A applekxk A applekxk A

Discretization for sinusoids Discretized Atomic Set: A N = 82 >< 6 4 >: e e e e i i2 +i i4 +i i2 n+i 3 7 5 : = 2 k N, k =,...,N, 2 [, ) 9 >= >; kxk AN =inf ( N X k= c k : x = F [N,n]c ) F [N,n] = First n rows of N x N DFT Matrix 2 n N kxk A N applekxk A applekxk AN Approximation fast to compute using first-order methods. Approximation only improves as N improves. Allows you to estimate the signal, but you can t recover the exact frequencies.

Discretization Discretize the parameter space to get a finite number of grid points Enforce finite number of constraints: h z,a( j )i apple, j 2 m Equivalently, in the primal replace the atomic norm with a discrete one What happens to the solutions when

Convergence in Dual Assumption: there exist parameters such that are linearly independent Enforce finite constraints in the dual: h z,a( j )i apple, j 2 m Theorem: The discretized optimal objectives converge to the original objective Any solution sequence {ẑ m } of the discretized problems has a subsequence that converges to the solution set of the original problem For the LASSO dual, the convergence speed is O( m ) m =32.5.5 2.5 3 m =28.5 4 6 8 2.5 5.5 m =52.5.5 5 4 6 8 2 log.5 2 (m) log 2 ( fm f ) log 2 ( ẑm ẑ )

Convergence in Primal The original primal solution is associated with a measure ˆµ : ˆx = R a()ˆµ(d), kˆxk A = kˆµk TV The discretized primal solutions ˆx m are associated with measure ˆµ that.5. m are supported only on m Theorem The discretized optimal measures converge in distribution to an original optimal measure Fixed a neighborhood of the support of an original optimal measure, the supports of the discretized optimal measures will eventually converge to the neighborhood ˆx.5.5 m = 64 m = 256.5. m = 24.5.5.

Single Molecule Imaging Courtesy of Zhuang Research Lab

Single Molecule Imaging Bundles of 8 tubes of 3 nm diameter Sparse density: 849 molecules on 2 frames Resolution: 64x64 pixels Pixel size: nmxnm Field of view: 64nmx64nm Target resolution: nmxnm Discretize the FOV into 64x64 pixels I(x, y) = X j c j PSF(x x j,y y j ) (x j,y j ) 2 [, 64] 2 (x, y) 2 {5, 5,...,635}

Single Molecule Imaging

Single Molecule Imaging.8.8 Precision.6.4 Recall.6.4.2 Sparse CoG quickpalm 2 Radius 3 4 5.2 Sparse CoG quickpalm 2 Radius 3 4 5 8.8 Jaccard 6 4 F score.6.4 2 Sparse CoG quickpalm 2 3 4 5 Radius.2 Sparse CoG quickpalm 2 3 4 5 Radius

Summary and Future ` Work Unified approach for continuous problems in estimation. Obviates incoherence and basis mismatch New insight into bounds on estimation error and exact recovery through convex duality and algebraic geometry State-of-the-art results in classic fields of spectrum estimation and system identification Recovery of general sparse signal trains More general moment problems in signal processing

Acknowledgements Work developed with Venkat Chandrasekaran, Babak Hassibi (Caltech), Weiyu Xu (Iowa), Pablo A. Parrilo, Alan Willsky (MIT), Maryam Fazel (Washington) Badri Bhaskar, Rob Nowak, Nikhil Rao, Gongguo Tang (Wisconsin). For all references, see: http://pages.cs.wisc.edu/~brecht/publications.html

References Atomic norm denoising with applications to line spectral estimation. Badri Narayan Bhaskar, Gongguo Tang, and Benjamin Recht. IEEE Transactions on Signal Processing. Vol 6, no 23, pages 5987-5999. 23. Compressed sensing off the grid. Gongguo Tang, Badri Bhaskar, Parikshit Shah, and Benjamin Recht. IEEE Transactions on Information Theory. Vol 59, no, pages 7465-749. 23. Near Minimax Line Spectral Estimation. Gongguo Tang, Badri Narayan Bhaskar, and Benjamin Recht. Submitted to IEEE Transactions on Information Theory, 23. The convex geometry of inverse problems. Venkat Chandrasekaran, Benjamin Recht, Pablo Parrilo, and Alan Willsky. Foundations on Computational Mathematics. Vol. 2, no 6, pages 85-849. 22. All references can be found at http://pages.cs.wisc.edu/~brecht/publications.html