Going off the grid. Benjamin Recht University of California, Berkeley. Joint work with Badri Bhaskar Parikshit Shah Gonnguo Tang

Going off the grid Benjamin Recht University of California, Berkeley Joint work with Badri Bhaskar Parikshit Shah Gonnguo Tang

imaging astronomy seismology spectroscopy x(t) = kx j= c j e i2 f jt DOA Estimation Sampling GPS Radar Ultrasound

imaging astronomy seismology spectroscopy x(t) = kx c j g(t j ) j= DOA Estimation Sampling GPS Radar Ultrasound

imaging astronomy seismology spectroscopy x(t) = kx j= c j g(t j )e i2 f jt DOA Estimation Sampling GPS Radar Ultrasound

imaging astronomy seismology spectroscopy x(s) =PSF~ 8 < : kx j= 9 = c j (s s j ) ; DOA Estimation Sampling GPS Radar Ultrasound

imaging astronomy seismology spectroscopy ˆx() = d PSF() kx c j e i2 s j j= DOA Estimation Sampling GPS Radar Ultrasound

Observe a sparse combination of sinusoids sx x m = c k e i2 mu k for some u k 2 [, ) k= Spectrum Estimation: find a combination of sinusoids agreeing with time series data Classic (79...): Prony s method Assume coefficients are positive for simplicity r k= c k e i2 u k e i4 u k e i6 u k e i2 u k e i4 u k e i6 u k = x x x 2 x 3 x x x x 2 x 2 x x x x 3 x 2 x x =: toep(x) toep(x) is positive semidefinite, and any null vector corresponds to a polynomial that vanishes at MUSIC, ESPRIT, Cadzow, etc. e i2 u k

Observe a sparse combination of sinusoids sx x m = c k e i2 mu k for some u k 2 [, ) k= Spectrum Estimation: find a combination of sinusoids agreeing with time series data sparse Contemporary: x Fc n N F ab =exp(i2 ab/n) Solve with LASSO: minimize x Fc 2 2 + µ c

Observe a sparse combination of sinusoids sx x m = c k e i2 mu k for some u k 2 [, ) k= Spectrum Estimation: find a combination of sinusoids agreeing with time series data Classic SVD grid free need to know model order lack of quantitative theory unstable in practice Contemporary gridding+l minimization robust model selection quantitative theory discretization error basis mismatch numerical instability Can we bridge the gap?

Parsimonious Models rank model weights atoms Search for best linear combination of fewest atoms rank = fewest atoms needed to describe the model

Atomic Norms Given a basic set of atoms,, define the function kxk A =inf{t > : x 2 tconv(a)} A When is centrosymmetric, we get a norm A kxk A =inf{ X a2a c a : x = X a2a c a a} IDEA: When does this work? minimize subject to kzk A z = y

Atomic Norm Minimization IDEA: minimize subject to kzk A z = y Generalizes existing, powerful methods Rigorous formula for developing new analysis algorithms Precise, tight bounds on number of measurements needed for model recovery One algorithm prototype for a myriad of dataanalysis applications Chandrasekaran, R, Parrilo, and Willsky

Spectrum Estimation Observe a sparse combination of sinusoids sx x? m = c? ke i2 mu? k for some u? k 2 [, ) k= Observe: y = x? + (signal plus noise) Atomic Set 82 >< A = 6 4 >: e e e e i i2 +i i4 +i i2 n+i 3 7 5 : 2 [, 2 ), 2 [, ) 9 >= >; Classical techniques (Prony, Matrix Pencil, MUSIC, ESPRIT, Cadzow), use the fact that noiseless moment matrices are low-rank: rx k= k 2 6 4 e k i e 2 ki e 3 ki 3 2 7 6 5 4 e k i e 2 ki e 3 ki 3 7 5 = 2 6 4 µ µ µ 2 µ 3 µ µ µ µ 2 µ 2 µ µ µ µ 3 µ 2 µ µ 3 7 5

Atomic Norm for Spectrum Estimation IDEA: minimize kzk A subject to k z yk apple Atomic Set: 82 >< A = 6 4 >: e e e e i i2 +i i4 +i. i2 n+i 3 7 5 : 2 [, 2 ), 2 [, ) 9 >= >; How do we solve the optimization problem? Can we approximate the true signal from partial and noisy measurements? Can we estimate the frequencies from partial and noisy measurements?

Which atomic norm for sinusoids? x m = sx c k e i2 mu k for some u k 2 [, ) k= When the rx k= c k 2 6 4 e i2 u k e i4 u k e i6 u k c k 3 2 7 6 5 4 are positive e i2 u k e i4 u k e i6 u k 3 7 5 = 2 6 4 x x x 2 x 3 x x x x 2 x 2 x x x x 3 x 2 x x 3 7 5 For general coefficients, the convex hull is characterized by linear matrix inequalities (Toeplitz positive semidefinite) kxk A =inf 2 t + 2 w : apple t x x toep(w) Moment Curve of [t,t 2,t 3,t 4,...], t 2 S

A A [ { A} conv(a) conv(a [ { A})

cos(2 u k)/2 + cos(2 u 2 k)/2 cos(2 u k)/2 cos(2 u 2 k)/2 u =.43 u 2 =.4

Nearly optimal rates Observe a sparse combination of sinusoids sx x? m = c? ke i2 mu? k for some u? k 2 [, ) k= Observe: y = x? + Assume frequencies are far apart: (signal plus noise) min p6=q d(u p,u q ) 4 n Solve: ˆx = arg min x 2 kx yk2 2 + µkxk A Error Rate: n kˆx x? k 2 2 apple C 2 s log(n) n No algorithm can do better than apple E n kˆx x? k 2 C 2 s log(n/s) 2 n even if the frequencies are wellseparated No algorithm can do better than n kˆx x? k 2 2 C n 2 s even if we knew all of the frequencies (uk * )

Mean Square Error Performance Profile P ( ) Frequencies generated at random with /n separation. Random phases, fading amplitudes. Cadzow and MUSIC provided model order. AST (Atomic Norm Soft Thresholding) and LASSO estimate noise power. Lower is better Performance profile over all parameter values and settings For algorithm s: P s ( )= # {p 2 P : MSE s(p) apple #(P) Higher is better min s MSE s (p)}

Extracting the decomposition How do you extract the frequencies? Look at the dual norm: kvk A = maxha, vi = max a2a u2[,) Dual norm is the maximum modulus of a polynomial nx k= v k e 2 iku At optimality, maximum modulus is attained at the support of the signal works way better than Prony interpolation in practice

Aside: the proof How do you prove any compressed sensing-esque result? Look at the dual norm: kvk A = maxha, vi = max a2a u2[,) nx k= v k e 2 iku Generic proof: Construct a polynomial such that the atoms you want maximize the polynomial. Then prove that the polynomial is bounded everywhere else.

Localization Guarantees Dual Polynomial Magnitude Estimated Frequency.6/n True Frequency Spurious Frequency..2.3.4.5.6.7.8.9 Frequency Spurious Amplitudes l: ˆf l F ĉ l C k 2 (n) n Weighted Frequency Deviation l: ˆf l N j ĉ l f j T d(f j, ˆf l ) 2 C 2 k 2 (n) n Near region approximation c j l: ˆfl Nj ĉ l C 3 k 2 (n) n

Frequency Localization Spurious Amplitudes Weighted Frequency Deviation Near region approximation l: ˆf l F ĉ l C k 2 (n) n l: ˆf l N j ĉ l f j T d(f j, ˆf l ) 2 C 2 k 2 (n) n c j l: ˆfl Nj ĉ l C 3 k 2 (n) n.8.8.8 Ps(β).6.4 Ps(β).6.4 Ps(β).6.4.2 8 AST MUSIC Cadzow 2 4 6 8 β AST MUSIC Cadzow.2.4.3 AST MUSIC Cadzow 2 3 4 5 β AST MUSIC Cadzow.2 5 4 AST MUSIC Cadzow 5 5 2 β AST MUSIC Cadzow m 6 4 k = 32 m2.2 k = 32 m3 3 2 k = 32 2. 5 5 5 2 SNR (db) 5 5 5 2 SNR (db) 5 5 5 2 SNR (db)

Incomplete Data/Random Sampling Observe a random subset of samples On the grid, Candes, Romberg & Tao (24) Off the grid, new, Compressed Sensing extended to the wide continuous domain Recover the missing part by solving T {,,...,n } minimize kzk A z subject to z T = x T. Extract frequencies from the dual optimal solution Full observation Random sampling

Exact Recovery (off the grid) { Δ sx x m = c k exp(2 imu k ) k= WANT {u k,c k } Theorem: (Candès and Fernandez-Granda 22) A line spectrum with minimum frequency separation Δ > 4/s can be recovered from the first 2s Fourier coefficients via atomic norm minimization. Theorem: (Tang, Bhaskar, Shah, and R. 22) A line spectrum with minimum frequency separation Δ > 4/n can be recovered from most subsets of the first n Fourier coefficients of size at least O(s log(s) log(n)). s random samples are better than s equispaced samples. On a grid, this is just compressed sensing Off the grid, this is new See [BCGLS2] for a sketching approach to a similar problem No balancing of coherence as n grows.

Mean Square Error Performance Profile P ( ) SDP BP:4x BP:64x.5.5.5.5.5.5.5.5.5.5.5.5.5.5.5

Discretization Discretize the parameter space to get a finite dictionary x(t) =? NX l= c l e i2 f lt,f l 2 grids Wakin, Becker, et.al (22), Fannjiang, Strohmer &Yan (2), Bajwa, Haupt, Sayeed & Nowak (2), Tropp, Laska, Duarte, Romberg & Baraniuk (2), Herman & Strohmer (29), Malioutov, Cetin & Willsky (25), Candes, Romberg & Tao (24) Basis mismatch Chi, Scharf, Pezeshki & Calderbank (2), Herman & Strohmer (2)

Approximation through discretization Relaxations: A A 2 =) kxk A2 applekxk A T T Let be a finite net. T T ( X Let be a matrix whose columns are the set ) A kxk A =inf c k : x = c k ` + extra equality constraints When polynomials of bounded degree are Lipschitz on T, there is a lambda in (,] such that kxk A applekxk A applekxk A

Discretization for sinusoids Discretized Atomic Set: A N = 82 >< 6 4 >: e e e e i i2 +i i4 +i i2 n+i 3 7 5 : = 2 k N, k =,...,N, 2 [, ) 9 >= >; kxk AN =inf ( N X k= c k : x = F [N,n]c ) F [N,n] = First n rows of N x N DFT Matrix 2 n N kxk A N applekxk A applekxk AN Approximation fast to compute using first-order methods. Approximation only improves as N improves. Allows you to estimate the signal, but you can t recover the exact frequencies.

Discretization Discretize the parameter space to get a finite number of grid points Enforce finite number of constraints: h z,a( j )i apple, j 2 m Equivalently, in the primal replace the atomic norm with a discrete one What happens to the solutions when

Convergence in Dual Assumption: there exist parameters such that are linearly independent Enforce finite constraints in the dual: h z,a( j )i apple, j 2 m Theorem: The discretized optimal objectives converge to the original objective Any solution sequence {ẑ m } of the discretized problems has a subsequence that converges to the solution set of the original problem For the LASSO dual, the convergence speed is O( m ) m =32.5.5 2.5 3 m =28.5 4 6 8 2.5 5.5 m =52.5.5 5 4 6 8 2 log.5 2 (m) log 2 ( fm f ) log 2 ( ẑm ẑ )

Convergence in Primal The original primal solution is associated with a measure ˆµ : ˆx = R a()ˆµ(d), kˆxk A = kˆµk TV The discretized primal solutions ˆx m are associated with measure ˆµ that.5. m are supported only on m Theorem The discretized optimal measures converge in distribution to an original optimal measure Fixed a neighborhood of the support of an original optimal measure, the supports of the discretized optimal measures will eventually converge to the neighborhood ˆx.5.5 m = 64 m = 256.5. m = 24.5.5.

Single Molecule Imaging Courtesy of Zhuang Research Lab

Single Molecule Imaging Bundles of 8 tubes of 3 nm diameter Sparse density: 849 molecules on 2 frames Resolution: 64x64 pixels Pixel size: nmxnm Field of view: 64nmx64nm Target resolution: nmxnm Discretize the FOV into 64x64 pixels I(x, y) = X j c j PSF(x x j,y y j ) (x j,y j ) 2 [, 64] 2 (x, y) 2 {5, 5,...,635}

Single Molecule Imaging

Single Molecule Imaging.8.8 Precision.6.4 Recall.6.4.2 Sparse CoG quickpalm 2 Radius 3 4 5.2 Sparse CoG quickpalm 2 Radius 3 4 5 8.8 Jaccard 6 4 F score.6.4 2 Sparse CoG quickpalm 2 3 4 5 Radius.2 Sparse CoG quickpalm 2 3 4 5 Radius

Summary and Future ` Work Unified approach for continuous problems in estimation. Obviates incoherence and basis mismatch New insight into bounds on estimation error and exact recovery through convex duality and algebraic geometry State-of-the-art results in classic fields of spectrum estimation and system identification Recovery of general sparse signal trains More general moment problems in signal processing

Acknowledgements Work developed with Venkat Chandrasekaran, Babak Hassibi (Caltech), Weiyu Xu (Iowa), Pablo A. Parrilo, Alan Willsky (MIT), Maryam Fazel (Washington) Badri Bhaskar, Rob Nowak, Nikhil Rao, Gongguo Tang (Wisconsin). For all references, see: http://pages.cs.wisc.edu/~brecht/publications.html

References Atomic norm denoising with applications to line spectral estimation. Badri Narayan Bhaskar, Gongguo Tang, and Benjamin Recht. IEEE Transactions on Signal Processing. Vol 6, no 23, pages 5987-5999. 23. Compressed sensing off the grid. Gongguo Tang, Badri Bhaskar, Parikshit Shah, and Benjamin Recht. IEEE Transactions on Information Theory. Vol 59, no, pages 7465-749. 23. Near Minimax Line Spectral Estimation. Gongguo Tang, Badri Narayan Bhaskar, and Benjamin Recht. Submitted to IEEE Transactions on Information Theory, 23. The convex geometry of inverse problems. Venkat Chandrasekaran, Benjamin Recht, Pablo Parrilo, and Alan Willsky. Foundations on Computational Mathematics. Vol. 2, no 6, pages 85-849. 22. All references can be found at http://pages.cs.wisc.edu/~brecht/publications.html