Combining Sparsity with Physically-Meaningful Constraints in Sparse Parameter Estimation

UIUC CSL Mar. 24 Combining Sparsity with Physically-Meaningful Constraints in Sparse Parameter Estimation Yuejie Chi Department of ECE and BMI Ohio State University Joint work with Yuxin Chen (Stanford). Page

Sparse Fourier Analysis In many sensing applications, one is interested in identification of a parametric signal: x (t) = r d i e j2π t,f i, t n... n K i= (f i [, ) K frequencies, d i amplitudes, r model order) Occam s razor: the number of modes r is small. Sensing with a minimal cost: how to identify the parametric signal model from a small subset of samples of x(t) This problem has many (classical) applications in communications, remote sensing, and array signal processing. Page 2

Applications in Communications and Sensing Multipath channels: a (relatively) small number of strong paths. Radar Target Identification: a (relatively) small number of strong scatters. Page 3

Applications in Imaging Swap time and frequency: r z (t) = diδ(t ti) i= Applications in microscopy imaging and astrophysics. Resolution is limited by the point spread function of the imaging system Page 4

x (t τ ) = Something Old: Parametric Estimation Exploring Physically-meaningful Constraints: shift invariance of the harmonic structure r d i e j2π t τ,f i = i= r d i e j2π τ,f i e j2π t,f i i= Prony s method: root-finding. SVD based approaches: ESPRIT [RoyKailath 989], MUSIC [Schmidt 986], matrix pencil [HuaSarkar 99, Hua 992]. spectrum blind sampling [Bresler 996], finite rate of innovation [Vitterli 2], Xampling [Eldar 2]. Pros: perfect recovery from (equi-spaced) O(r) samples Cons: sensitive to noise and outliers, usually require prior knowledge on the model order. Page 5

Something New: Compressed Sensing Exploring Sparsity: Compressed Sensing [Candes and Tao 26, Donoho 26] capture the attributes (sparsity) of signals from a small number of samples. Discretize the frequency and assume a sparse representation over the discretized basis f i F = { n,..., n n } { n 2,..., n 2 n 2 }... Pros: perfect recovery from O(r log n) random samples, robust against noise and outliers Cons: sensitive to gridding error Page 6

Sensitivity to Basis Mismatch A toy example: x(t) = e j2πf t : The success of CS relies on sparsity in the DFT basis. Basis mismatch: Physics places f off grid by a frequency offset. Basis mismatch translates a sparse signal into an incompressible signal. 2 4 Mismatch θ=.π/n 2 4 Mismatch θ=.5π/n 2 4 Mismatch θ=π/n 2 4 Normalized recovery error=.86 2 4 Normalized recovery error=.346 2 4 Normalized recovery error=.873 Finer grid does not help, and one never estimates the true continuousvalued frequencies! [Chi, Scharf, Pezeshki, Calderbank 2] Page 7

Our Approach Conventional approaches enforce physically-meaningful constraints, but not sparsity; Compressed sensing enforces sparsity, but not physically-meaningful constraints; Approach: We combine sparsity with physically-meaningful constraints, so that we can stably estimate the continuous-valued frequencies from a minimal number of time-domain samples. revisit matrix pencil proposed for array signal processing revitalize matrix pencil by combining it with convex optimization 5 5 2 25 3 35 5 5 2 25 3 35 Page 8

Two-Dimensional Frequency Model Stack the signal x (t) = r i= d i e j2π t,f i into a matrix X C n n 2. The matrix X has the following Vandermonde decomposition: X = Y Here, D = diag {d,, d r } and Y = y y 2 y r y n 2 y n r y n Vandemonde matrix D Z T. diagonal matrix, Z = z z 2 z r z n 2 2 z n 2 r z n 2 Vandemonde matrix where y i = exp(j2πf i ), z i = exp(j2πf 2i ), f i = (f i, f 2i ). Goal: We observe a random subset of entries of X, and wish to recover the missing entries. Page 9

Matrix Completion recall that X = Y D ZT. Vandemonde diagonal Vandemonde where D = diag {d,, dr }, and Convex Relaxation y Y = n y y2 n y2 z, Z = n2 n z yr yr z2 n z2 2 n2 zr zr Quick observation: X can be a low-rank matrix with rank(x) = r. Question: can we apply Matrix Completion algorithms directly on X ' () Σ performance. Yes, but it yields sub-optimal r max{n at least decompose It requires, n2}samples. No, X is no longer low-rank if r > min (n, n2) Note that r can be as large as nn2 * ' () * ' () = + Σ Page H *

Revisiting Matrix Pencil: Matrix Enhancement Given a data matrix X, Hua proposed the following matrix enhancement for two-dimensional frequency models [Hua 992]: Choose two pencil parameters k and k 2 ; 5 5 2 25 3 35 5 5 2 25 3 35 An enhanced form X e is an k (n k + ) block Hankel matrix : X X X n k X X e = X 2 X n k +, X k X k X n where each block is a k 2 (n 2 k 2 + ) Hankel matrix as follows x l, x l, x l,n2 k 2 x X l = l, x l,2 x l,n2 k 2 +. x l,k2 x l,k2 x l,n2 Page

Low Rankness of the Enhanced Matrix Choose pencil parameters k = Θ(n ) and k 2 = Θ(n 2 ), the dimensionality of X e is proportional to n n 2 n n 2. The enhanced matrix can be decomposed as follows [Hua 992]: X e = Z L Z L Y d The Task Z L Y k d D [Z R, Y d Z R,, Y n k d Z R ], C = A * + B* Z L and Z R are Vandermonde matrices specified by z,..., z r, Y d = diag [y, y 2,, y r ]. The enhanced form X e is low-rank. rank (X e ) r Spectral Sparsity Low Rankness holds even with damping modes. Given Sparse Errors Matrix Low-rank Matrix Page 2

Enhanced Matrix Completion (EMaC) The natural algorithm is to find the enhanced matrix with the minimal rank satisfying the measurements: minimize M C n n 2 subject to rank (M e ) M i,j = X i,j, (i, j) Ω where Ω denotes the sampling set. Motivated by Matrix Completion, we will solve its convex relaxation, (EMaC) minimize M C n n 2 subject to M e M i,j = X i,j, (i, j) Ω where denotes the nuclear norm. The algorithm is referred to as Enhanced Matrix Completion (EMaC). Page 3

Enhanced Matrix Completion (EMaC) (EMaC) minimize n n M C 2 subject to M e M i,j = X i,j, (i, j) Ω Convex Relaxation existing MC result won t apply requires at least O(nr) samples Question: How many samples do we need ' () Σ * decompose = ' () Σ Page 4 *

Introduce Coherence Measure Define the 2-D Dirichlet kernel: K(k, k 2, f, f 2 ) = ( e j2πkf k k 2 e j2πf ) ( e j2πk2f2 e j2πf 2 ), Define G L and G R as r r Gram matrices such that (G L ) i,l = K(k, k 2, f i f l, f 2i f 2l ), (G R ) i,l = K(n k +, n 2 k 2 +, f i f l, f 2i f 2l )..5.4.9.3.8 separation on y axis.2...2.7.6.5.4.3.3.2.4..5.5.4.3.2...2.3.4.5 separation on x axis Dirichlet Kernel Page 5

Introduce Incoherence Measure Incoherence condition holds w.r.t. µ if σ min (G L ) µ, σ min (G R ) µ. Examples: µ = Θ() under many scenarios: Randomly generated frequencies; (Mild) perturbation of grid points; In D, let k n 2 : well-separated frequencies (Liao and Fannjiang, 24): = min i j f i f j 2 n, which is about 2 times Rayleigh limits. Page 6

Theoretical Guarantees for Noiseless Case Theorem [Chen and Chi, 23] (Noiseless Samples) Let n = n n 2. If all measurements are noiseless, then EMaC recovers X perfectly with high probability if m > Cµr log 4 n. where C is some universal constant. Implications deterministic signal model, random observation; coherence condition µ only depends on the frequencies but the amplitudes. near-optimal within logarithmic factors: Θ(rpolylogn). general theoretical guarantees for Hankel (Toeplitz) matrix completion. see applications in control, MRI, natural language processing, etc Page 7

Phase Transition r: sparsity level 22 2 8 6 4 2 8 6 4 2.9.8.7.6.5.4.3.2. 2 4 6 8 2 4 6 8 2 m: number of samples Figure : Phase transition diagrams where spike locations are randomly generated. The results are shown for the case where n = n 2 = 5. Page 8

Robustness to Bounded Noise Assume the samples are noisy X = X o + N, where N is bounded noise: (EMaC-Noisy) minimize M C n n 2 M e subject to P Ω (M X) F δ, Theorem 2 [Chen and Chi, 23] (Noisy Samples) Suppose X o is a noisy copy of X that satisfies P Ω (X X o ) F δ. Under the conditions of Theorem, the solution to EMaC-Noisy satisfies with probability exceeding n 2. ˆX e X e F {2 n + 8n + 8 2n 2 m } δ Implications: The average entry inaccuracy is bounded above by O( n m δ). In practice, EMaC-Noisy usually yields better estimate. Page 9

Singular Value Thresholding (Noisy Case) Several optimized solvers for Hankel matrix completion exist, for example [Fazel et. al. 23, Liu and Vandenberghe 29] Algorithm Singular Value Thresholding for EMaC : initialize Set M = X e and t =. 2: repeat 3: ) Q t D τt (M t ) (singular-value thresholding) 4: 2) M t Hankel X (Q t ) (projection onto a Hankel matrix consistent with observation) 5: 3) t t + 6: until convergence 25 2 True Signal Reconstructed Signal Amplitude 5 5 2 3 4 5 6 7 8 9 Time (vectorized) Figure 2: dimension:, r = 3, m n n 2 = 5.8%, SNR = db. Page 2

Robustness to Sparse Outliers What if a constant portion of measurements are arbitrarily corrupted 8 6 4 2 real part 2 4 6 clean signal noisy samples 8 2 3 4 5 6 data index Robust PCA approach [Candes et. al. 2, Chandrasekaran et. al. 2] Solve the following algorithm: (RobustEMaC) minimize M,S C n n 2 subject to M e + λ S e (M + S) i,l = X corrupted i,l, (i, l) Ω Page 2

Theoretical Guarantees for Robust Recovery Assume the percent of corrupted entries is s. Theorem 3 [Chen and Chi, 23] (Sparse Outliers) Set n = n n 2 and λ = m log n. Let s 2%, then RobustEMaC recovers X with high probability if m > Cµr 2 log 3 n, where C is some universal constant. Implications: Sample complexity: m Θ(r 2 log 3 n), slight loss than the previous case; Robust to a constant portion of outliers: s Θ(m) Page 22

Robustness to Sparse Corruptions 8 8 6 6 4 4 2 2 real part real part 2 2 4 4 6 clean signal noisy samples 8 2 3 4 5 6 data index (a) Observation 6 clean signal recovered signal recovered corruptions 8 2 3 4 5 6 data index (b) Recovery Figure 3: Robustness to sparse corruptions: (a) Clean signal and its corrupted subsampled samples; (b) recovered signal and the sparse corruptions. Page 23

Phase Transition for Line Spectrum Estimation Fix the amount of corruption as % of the total number of samples: 25.9 r: sparsity level 2 5 5.8.7.6.5.4.3.2. 2 4 6 8 m: number of samples Figure 4: Phase transition diagrams where spike locations are randomly generated. The results are shown for the case where n = 25. Page 24

Related Work Cadzow s denoising: non-convex heuristic to denoise line spectrum data based on the Hankel form. Atomic norm minimization: recently proposed by [Tang et. compressive line spectrum estimation off the grid. min s s A subject to P Ω (s) = P Ω (x), al., 23] for where the atomic norm is defined as x A = inf { i d i x(t) = i d i e j2πf it }. Random signal model: if the frequencies are separated by 4 times the Rayleigh limit and the phases are random, then perfect recovery with O(r log r log n) samples; no stability result with random observations or sparse corruptions; extendable to multi-dimensional frequencies [Chi and Chen, 23], but the SDP characterization is more complicated [Xu et. al. 23]. Page 25

Comparison with Atomic Norm Minimization Phase transition for line spectrum estimation: numerically, the EMaC approach seems less sensitive to the separation condition. EMaC Atomic Norm 4 4 35.9 35.9.8.8 without separation r: sparsity level 3 25 2 5.7.6.5.4 r: sparsity level 3 25 2 5.7.6.5.4.3.3.2.2 5. 5. 2 3 4 5 6 7 8 9 2 m: number of samples 2 3 4 5 6 7 8 9 2 m: number of samples 4 4 35.9 35.9.8.8 3 3 with separation r: sparsity level 25 2 5.7.6.5.4 r: sparsity level 25 2 5.7.6.5.4.3.3.2.2 5. 5. 2 3 4 5 6 7 8 9 2 m: number of samples 2 3 4 5 6 7 8 9 2 m: number of samples Page 26

Super Resolution (2-D) Super resolution: EMaC can also be applied to super resolution, where one observes low-end spectrum and extrapolate to higher frequencies [Candes and Fernandez-Granda, 22]. 2 3 4 5 theoretical guarantee remains an open problem 6 7 8 2 3 4 5 6 7 8.9.9.8.7.8.7 amplitude.5.2.4.6.8.8.6.4.2.6.5.4.3.2...2.3.4.5.6.7.8.9.6.5.4.3.2...2.3.4.5.6.7.8.9 (a) Ground Truth (b) Low Resolution Image (c) Super-Resolution via EMaC Page 27

Final Remarks Convex Relaxation Connect spectral compressed sensing with matrix completion ' () Σ decompose = * Connect traditional approach (parametric harmonic retrieval) with recent advance (MC) ' Applying Taylor expansion: 5 5 2 25 3 35 5 5 2 25 3 35 Σno = Σno K!"#$ Σno + sparse matrix Treat W as noise (BUT WHY super W resolution small Future work: performance guarantees for 2-D n smal H K H and Σ Σ Page 28 Page 5

Q&A Preprints available at arxiv: Robust Spectral Compressed Sensing via Structured Matrix Completion http://arxiv.org/abs/34.826 Thank You! Questions Page 29