Sparse Parameter Estimation: Compressed Sensing meets Matrix Pencil Yuejie Chi Departments of ECE and BMI The Ohio State University Colorado School of Mines December 9, 24 Page
Acknowledgement Joint work with Yuxin Chen (Stanford) and Yuanxin Li (Ohio State). Y. Chen and Y. Chi, Robust Spectral Compressed Sensing via Structured Matrix Completion, IEEE Trans. Information Theory, http://arxiv. org/abs/34.826 Y. Li and Y. Chi, Off-the-Grid Line Spectrum Denoising and Estimation with Multiple Measurement Vectors, submitted, http://arxiv.org/abs/ 48.2242 Page 2
Sparse Fourier Analysis In many sensing applications, one is interested in identification of a parametric signal: x (t) = r d i e j2π t,f i, t n... n K i= (f i [, ) K frequencies, d i amplitudes, r model order) Occam s razor: the number of modes r is small. Sensing with a minimal cost: how to identify the parametric signal model from a small subset of entries of x(t)? This problem has many (classical) applications in communications, remote sensing, and array signal processing. Page 3
Applications in Communications and Sensing Multipath channels: a (relatively) small number of strong paths. Radar Target Identification: a (relatively) small number of strong scatters. Page 4
Applications in Imaging Swap time and frequency: r z (t) = diδ(t ti) i= Applications in microscopy imaging and astrophysics. Resolution is limited by the point spread function of the imaging system Page 5
x (t τ ) = Something Old: Parametric Estimation Exploring Physically-meaningful Constraints: shift invariance of the harmonic structure r d i e j2π t τ,f i = i= r d i e j2π τ,f i e j2π t,f i i= Prony s method: root-finding. SVD based approaches: ESPRIT [RoyKailath 989], MUSIC [Schmidt 986], matrix pencil [HuaSarkar 99, Hua 992]. spectrum blind sampling [Bresler 996], finite rate of innovation [Vetterli 2], Xampling [Eldar 2]. Pros: perfect recovery from (equi-spaced) O(r) samples Cons: sensitive to noise and outliers, usually require prior knowledge on the model order. Page 6
Something New: Compressed Sensing Exploring Sparsity: Compressed Sensing [Candes and Tao 26, Donoho 26] capture the attributes (sparsity) of signals from a small number of samples. Discretize the frequency and assume a sparse representation over the discretized basis f i F = { n,..., n n } { n 2,..., n 2 n 2 }... Pros: perfect recovery from O(r log n) random samples, robust against noise and outliers Cons: sensitive to gridding error Page 7
Sensitivity to Basis Mismatch A toy example: x(t) = e j2πf t : The success of CS relies on sparsity in the DFT basis. Basis mismatch: Physics places f off grid by a frequency offset. Basis mismatch translates a sparse signal into an incompressible signal. 2 4 Mismatch θ=.π/n 2 4 Mismatch θ=.5π/n 2 4 Mismatch θ=π/n 2 4 Normalized recovery error=.86 2 4 Normalized recovery error=.346 2 4 Normalized recovery error=.873 Finer grid does not help, and one never estimates the true continuousvalued frequencies! [Chi, Scharf, Pezeshki, Calderbank 2] Page 8
Our Approach Conventional approaches enforce physically-meaningful constraints, but not sparsity; Compressed sensing enforces sparsity, but not physically-meaningful constraints; Approach: We combine sparsity with physically-meaningful constraints, so that we can stably estimate the continuous-valued frequencies from a minimal number of time-domain samples. revisit matrix pencil proposed for array signal processing revitalize matrix pencil by combining it with convex optimization 5 5 2 25 3 35 5 5 2 25 3 35 Page 9
Two-Dimensional Frequency Model Stack the signal x (t) = r i= d i e j2π t,f i into a matrix X C n n 2. The matrix X has the following Vandermonde decomposition: X = Y Here, D = diag {d,, d r } and Y = y y 2 y r y n 2 y n r y n Vandemonde matrix D Z T. diagonal matrix, Z = z z 2 z r z n 2 2 z n 2 r z n 2 Vandemonde matrix where y i = exp(j2πf i ), z i = exp(j2πf 2i ), f i = (f i, f 2i ). Goal: We observe a random subset of entries of X, and wish to recover the missing entries. Page
Matrix Completion? recall that X = Y D ZT. Vandemonde diagonal Vandemonde where D = diag {d,, dr }, and Convex Relaxation y Y = n y y2 n y2 z, Z = n2 n z yr yr z2 n z2 2 n2 zr zr Quick observation: X can be a low-rank matrix with rank(x) = r. Question: can we apply Matrix Completion algorithms directly on X??? '????? () Σ? performance. Yes, but it yields sub-optimal?? r max{n at least?? decompose It requires, n2} samples.???? No, X is no longer low-rank if r > min (n, n2)? Note that r can be as large as nn2 * ' () * ' () = + Σ? Page H?? *
Revisiting Matrix Pencil: Matrix Enhancement Given a data matrix X, Hua proposed the following matrix enhancement for two-dimensional frequency models [Hua 992]: Choose two pencil parameters k and k 2 ; 5 5 2 25 3 35 5 5 2 25 3 35 An enhanced form X e is an k (n k + ) block Hankel matrix : X X X n k X X e = X 2 X n k +, X k X k X n where each block is a k 2 (n 2 k 2 + ) Hankel matrix as follows x l, x l, x l,n2 k 2 x X l = l, x l,2 x l,n2 k 2 +. x l,k2 x l,k2 x l,n2 Page 2
Low Rankness of the Enhanced Matrix Choose pencil parameters k = Θ(n ) and k 2 = Θ(n 2 ), the dimensionality of X e is proportional to n n 2 n n 2. The enhanced matrix can be decomposed as follows [Hua 992]: X e = Z L Z L Y d The Task Z L Y k d D [Z R, Y d Z R,, Y n k d Z R ], C = A * + B* Z L and Z R are Vandermonde matrices specified by z,..., z r, Y d = diag [y, y 2,, y r ]. The enhanced form X e is low-rank. rank (X e ) r Spectral Sparsity Low Rankness holds even with damping modes. Given Sparse Errors Matrix Low-rank Matrix Page 3
Enhanced Matrix Completion (EMaC) The natural algorithm is to find the enhanced matrix with the minimal rank satisfying the measurements: minimize M C n n 2 subject to rank (M e ) M i,j = X i,j, (i, j) Ω where Ω denotes the sampling set. Motivated by Matrix Completion, we will solve its convex relaxation, (EMaC) minimize M C n n 2 subject to M e M i,j = X i,j, (i, j) Ω where denotes the nuclear norm. The algorithm is referred to as Enhanced Matrix Completion (EMaC). Page 4
Enhanced Matrix Completion (EMaC) (EMaC) minimize M C n n 2 subject to M e M i,j = X i,j, (i, j) Ω existing MC result won t apply requires at least O(nr) samples Question: How many samples do we need???????????????????? Page 5
Introduce Coherence Measure Define the 2-D Dirichlet kernel: D(k, k 2, f, f 2 ) = ( e j2πkf k k 2 e j2πf ) ( e j2πk2f2 e j2πf 2 ), Define G L and G R as r r Gram matrices such that (G L ) i,l = D(k, k 2, f i f l, f 2i f 2l ), (G R ) i,l = D(n k +, n 2 k 2 +, f i f l, f 2i f 2l )..5.4.9.3.8 separation on y axis.2...2.7.6.5.4.3.3.2.4..5.5.4.3.2...2.3.4.5 separation on x axis Dirichlet Kernel Page 6
Introduce Incoherence Measure Incoherence condition holds w.r.t. µ if σ min (G L ) µ, σ min (G R ) µ. Examples: µ = Θ() under many scenarios: Randomly generated frequencies; (Mild) perturbation of grid points; In D, let k n 2 : well-separated frequencies (Liao and Fannjiang, 24): = min i j f i f j 2 n, which is about 2 times Rayleigh limits. Page 7
Theoretical Guarantees for Noiseless Case Theorem [Chen and Chi, 23] (Noiseless Samples) Let n = n n 2. If all measurements are noiseless, then EMaC recovers X perfectly with high probability if m > Cµr log 4 n. where C is some universal constant. Implications deterministic signal model, random observation; coherence condition µ only depends on the frequencies but not amplitudes. near-optimal within logarithmic factors: Θ(rpolylogn). general theoretical guarantees for Hankel (Toeplitz) matrix completion. see applications in control, MRI, natural language processing, etc Page 8
Phase Transition r: sparsity level 22 2 8 6 4 2 8 6 4 2.9.8.7.6.5.4.3.2. 2 4 6 8 2 4 6 8 2 m: number of samples Figure : Phase transition diagrams where spike locations are randomly generated. The results are shown for the case where n = n 2 = 5. Page 9
Robustness to Bounded Noise Assume the samples are noisy X = X o + N, where N is bounded noise: (EMaC-Noisy) minimize M C n n 2 M e subject to P Ω (M X) F δ, Theorem [Chen and Chi, 23] (Noisy Samples) Suppose X o is a noisy copy of X that satisfies P Ω (X X o ) F δ. Under the conditions of Theorem, the solution to EMaC-Noisy satisfies with probability exceeding n 2. ˆX e X e F {2 n + 8n + 8 2n 2 m } δ Implications: The average entry inaccuracy is bounded above by O( n m δ). In practice, EMaC-Noisy usually yields better estimate. Page 2
Singular Value Thresholding (Noisy Case) Several optimized solvers for Hankel matrix completion exist, for example [Fazel et. al. 23, Liu and Vandenberghe 29] Algorithm Singular Value Thresholding for EMaC : initialize Set M = X e and t =. 2: repeat 3: ) Q t D τt (M t ) (singular-value thresholding) 4: 2) M t Hankel X (Q t ) (projection onto a Hankel matrix consistent with observation) 5: 3) t t + 6: until convergence 25 2 True Signal Reconstructed Signal Amplitude 5 5 2 3 4 5 6 7 8 9 Time (vectorized) Figure 2: dimension:, r = 3, m n n 2 = 5.8%, SNR = db. Page 2
Robustness to Sparse Outliers What if a constant portion of measurements are arbitrarily corrupted? 8 6 4 2 real part 2 4 6 clean signal noisy samples 8 2 3 4 5 6 data index Robust PCA approach [Candes et. al. 2, Chandrasekaran et. al. 2] Solve the following algorithm: (RobustEMaC) minimize M,S C n n 2 subject to M e + λ S e (M + S) i,l = X corrupted i,l, (i, l) Ω Page 22
Theoretical Guarantees for Robust Recovery Theorem [Chen and Chi, 23] (Sparse Outliers) Set n = n n 2 and λ = m log n. Let the percent of corrupted entries s 2% selected uniformly at random, then RobustEMaC recovers X with high probability if m > Cµr 2 log 3 n, where C is some universal constant. Implications: slightly more samples m Θ(r 2 log 3 n); robust to a constant portion of outliers: s Θ(m); In summary, EMaC achieves robust recovery with respect to dense and sparse errors from a near-optimal number of samples. real part 8 6 4 2 2 4 6 clean signal recovered signal recovered corruptions 8 2 3 4 5 6 data index Page 23
Phase Transition for Line Spectrum Estimation Fix the amount of corruption as % of the total number of samples: 25.9 r: sparsity level 2 5 5.8.7.6.5.4.3.2. 2 4 6 8 m: number of samples Figure 3: Phase transition diagrams where spike locations are randomly generated. The results are shown for the case where n = 25. Page 24
Related Work Cadzow s denoising with full observation: non-convex heuristic to denoise line spectrum data based on the Hankel form. Atomic norm minimization with random observation: recently proposed by [Tang et. al., 23] for compressive line spectrum estimation off the grid. min s s A subject to P Ω (s) = P Ω (x), where the atomic norm is defined as x A = inf { i d i x(t) = i d i e j2πf it }. Random signal model: if the frequencies are separated by 4 times the Rayleigh limit and the phases are random, then perfect recovery with O(r log r log n) samples; no stability result with random observations; extendable to multi-dimensional frequencies [Chi and Chen, 23], but the SDP characterization is more complicated [Xu et. al. 23]. Page 25
(Numerical) Comparison with Atomic Norm Minimization Phase transition for D spectrum estimation: the phase transition for atomic norm minimization is very sensitive to the separation condition. The EMaC, in contrast, is insensitive to the separation. 4 4 4 35.9 35.9 35.9 3.8.7 3.8.7 3.8.7 r: sparsity level 25 2 5.6.5.4 r: sparsity level 25 2 5.6.5.4 r: sparsity level 25 2 5.6.5.4.3.3.3.2.2.2 5. 5. 5. 2 3 4 5 6 7 8 9 2 m: number of samples 2 3 4 5 6 7 8 9 2 m: number of samples 2 3 4 5 6 7 8 9 2 m: number of samples (a) (b) (c) Figure: phase transition for atomic norm minimization without separation (a), with separation (b); and EMaC without separation (c). The inclusion of separation doesn t change the phase transition of EMaC. Page 26
(Numerical) Comparison with Atomic Norm Minimization Phase transition for 2D spectrum estimation: the phase transition for atomic norm minimization is very sensitive to the separation condition. The EMaC, in contrast, is insensitive to the separation. Here the problem dimension n = n 2 = 8 is relatively small and the atomic norm minimization approach seems in favor despite of separation..9.9.9 9.8 9.8 9.8 8.7 8.7 8.7 sparsity level 7 6 5.6.5.4 sparsity level 7 6 5.6.5.4 sparsity level 7 6 5.6.5.4 4.3 4.3 4.3 3.2 3.2 3.2 2. 2. 2. 5 2 25 3 35 4 45 5 55 6 number of samples 5 2 25 3 35 4 45 5 55 6 number of samples 5 2 25 3 35 4 45 5 55 6 number of samples (a) (b) (c) Figure: phase transition for atomic norm minimization without separation (a), with separation (b); and EMaC without separation (c). The inclusion of separation doesn t change the phase transition of EMaC. Page 27
Extension to Multiple Measurement Vectors Model When multiple snapshots available, it is possible to exploit the covariance structure to reduce the number of sensors. Without loss of generality, consider D: x l (t) = r d i,l e j2πtf i, t {,,..., n } i= where x l = [x l (), x l (),..., x l (n )] T, l =,..., L. We assume the coefficients d i,l CN (, σi 2 ), then the covariance matrix Σ = E [x l x H l ] = toep(u) is a PSD block Toeplitz matrix with rank(σ) = r. The frequencies can be estimated without separation from u using l minimization with nonnegative constraints [Donoho and Tanner 25]. Page 28
Observation with the Sparse Ruler Observation pattern: Instead of random observations, we assume deterministic observation pattern Ω over a (minimum) sparse ruler for all snapshots: y l = x Ω,l = {x l (t), t Ω}, l =,..., L. Sparse ruler in D: for Ω {,..., n } Define the difference set: = { i j, i, j Ω} Ω is called a length-n sparse ruler if = {,..., n }. Examples: when n = 2, Ω = {,, 2, 6, 7, 8, 7, 2}. nested arrays, co-prime arrays [Pal and Vaidyanathan 2, 2] minimum sparse rulers [Redei and Renyi, 949] roughly Ω = O( n). Page 29
Covariance Estimation on Sparse Ruler Entries Consider the observation on Ω = {,, 2, 5, 8}, x l () E [y l y H l ] = E x l () x l (2) [x H l x l (5) () xh l () xh l (2) xh l (5) xh l (8)] x l (8) u u H u H 2 u H 5 u H 8 u u u H u H 4 u H 7 = u 2 u u u H 3 u H 6 u 5 u 4 u 3 u u H 3 u 8 u 7 u 6 u 3 u which gives the exact full covariance matrix Σ = toep(u) in the absence of noise and an infinite number of snapshots. Page 3
Two-step Structured Covariance Estimation In practice, measurements will be noisy with a finite number of snapshots: where w l CN (σ 2, I). Two-step covariance estimation: y l = x Ω,l + w l, l =,..., L, formulate the sample covariance matrix of y l : Σ Ω,L = L L y l y H l ; l= determine the Teoplitz covariance matrix with SDP: û = argmin u toep(u) 2 P Ω(toep(u)) Σ Ω,L 2 F + λtr(toep(u)), where λ is some regularization parameter. Page 3
Theoretical Guarantee Theorem [Li and Chi] Let u be the ground truth. Set λ C max r log(ln) L, r log(ln) Σ Ω L with Σ Ω = E [y my H m] for some constant C, then with probability at least L, the solution satisfies n û u F 6λ r if Ω is a complete sparse ruler. Remark: n û u F is small as soon as L O(r 2 log n); As the rank r (as large as n) can be larger than Ω (as small as n), this allows frequency estimation even the snapshots cannot be recovered. Page 32
Numerical Simulations The algorithm also applies to other observation patterns, e.g. random. Setting: n = 64, L = 4, Ω = 5, and r = 6. CS with DFT frame Correlation aware with DFT frame Ground truth 2.5.5 2.5.5 2.5.5 Atomic norm minimization Structured covariance estimation 2.5 2.5.5.5 Page 33
Final Remarks Sparse parameter estimation is possible leveraging shift-invariance structures embedded in matrix pencil with recent matrix completion techniques; Fundamental performance is determined by the proximity of the frequencies measured by the conditioning number of the Gram matrix formed by the sampling the Dirichlet kernel; Recovering more lines than the number of sensors is made possible by exploiting the second-order statistics; Future work: how to compare conventional algorithms with those of convex optimization? Page 34
Q&A Publications available on arxiv: Robust Spectral Compressed Sensing via Structured Matrix Completion, IEEE Trans. Information Theory, http://arxiv.org/abs/34.826 Off-the-Grid Line Spectrum Denoising and Estimation with Multiple Measurement Vectors, submitted, http://arxiv.org/abs/48.2242 Thank You! Questions? Page 35