Compressive Sensing under Matrix Uncertainties: An Approximate Message Passing Approach Asilomar 2011 Jason T. Parker (AFRL/RYAP) Philip Schniter (OSU) Volkan Cevher (EPFL)
Problem Statement Traditional Compressive Sensing (CS) addresses underdetermined linear regression y = Ax + w y, w C M ; A C M N ; x C N ; M < N More generally, consider an unknown matrix perturbation E C M N ) y = ( + E x + w }{{} unknown A We characterize A =  + E with entry-wise means and variances given by  mn = E{A mn } ν A mn = var{a mn } Distribution A, Approved for Public Release as 88 ABW-11-5962 2
Previous Work Notice that y = ( + E ) x + w = Âx + (Ex + w) }{{} signal dependent noise Standard CS performance analysis for bounded E [1; 2] LASSO Sparsity-Cognizant Total Least Squares [3] {ˆx S-TLS, Ê S-TLS} = arg min ( + E)x y 2 2 + λ E E 2 F + λ x 1 x,e Dantzig Selector Matrix Uncertain Selector [4] ˆx MU-Selector = arg min x 1 subject to  H y Âx λ x 1 + ɛ x Distribution A, Approved for Public Release as 88 ABW-11-5962 3
Generalized Approximate Message Passing Approximate Message Passing (AMP) [5] is derived from (approximate) belief propagation ˆx k+1 = η s ˆx k + A H z k, β k θ k, z k = y Aˆx k + b k z k 1 η s as soft-thresholding near minimax performance (robust) η s distribution specific approximate MMSE inference Generalized AMP (GAMP) [6; 7] MMSE or MAP estimates of x C N, p(x) = n p X(x n ) Arbitrary separable output channel from noiseless measurements z = Ax C M to y, p(y z) = m p Y Z(y m z m ) Handles variable A mn Provides approximate posteriors Distribution A, Approved for Public Release as 88 ABW-11-5962 4
Matrix Uncertain GAMP Recall noise-free measurements are z = Ax. For large N, the Central Limit Theorem motivates treating z m x n as Gaussian Using the zero mean quantities Ãmn Amn Âmn and x mn x mn ˆx mn, we can write X z m = (Âmn + Ãmn)xn + ˆxr + Âmr xr + Ãmr ˆxr + Ãmr xr) r n(âmr From which we can conclude E{z m x n} = Âmnxm + X r n  mr ˆx mr var{z m x n} = ν A mn x n 2 + X r n  2 mrν x mr + ν A mr ˆx mr 2 + ν A mrν x mr Terms in red modify the original GAMP variance calculation Distribution A, Approved for Public Release as 88 ABW-11-5962 5
MU-GAMP Algorithm Summary for t = 1, 2, 3,... m : ẑ m(t) = P N n=1 Âmnˆxn(t) (R1) m : νm(t) z = P N n=1 Âmn 2 νn(t) x (R2a) m : νm(t) p = νm(t) z + P N n=1 νa mn `νx n + ˆx n(t) 2 (R2b) m : ˆp m(t) = ẑ m(t) νm(t) z û m(t 1) (R3) m : û m(t) = g out(y m, ˆp m(t), νm(t)) p (R4) m : νm(t) u = g out(y m, ˆp m(t), νm(t)) p (R5) n : νn(t) r = ` P N n=1 Âmn 2 νm(t) 1 u (R6) n : ˆr n(t) = ˆx n(t) + νn(t) r P M m=1 Â mnû m(t) (R7) n : νn(t+1) x = νn(t)g r in(ˆr n(t), νj r (t)) (R8) n : ˆx n(t+1) = g in (ˆr n(t), νn(t)) r (R9) end Distribution A, Approved for Public Release as 88 ABW-11-5962 6
Independent Identically Distributed Matrix Errors Consider i.i.d. matrix errors with ν A mn = ν A. For additive noise, a CLT argument suggests that, for large N, we can well approximate p(y x) N (Âx, ν A x 2 2 + ν w ) Law of large numbers x 2 2 constant for large N Conclusion: i.i.d. matrix uncertainty can be addressed by tuning standard algorithms for large N Distribution A, Approved for Public Release as 88 ABW-11-5962 7
Phase Transition - i.i.d. Matrix Errors K/M 0.3 0.25 0.2 0.15 0.1 GAMP MU GAMP STLS MU Selector SPARSA 0.05 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 M/N N = 256; Â is i.i.d. N (0, 1); ν A = 0.05 x Bernoulli-Radamacher (±1 non-zero entries) Gaussian additive noise at 20 db SNR. Effective SNR is about 12 db LASSO (using SPARSA), STLS, and MU-Selector parameters use genie-aided tuning GAMP uses genie-aided computation of effective noise variance Curves show 15dB NMSE contours based on median from 100 trials Distribution A, Approved for Public Release as 88 ABW-11-5962 8
NMSE vs M/N for Sparse Matrix Errors NMSE [db] 0 5 10 15 20 25 30 GAMP MU GAMP STLS MU Selector SPARSA 35 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 M/N Same setup, except that the entries of E are now Bernoulli-Radamacher with 99% zeroes and ν A = 5 for the non-zeroes. MU-GAMP is given the true entries ν A mn while GAMP is given only the true effective noise variance. The solid lines are linear estimates given the true support of x using  (blue) and  + E (black) Naive versions of STLS and MU-Selector are used with genie-aided tuning. The parametric STLS or compensated MU-Selector would likely show improved performance. Distribution A, Approved for Public Release as 88 ABW-11-5962 9
Parametric MU-GAMP Parametric Model Consider Θ C P an unknown parameter vector y = A(Θ)x + w, We employ a first order 0 Taylor series expansion, similar to 1 parametric STLS [3] y @A( ˆΘ) + (Θ p ˆΘ p)e p( ˆΘ) A x + w E p( ˆΘ) A(α) α α= ˆΘ p ( ˆΘ, ˆν Θ ) Estimate x Dictionary: A( ˆΘ) Estimate Θ Dictionary: E pˆx, p = 1... P (ˆx, ˆν x ) Distribution A, Approved for Public Release as 88 ABW-11-5962 10
Parametric MU-GAMP: Compute ˆx Data Model 0 1 y @A( ˆΘ) + (Θ p ˆΘ p)e p( ˆΘ) A x + w E p( ˆΘ) A(α) α α= ˆΘ p First, assume we have an estimate of the parameter as ( ˆΘ, ν Θ ) We can immediately write y = Cx + w C A( ˆΘ) + (Θ p ˆΘ p)e p( ˆΘ) ĉ E{C} = A( ˆΘ) ν c var{c} = ν Θ p Ep 2, where squares on matrix terms are understood to be element-wise squared magnitudes. In addition, the mean and variance of the matrix are interpreted element-wise. We can use MU-GAMP to compute an estimate (ˆx, ν x ) from this model. Distribution A, Approved for Public Release as 88 ABW-11-5962 11
Parametric MU-GAMP: Compute ˆΘ Alternate form 0 1 @A( ˆΘ) + (Θ p ˆΘ p)e p( ˆΘ) A x = 0 1 0 1 0 1 @ Θ pe p( ˆΘ) A x + @A( ˆΘ) ˆΘ pe p( ˆΘ) A ˆx + @A( ˆΘ) ˆΘ pe p( ˆΘ) A x {z } {z } known constant zero-mean We can leverage this expression to obtain a linear model for Θ with a known dictionary B. u = BΘ + n We can estimate Θ from this model using MU-GAMP! 0 1 0 1 u y @A( ˆΘ) ˆΘ pe p( ˆΘ) A ˆx; n w + @A( ˆΘ) ˆΘ pe p( ˆΘ) A x 2 E{n} = 0; var{n} = ν w + A( ˆΘ) ˆΘ pe p( ˆΘ) ν x B ˆ E 1 ( ˆΘ)x E 2 ( ˆΘ)x... E P ( ˆΘ)x ˆ ˆb E{B} = E1 ( ˆΘ)ˆx E 2 ( ˆΘ)ˆx... E P ( ˆΘ)ˆx ν b var{b} = ˆ E 1 ( ˆΘ) 2 ν x E 2 ( ˆΘ) 2 ν x... E P ( ˆΘ) 2 ν x Distribution A, Approved for Public Release as 88 ABW-11-5962 12
Example 1: NMSE vs Parameter Dimension 12 14 In this toy example (N = 256, M = 103), A = A 0 + θ pe p NMSE (db) 16 18 20 22 24 MU GAMP Parametric MU GAMP 26 0 50 100 150 200 250 P A 0 and all the E p have entries that are drawn i.i.d. Gaussian. As we vary P, the entries of the E p matrices are scaled to keep E{ν A mn } = ν A ; m, n. MU-GAMP is given the true element-wise variances, but is unaware of the underlying Θ structure. Parametric MU-GAMP leverages this underlying structure to improve performance for small P Distribution A, Approved for Public Release as 88 ABW-11-5962 13
Example 2: Joint Calibration and Recovery 5 10 15 20 25 x estimation, NMSE (db) Parametric MU GAMP MU GAMP GENIE GAMP True A 30 0 5 10 15 20 25 30 iteration 5 10 15 20 25 θ estimation, NMSE (db) Parametric MU GAMP GENIE 30 0 5 10 15 20 25 30 iteration N = 256; M = 103; K = 20; P = 10 Each E p contains ones for M/P of the rows, and zeros elsewhere. This model is a surrogate for channel calibration errors in a measurement system. The unknown x is Bernoulli-Gaussian, while Θ is Gaussian. In this example, all signals are complex-valued. The GENIE result for x assumes perfect knowledge of Θ and vice-versa. The GENIE also knows the signal support when applicable. Distribution A, Approved for Public Release as 88 ABW-11-5962 14
Example 3: Blind Deconvolution 0 5 10 15 20 25 x estimation, NMSE (db) Parametric MU GAMP MU GAMP GENIE GAMP True A 30 0 10 20 30 40 50 60 70 80 90 100 iteration 0 5 10 15 θ estimation, NMSE (db) Parametric MU GAMP GENIE 20 0 10 20 30 40 50 60 70 80 90 100 iteration We consider here the model Y = ΨA(Θ)X A(Θ) C N N is circulant, and Θ C N represents perturbations to the first column. Y C M S ; X C N S Ψ C M N is a mixing matrix. N = 256; M = 103; K = 20; S = 8 Each E p represents the change to the system response for a given coefficient of the impulse response. This model is a surrogate for learning a system impulse response from S snapshots, where each signal is K sparse. The unknown x is complex Bernoulli-Gaussian, while Θ is complex Gaussian. GENIE estimators same as previous example Distribution A, Approved for Public Release as 88 ABW-11-5962 15
Conclusions and Future Work We have developed a Matrix Uncertain version of GAMP Adaptively adjusts noise power for i.i.d. matrix errors Incorporate element-wise variances for independent, non-identical errors Iterative approach for parametric matrix uncertainty Future Work MU-GAMP for spectral estimation Parametric MU-GAMP for dictionary learning and matrix completion Extension of rigorous GAMP performance analysis to MU-GAMP case. Incorporation of MU-GAMP into EM tuning approach to learn hyper-parameters from data Distribution A, Approved for Public Release as 88 ABW-11-5962 16
Bibliography [1] Y. Chi, A. Pezeshki, L. Scharf, and R. Calderbank, Sensitivity to basis mismatch in compressed sensing, in Proc. IEEE Int Acoustics Speech and Signal Processing (ICASSP) Conf, pp. 3930 3933, 2010. [2] M. A. Herman and T. Strohmer, General deviants: An analysis of perturbations in compressed sensing, IEEE Journal of Selected Topics in Signal Processing, vol. 4, no. 2, pp. 342 349, 2010. [3] H. Zhu, G. Leus, and G. Giannakis, Sparsity-cognizant total least-squares for perturbed compressive sampling, Signal Processing, IEEE Transactions on, vol. 59, pp. 2002 2016, May 2011. [4] M. Rosenbaum and A. Tsybakov, Sparse recovery under matrix uncertainty, The Annals of Statistics, vol. 38, no. 5, pp. 2620 2651, 2010. [5] D. Donoho, A. Maleki, and A. Montanari, Message-passing algorithms for compressed sensing, Proceedings of the National Academy of Sciences, vol. 106, no. 45, pp. 18914 18919, 2009. [6] S. Rangan, Generalized Approximate Message Passing for Estimation with Random Linear Mixing, Arxiv preprint arxiv:1010.5141, 2010. [7] P. Schniter, Turbo Reconstruction of Structured Sparse Signals, ICASSP, 2010. Distribution A, Approved for Public Release as 88 ABW-11-5962 17
Questions? Distribution A, Approved for Public Release as 88 ABW-11-5962 18