Approximate Message Passing with Built-in Parameter Estimation for Sparse Signal Recovery

Approimate Message Passing with Built-in Parameter Estimation for Sparse Signal Recovery arxiv:1606.00901v1 [cs.it] Jun 016 Shuai Huang, Trac D. Tran Department of Electrical and Computer Engineering Johns Hopkins University, Baltimore, MD, USA {shuaihuang, trac}@jhu.edu Abstract The approimate message passing AMP) algorithm shows advantage over conventional conve optimization methods in recovering under-sampled sparse signals. AMP is analytically tractable and has a much lower compleity. However, it requires that the true parameters of the input and output channels are known. In this paper, we propose an AMP algorithm with built-in parameter estimation that jointly estimates the sparse signals along with the parameters by treating them as unknown random variables with simple priors. Specifically, the maimum a posterior MAP) parameter estimation is presented and shown to produce estimations that converge to the true parameter values. Eperiments on sparse signal recovery show that the performance of the proposed approach matches that of the oracle AMP algorithm where the true parameter values are known. 1 Introduction Sparse signal recovery SSR) is the key topic in Compressive Sensing CS) [1 4], it lays the foundation for applications such as dictionary learning [5], sparse representation-based classification [6], etc. Specifically, SSR tries to recover the sparse signal R N given a M N sensing matri A and a measurement vector y = A + w R M, where M < N and w R M is the unknown noise introduced in this process. Although the problem itself is ill-posed, perfect recovery is still possible provided that is sufficiently sparse and A is incoherent enough [1]. Lasso [7], a.k.a l 1 -minimization, is one of most popular approaches proposed to solve this problem: arg min y A + γ 1, 1) where y A is the data-fidelity term, 1 is the sparsity-promoting term, and γ balances the trade-off between them. The authors would like to thank NSF-CCF-1117545, NSF-CCF-14995 and NSF-EECS-1443936 for funding. 1

p j λ) R N z R M y R M py i z i, θ) A M N Separable Input channel Linear transformation matri Separable Output channel Figure 1: A probabilistic view of the sparse signal recovery [8]: The signal is estimated given the output vector y, the channel transition probability functions p j λ), py i z i, θ) and the transformation matri A. {λ, θ} denote the parameters of the probability models and are usually unknown. From a probabilistic view, Lasso is equivalent to a maimum likelihood ML) estimation of the signal under the assumption that the entries of are i.i.d. distributed following the Laplace distribution p j ) ep λ j ), and those of w are i.i.d. distributed following the Gaussian distribution pw i ) ep w i/θ). Let z = A, we have py i ) ep y i z i ) /θ). The ML estimation is then arg ma p, y), which is essentially the same as 1). In general, SSR can be described by the Bayesian model from [8], as is shown in Fig. 1. Under the Bayesian setting it is possible to design efficient iterative algorithms to compute either the maimum a posterior MAP) or minimum mean square error MMSE) estimate of the signal. Most notable among them is the approimate message passing AMP) [9 11]. The AMP algorithm performs probabilistic inferences on the corresponding factor graph using Gaussian and quadratic approimations of loopy belief propagation loopy BP), a.k.a. message passing [1]. Based on different inference tasks, loopy BP has two variants: sumproduct message passing for the MMSE estimate of and ma-sum message passing for the MAP estimate of [8]. Although AMP is analytically tractable and has low compleity [9 11], it requires the parameters {λ, θ} in the input and output channels are known eactly, which is often not satisfied in practice. Various methods have been proposed to include parameter estimation in the AMP algorithm using Epectation-Maimization [13, 14] and adaptive methods [15]. In this paper, we propose an etension to the generalized approimate message passing GAMP) [8] by treating the parameters {λ, θ} as unknown random variables with prior distributions and estimating them jointly with the signal. Using the sum-product GAMP as an eample, we give the message passing updates between the factor nodes and the variable nodes, which serves as the basis to write the state evolution equations of the GAMP algorithms. The MAP parameter estimation can then be derived by maimizing the approimated posterior marginals pλ y) and pθ y). Following the analysis in [15], we can show that the estimated parameter values from the MAP estimation also converge to the true parameter values if they are computed eactly. To compare the proposed GAMP with built-in parameter estimation PE-GAMP) with oracle-gamp where the true parameters are known, we run sparse signal recovery eperiments with Bernoulli-Gaussian BG) input channel and additive white Gaussian noise AWGN) output channel. The eperiments show that the performance of PE-GAMP does match that of oracle-gamp.

GAMP with Built-in Parameter Estimation The generalized factor graph for the proposed PE-GAMP that treats the parameters as random variables is shown in Fig.. Here we adopt the same notations used by [8]. Take the messages being passed between the factor node Φ m and the variable node n for eample, Φm n is the message from Φ m to n, and Φm n is the message from n to Φ m. Both Φm n and Φm n can be viewed as functions of n. In the following section.1 and., we give the messages being passed on the generalized factor graph in log domain for the sum-product message passing algorithm and the ma-sum message passing algorithm respectively..1 Sum-product Message Passing In the following, we first present the sum-product message passing in the t + 1)-th iteration. Φ m n = const + log Φ m y m,, θ) ep j n t) Φ m j + ) v t) Φ m θ v ) \ n,θ Φ m n = const + t) Ω n n + i m t) Φ i n 3) ) Ω n n = const + log Ω n n, λ) ep u t) Ω n λ u 4) λ Ω n n = const + i t) Φ i n, 5) where Φ m y m,, θ) = py m, θ) and Ω n n, λ) = p n λ). Let Γ n ) denote the factor nodes in the neighborhood of n. The posterior marginal of n is: p n y) ep t) n = ep t) Ω n n + ) Φ m Γ n) t) Φ m n. 6) Using p y), the MMSE estimate of can then be computed: ˆ n = E [ n y] = n p n y). 7) n Similarly, we can write the message passing updates involving λ, θ as follows: Ω n λ l = const + log Ω n n, λ) ep t) Ω n n + ) u l t) Ω n λ u n,λ\λ l Ω n λ l = const + j n t) Ω j λ l + log pλ l ) 9) Φ m θ k = const + log Φ m y m,, θ) ep j t) Φ m j + ) v k t) Φ m θ v 10) θ\θ k, Φ m θ k = const + i m t) Φ i θ k + log pθ k ), 11) where pλ l ), pθ k ) are the pre-specified prior on the parameters. In general, if we don t have any knowledge about how the parameters are distributed, we can fairly assume a uniform 8) 3

Ω 1 1 Φ 1 Λ 1 λ 1.. Λ L λ L Ω.. Φ. θ 1. θ K Θ 1. Θ K N Φ M Ω N Figure : The factor graph for the proposed PE-GAMP. represents the factor node, and represents the variable node. λ = {λ 1,, λ L } and θ = {θ 1,, θ K } are the parameters. = [ 1,, N ] T is the sparse signal. prior and treat pλ l ), pθ k ) as constants. Additionally, we also have the following posterior marginals of λ l, θ k : pλ l y) ep t) λ l = ep log pλ l ) + ) Ω n Γλ l ) t) Ω n λ l 1) pθ k y) ep t) θ k = ep log pθ k ) + ) Φ m Γθ k ) t) Φ m θ k. 13). Ma-sum Message Passing For the ma-sum message passing, the message updates from the variable nodes to the factor nodes are the same as the aforementioned sum-product message passing, i.e. 3, 5, 9, 11). We only need to change the message updates from the factor nodes to the variable nodes by replacing with ma: Φ m n = ma \ n,θ log Φ m y m,, θ) + j n t) Φ m j + v t) Φ m θ v 14) Ω n n = ma log Ω n n, λ) + u λ t) Ω n λ u 15) Ω n λ l = ma log Ω n n, λ) + t) Ω n n + u l t) Ω n λ u 16) n,λ\λ l Φ m θ k = ma θ\θ k, log Φ m y m,, θ) + j t) Φ m j + v k t) Φ m θ v. 17) The MAP estimate of is then: ˆ n = arg ma t) n n = arg ma n t) Ω n n + Φ m Γ n) t) Φ m n. 18) 4

3 MAP Parameter Estimation for Sum-product Message Passing In this paper we focus on using the sum-product message passing algorithm for sparse signal recovery. Take ) in sum-product message passing for eample, it contains integrations with respect to the parameters θ, and they might be difficult to compute. However, the posterior marginals of θ can be directly estimated in 1). Instead of doing a complete integration w.r.t θ k, we can use the following scalar MAP estimate of θ k to simplify the message updates: ˆθ t) k = arg ma θk pθ k y) = arg ma θk t) θ k. 19) Similarly, we also have the MAP estimate of λ l as follows: ˆλ t) l = arg ma λl pλ l y) = arg ma λl t) λ l. 0) The GAMP algorithm with MAP parameter estimation can be summarized in Algorithm 1. Algorithmus 1 Sum-product GAMP with MAP parameter estimation Require: y, A, pλ), p λ), pθ), pw θ) 1: Initialize λ 0), θ 0) : for t = {1,, } do 3: Perform GAMP [8]; t) 4: Compute ˆλ l for l = 1,, L; t) 5: Compute ˆθ k for k = 1,, K; 6: Compute the MMSE estimate ˆ t) = E[ y]; 7: if ˆ t) reaches convergence then 8: ˆ = ˆ t) ; 9: break; 10: end if 11: end for 1: return Output ˆ; Discussion: The EM-GAMP [13, 14] and the adaptive-gamp [15] are both ML estimations. Specifically, EM-GAMP tries to maimize E[log p, w; λ, θ) y, λ t), θ t) ] iteratively using the Epectation-Maimization EM) algorithm [16]; adaptive-gamp tries to maimize the log-likelihood of two new random variables r, p introduced in the original GAMP: log pr λ), log py, p θ). For the proposed MAP parameter estimation, we have the following using Bayes rule: pλ y) py λ)pλ) pθ y) py θ)pθ). 1a) 1b) If the priors pλ), pθ) are chosen to be uniform distributions, the MAP estimation and ML estimation would have the same solutions since they were both maimizing py λ) and py θ). 5

For the state evolution analysis, we make the same assumptions as [8, 11, 15] about the PE-GAMP algorithm. In general, the estimation problem is indeed by the dimensionality N of, and the output dimensionality M satisfies lim N N/M = β for some β > 0. Specifically, 19,0) can be viewed as different adaptation functions defined in [15]. The empirical convergence of, λ, θ can then be obtained directly from the conclusion of Theorem in [15]. Following the analysis in [15], we also assume the MAP adaptation functions arg ma λ pλ y), arg ma θ pθ y) satisfy the week pseudo-lipschitz continuity property [11] over y. Using Theorem 3 in [15], we can see that the MAP estimations converge to the true parameters as N when they are computed eactly. 3.1 MAP Estimation of λ, θ We net show how to compute the MAP estimates in 19,0). For clarification purposes, we will remove the superscript t) from the notations. Starting with some initial solutions λ l 0), θ k 0), we will maimize the quadratic approimations of λl, θk iteratively. Take λl for eample, it is approimated as follows in the h + 1)-th iteration: λl λl h) + λ l h) λ l λ l h)) + 1 λ l h) λ l λ l h)), ) where λ l h) is the solution in the h-th iteration. If λ l h) λ l h + 1) using Newton s method: < 0, we can get the following λ l h + 1) = λ l h) λ l h). 3) λ l h) If λ l h) > 0, Newton s method will produce local minimum solution. In this case we will use the line search method. This iterative approach can be summarized in Algorithm. The MAP estimation θ k can also be computed in a similar way. Algorithmus MAP parameter estimation Require: λl 1: Initialize λ l 0) : for h = {0, 1, } do 3: Compute λ l h), λ l h) ; 4: if λ l h) < 0 then λ lh + 1) = λ l h) λ l h) λ l h) end if; 5: if λ l h) 0 then Perform line search for λ lh + 1) end if; 6: if λ l h + 1) reaches convergence then 7: λ l = λ l h + 1); 8: break; 9: end if 10: end for 11: return Output λ l ; 6

3. MAP estimation for Special Input and Output Channels We then give the following three eamples on how to perform MAP estimation of the parameters in the input and output channels. 3..1 Bernoulli-Gaussian Input Channel In this case, the sparse signal can be modeled as a miture of Bernoulli and Gaussian distribution: p j λ) = 1 λ 1 )δ j ) + λ 1 N j ; λ, λ 3 ), 4) where δ ) is Dirac delta function, λ 1 [0, 1] is the sparsity rate, λ R is the nonzero coefficient mean and λ 3 R + is the nonzero coefficient variance. The AMP algorithm [8,9] uses quadratic approimation of the loopy BP. In the t+1)-th iteration, we have the following from [8]: 1 Ωn n τω n t) n ) + const, 5) n n where τω n n, t) n correspond to the variance and mean of n in the t-th iteration respectively. Take λ 1 for eample, Ω n λ 1 can be computed as follows: c n ) = ep t) n τω n n ) d n ) = τ Ωn n λ t) 3 + τ Ω n n ep λt) t) n ) λ t) 3 + τω n n ) 6) 7) Ω n λ 1 = log 1 λ 1 )c n ) + λ 1 d n )) + const, 8) where λ t), λ t) 3 are the MAP estimations in the t-th iteration. λ 1 can then be computed accordingly for the MAP estimation of λ t+1) 1 in the t + 1)-th iteration. λ t+1), λ t+1) 3 can be updated similarly. 3.. Laplace Input Channel Laplace input channel assumes the signal follows the Laplace distribution: p j λ) = λ 1 ep λ 1 j ) 9) where λ 1 0, ). We can compute Ω n λ 1 similarly using the quadratic approimation of Ωn n from [8]: ) c n ) = λ )) 1 τ ep Ωn n λ 1 t) n λ 1 τ Ωn n π 1 erf t) n λ 1 τω n n 30) τ Ωn n 7

) d n ) = λ 1 τ ep Ωn n λ 1 + n t) λ 1 τ Ωn n π 1 + erf t) n )) + λ 1 τω n n τ Ωn n Ω n λ 1 = logc n ) + d n )) + const, 3) where erf ) is the error function. λ 1 can then be computed accordingly for the MAP estimation of λ t+1) 1 in the t + 1)-th iteration. 3..3 Additive White Gaussian Noise Output Channel AWGN output channel assumes the noise w is white Gaussian noise: 31) pw i θ) = N w i ; 0, θ 1 ), 33) where θ 1 R + is the noise variance. Using the quadratic approimation from [8], we can get: where τ p Φ m, p t) Φ m iteration respectively. θ 1 Φ m θ 1 = 1 log ) θ 1 + τ p 1 Φ m y m p t) Φ m ) θ 1 + τ p + const, 34) Φ m ) are from [8] and correspond to the variance and mean of z m in the t-th θ t+1) 1 in the t + 1)-th iteration. 4 Numerical Eperiments can then be computed accordingly for the MAP estimation of In this section, we use the proposed GAMP algorithm with built-in parameter estimation PE-GAMP) to recover sparse signal from the under-sampled measurement y = A + w. Specifically, we will use BG input channel and AWGN output channel to generate, y. Since we don t have any knowledge about the priors of λ, θ, we will fairly choose the uniform prior for each parameter. The recovered signal ˆ is then computed using the MMSE estimator E [ y] from sum-product message passing, and we will compare the recovery performance with the oracle-gamp that knows the true parameters. 4.1 Noiseless Sparse Signal Recovery We first perform noiseless sparse signal recovery eperiments and draw the empirical phase transition curves PTC) of PE-GAMP, oracle-gamp. We fi N = 1000 and vary the oversampling ratio σ = M N [0.05, 0.1, 0.15,, 0.95] and the under-sampling ratio ρ = S M [0.05, 0.1, 0.15,, 0.95], where S is the sparsity of the signal, i.e. the number of nonzero coefficients. For each combination of σ and ρ, we randomly generate 100 pairs of {, A}. The entries of A and the nonzero entries of are i.i.d. Gaussian N 0, 1). Given the measurement vector y = A and the sensing matri A, we try to recover the sparse signal. If ɛ = ˆ / < 10 3, the recovery is considered to be a success. Based on the 100 trials, we compute the success rate for each combination of σ and ρ. 8

0.95 0.8 0.0 0.85 0.75 ρ = S M 0.6 0.4 0.15 0.10 ρ = S M 0.65 0.55 0.45 0.35 0.05 0.5 0. 0.00 0.15 0.05 PE GAMP orcale GAMP theoretical Lasso 0. 0.4 0.6 0.8 σ = M N 0.05 0.15 0.5 0.35 0.45 0.55 0.65 0.75 0.85 0.95 σ = M N a) b) Figure 3: a) The absolute difference of the success rates between PE-GAMP and oracle- GAMP; b)the phase transition curves of the PE-GAMP, oracle-gamp and theoretical Lasso. The absolute difference of the success rates between PE-GAMP and oracle-gamp are shown in Fig. 3a). The PTC curve is the contour that correspond to the 0.5 success rate in the domain σ, ρ) 0, 1), it divides the domain into a success phase lower right) and a failure phase upper left). The PTC curves of the two GAMP methods along with the theoretic Lasso [9] are shown in Fig. 3b). We can see from Fig. 3 that the performance of PE-GAMP generally matches that of the oracle-gamp. PE-GAMP is able to estimate the parameters fairly well while recovering the sparse signals. 4. Noisy Sparse Signal Recovery We net try to recover the sparse signal from a noisy measurement vector y. In this case, we would like to see how the two algorithms behave when an increasing amount of noise is added to the measurement. Specifically, S = 100, M = 500, N = 1000 are fied, and y is generated as follows: y = A + νw, 35) where we choose ν {0.1, 0.,, 1}. For each ν, we randomly generate 100 triples of {, A, w}. The entries of A, w and the nonzero entries of are i.i.d Gaussian N 0, 1). ɛ = ˆ / is used to evaluate the performances of the algorithms. The mean ± standard deviation of the ɛs from the 100 trials are shown in Fig. 4. We can see that the proposed PE-GAMP is able to perform as well as the oracle-gamp in recovering noisy sparse signals. 9

0.08 0.06 ε 0.04 method PE GAMP oracle GAMP 0.0 0.00 0.1 0. 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 ν Figure 4: Comparison on the recovery of noisy sparse signals at different noise levels ν, the error bar represents the mean ± standard deviation of the ɛ. 5 Conclusion and Future Work In this paper we proposed the GAMP algorithm with built-in parameter estimation to recover under-sampled sparse signals. The parameters are treated as random variables with prespecified priors, their posterior marginals can then be directly approimated by loopy belief propagation. This allows us to perform MAP estimation of the parameters and update to the recovered signals jointly. Following the same assumptions made by the original GAMP, the proposed PE-GAMP converges empirically to the true parameter values, as is evident from a series of noiseless and noisy sparse signal recovery eperiments. We have mainly focused on the GAMP algorithm based on sum-product message passing here. In the future we would like to include the ma-sum message passing and conduct analysis on its convergence and asymptotic consistency behavior. References [1] E. Candès and T. Tao, Decoding by linear programming, IEEE Trans. on Information Theory, vol. 511), pp. 403 415, 005. [] E.J. Candes, J. Romberg, and T. Tao, Robust uncertainty principles: Eact signal reconstruction from highly incomplete frequency information, IEEE Trans. on Information Theory, vol. 5), pp. 489 509, 006. [3] D.L. Donoho, Compressed sensing, IEEE Trans. on Information Theory, vol. 5, no. 4, pp. 189 1306, 006. [4] E.J. Candes and T. Tao, Near-optimal signal recovery from random projections: Universal encoding strategies?, IEEE Trans. on Information Theory, vol. 5, no. 1, pp. 5406 545, 006. 10

[5] M. Aharon, M. Elad, and A. Bruckstein, K-svd: An algorithm for designing overcomplete dictionaries for sparse representation, IEEE Transactions on Signal Processing, vol. 54, no. 11, pp. 4311 43, 006. [6] J. Wright, A. Y. Yang, A. Ganesh, S. S. Sastry, and Y. Ma, Robust face recognition via sparse representation, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 31, no., pp. 10 7, 009. [7] Robert Tibshirani, Regression shrinkage and selection via the lasso, Journal of the Royal Statistical Society, vol. 58, pp. 67 88, 1994. [8] S. Rangan, Generalized approimate message passing for estimation with random linear miing, in Information Theory Proceedings ISIT), 011 IEEE International Symposium on, July 011, pp. 168 17. [9] D.L. Donoho, A. Maleki, and A. Montanari, Message-passing algorithms for compressed sensing, Proc. Nat. Acad. Sci., vol. 106, no. 45, pp. 18914 18919, 009. [10] D.L. Donoho, A. Maleki, and A. Montanari, Message-passing algorithms for compressed sensing: I. motivation and construction, Proc. Inform. Theory Workshop, Jan 010. [11] M. Bayati and A. Montanari, The dynamics of message passing on dense graphs, with applications to compressed sensing, IEEE Transactions on Information Theory, vol. 57, no., pp. 764 785, Feb 011. [1] M. J. Wainwright and M. I. Jordan, Graphical models, eponential families, and variational inference, Found. Trends Mach. Learn., vol. 1, no. 1-, Jan. 008. [13] J. Vila and P. Schniter, Epectation-maimization bernoulli-gaussian approimate message passing, in Conf. Rec. 45th Asilomar Conf. Signals, Syst. Comput, Nov 011, pp. 799 803. [14] J. P. Vila and P. Schniter, Epectation-maimization gaussian-miture approimate message passing, IEEE Transactions on Signal Processing, vol. 61, no. 19, pp. 4658 467, Oct 013. [15] U. S. Kamilov, S. Rangan, A. K. Fletcher, and M. Unser, Approimate message passing with consistent parameter estimation and applications to sparse learning, IEEE Transactions on Information Theory, vol. 60, no. 5, pp. 969 985, May 014. [16] A. P. Dempster, N. M. Laird, and D. B. Rubin, Maimum likelihood from incomplete data via the em algorithm, Journal of the Royal Statistical Society, vol. 39, no. 1, pp. 1 38, 1977. 11