Convergence Analysis of the Variance in. Gaussian Belief Propagation

Size: px

Start display at page:

Download "Convergence Analysis of the Variance in. Gaussian Belief Propagation"

Franklin Richardson
6 years ago
Views:

1 0.09/TSP , IEEE Transactions on Signal Processing Convergence Analysis of the Variance in Gaussian Belief Propagation Qinliang Su and Yik-Chung Wu Abstract It is known that Gaussian belief propagation (BP) is a low-complexity algorithm for (approximately) computing the marginal distribution of a high dimensional Gaussian distribution. However, in loopy factor graph, it is important to determine whether Gaussian BP converges. In general, the convergence conditions for Gaussian BP variances and means are not necessarily the same, and this paper focuses on the convergence condition of Gaussian BP variances. In particular, by describing the messagepassing process of Gaussian BP as a set of updating functions, the necessary and sufficient convergence conditions of Gaussian BP variances are derived under both synchronous and asynchronous schedulings, with the converged variances proved to be independent of the initialization as long as it is chosen from the proposed set. The necessary and sufficient convergence condition is further expressed in the form of a semi-definite programming (SDP) optimization problem, thus can be verified more efficiently compared to the existing convergence condition based on computation tree. The relationship between the proposed convergence condition and the existing one based on computation tree is also established analytically. Numerical examples are presented to corroborate the established theories. Index Terms Gaussian belief propagation, loopy belief propagation, graphical model, factor graph, message passing, sum-product algorithm, convergence. Copyright (c) 204 IEEE. Personal use of this material is permitted. However, permission to use this material for any other purposes must be obtained from the IEEE by sending a request to pubs-permissions@ieee.org. This work was supported in part by the HKU seed Funding Programme, Project No The authors are with the Department of Electrical and Electronic Engineering, The University of Hong Kong, Hong Kong, ( qlsu@eee.hku.hk, ycwu@eee.hku.hk). July 3, X (c) 203 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See

2 0.09/TSP , IEEE Transactions on Signal Processing 2 I. INTRODUCTION In signal processing and machine learning, many problems eventually come to the issue of computing the marginal mean and variance of an individual random variable from a high dimensional joint Gaussian probability density function (PDF). A direct way of marginalization involves the computation of the inverse of the precision matrix in the joint Gaussian PDF. The inverse operation is known to be computationally expensive for a high dimensional matrix, and would introduce heavy communication overhead when carried out in distributed scenarios. By representing the joint PDF with a factor graph [], Gaussian BP provides an alternative to (approximately) calculate the marginal mean and variance for each individual random variable by passing messages between neighboring nodes in the factor graph [2] [5]. It is known that if the factor graph is of tree structure, the belief means and variances calculated with Gaussian BP both converge to the true marginal means and variances, respectively []. For a factor graph with loops, if the belief means and variances in Gaussian BP converge, the true marginal means and approximate marginal variances are obtained [6]. Recently, a novel message-passing algorithm has been proposed in [7] for inference in Gaussian graphical model by choosing a special set of nodes to break loops in the graph. Although this algorithm computes the true marginal means and improves the accuracy of variances, it still needs to use Gaussian BP as an underlying inference algorithm. Thus, in this paper, we focus on the message-passing algorithm of Gaussian BP only. With the ability to provide the true marginal means upon convergence, Gaussian BP has been successfully applied in low-complexity detection and estimation problems arising in communication systems [8] [0], fast solver for large sparse linear systems [], [2], sparse Bayesian learning [3], estimation in Gaussian graphical model [4], etc. In addition, the distributed property inherited from message passing algorithms is particularly attractive for applications requiring distributed implementation, such as distributed beamforming [5], inter-cell interference mitigation [6], distributed synchronization and localization in wireless sensor networks [7] [9], distributed energy efficient self-deployment in mobile sensor networks [20], distributed rate control in Ad Hoc networks [2], and distributed network utility maximization [22]. Moreover, Gaussian BP is also exploited to provide approximate marginal variances for large-scale sparse Bayesian learning in a computationally efficient way [23]. However, Gaussian BP only works under the prerequisite that the belief means and variances July 3, X (c) 203 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See

3 0.09/TSP , IEEE Transactions on Signal Processing 3 calculated from the updating messages do converge. So far, several sufficient convergence conditions have been proposed, which can guarantee the means and variances converge simultaneously [6], [24], [25]. However, in general, the belief means and variances do not necessarily converge under the same condition. It is reported in [24] that if the variances converge, the convergence of the means can always be observed when suitable damping is imposed. Thus, it is important to ensure the variances of Gaussian BP to converge in the first place. In the pioneering work [24], the message-passing procedure of Gaussian BP in a loopy factor graph is equivalently translated into that in a loop-free computation tree [26]. Based on the computation tree, an almost necessary and sufficient convergence condition of variances is derived. It is proved that if the spectral radius of the infinite dimensional precision matrix of the computation tree is smaller than one, the variances converge; and if this spectral radius is larger than one, the Gaussian BP becomes ill-posed. However, this condition is only almost necessary and sufficient since it does not cover the scenario when the spectral radius is equal to one. More crucially, this convergence condition requires the evaluation of the spectral radius of an infinite dimensional matrix, which is almost impossible to be calculated in practice. In this paper, with the fact that the messages in Gaussian BP can be represented by linear and quadratic parameters, the message-passing process of Gaussian BP is described as a set of updating functions of parameters. Based on the monotonically non-decreasing and concave properties of the updating functions, the necessary and sufficient convergence condition of messages quadratic parameters is derived first. Then, with the relation between BP messages quadratic parameters and belief variances, the necessary and sufficient convergence condition of belief variances is developed under both synchronous and asynchronous schedulings. The initialization set under which the variances are guaranteed to converge is proposed as well. The convergence condition derived in this paper is proved to be equivalent to a semi-definite programming (SDP) problem, thus can be easily verified. Furthermore, the relationship between the proposed convergence condition and the existing one based on computation tree is also established. Our result fills in the missing part of the convergence condition in [24] on the case when the spectral radius of the infinite dimensional matrix is equal to one. The rest of this paper is organized as follows. Gaussian BP is reviewed in Section II. Section III analyzes the updating process of Gaussian BP messages, followed by the necessary and sufficient convergence condition of quadratic parameters in the messages in Section IV. The July 3, X (c) 203 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See

4 0.09/TSP , IEEE Transactions on Signal Processing 4 derivation of the necessary and sufficient convergence condition of belief variances is presented in Section V. Relationship between the condition proposed in this paper and the existing one based on computation tree is established in Section VI. Numerical examples are presented in Section VII, followed by conclusions in Section VIII. The following notations are used throughout this paper. For two vectors, x x 2 and x > x 2 mean the inequalities are held in all corresponding elements. Notations λ(g) and λ max (G) represent any eigenvalue and the maximal eigenvalue of matrix G, respectively. Notation ρ(g) means the spectral radius of G. Notations [ ] i and [ ] ij represent the i-th and (i, j)-th element of a vector and matrix, respectively. II. GAUSSIAN BELIEF PROPAGATION In general, a Gaussian PDF can be written as { f(x) exp } 2 xt Px + h T x, () where x = [x, x 2,, x N ] T with N being the number of random variables; P 0 is the precision matrix with p ij being its (i, j)-th element; and h = [h, h 2,, h N ] T is the linear coefficient vector. In many signal processing applications, we want to find the marginalized PDF p(x i ) = f(x)dx dx i dx i+ dx N. For a Gaussian f(x), this can be done by completing the square inside the exponent of () as f(x) exp { 2 (x P h) T P(x P h) }, and then the marginalized PDF is obtained as p(x i ) exp { (x } i µ i ) 2, (2) with the mean µ i = [P h] i and the variance σ 2 i = [P ] ii. However, the required matrix inverse is computationally expensive for a large dimensional P, and introduces heavy communication overhead to the network when the information p ij is located distributedly. On the other hand, the Gaussian PDF in () can be expanded as f(x) N 2σ 2 i i= f i (x i ) N N j= k=j+ where f i (x i ) = exp { p ii 2 x2 i + h i x i } and fjk (x j, x k ) = exp { p jk x j x k }. Based on this expansion, a factor graph can be constructed by connecting each variable x i with its associated factors f i (x i ) and f ij (x i, x j ). It is known that Gaussian BP can be applied on this factor graph by passing f jk (x j, x k ), The factor graph is assumed to be connected in this paper. July 3, X (c) 203 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See

5 0.09/TSP , IEEE Transactions on Signal Processing 5 messages among neighboring nodes to obtain the true marginal mean and approximate marginal variance for each variable. In Gaussian BP, the departing and arriving messages corresponding to variables i and j are updated as m d i j (x i, t) f i (x i ) m a i j(x j, t + ) k N (i)\j m a k i (x i, t), (3) f ij (x i, x j ) m d i j (x i, t) dx i, (4) where (i, j) E with E {(i, j) p ij 0, for i, j V} being the set of index pairs of any two connected variables and V {, 2,, N} being the set of indices of all variables; t is the time index; N (i) is the set of indices of neighboring variable nodes of node i, and N (i)\j is the set N (i) but excluding the node j. After obtaining the messages m a k i (x i, t), the belief at variable node i is equal to b i (x i, t) f i (x i ) k N (i) m a k i (x i, t). (5) It has been proved that b i (x i, t) always converges to p(x i ) in (2) exactly if the factor graph is of tree structure [], [6]. For loopy factor graph, if b i (x i, t) converges, the mean of b i (x i, t) will converge to the true mean µ i, while the converged variance is an approximation of σi 2 [6]. { Assume the arriving message is in Gaussian form of m a i j(x j, t) exp va i j (t) x 2 2 j + βi j(t)x a j }, where vi j(t) a and βi j(t) a are the arriving precision and arriving linear coefficient, respectively. { Inserting it into (3), we obtain m d i j(x i, t) exp vd i j (t) x 2 2 i + βi j(t)x d i }, where vi j(t) d = p ii + vk i(t), a (6) β d i j(t) = h i + k N (i)\j k N (i)\j β a k i(t) (7) are the departing precision and linear coefficient, respectively. Furthermore, substituting the departing message m d i j(x i, t) into (4), we obtain { p 2 m a ij i j(x j, t+) exp 2vi j d (t)x2 j p } ijβi j(t) d vi j d (t) x j ( ) 2 i j(t) exp vd x i βd i j(t) p ij x j 2 vi j d (t) dx i. (8) July 3, X (c) 203 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See

6 0.09/TSP , IEEE Transactions on Signal Processing 6 { If vi j(t) d > 0, the integration equals to a constant, and thus m a p 2 i j(x j, t+) exp ij 2vi j d (t)x2 j p ijβi j d (t) x vi j d (t) j }. Therefore, vi j(t a + ) and βi j(t a + ) are updated as p2 ij vi j(t a v + ) = d (t), if vd i j(t) > 0 i j (9) not defined, otherwise, p ijβi j d (t), if v βi j(t a v i j(t) d > 0 i j + ) = d (t) (0) not defined, otherwise, where not defined arises since under the case of v d i j(t) 0, the integration in (8) becomes infinite and the message m a i j(x j, t + ) loses its Gaussian form. After obtaining vk i a (t), the variance of belief at each iteration is computed as σ 2 i (t) = p ii + () k N (i) va k i (t). From (9) and (0), it can be seen that the messages of Gaussian BP maintain the Gaussian form only when v d i j(t) > 0 for all t 0. Therefore, we have the following lemma. Lemma. The messages of Gaussian BP are always in Gaussian form if and only if v d i j(t) > 0 for all t 0. III. ANALYSIS OF THE MESSAGE-PASSING PROCESS In this section, we first analyze the updating process of v a i j(t), and then derive the condition to guarantee the BP messages being maintained at Gaussian form. A. Updating Function of v a i j(t) and Its Properties Under the Gaussian form of messages, substitutting (6) into (9) gives v a i j(t + ) = By writing (2) into a vector form, we obtain p 2 ij p ii + (2) k N (i)\j va k i (t). v a (t + ) = g(v a (t)), (3) where g( ) is a vector-valued function containing components g ij ( ) with (i, j) E arranged in ascending order first on j and then on i, and g ij ( ) is defined as p 2 ij g ij (w) p ii + k N (i)\j w ; (4) ki July 3, X (c) 203 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See

7 0.09/TSP , IEEE Transactions on Signal Processing 7 v a (t) and w are vectors containing elements v a i j(t) and w ij, respectively, both with (i, j) E arranged in ascending order first on j and then on i. Moreover, define the set { W w pii + } w ki > 0, (i, j) E. (5) k N (i)\j Now, we have the following proposition about g( ) and W. Proposition. The following claims hold: P) For any w W, if w w 2, then w 2 W; P2) For any w W, if w w 2, then g(w ) g(w 2 ); P3) g ij (w) is a concave function with respect to w W. Proof: Consider two vectors w and ŵ in W, which contain elements w ki and ŵ ki with (k, i) E arranged in ascending order first on k and then on i. For any w W, according to the definition of W in (5), we have p ii + k N (i)\j w ki > 0. Then, if w ŵ, it can be easily seen that p ii + k N (i)\j ŵki > 0 as well. Thus, we have ŵ W. The first-order derivative of g ij (w) with respect to w ki for k N (i)\j is computed to be g ij w ki = p 2 ij (p ii + k N (i)\j w 2 > 0. (6) ki) Thus, g ij (w) is a continuous and strictly increasing function with respect to the components w ki for k N (i)\j with w W. Hence, we have if w w 2, then g(w ) g(w 2 ). To prove the third property, consider the simple function p2 ij p ii +r under the domain of r > p ii first. It can be easily derived that the second order derivative of p2 ij is p ii +r (p ii < 0, where the +r) 3 inequality holds due to p ii + r > 0. Thus, p2 ij is a concave function with respect to r > p p ii +r ii. Since g ij (w) = p 2 ij p ii + k N (i)\j w ki 2p2 ij is the composition of the concave function p2 ij p ii +r and the affine mapping k N (i)\j w ki, the composition function g ij (w) is also concave with respect to { k N (i)\j w ki > p ii [28, p. 79]. Due to W w pii + } k N (i)\j w ki > 0, (i, j) E as defined in (5), then w W implies that k N (i)\j w ki > p ii. Thus, we can infer that g ij (w) is a concave function with respect to w W. Notice that convexity is also exploited in [29] to derive the convergence condition for a general min-sum message-passing algorithm, which reduces to Gaussian BP when it is applied to the particular quadratic function 2 xt Px h T x. The convexity in [29] means that the quadratic function can be decomposed into summation of convex functions, which is different from the convexity in the updating function g ij (w) here. July 3, X (c) 203 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See

8 0.09/TSP , IEEE Transactions on Signal Processing 8 B. Condition to maintain the Gaussian Form of Messages First, define the following set 2 S {w W w g(w)}. (7) With notations g (t) (w) g ( g (t ) (w) ) and g (0) (w) w, the following proposition can be established. Proposition 2. The set S has the following properties: P4) S is a convex set; P5) If s S, then s < 0; P6) If s S, then g (t) (s) S and g (t) (s) g (t+) (s) for all t 0. Proof: As g ij (w) is a concave function for w W according to P3), thus g ij (w) w ij is also concave for w W. The set of w W which satisfies the condition of g ij (w) w ij 0 is a convex set [28]. Thus, S = {w w g(w) with w W} = {g ij (w) w ij 0 for all (i, j) E with w W} is also a convex set. If s S, we have s g(s) and s W. According to the definition of W in (5), p ii + k N (i)\j s ki > 0. Putting this fact into the definition of g ij ( ) in (4), it is obvious that g(s) < 0. Therefore, we have the relation that s g(s) < 0 for all s S. Finally, if s S, we have s g(s) and s W. Hence, g(s) W according to the P). Applying g( ) on both sides of s g(s) and using P2), we obtain g(s) g (2) (s). Furthermore, since g(s) W and g(s) g (2) (s), we also have g(s) S. By induction, we can prove in general that g (t) (s) S and g (t) (s) g (t+) (s) for all t 0. Now, we give the following theorem. Theorem. There exists at least one initialization v a (0) such that the messages of Gaussian BP are maintained at Gaussian form for all t 0 if and only if S. Proof: Sufficient Condition: 2 The simpler symbol S is reserved for later use. July 3, X (c) 203 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See

9 0.09/TSP , IEEE Transactions on Signal Processing 9 If S, by choosing an initialization v a (0) S and applying P6), it can be inferred that v a (t) S for all t 0. Moreover, according to the definitions of W in (5) and S in (7), we have S W, and thereby v a (t) W for all t 0, or equivalently p ii + k N (i)\j va k i (t) > 0 for all (i, j) E according to the definition of set W in (5). Due to v d i j(t) = p ii + k N (i)\j va k i (t) as given in (6), we obtain vd i j(t) > 0 for all (i, j) E and t 0. According to Lemma, it can be inferred that the messages are maintained at Gaussian form for all t 0. Necessary Condition: We prove this necessary condition by contradiction. When S =, suppose that there exists a v a (0) such that the messages are maintained at Gaussian form for all t 0. According to Lemma, we can infer that v i j(t) d = p ii + k N (i)\j va k i (t) > 0 for all (i, j) E, or equivalently v a (t) g (t) ( v a (0)) W for all t 0 from the definition of W in (5). Now, choose another initialization v a (0) satisfying both v a (0) v a (0) and v a (0) 0. Due to v a (0) W and v a (0) v a (0), by using P2), we have g ( v a (0)) g (v a (0)), that is, v a () v a (). Furthermore, substituting v a (0) 0 into (4) gives v a () 0, and thereby v a () v a (0). Combining with v a () v a () leads to v a () v a () v a (0). Due to the assumption v a () W, by applying P2) to v a () v a () v a (0), we obtain v a (2) v a (2) v a (). Combining with the fact v a () 0 as proved above, we have v a (2) v a (2) v a () 0. By induction, it can be derived that v a (t + ) v a (t + ) v a (t) 0, t. (8) Then, from the definition of W in (5), the assumption v a (t) W is equivalent to p ii + k N (i)\j va k i (t) > 0 for all (i, j) E, where va k i (t) are the elements of va (t) with (k, i) E arranged in the same order as v a (t). By rearranging the terms in p ii + k N (i)\j va k i (t) > 0, we obtain v a γ i(t) > p ii k N (i)\j,γ va k i (t). Applying va (t) 0 shown in (8) into this inequality, it can be inferred that v a γ i(t) > p ii k N (i)\j,γ va k i (t) p ii, that is, v a γ i(t) > p ii for all (γ, i) E. Combining with v a (t) v a (t) shown in (8), for all (i, j) E, we have v a i j(t) > p jj. (9) It can be seen from (8) and (9) that v a (t) is a monotonically non-increasing and lower bounded sequence. Thus, v a (t) must converge to a vector v a, that is, v a = g(v a ). On the other hand, substituting (2) into (9) gives p 2 ij p ii + k N (i)\j va k i (t) > p jj. Due to v a (t) W, it can be inferred from P) and (8) that v a (t) W, or equivalently p ii + k N (i)\j va k i (t) > July 3, X (c) 203 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See

10 0.09/TSP , IEEE Transactions on Signal Processing 0 0. Together with the fact p jj > 0, we obtain p ii + k N (i)\j va k i (t) > p2 ij p jj. Since v a (t) converges to v a, taking the limit on both sides of the inequality gives p ii + k N (i)\j va k i p2 ij p jj. Hence, p ii + k N (i)\j va k i > 0. From the definition of W in (5), we have va W. Combining with v a = g(v a ), according to the definition of S in (7), it is clear that v a S. This contradicts with the prerequisite S =. Therefore, S cannot be. IV. NECESSARY AND SUFFICIENT CONVERGENCE CONDITION OF MESSAGE PARAMETER v a (t) According to Theorem, Gaussian form of messages may only be maintained for some v a (0). Thus, the choice of initialization v a (0) also plays an important role in the convergence of v a (t). A. Synchronous Scheduling In this section, we will investigate the convergence condition of v a (t) under synchronous scheduling, which means that v a (t) is updated as v a (t + ) = g(v a (t)). The necessary and sufficient convergence condition under nonnegative initialization set is derived first. Then, the convergence condition is proved to hold under a more general initialization set. Lemma 2. v a (t) converges to the same point v a S for all v a (0) 0 if and only if S. Proof: See Appendix A. Obviously, the initialization v a (0) = 0 used by the convergence condition in [24] is implied by the proposed initialization set v a (0) 0 in Lemma 2. In the following, we show that the initialization set could be further expanded to the set { } w0 A {w 0} {w w 0 w 0 int(s )} w w 0 S and w 0 = lim g (t) (0), (20) t where int(s ) means the interior of S. Notice that the set A consists of the union of three parts. The first part {w 0} corresponds to the initialization set given in Lemma 2. For the case S but int(s ) =, the second part is empty. Also, according to Lemma 2, we have lim t g (t) (0) = v a S. Thus, the third part of set A reduces to {w v a }. Due to the fact that v a S implies v a < 0 given by P5), we have {w 0} {w v a }. For the case int(s ), according to (63), we have w 0 v a for any w 0 int(s ). This implies {w v a } {w w 0 w 0 int(s )}. Obviously, in this case, we also have {w 0} {w w 0 w 0 int(s )}. July 3, X (c) 203 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See

11 0.09/TSP , IEEE Transactions on Signal Processing Therefore, if S, the set A is always larger than the set {w 0}. Theorem 2. v a (t) converges to the same point v a for all v a (0) A if and only if S. Proof: The first part {w 0} of A corresponds to the case covered in Lemma 2, thus has already been established. To prove the sufficiency, we consider the case int (S ) and the case int(s ) = but S, respectively. For the case int (S ), it is known from (20) that A = {w w 0 w 0 int(s )} in this case. To prove the theorem, we will first prove that v a (t) converges to the same point v a for all v a (0) int(s ). Then, notice that for any v a (0) {w w 0 w 0 int(s )}, it can be upper bounded by a point from {w 0} and lower bounded by a point from int(s ). Because both the upper bound and lower bound converge to v a, we can easily extend the convergence to the much larger set v a (0) {w w 0 w 0 int(s )}. First, we prove that v a (t) converges to v a for all v a (0) int(s ). To do this, we establish the following three facts first. ) For any v a (0) int(s ), according to P6), we have v a (t) v a (t + ) and v a (t) S. With P5), we also have v a (t) < 0. It can be seen that v a (t) is a monotonically non-decreasing and upper bounded sequence, thus v a (t) converges to a fixed point, which is denoted as v a. On the other hand, let v a (0) 0 correspond to a point in the first part of A, due to v a (0) < 0, we have v a (0) < v a (0). By using P2), applying g( ) to v a (0) < v a (0) for t times gives v a (t) v a (t). Since v a (t) and v a (t) converge to v a and v a, respectively, taking the limit on both sides of v a (t) v a (t) gives v a v a. Due to v a (0) int(s ), the definition of S in (7) implies that v a (0) < g( v a (0)), that is, v a (0) < v a (). Combining with v a (t) v a (t + ) established above, we obtain v a (0) < v a. Together with the fact v a v a, we obtain v a (0) < v a v a. (2) 2) For a v a (0) int (S ), construct a vector ˆv a (0) = λ v a (0) + ( λ)v a, (22) where λ (0, ). Since g ij (w) is a concave function over w S and { v a (0), v a } S, we have g(λ v a (0) + ( λ)v a ) λg( v a (0)) + ( λ)g(v a ), or equivalently g(λ v a (0) + ( λ)v a ) λ v a () + ( λ)v a. According to P2), applying g( ) to this inequality gives July 3, X (c) 203 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See

12 0.09/TSP , IEEE Transactions on Signal Processing 2 g (2) (λ v a (0) + ( λ)v a ) g(λ v a () + ( λ)v a ). Because of the concave property of g ij (w), it is known that g(λ v a () + ( λ)v a ) λ v a (2) + ( λ)v a. Thus, we have g (2) (λ v a (0) + ( λ)v a ) λ v a (2) + ( λ)v a. By induction, it can be inferred that g (t) (λ v a (0)+( λ)v a ) λ v a (t)+( λ)v a for all t 0. With ˆv a (0) = λ v a (0)+( λ)v a given by (22), we obtain ˆv a (t) λ v a (t) + ( λ)v a, (23) where ˆv a (t) g (t) (ˆv a (0)). Since { v a (0), v a } S and S is a convex set as indicated in P4), we have ˆv a (0) = λ v a (0) + ( λ)v a S. According to P6), ˆv a (t) g (t) (ˆv a (0)) is a monotonically non-decreasing sequence. Furthermore, according to P5), ˆv a (t) is also upper bounded by 0. Thus, ˆv a (t) converges to a fixed point, which is denoted as ˆv a. Taking limits on both sides of (23) gives ˆv a λ v a + ( λ)v a = v a + ( λ)(v a v a ). (24) 3) From v a (0) < v a v a given by (2), we have v a v a (0) > 0 and v a v a (0) > 0. If λ (0, ) and is close to, the relation ( λ)(v a v a (0)) < v a v a (0) always holds, that is, λ (0, ), s.t. ( λ)(v a v a (0)) < v a v a (0). (25) Rearranging the terms in (25) gives λ (0, ), s.t. λ v a (0) + ( λ)v a < v a. Due to λ v a (0) + ( λ)v a = ˆv a (0) given in (22), thus (25) implies that there always exists a λ (0, ) such that ˆv a (0) < v a. Now, applying g( ) to ˆv a (0) < v a for t times, from P2), we obtain g (t) (ˆv a (0)) g (t) ( v a ), or equivalently ˆv a (t) v a. Taking the limit on both sides of ˆv a (t) v a, it can be inferred that λ (0, ), s.t. ˆv a v a. (26) Now, we make use of (2), (24) and (26) to prove that the convergence limit v a is identical to the convergence limit v a. Combining (2) and (26) gives λ (0, ), s.t. ˆv a v a v a. (27) From the second inequality of (27), we have ( λ)(v a v a ) 0. Substituting this relation into (24), we can infer that ˆv a v a. Comparing this result to the first inequality of (27), we July 3, X (c) 203 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See

13 0.09/TSP , IEEE Transactions on Signal Processing 3 obtain ˆv a = v a. Due to ˆv a = v a, (24) becomes ( λ)(v a v a ) 0. For λ (0, ), the relation is equivalent to v a v a. Combining it with the second inequality in (27), we obtain v a = v a. (28) Next, we will extend the initialization set to the whole set A = {w w 0 w 0 int(s )}. For any v a (0) A, we can always find a vl a(0) int(s ) and a vu(0) a 0 such that vl a(0) v a (0) vu(0). a According to P2), applying g( ) to this inequality repeatedly gives g (t) (vl a(0)) g (t) ( v a (0)) g (t) (vu(0)). a Since vl a(t) = g(t) (vl a(0)) and va u(t) = g (t) (vu(0)) a both converge to v a, thus v a (t) converges to v a, too. Now, consider the case int(s ) = but S. In this case, according to the discussion after (20), A = {w v a }. For any v a (0) A, we can always find a vu(0) a {w 0} such that v a v a (0) vu(0). a Applying g( ) to this inequality for t times, from P2), we obtain v a g (t) ( v a (0)) g (t) (vu(0)). a Since vu(t) a converges to v a, thus v a (t) also converges to v a. On the other hand, if S =, according to Theorem, the messages of Gaussian BP passed in factor graph lose the Gaussian form, thus the parameters of messages v a (t) cannot converge in this case. B. Asynchronous Scheduling In this section, convergence conditions under asynchronous scheduling schemes will be investigated. Modified from the synchronous updating equation in (2), arriving precision in asynchronous scheduling is updated as p 2 ij vi j(t a + ) = p ii + k N (i)\j va k i (τ k i a (t)), t T i j (29) where T i j is the set of time instants at which vi j(t) a are updated; 0 τk i a (t) t is the last updated time instant of vk i a (t) at time t. In this paper, we only consider the totally asynchronous scheduling defined as follows. Definition. (Totally Asynchronous Scheduling) [27]: The sets T i j are infinite, and τ a i j(t) satisfies 0 τ a i j(t) t and lim t τ a i j(t) = for all (i, j) E. In order to prove the convergence of v a (t) under asynchronous scheduling given by Definition, we need to find a sequence of nonempty sets {U(m)} for m 0 such that the following two sufficient conditions are satisfied [27]: July 3, X (c) 203 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See

14 0.09/TSP , IEEE Transactions on Signal Processing 4 Synchronous Convergence Condition: the sequence of sets satisfy ) U(m + ) U(m); 2) g(w) U(m + ) for w U(m); 3) U(m) converges to a fixed point; Box Condition: U(m) can be written as product of subsets of individual components U i j (m) for all (i, j) E. While the formal proof of asynchronous convergence under these conditions are provided in [27], we give a brief intuitive explanation below. Suppose when t t m, v a i j(t) U i j (m) for all (i, j) E, or expressed using the Box Condition v a (t) U(m) for t t m. Now, consider the updating of an individual component vi j(t). a Due to lim t always find a t (i j) m+ > t m such that when t t (i j) τk i a (t) =, we can m+, we have τ a k i (t) t m and thereby v a k i (τ a k i (t)) U k i(m) for all k N (i)\j. Putting v a k i (τ a k i (t)) into function g ij( ) in (29) and applying condition 2) in Synchronous Convergence Condition, we have v a i j(t + ) U i j (m + ) for t t (i j) m+. Thus, by choosing t m+ = max t (i j) m+, we have when t t m+, (i,j) E vi j(t) a U i j (m + ) for all (i, j) E. Further due to conditions ) and 3) in Synchronous Convergence Condition, we know that v a (t) will gradually converge to a fixed point. Theorem 3. v a (t) converges for any v a (0) A and all choices of schedulings if and only if S. Proof: We first prove the sufficient condition. Since S, as given in the proof of Theorem 2, for any v a (0) A, there always exist an element v a (0) int (S ) {v a } and ˆv a (0) 0 such that v a (0) v a (0) ˆv a (0). Construct a sequence of sets as U(m) = {w v a (m) w ˆv a (m)}. (30) Since v a (0) int (S ) {v a }, we have v a (m) v a (m + ) according to P6). Further, for any ˆv a (0) 0, putting ˆv a (0) into (4), we have ˆv a () < 0, thereby ˆv a () < ˆv a (0). Since ˆv a (m) converges according to Theorem 2, then ˆv a (m) W for all m 0. Thus, according to P2), by applying g( ) to ˆv a () < ˆv a (0) and by induction, we have ˆv a (m + ) ˆv a (m). By exploiting v a (m) v a (m + ), ˆv a (m + ) ˆv a (m) and the definition of U(m) in (30), we have U(m + ) U(m). (3) Now, for any w U(m), by applying g( ) to v a (m) w ˆv a (m), we obtain v a (m + ) g(w) ˆv a (m + ). Thus, from the definition of U(m) in (30), we have the relation g (w) U(m + ), w U(m). (32) July 3, X (c) 203 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See

15 0.09/TSP , IEEE Transactions on Signal Processing 5 Furthermore, according to Theorem 2, we have lim m va (m) = lim m ˆva (m) = v a, hence U(m) will converge to the fixed point v a. Thus, the set U(m) satisfies the Synchronous Convergence Condition [27]. Furthermore, according to the definition of U(m) in (30), v a (m) U(m) means v i j(m) a vi j(m) a ˆv i j(m) a for all (i, j) E, that is, vi j(m) a U i j (m) with U i j (m) = {w ij v i j(m) a w ij ˆv i j(m)}. a Thus, U(m) can be represented in form of product of subsets of individual components U i j (m) for all (i, j) E, and thereby the Box Condition is satisfied. Hence, the sufficient condition is proved. On the other hand, if S =, according to Theorem, the messages of Gaussian BP passed in factor graph lose the Gaussian form, thus the parameters of messages v a (t) cannot converge in this case. From Theorems 2 and 3, it can be seen that the convergence condition of v a (t) under asynchronous scheduling is the same as that under synchronous scheduling. V. NECESSARY AND SUFFICIENT CONVERGENCE CONDITION OF BELIEF VARIANCE σ 2 i (t) Although the convergence conditions for v a (t) are derived in the last section, what we are really interested in is the convergence condition for the belief variance σ 2 i (t). According to (), the variance σi 2 (t) and v a (t) are related by σi 2 (t) = p ii + k N (i) va k i (t), or equivalently = p σi 2(t) ii + k N (i) va k i (t). A. Convergence Condition first. In order to investigate the convergence condition of σ 2 i (t), we present the following lemma Lemma 3. If S and v a (0) A, then lim i V. t σi 2 t σi 2 (t) > 0 for all i V or lim (t) = 0 for all Proof: In the proof, we first consider the special case under tree-structured factor graph as well as the impact of initializations. Then, for the case of loopy factor graph, we prove that there exists at least one node γ V such that lim lim t σγ(t) 2 t σγ(t) 2 0. At last, we demonstrate that either > 0 for all γ V or lim = 0 for all γ V will hold. t σγ(t) 2 First, if the factor graph is of tree structure, according to properties of BP in a tree structured factor graph, it is known that lim t σ 2 γ(t) = [P ] γγ []. Due to P 0, we have lim t σ 2 γ(t) = July 3, X (c) 203 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See

16 0.09/TSP , IEEE Transactions on Signal Processing 6 [P ] γγ > 0, and thereby lim t σγ(t) 2 graph containing loops. > 0 for all γ V. In the following, we focus on the factor If S, according to Theorem 2, v a (t) converges to the same point for any initialization v a (0) A. Due to = p σi 2(t) ii + k N (i) va k i (t), then lim always exists and is identical t σi 2 (t) for all v a (0) A. Therefore, we only need to consider a specific initialization, e.g. v a (0) = 0. Second, we prove lim 0 for at least one node γ V. Due to the factor graph containing t σγ(t) 2 loops, we can always find a node γ such that N (γ) 2, where means the cardinality of a set. Denote nodes i, j as two neighbors of node γ, that is, i, j N (γ). Since v a (0) = 0, from (63), we have v a v a (t). Applying this relation to σ 2 γ(t) = p γγ + k N (γ)\i va k γ (t) + va i γ(t) gives σ 2 γ(t) p γγ + k N (γ)\i By further using v a v a (t), we can easily obtain p γγ + k N (γ)\i It can be further shown that p γγ + k N (γ)\j v a k γ + v a j γ(t) = p γγ + k N (γ)\j va k γ p jj + k N (j)\γ va k j (t ) a = b = p γγ + k N (γ)\j va k γ p jj + k N (j)\γ va k j (t ) p γγ + k N (γ)\j va k γ p jj + k N (j)\γ va k j (t ) where the equality a = holds because v a k γ(t) + v a i γ = p γγ + p γγ + v a k γ(t) + v a i γ. (33) k N (γ)\i,j k N (γ)\j v a k γ(t) + v a i γ + v a j γ(t) v a k γ + v a j γ(t). (34) p jj + vk j(t a ) + p jj + k N (j)\γ va k j (t ) p γγ + vj γ(t) a k N (j)\γ k N (γ)\j va k γ p jj + p 2 vk j(t a γj ) p γγ + k N (j)\γ k N (γ)\j va k γ p jj + vk j(t a ) + v a, (35) k N (j)\γ and = b is obtained by using the limiting form of (2) vγ j a = γ j (p jj + k N (j)\γ vak j (t ) ) v a j γ(t) = p 2 γj from (2); p 2 γj p γγ + k N (γ)\j va k γ. Combining July 3, X (c) 203 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See

17 0.09/TSP , IEEE Transactions on Signal Processing 7 (34) and (35) gives p γγ + k N (γ)\i v a k γ(t)+v a i γ p γγ + k N (γ)\j va k γ p jj + k N (j)\γ va k j (t ) p jj + k N (j)\γ v a k j(t ) + v a Due to v a S from Lemma 2, then v a W. From the definition of W in (5), for all (ν 0, ν ) E, we have p ν0 ν 0 + γ j (36) k N (ν 0 )\ν v a k ν 0 > 0. (37) By using (37) and v a (t) v a implied by (63), we can infer p ν0 ν 0 + k N (ν 0 )\ν v a k ν 0 (t) > 0 for all (ν 0, ν ) E, and thereby p ν0 ν 0 + k N (ν 0 )\ν v a k ν 0 p ν ν + k N (ν )\ν 0 v a k ν (t) > 0. (38) Since the factor graph contains loops, there always exists a walk (j 0, j,, j t, j t, i) with j t = γ such that all (j l, j l ) E and N (j l ) 2 for l = 0,,, t. Then, using (36) repeatedly on such a walk gives p γγ + k N (γ)\i t vk γ(t) a + vi γ a l=0 p jl+ j l+ + k N (j l+ )\j l vk j a l+ p jl j l + p k N (j l )\j l+ vk j a j0 j l (l) 0 + = ( ) t p j0 j 0 + vj a j 0 l=0. k N (j 0 )\j v a k j 0 (0) + v a p jl+ j l+ + k N (j l+ )\j l v a k j l+ p jl j l + k N (j l )\j l+ v a k j l (l), (39) where the last equality holds due to v a (0) = 0. Due to N (j 0 ) 2, there exists a node ν such that j N (j 0 )\ν. By exploiting the fact that v a < 0 due to v a S, we obtain p j0 j 0 + v a j j 0 p j0 j 0 + k N (j 0 )\ν va k j 0. Combining with (37), we can infer that p j0 j 0 + v a j j 0 > 0. (40) Putting (38) and (40) into (39), we obtain p γγ + k N (γ)\i va k γ (t) + va i γ > 0. Applying this result to (33) gives σ 2 γ(t) > 0 for all t 0, and thereby lim t 0. σγ(t) 2 At last, we will prove that lim > 0 for all γ V or lim = 0 for all γ V. Notice t σγ(t) 2 t σγ(t) 2 that the relation in (35) holds for any (γ, j) E. Thus, taking the limit on both sides of (35) and recognizing that lim = p t σγ 2(t) γγ + k N (γ) va k γ obtain lim t σγ(t) = p γγ + 2 p jj + k N (j)\γ va k N (γ)\j va k γ and lim t k j lim σ 2 j (t) = p jj + k N (j) va k j, we t σj 2 (t). (4) j j 0 July 3, X (c) 203 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See

18 0.09/TSP , IEEE Transactions on Signal Processing 8 Since p γγ + k N (γ)\j va k γ > 0 for all (γ, j) E given in (37), according to (4), if lim 0, then lim lim t t σj 2 σj 2 (t) > 0 for all nodes j N (γ), and vice versa. Similarly, if lim t t σγ 2 > σγ(t) 2 (t) = 0, then (t) = 0 for all nodes j N (γ), and vice versa. Recall that we have proved there exists a node γ with lim t σγ(t) 2 for all i V or lim t σi 2 t σi 2 0. Therefore, for a fully connected factor graph, either lim t = 0 for all i V. (t) > 0 σi 2(t) Obviously, if lim 0, then lim (t) t σ2 i (t) exists, and hence the convergence condition of σi 2 (t) is the same as that of σi 2 (t). However, Lemma 3 reveals that the scenario lim = 0 t σi 2(t) cannot be excluded. Thus, we have the following theorem. Theorem 4. σ 2 i (t) with i V converges to the same positive value for all v a (0) A under synchronous and asynchronous schedulings if and only if S =, where { S w w S and p ii + w ki > 0, i V k N (i) }. (42) Proof: First, we prove if S =, then σ 2 i (t) converges for any v a (0) A. From the definition of S in (42), if S, then S. According to Theorems 2 and 3, v a (t) converges to v a for all v a (0) A under both synchronous and asynchronous schedulings. Thus, lim = t σi 2(t) p ii + k N (i) va k i (t) will converge to p ii + k N (i) va k i. Due to S =, there exists a w such that w S and p ii + k N (i) w ki > 0 for all i V. Due to w S, it can be inferred from (63) that w v a. Putting this result into p ii + k N (i) w ki > 0, we obtain p ii + k N (i) va k i > 0 and its inverse exists. Thus, if S, the variance σ 2 i (t) = p ii + k N (i) va k i (t) p ii + k N (i) va k i > 0. converges to Next, we prove by contradiction that if σ 2 i (t) converges for v a (0) A, then S =. Suppose that S =. This assumption contains two cases: ) S = ; 2) S and p ii + k N (i) w ki 0 for some i V. First, consider the case S =. Due to S =, according to Theorem, the messages of Gaussian BP passed in factor graph cannot maintain Gaussian form, hence v a (t) becomes undefined. Therefore, the variance σ 2 i (t) cannot converge in this case, which contradicts with the prerequisite that σ 2 i (t) converges. Second, consider the case S. Due to S, according to Theorems 2 and 3, v a (t) converges to v a S for any v a (0) A. According to Lemma 3, we further have lim lim t σ 2 (t) t k i σ 2 (t) = p ii + k N (i) va = p ii + k N (i) va k i = 0 for all i V. If p ii + k N (i) va k i > 0 for all i V or > 0 for all i V, combining with v a S, we can infer from the definition of S in (42) that v a S, which July 3, X (c) 203 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See

19 0.09/TSP , IEEE Transactions on Signal Processing 9 contradicts with the assumption S =. On the other hand, if p ii + k N (i) va k i = 0 for all i V, obviously, the variance σi 2 (t) = p ii + cannot converge, which contradicts with k N (i) va k i (t) the prerequisite that σi 2 (t) converges. In summary, we have proved that if σi 2 (t) converges for all v a (0) A, then S = cannot hold, or equivalently we must have S. Remark : Due to the belief mean µ i (t) = h i+ k N (i) βa k i (t), if the convergence condition σi 2(t) of belief variance σ 2 i (t) is already known, the convergence of µ i (t) is determined by that of message parameters βk i a (t). As pointed out in [24], βa k j (t) is updated in accordance to a set of linear equations upon the convergence of belief variances. Thus, after deriving the convergence condition of σ 2 i (t), we can then apply the theories from linear equations to analyze the convergence condition of belief mean µ i (t). However, the convergence condition of belief mean µ i (t) would be lengthy and will be investigated in the future. B. Verification Using Semi-definite Programming [30] Next, we show that whether S is can be determined by solving the following SDP problem min w, α s.t. α (43) p ii + w ki p ij k N (i)\j 0, (i, j) E; p ij α + p ii + k N (i) w ij w ki 0, i V. The SDP problem (43) can be solved efficiently by existing softwares, such as CVX [3] and SeDuMi [32], etc. Theorem 5. S if and only if the optimal solution α of (43) satisfies α < 0. Proof: First, notice that the SDP problem in (43) is equivalent to the following optimization July 3, X (c) 203 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See

20 0.09/TSP , IEEE Transactions on Signal Processing 20 problem min w, α α (44) s.t. w ij g ij (w) 0, (i, j) E; p ii w ki 0, (i, j) E; k N (i)\j p ii k N (i) w ki α, i V. If S, according to definition of S in (42), there must exist a w such that w g(w), w W, and p ii + k N (i) w ki > 0 for all i V. Obviously, these three conditions are equivalent to w ij g ij (w) 0, p ii k N (i)\j w ki < 0 for all (i, j) E and p ii k N (i) w ki < 0 for all i V. Thus, by defining α = p ii k N (i) w ki, it can be seen that (w, α) satisfies the constraints in (44). Due to p ii k N (i) w ki < 0, thus we have α < 0. Since α is a feasible solution of the minimization problem in (44), the optimal solution of (44) cannot be greater than α, hence it can be inferred that α < 0. Next, we prove the reverse also holds. If (w, α ) is the optimal solution of (44) with α < 0, (w, α ) must satisfy the constraints in (44), thus the following three conditions hold: ) w ij g ij (w ) 0; 2) p ii k N (i)\j w ki 0; 3) p ii k N (i) w ki α. For the second constraint, if p ii k N (i)\j w ki = 0, the function g ij (w ) = p 2 ij p ii + k N (i)\j w ki becomes undefined, thus p ii k N (i)\j w ki = 0 will never happen. Hence, we always have p ii k N (i)\j w ki < 0. For the third constraint, due to α < 0, it can be inferred that p ii k N (i) w ki < 0. Now, comparing p ii k N (i)\j w ki 0, p ii k N (i)\j w ki < 0 and p ii k N (i) w ki < 0 with the definition of set S in (42), we have w S, and hence S. Remark 2: Using the alternating direction method of multipliers (ADMM) [27], the SDP problem in (43) can be reformulated into N locally connected low-dimensional SDP subproblems, thus can be solved distributively. Not only this avoids ( the gathering of information at (N ) ) 4 a central processing unit, the complexity is also reduced from O i= N (i) of directly ( N ) solving SDP to O i= N (i) 4 per iteration of using ADMM technique. Furthermore, since what we are really interested in is to know whether S =, ADMM can stop its updating immediately if an intermediary vector is found to be within S. Thus, the required number of iterations can be reduced significantly. Because the derivation of ADMM is well-documented July 3, X (c) 203 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See

21 0.09/TSP , IEEE Transactions on Signal Processing 2 in [27], [33], we do not give the details here. It should be emphasized that despite of the low complexity of ADMM at each iteration, due to the unknown number of required iterations, it cannot be proved that the overall complexity is always lower than that of direct matrix inverse O(N 3 ). Remark 3: As discussed after (20), we know that A = {w w 0 w 0 int(s )} if int(s ). In particular, with a proof similar to that of Theorem 5, it can be shown that if the SDP min w, α s.t. α p ii + w ki k N (i)\j p ij 0, (i, j) E (45) p ij α w ij has a solution α < 0, then the optimal w must be a point in int(s ). VI. RELATIONSHIP WITH THE CONDITION BASED ON COMPUTATION TREE In this section, we will establish the relation between the proposed convergence condition and the one proposed in [24], which is the best currently available. For the convergence condition in [24], the original PDF in () is first transformed into the normalized form { f( x) exp } 2 xt P x + h T x, (46) where x = diag 2 (P)x, P = diag 2 (P)Pdiag 2 (P) and h = diag 2 (P)h with diag(p) denoting a diagonal matrix by only retaining the diagonal elements of P. Then, Gaussian BP is carried out with P and h using the updating equations in (6), (7), (9) and (0). Under the normalized P and h, we denote the corresponding message parameters as ṽ a (t) and β a (t). With ṽ a (t), the corresponding belief variance can be calculated using () as σ 2 i (t) = p ii + k N (i) ṽa k i (t). The variance of the original PDF f(x) can be recovered from p ii σ 2 i (t). Moreover, under the normalized P, the sets W, S, Ã and S can also be defined similar to (5), (7), (20) and (42), respectively. By noticing that the message-passing process of Gaussian BP in the factor graph can be translated exactly into the message passing in a computation tree, [24] proposed a convergence condition for belief variance σ 2 i (t) in terms of computation tree. A computation tree T γ, is constructed by choosing one of the variable nodes γ as root node and N (γ) as its direct descendants, connected through factor node f γ,ν ( x γ, x ν ) = exp{ p γν x γ x ν }, where ν is the July 3, X (c) 203 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See

22 0.09/TSP , IEEE Transactions on Signal Processing 22 x x 2 x3 x4 x 3 x 4 x3 x2 x x 4 x x3 x x x x 3 2 x 2 x 4 x 3 x 4 x x 4 Fig. : Illustration of computation tree T(, 4) and computation sub-tree T s (, 4, 4) within the dashed line. index of any descendant. For each descendant ν, the neighbors of ν excluding its parent node are connected as next level descendants, also connected through the corresponding factor nodes. The process is repeated until the computation tree T γ, has + layers of variables nodes (including the root node). The computation tree is completed by further connecting factor node f i ( x i ) = exp{ 2 x2 i + h i x i } to variable node x i for i V. Furthermore, a computation subtree T s (γ, ν, ) can be obtained by cutting the branch of x ν in the first layer from T(γ, ). An example of computation tree T(, 4) as well as computation subtree T s (, 4, 4) corresponding to a fully connected factor graph with four variables are illustrated in Fig.. By assigning to each variable node with a new index i =, 2,, T γ,, T γ, can be viewed as a factor graph for a new PDF, where T γ, denotes the number of variable nodes in T γ,. The new PDF represented by T γ, is f γ, ( x γ, ) exp { 2 xtγ, P γ, x γ, + h Tγ, x γ, }, (47) where x γ, is a T γ, vector; P γ, and h γ, are the corresponding precision matrix and linear coefficient vector, respectively. According to the equivalence between the message-passing process in original factor graph and that in computation tree [6], [24], under the prerequisite of P γ, 0 and ṽ a (0) = 0, it is known that σ 2 γ( ) = [ P γ, ]. (48) July 3, X (c) 203 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See

23 0.09/TSP , IEEE Transactions on Signal Processing 23 Similarly, the computation sub-tree T s (γ, ν, ) can also be viewed as a factor graph with P γ\ν, being the corresponding symmetric precision matrix in the PDF. Furthermore, if (48) holds, we also have σ 2 γ\ν( ) = [ P γ\ν, ], (49) where σ γ\ν 2 ( ) p γγ+ k N (γ)\ν ṽa k γ ( ). In [24, Proposition 25], it is proved that if lim ρ( P γ, I) <, the variance σ γ(t) 2 converges for ṽ a (0) = 0; but if lim ρ( P γ, I) >, Gaussian BP becomes ill-posed. Notice that the limit lim ρ( P γ, I) always exist and are identical for all γ V [24, Lemma 24]. The following lemma and theorem reveal the relation between the proposed convergence condition and that based on computation tree. Lemma 4. If P γ, 0 for all γ V and 0, then ṽ a (t) converges to ṽ a S for ṽ a (0) = 0. Proof: See Appendix B. Now, we give the following theorem. Theorem 6. S if and only if either of the following two conditions holds: ) lim ρ( P γ, I) < ; 2) lim ρ( P γ, I) =, p γγ + k N (γ) ṽa k γ 0 with ṽa (0) = 0, and P ν, 0 for all ν V and 0. Proof: Sufficient Condition: First, we prove that if lim ρ( P γ, I) <, then S =. With the monotonically increasing property of ρ( P γ, I) [24, Lemma 23] and the assumption that its limit is smaller than, we have ρ( P γ, I) ρ u with ρ u lim ρ( P γ, I) (0, ). Expressing in terms of eigenvalues, we have ρ u λ( P γ, ) ρ u, or equivalently ρ u λ( P γ, ) + ρ u. (50) { } For a symmetric P γ,, the eigenvalues of P γ, are, and (50) is equivalent to λ( P γ, ) + ρ u λ( P γ, ) ρ u. (5) July 3, X (c) 203 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See

24 0.09/TSP , IEEE Transactions on Signal Processing 24 Due to 0 < ρ u <, it can be obtained from (5) that λ( P γ, ) > 0, that is, P γ, 0, γ V and 0. (52) Notice that the diagonal element of a symmetric matrix is always smaller or equal to the matrix s maximum eigenvalue [34], that is, [ P γ, ] ii λ max ( P γ, ). Combining with the fact that for a positive definite matrix, its diagonal elements are positive, we obtain 0 < [ P γ, ] ii λ max ( P γ, ). Since the upper bound of (5) applies to all eigenvalues, we obtain 0 < [ P γ, ] ii ρ u. (53) Due to [ P γ, ] = σ γ( ) 2 from (48) and σ γ( ) 2 = p γγ+ k N (γ) ṽa k γ ( ), it can be inferred from (53) that p γγ + k N (γ) ṽa k γ ( ) ρ u for all 0. Due to P γ, 0 from (52), we have P γ, 0, and according to Lemma 4, ṽ a (t) always converges to ṽ a S. Thus, it can be inferred that p γγ + k N (γ) ṽa k γ ρ u. With ρ u <, we obtain p γγ + ṽk γ a > 0, γ V. (54) k N (γ) Combining (54) with ṽ a S, it can be inferred that ṽ a S by the definition, thus S =. Next, we prove the second scenario. Due to P γ, 0, according to Lemma 4, ṽ a (t) converges to ṽ a S, and thereby S and p γγ + k N (γ) ṽa exists. Due to S, according to Lemma 3, it is known that p γγ + k N (γ) ṽa > 0 for all γ V or p γγ + k N (γ) ṽa = 0 for all γ V. From the prerequisite p γγ + k N (γ) ṽa 0, we can infer that p γγ + k N (γ) ṽa > 0 holds for all γ V. Combining with ṽ a S, we can see that ṽ a S, thus S. Necessary Condition: In the following, we will first prove that the precision matrix of a computation tree can always be written into a special structure by re-indexing the nodes in the tree. Based on the special structure, the necessity is proved by the method of induction. First, notice that changing indexing schemes in the computation tree does not affect the positive definiteness of the corresponding precision matrix P γ,. So, we consider an indexing scheme July 3, X (c) 203 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See

25 0.09/TSP , IEEE Transactions on Signal Processing 25 such that the precision matrix P γ, + for 0 can be represented in the form p γγ ã T k \γ, ãt k 2 \γ, ãt k N (γ) \γ, ã k \γ, P k \γ, 0 0 P γ, + =. ã k2 \γ, , (55) ã k N (γ) \γ, 0 0 P k N (γ) \γ, where k i N (γ) for i =, 2,, N (γ) ; ã ki \γ, = [ p ki γ, 0,, 0] T is a vector with length T s (k i, γ, ). Notice that the ( + )-order computation tree T(γ, + ) consists of the root node [ x] γ and a set of -order computation sub-trees T s (k i, γ, ) with the corresponding root node k i. The lower-right block diagonal structure of (55) can be easily obtained by assigning consecutive indices to nodes inside each sub-tree T s (k i, γ, ). Moreover, since there is only one connection from node γ to the root node k i of each sub-tree T s (k i, γ, ), ã ki \γ, contains only one nonzero element p ki γ. Then, by assigning the smallest index to the root node in each T s (k i, γ, ), the only nonzero element in ã ki \γ, must locate at the first position. Therefore, the precision matrix P γ, + can be represented in the form of (55). When = 0, obviously, P γ, = p γγ > 0. Suppose P γ, 0 for some 0, we need to prove P γ, + 0. From (55), it is clear that P γ, + 0 if and only if the following two conditions are satisfied [34, p. 472]: ) Bdiag ( P k \γ,,, P k N (γ) \γ, 0, (56) p γγ ã T k\γ, P k\γ, ãk\γ, > 0, (57) k N (γ) where Bdiag( ) denotes block diagonal matrix with the elements located along the main diagonal. Due to P k, 0 for all k V by assumption, then its sub-matrices P k\γ, 0 for all (k, γ) E, thus the first condition (56) holds. On the other hand, for the second condition (57), we write p γγ k N (γ) ã T k\γ, P k\γ, ãk\γ, a = p γγ k N (γ) b = p γγ + k N (γ) where the equality a = holds since ã k\γ, has only one nonzero p kγ p 2 kγ σ 2 k\γ( ) [ P k\γ, ] = σ k\γ 2 ( ) from (49); and = b holds due to p 2 kγ σ2 k\γ ( ) = ṽ a k γ( + ), (58) in the first element, and p 2 kγ p = γγ+ k N (γ)\ν ṽa k γ ( ) July 3, X (c) 203 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See

26 0.09/TSP , IEEE Transactions on Signal Processing 26 ṽ a k γ ( + ) given in (2). If S, according to Theorem 4, we have p γγ + k N (γ) ṽ a k γ > 0. (59) Due to S S, then S = also implies S. According to (63), it can be inferred that ṽ a (t) ṽ a. Combining with (59), we obtain p γγ + k N (γ) ṽa k γ (t) > 0. Substituting the result into (58), it can be inferred that the second condition (57) holds as well. Thus, we have P γ, 0 for all γ V and > 0. Furthermore, from (59), it is obvious that p γγ + k N (γ) ṽa k γ 0. From P γ, 0, we obtain I (I P γ, ) 0, and hence λ max (I P γ, ) > 0. Since P γ, represents a tree-structured factor graph, it is proved in [24, Proposition 5] that λ max (I P γ, ) = ρ( P γ, I). Therefore, it can be obtained that ρ( P γ, I) <, and thereby lim ρ( P γ, I). (60) Finally, if lim ρ( P γ, I) <, due to P γ, 0 from (52), we have P γ, 0. Together with (54), it can be inferred that under the prerequisite of lim ρ( P γ, I) <, the conditions P γ, 0 and p γγ + k N (γ) ṽa k γ 0 are automatically satisfied. Therefore, if S =, we have either lim ρ( P γ, I) < or lim ρ( P γ, I) =, p γγ + k N (γ) ṽa k γ 0 and P γ, 0. From Theorems 4 and 6, it can be obtained that the variance σγ(t) 2 converges as lim ρ( P γ, I) <, and diverges as lim ρ( P γ, I) >, which are consistent with the results proposed in [24]. Moreover, it can be seen from Theorem 6 that it is not sufficient to determine the convergence of variance σ γ(t) 2 by using lim ρ( P γ, I) = only. This fills in the gap of [24] in the scenario of lim ρ( P γ, I) =. Albeit with similar conclusions to [24], we need to emphasize that the criterion lim ρ( P γ, I) < proposed in [24] is not easy to check in practice due to the infinite dimension, while our condition S can be verified by solving an SDP problem given in Theorem 5. Moreover, the initialization is expanded from the a single choice ṽ a (0) = 0 in [24] to a much larger set ṽ a (0) Ã in this paper. The flexibility on the choice of initialization is useful to accelerate the convergence of variance σ 2 i (t) if the initialization is chosen close to the convergent point. July 3, X (c) 203 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See

27 0.09/TSP , IEEE Transactions on Signal Processing 27 VII. NUMERICAL EXAMPLES In this section, numerical experiments are presented to corroborate the theories in this paper. The example is based on the precision matrices P constructed as, if i = j p ij =, (6) ζ θ mod(i+j,0)+, if i j where ζ is a coefficient indicating the correlation strength among variables; and θ k is the k- th element of the vector θ = [0.3, 0.0, 0.7, 0.05, 0, 0.2, 0.07, 0., 0.02, 0.03] T. The varying of correlation strength ζ induces a series of matrices, and the positive definite constraint P 0 required by a valid PDF is guaranteed when ζ < Fig. 2 illustrates how the optimal solution α of (43) varies with the correlation strength ζ. It can be seen that the optimal solution α always exists and the condition α < 0 holds for all ζ , while no feasible solution exists in the SDP problem (43) when ζ > According to Theorem 4, this means that if ζ , the variance σ 2 i (t) with i V converges to the same point for all initializations v a (0) A under both synchronous and asynchronous schedulings. On the other hand, if ζ > , the variance σ 2 i (t) cannot converge. To verify the convergence of belief variances under ζ , Fig. 3 shows how the variance σ 2 (t) of the -st variable evolves as a function of t when ζ = , which is slightly smaller than It can be observed that the variance σ 2 (t) converges to the same value under both synchronous and asynchronous schedulings and different initializations of v a (0) = 0 and v a (0) = w, where w is the optimal solution of (45). For the asynchronous case, a scheduling with 30% chance of not updating the messages at each iteration is considered. On the other hand, Fig. 4 verifies the divergence of variance σ 2 (t) when ζ = , which is slightly larger than In this figure, synchronous scheduling and the same initializations as that of Fig. 3 are used. It can be seen that σ 2 (t) fluctuates as iterations proceed, and does not show sign of convergence. VIII. CONCLUSIONS In this paper, the necessary and sufficient convergence condition for the variances of Gaussian BP was developed for synchronous and asynchronous schedulings. The initialization set of the proposed condition is much larger than the usual choice of a single point of zero. It is proved that July 3, X (c) 203 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See

28 0.09/TSP , IEEE Transactions on Signal Processing Convergence Metric of Variance X: Y: Convergence Metric: α ζ= Correlation Strength (ζ) Fig. 2: The value of α under different correlation strength ζ. Convergence of Variance with ζ= v a (0) = w v a (0) = 0 asynchronous synchronous σ 2 (t) Number of Iterations (t) Fig. 3: Illustration for the convergence of variance σ 2 (t) under different schedulings and initializations with w being the optimal solution of (45). the convergence condition can be verified efficiently by solving an SDP problem. Furthermore, the relationship between the convergence condition proposed in this paper and the one based on computation tree was established. The relationship fills in a missing piece of the result in [24] where the spectral radius of computation tree is equal to one. Numerical examples were further proposed to verify the proposed convergence conditions. July 3, X (c) 203 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See

29 0.09/TSP , IEEE Transactions on Signal Processing 29 5 Divergence of Variance with ζ= v a (0) = 0 5 σ 2 (t) 0 v a (0) = w Number of Iterations (t) Fig. 4: Illustration for the divergence of variance σ 2 (t) with w being the optimal solution of (45). APPENDIX A PROOF OF LEMMA 2 First, we prove that v a (t) converges for any v a (0) 0 given S. For any w S, according to P5), we have w < 0. Thus, for any v a (0) 0, the relation w v a (0) always holds. Notice that w W due to w S and S W. Applying P2) to w v a (0), we obtain g(w) v a (). Combining it with w g(w) from P6) gives w v a (). On the other hand, substituting v a (0) 0 into (4) gives v a () < 0. (62) Due to v a (0) 0, thus v a () v a (0). Combining it with w v a () gives w v a () v a (0). Applying g( ) to w v a () < v a (0), it can be inferred from P2) that g(w) v a (2) < v a (). Together with w g(w) as claimed by P6), we obtain w v a (2) < v a (). By induction, we can infer that w v a (t + ) v a (t). (63) It can be seen from (63) that v a (t) is a monotonically non-increasing but lower bounded sequence, thus it converges. Next, we prove that v a (t) converges to the same point for all v a (0) 0. For any v a (0) 0, according to P2), applying g( ) on both sides of 0 v a (0) gives g(0) g(v a (0)) = v a (). July 3, X (c) 203 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See

30 0.09/TSP , IEEE Transactions on Signal Processing 30 Combining this relation with (62) leads to g(0) v a () 0. Applying g( ) on this inequality for t more times, it can be obtained from P2) that g (t+) (0) v a (t + ) g (t) (0) for all t 0. By denoting lim t g (t) (0) = v a, it is obvious that v a (t) also converges to the v a. Finally, since v a is a fixed point of g( ), hence v a = g(v a ). From (63), we obtain w v a. Due to w S, thus w W by definition. Applying P), it can be inferred that v a W. Combining with the fact v a = g(v a ), according to the definition of S in (7), we obtain v a S. On the other hand, if S =, according to Theorem, the messages of Gaussian BP passed in factor graph cannot be maintained at the Gaussian form, thus the parameters of messages v a (t) cannot converge in this case. Due to P γ, 0, then P γ, APPENDIX B PROOF OF LEMMA 4 0, hence the diagonal elements [ P γ, ] ii > 0 [34]. With σ γ( ) 2 = [ P γ, ] and = p σ γ(t) 2 γγ + k N (γ) ṽa k γ (t), we have p γγ + ṽk γ(t) a > ṽν γ(t). a (64) k N (γ)\ν Next, we prove ṽ a (t) W for t 0. Obviously, ṽ a (0) = 0 W. Suppose ṽ a (t) W holds for some t 0, and according to the definition of W in (5), this is equivalent to p νν + k N (ν)\γ ṽa k ν (t) > 0. Substituting this inequality into (4) gives ṽa ν γ(t + ) < 0. Then applying ṽν γ(t a + ) < 0 into (64) gives p γγ + k N (γ)\ν ṽa k γ (t + ) > 0 for all (ν, γ) E, or equivalently ṽ a (t + ) W. Thus, we have ṽ a (t) W for all t 0. Finally, substituting ṽ a (0) = 0 into (3) gives ṽ a () < 0, and thereby ṽ a () < ṽ a (0). From ṽ a (t) W and P2), applying g( ) on both sides of ṽ a () < ṽ a (0) for t times gives ṽ a (t + ) ṽ a (t). (65) From ṽ a (0) = 0 and (65), we have ṽ a (t) 0. Putting the result into (64) gives ṽ a ν γ(t) > p γγ for all (ν, γ) E. Together with (65), it can be seen that ṽ a (t) is a monotonically non-increasing and lower bounded sequence. Thus, ṽ a (t) converges to a fixed point ṽ a with ṽ a = g(ṽ a ). Moreover, from ṽ a () < 0 and (65), we have ṽ a < 0. Putting the result into the limiting form of (64) gives p γγ + k N (γ)\ν ṽa k γ > 0 for all (ν, γ) E, or equivalently ṽa W. Combining with ṽ a = g(ṽ a ), we obtain ṽ a S by the definition of S. July 3, X (c) 203 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See

31 0.09/TSP , IEEE Transactions on Signal Processing 3 REFERENCES [] F. R. Kschischang, B. J. Frey, and H. A. Loeliger, Factor graphs and the sum-product algorithm, IEEE Trans. Inf. Theory, vol. 47, pp , 200. [2] H. A. Loeliger, An introduction to factor graphs, IEEE Signal Process. Mag., vol. 2, pp. 28-4, [3] H. A. Loeliger, J. Dauwels, H. Junli, S. Korl, P. Li, and F. R. Kschischang, The Factor graph approach to model-based signal processing, Proc. IEEE, vol. 95, no. 6, pp , June [4] C.M. Bishop, Pattern Recognition and Machine Learning. Springer, Aug [5] D. Koller and N. Friedman. Probabilistic Graphical Models Principles and Techniques. MIT Press, [6] Y. Weiss and W. T. Freeman, Correctness of belief propagation in Gaussian graphical models of arbitrary topology, Neural Comput., vol. 3, pp , 200. [7] Y. Liu, V. Chandrasekaran, A. Anandkumar, and A. S. Willsky, Feedback message passing for inference in Gaussian graphical models, IEEE Trans. Signal Process., vol. 60, no. 8, pp , 202. [8] A. Montanari, B. Prabhakar, and D. Tse, Belief propagation based multiuser detection, in Proc. IEEE Information Theory Workshop, Punta del Este, Uruguay, Mar [9] Q. Guo and D. Huang, EM-based joint channel estimation and detection for frequency selective channels using Gaussian message passing, IEEE Trans. Signal Process., vol. 59, pp , 20. [0] Q. Guo and P. Li, LMMSE turbo equalization based on factor graphs, IEEE J. Sel. Areas Commun., vol. 26, pp. 3-39, [] Y. El-Kurdi, W. J. Gross, and D. Giannacopoulos, Efficient implementation of Gaussian belief propagation solver for large sparse diagonally dominant linear systems, IEEE Trans. Magn., vol. 48, no. 2, pp , Feb [2] O. Shental, P. H. Siegel, J. K. Wolf, D. Bickson, and D. Dolev, Gaussian belief propagation solver for systems of linear equations, in Proc. IEEE Int. Symp. Inform. Theory (ISIT), 2008, pp [3] X. Tan and J. Li, Computationally efficient sparse Bayesian learning via belief propagation, IEEE Trans. Signal Process., vol. 58, pp , 200. [4] V. Chandrasekaran, J. K. Johnson, and A. S. Willsky, Estimation in Gaussian graphical models using tractable subgraphs: a walk-sum analysis, IEEE Trans. Signal Process., vol. 56, pp , [5] N. Boon Loong, J. S. Evans, S. V. Hanly, and D. Aktas, Distributed downlink beamforming with cooperative base stations, IEEE Trans. Inf. Theory, vol. 54, pp , [6] F. Lehmann, Iterative mitigation of intercell interference in cellular networks based on Gaussian belief propagation, IEEE Trans. Veh. Technol., vol. 6, pp , 202. [7] M. Leng and Y. C. Wu, Distributed clock synchronization for wireless sensor networks using belief propagation, IEEE Trans. Signal Process., vol.59, no., pp , Nov. 20. [8] A. Ahmad, D. Zennaro, E. Serpedin, and L. Vangelista, A factor graph approach to clock offset estimation in wireless sensor networks, IEEE Trans. Inf. Theory, vol. 58, no. 7, pp , July 202. [9] M. Leng, T. Wee-Peng, and T. Q. S. Quek, Cooperative and distributed localization for wireless sensor networks in multipath environments, in Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP), 202, pp [20] Y. Song, B. Wang, Z. Shi, K. Pattipati, and S. Gupta, Distributed algorithms for energy-efficient even self-deployment in mobile sensor networks, in IEEE Trans. Mobile Compt., 203, pp. -4. July 3, X (c) 203 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See

0.09/TSP.204.2345635, IEEE Transactions on Signal Processing 32 [2] G. Zhang, W. Xu, and Y. Wang, Fast distributed rate control algorithm with QoS support in Ad Hoc networks, in Proc.

32 0.09/TSP , IEEE Transactions on Signal Processing 32 [2] G. Zhang, W. Xu, and Y. Wang, Fast distributed rate control algorithm with QoS support in Ad Hoc networks, in Proc. IEEE Global Telecommunications Conf. (Globecom), 200. [22] D. Dolev, A. Zymnis, S. P. Boyd, D. Bickson, and Y. Tock, Distributed large scale network utility maximization, in Proc. IEEE Int. Symp. Inform. Theory (ISIT), 2009, pp [23] M. W. Seeger and D. P. Wipf, Variational Bayesian inference techniques, IEEE Signal Process. Mag., vol. 27, no. 6, pp. 89, Nov [24] D. M. Malioutov, J. K. Johnson, and A. S. Willsky, Walk-sums and belief propagation in Gaussian graphical models, J. Mach. Learn. Res., vol. 7, no., pp , [25] C. C. Moallemi and B. Van Roy, Convergence of min-sum message passing for quadratic optimization, IEEE Trans. Inf. Theory, vol. 55, pp , [26] S. Tatikonda and M. I. Jordan, Loopy belief propagation and Gibbs measures, in Proc. 8th Annu. Conf. Uncertainty Artif. Intell. (UAI-02), San Francisco, CA, Aug. 2002, pp [27] D.P. Bertsekas and J.N. Tsitsiklis, Parallel and Distributed Computation: Numerical Methods. Englewood Cliffs, NJ: Prentice-Hall, 989. [28] S. Boyd and L. Vandenberghe, Convex Optimization. Cambridge, U.K.: Cambridge Univ. Press, [29] C. C. Moallemi and B. Van Roy, Convergence of Min-Sum Message Passing for Convex Optimization, IEEE Trans. Inf. Theory, vol. 56, no. 4, pp , 200. [30] L. Vandenberghe and S. Boyd, Semidefinite programming, SIAM Rev., vol. 38, no., pp , Mar [3] M. Grant, S. Boyd, and Y. Ye, cvx: Matlab software for disciplined convex programming. [Online]. Available: http: // boyd/cvx/ [32] J. F. Sturm, Using SeDumi.02, a MATLAB toolbox for optimization over symmetric cones, Optim. Methods Software, no. -22, pp , 999. [33] S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein, Distributed optimization and statistical learning via the alternating method of multipliers, Found. Trends in Mach. Learn., vol. 3, no., pp. -22, 20. [34] R. Horn and C.R. Johnson, Matrix Analysis. New York: Cambridge Univ. Press, 985. Qinliang Su received the B.Eng degree from Chongqing University in 2007 and M.Eng degree from Zhejiang University in 200 (both with honors), respectively. He is currently pursuing the Ph.D degree with the Department of Electrical and Electronic Engineering, the University of Hong Kong, Hong Kong. His current research interests include signal processing for wireless communications and networks; statistical inference and learning; graphical models; distributed and parallel computation; convex optimization theories. July 3, X (c) 203 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See

0.09/TSP.204.2345635, IEEE Transactions on Signal Processing 33 Yik-Chung Wu received the B.Eng. (EEE) degree in 998 and the M.Phil. degree in 200 from the University of Hong Kong (HKU).

33 0.09/TSP , IEEE Transactions on Signal Processing 33 Yik-Chung Wu received the B.Eng. (EEE) degree in 998 and the M.Phil. degree in 200 from the University of Hong Kong (HKU). He received the Croucher Foundation scholarship in 2002 to study Ph.D. degree at Texas A&M University, College Station, and graduated in From August 2005 to August 2006, he was with the Thomson Corporate Research, Princeton, NJ, as a Member of Technical Staff. Since September 2006, he has been with HKU, currently as an Associate Professor. He was a visiting scholar at Princeton University, in summer 20. His research interests are in general area of signal processing and communication systems, and in particular distributed signal processing and communications; optimization theories for communication systems; estimation and detection theories in transceiver designs; and smart grid. Dr. Wu served as an Editor for IEEE Communications Letters, is currently an Editor for IEEE Transactions on Communications and Journal of Communications and Networks. July 3, X (c) 203 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See

Walk-Sum Interpretation and Analysis of Gaussian Belief Propagation

Walk-Sum Interpretation and Analysis of Gaussian Belief Propagation Jason K. Johnson, Dmitry M. Malioutov and Alan S. Willsky Department of Electrical Engineering and Computer Science Massachusetts Institute