Convergence Analysis of the Variance in. Gaussian Belief Propagation

Size: px
Start display at page:

Download "Convergence Analysis of the Variance in. Gaussian Belief Propagation"

Transcription

1 0.09/TSP , IEEE Transactions on Signal Processing Convergence Analysis of the Variance in Gaussian Belief Propagation Qinliang Su and Yik-Chung Wu Abstract It is known that Gaussian belief propagation (BP) is a low-complexity algorithm for (approximately) computing the marginal distribution of a high dimensional Gaussian distribution. However, in loopy factor graph, it is important to determine whether Gaussian BP converges. In general, the convergence conditions for Gaussian BP variances and means are not necessarily the same, and this paper focuses on the convergence condition of Gaussian BP variances. In particular, by describing the messagepassing process of Gaussian BP as a set of updating functions, the necessary and sufficient convergence conditions of Gaussian BP variances are derived under both synchronous and asynchronous schedulings, with the converged variances proved to be independent of the initialization as long as it is chosen from the proposed set. The necessary and sufficient convergence condition is further expressed in the form of a semi-definite programming (SDP) optimization problem, thus can be verified more efficiently compared to the existing convergence condition based on computation tree. The relationship between the proposed convergence condition and the existing one based on computation tree is also established analytically. Numerical examples are presented to corroborate the established theories. Index Terms Gaussian belief propagation, loopy belief propagation, graphical model, factor graph, message passing, sum-product algorithm, convergence. Copyright (c) 204 IEEE. Personal use of this material is permitted. However, permission to use this material for any other purposes must be obtained from the IEEE by sending a request to pubs-permissions@ieee.org. This work was supported in part by the HKU seed Funding Programme, Project No The authors are with the Department of Electrical and Electronic Engineering, The University of Hong Kong, Hong Kong, ( qlsu@eee.hku.hk, ycwu@eee.hku.hk). July 3, X (c) 203 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See

2 0.09/TSP , IEEE Transactions on Signal Processing 2 I. INTRODUCTION In signal processing and machine learning, many problems eventually come to the issue of computing the marginal mean and variance of an individual random variable from a high dimensional joint Gaussian probability density function (PDF). A direct way of marginalization involves the computation of the inverse of the precision matrix in the joint Gaussian PDF. The inverse operation is known to be computationally expensive for a high dimensional matrix, and would introduce heavy communication overhead when carried out in distributed scenarios. By representing the joint PDF with a factor graph [], Gaussian BP provides an alternative to (approximately) calculate the marginal mean and variance for each individual random variable by passing messages between neighboring nodes in the factor graph [2] [5]. It is known that if the factor graph is of tree structure, the belief means and variances calculated with Gaussian BP both converge to the true marginal means and variances, respectively []. For a factor graph with loops, if the belief means and variances in Gaussian BP converge, the true marginal means and approximate marginal variances are obtained [6]. Recently, a novel message-passing algorithm has been proposed in [7] for inference in Gaussian graphical model by choosing a special set of nodes to break loops in the graph. Although this algorithm computes the true marginal means and improves the accuracy of variances, it still needs to use Gaussian BP as an underlying inference algorithm. Thus, in this paper, we focus on the message-passing algorithm of Gaussian BP only. With the ability to provide the true marginal means upon convergence, Gaussian BP has been successfully applied in low-complexity detection and estimation problems arising in communication systems [8] [0], fast solver for large sparse linear systems [], [2], sparse Bayesian learning [3], estimation in Gaussian graphical model [4], etc. In addition, the distributed property inherited from message passing algorithms is particularly attractive for applications requiring distributed implementation, such as distributed beamforming [5], inter-cell interference mitigation [6], distributed synchronization and localization in wireless sensor networks [7] [9], distributed energy efficient self-deployment in mobile sensor networks [20], distributed rate control in Ad Hoc networks [2], and distributed network utility maximization [22]. Moreover, Gaussian BP is also exploited to provide approximate marginal variances for large-scale sparse Bayesian learning in a computationally efficient way [23]. However, Gaussian BP only works under the prerequisite that the belief means and variances July 3, X (c) 203 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See

3 0.09/TSP , IEEE Transactions on Signal Processing 3 calculated from the updating messages do converge. So far, several sufficient convergence conditions have been proposed, which can guarantee the means and variances converge simultaneously [6], [24], [25]. However, in general, the belief means and variances do not necessarily converge under the same condition. It is reported in [24] that if the variances converge, the convergence of the means can always be observed when suitable damping is imposed. Thus, it is important to ensure the variances of Gaussian BP to converge in the first place. In the pioneering work [24], the message-passing procedure of Gaussian BP in a loopy factor graph is equivalently translated into that in a loop-free computation tree [26]. Based on the computation tree, an almost necessary and sufficient convergence condition of variances is derived. It is proved that if the spectral radius of the infinite dimensional precision matrix of the computation tree is smaller than one, the variances converge; and if this spectral radius is larger than one, the Gaussian BP becomes ill-posed. However, this condition is only almost necessary and sufficient since it does not cover the scenario when the spectral radius is equal to one. More crucially, this convergence condition requires the evaluation of the spectral radius of an infinite dimensional matrix, which is almost impossible to be calculated in practice. In this paper, with the fact that the messages in Gaussian BP can be represented by linear and quadratic parameters, the message-passing process of Gaussian BP is described as a set of updating functions of parameters. Based on the monotonically non-decreasing and concave properties of the updating functions, the necessary and sufficient convergence condition of messages quadratic parameters is derived first. Then, with the relation between BP messages quadratic parameters and belief variances, the necessary and sufficient convergence condition of belief variances is developed under both synchronous and asynchronous schedulings. The initialization set under which the variances are guaranteed to converge is proposed as well. The convergence condition derived in this paper is proved to be equivalent to a semi-definite programming (SDP) problem, thus can be easily verified. Furthermore, the relationship between the proposed convergence condition and the existing one based on computation tree is also established. Our result fills in the missing part of the convergence condition in [24] on the case when the spectral radius of the infinite dimensional matrix is equal to one. The rest of this paper is organized as follows. Gaussian BP is reviewed in Section II. Section III analyzes the updating process of Gaussian BP messages, followed by the necessary and sufficient convergence condition of quadratic parameters in the messages in Section IV. The July 3, X (c) 203 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See

4 0.09/TSP , IEEE Transactions on Signal Processing 4 derivation of the necessary and sufficient convergence condition of belief variances is presented in Section V. Relationship between the condition proposed in this paper and the existing one based on computation tree is established in Section VI. Numerical examples are presented in Section VII, followed by conclusions in Section VIII. The following notations are used throughout this paper. For two vectors, x x 2 and x > x 2 mean the inequalities are held in all corresponding elements. Notations λ(g) and λ max (G) represent any eigenvalue and the maximal eigenvalue of matrix G, respectively. Notation ρ(g) means the spectral radius of G. Notations [ ] i and [ ] ij represent the i-th and (i, j)-th element of a vector and matrix, respectively. II. GAUSSIAN BELIEF PROPAGATION In general, a Gaussian PDF can be written as { f(x) exp } 2 xt Px + h T x, () where x = [x, x 2,, x N ] T with N being the number of random variables; P 0 is the precision matrix with p ij being its (i, j)-th element; and h = [h, h 2,, h N ] T is the linear coefficient vector. In many signal processing applications, we want to find the marginalized PDF p(x i ) = f(x)dx dx i dx i+ dx N. For a Gaussian f(x), this can be done by completing the square inside the exponent of () as f(x) exp { 2 (x P h) T P(x P h) }, and then the marginalized PDF is obtained as p(x i ) exp { (x } i µ i ) 2, (2) with the mean µ i = [P h] i and the variance σ 2 i = [P ] ii. However, the required matrix inverse is computationally expensive for a large dimensional P, and introduces heavy communication overhead to the network when the information p ij is located distributedly. On the other hand, the Gaussian PDF in () can be expanded as f(x) N 2σ 2 i i= f i (x i ) N N j= k=j+ where f i (x i ) = exp { p ii 2 x2 i + h i x i } and fjk (x j, x k ) = exp { p jk x j x k }. Based on this expansion, a factor graph can be constructed by connecting each variable x i with its associated factors f i (x i ) and f ij (x i, x j ). It is known that Gaussian BP can be applied on this factor graph by passing f jk (x j, x k ), The factor graph is assumed to be connected in this paper. July 3, X (c) 203 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See

5 0.09/TSP , IEEE Transactions on Signal Processing 5 messages among neighboring nodes to obtain the true marginal mean and approximate marginal variance for each variable. In Gaussian BP, the departing and arriving messages corresponding to variables i and j are updated as m d i j (x i, t) f i (x i ) m a i j(x j, t + ) k N (i)\j m a k i (x i, t), (3) f ij (x i, x j ) m d i j (x i, t) dx i, (4) where (i, j) E with E {(i, j) p ij 0, for i, j V} being the set of index pairs of any two connected variables and V {, 2,, N} being the set of indices of all variables; t is the time index; N (i) is the set of indices of neighboring variable nodes of node i, and N (i)\j is the set N (i) but excluding the node j. After obtaining the messages m a k i (x i, t), the belief at variable node i is equal to b i (x i, t) f i (x i ) k N (i) m a k i (x i, t). (5) It has been proved that b i (x i, t) always converges to p(x i ) in (2) exactly if the factor graph is of tree structure [], [6]. For loopy factor graph, if b i (x i, t) converges, the mean of b i (x i, t) will converge to the true mean µ i, while the converged variance is an approximation of σi 2 [6]. { Assume the arriving message is in Gaussian form of m a i j(x j, t) exp va i j (t) x 2 2 j + βi j(t)x a j }, where vi j(t) a and βi j(t) a are the arriving precision and arriving linear coefficient, respectively. { Inserting it into (3), we obtain m d i j(x i, t) exp vd i j (t) x 2 2 i + βi j(t)x d i }, where vi j(t) d = p ii + vk i(t), a (6) β d i j(t) = h i + k N (i)\j k N (i)\j β a k i(t) (7) are the departing precision and linear coefficient, respectively. Furthermore, substituting the departing message m d i j(x i, t) into (4), we obtain { p 2 m a ij i j(x j, t+) exp 2vi j d (t)x2 j p } ijβi j(t) d vi j d (t) x j ( ) 2 i j(t) exp vd x i βd i j(t) p ij x j 2 vi j d (t) dx i. (8) July 3, X (c) 203 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See

6 0.09/TSP , IEEE Transactions on Signal Processing 6 { If vi j(t) d > 0, the integration equals to a constant, and thus m a p 2 i j(x j, t+) exp ij 2vi j d (t)x2 j p ijβi j d (t) x vi j d (t) j }. Therefore, vi j(t a + ) and βi j(t a + ) are updated as p2 ij vi j(t a v + ) = d (t), if vd i j(t) > 0 i j (9) not defined, otherwise, p ijβi j d (t), if v βi j(t a v i j(t) d > 0 i j + ) = d (t) (0) not defined, otherwise, where not defined arises since under the case of v d i j(t) 0, the integration in (8) becomes infinite and the message m a i j(x j, t + ) loses its Gaussian form. After obtaining vk i a (t), the variance of belief at each iteration is computed as σ 2 i (t) = p ii + () k N (i) va k i (t). From (9) and (0), it can be seen that the messages of Gaussian BP maintain the Gaussian form only when v d i j(t) > 0 for all t 0. Therefore, we have the following lemma. Lemma. The messages of Gaussian BP are always in Gaussian form if and only if v d i j(t) > 0 for all t 0. III. ANALYSIS OF THE MESSAGE-PASSING PROCESS In this section, we first analyze the updating process of v a i j(t), and then derive the condition to guarantee the BP messages being maintained at Gaussian form. A. Updating Function of v a i j(t) and Its Properties Under the Gaussian form of messages, substitutting (6) into (9) gives v a i j(t + ) = By writing (2) into a vector form, we obtain p 2 ij p ii + (2) k N (i)\j va k i (t). v a (t + ) = g(v a (t)), (3) where g( ) is a vector-valued function containing components g ij ( ) with (i, j) E arranged in ascending order first on j and then on i, and g ij ( ) is defined as p 2 ij g ij (w) p ii + k N (i)\j w ; (4) ki July 3, X (c) 203 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See

7 0.09/TSP , IEEE Transactions on Signal Processing 7 v a (t) and w are vectors containing elements v a i j(t) and w ij, respectively, both with (i, j) E arranged in ascending order first on j and then on i. Moreover, define the set { W w pii + } w ki > 0, (i, j) E. (5) k N (i)\j Now, we have the following proposition about g( ) and W. Proposition. The following claims hold: P) For any w W, if w w 2, then w 2 W; P2) For any w W, if w w 2, then g(w ) g(w 2 ); P3) g ij (w) is a concave function with respect to w W. Proof: Consider two vectors w and ŵ in W, which contain elements w ki and ŵ ki with (k, i) E arranged in ascending order first on k and then on i. For any w W, according to the definition of W in (5), we have p ii + k N (i)\j w ki > 0. Then, if w ŵ, it can be easily seen that p ii + k N (i)\j ŵki > 0 as well. Thus, we have ŵ W. The first-order derivative of g ij (w) with respect to w ki for k N (i)\j is computed to be g ij w ki = p 2 ij (p ii + k N (i)\j w 2 > 0. (6) ki) Thus, g ij (w) is a continuous and strictly increasing function with respect to the components w ki for k N (i)\j with w W. Hence, we have if w w 2, then g(w ) g(w 2 ). To prove the third property, consider the simple function p2 ij p ii +r under the domain of r > p ii first. It can be easily derived that the second order derivative of p2 ij is p ii +r (p ii < 0, where the +r) 3 inequality holds due to p ii + r > 0. Thus, p2 ij is a concave function with respect to r > p p ii +r ii. Since g ij (w) = p 2 ij p ii + k N (i)\j w ki 2p2 ij is the composition of the concave function p2 ij p ii +r and the affine mapping k N (i)\j w ki, the composition function g ij (w) is also concave with respect to { k N (i)\j w ki > p ii [28, p. 79]. Due to W w pii + } k N (i)\j w ki > 0, (i, j) E as defined in (5), then w W implies that k N (i)\j w ki > p ii. Thus, we can infer that g ij (w) is a concave function with respect to w W. Notice that convexity is also exploited in [29] to derive the convergence condition for a general min-sum message-passing algorithm, which reduces to Gaussian BP when it is applied to the particular quadratic function 2 xt Px h T x. The convexity in [29] means that the quadratic function can be decomposed into summation of convex functions, which is different from the convexity in the updating function g ij (w) here. July 3, X (c) 203 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See

8 0.09/TSP , IEEE Transactions on Signal Processing 8 B. Condition to maintain the Gaussian Form of Messages First, define the following set 2 S {w W w g(w)}. (7) With notations g (t) (w) g ( g (t ) (w) ) and g (0) (w) w, the following proposition can be established. Proposition 2. The set S has the following properties: P4) S is a convex set; P5) If s S, then s < 0; P6) If s S, then g (t) (s) S and g (t) (s) g (t+) (s) for all t 0. Proof: As g ij (w) is a concave function for w W according to P3), thus g ij (w) w ij is also concave for w W. The set of w W which satisfies the condition of g ij (w) w ij 0 is a convex set [28]. Thus, S = {w w g(w) with w W} = {g ij (w) w ij 0 for all (i, j) E with w W} is also a convex set. If s S, we have s g(s) and s W. According to the definition of W in (5), p ii + k N (i)\j s ki > 0. Putting this fact into the definition of g ij ( ) in (4), it is obvious that g(s) < 0. Therefore, we have the relation that s g(s) < 0 for all s S. Finally, if s S, we have s g(s) and s W. Hence, g(s) W according to the P). Applying g( ) on both sides of s g(s) and using P2), we obtain g(s) g (2) (s). Furthermore, since g(s) W and g(s) g (2) (s), we also have g(s) S. By induction, we can prove in general that g (t) (s) S and g (t) (s) g (t+) (s) for all t 0. Now, we give the following theorem. Theorem. There exists at least one initialization v a (0) such that the messages of Gaussian BP are maintained at Gaussian form for all t 0 if and only if S. Proof: Sufficient Condition: 2 The simpler symbol S is reserved for later use. July 3, X (c) 203 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See

9 0.09/TSP , IEEE Transactions on Signal Processing 9 If S, by choosing an initialization v a (0) S and applying P6), it can be inferred that v a (t) S for all t 0. Moreover, according to the definitions of W in (5) and S in (7), we have S W, and thereby v a (t) W for all t 0, or equivalently p ii + k N (i)\j va k i (t) > 0 for all (i, j) E according to the definition of set W in (5). Due to v d i j(t) = p ii + k N (i)\j va k i (t) as given in (6), we obtain vd i j(t) > 0 for all (i, j) E and t 0. According to Lemma, it can be inferred that the messages are maintained at Gaussian form for all t 0. Necessary Condition: We prove this necessary condition by contradiction. When S =, suppose that there exists a v a (0) such that the messages are maintained at Gaussian form for all t 0. According to Lemma, we can infer that v i j(t) d = p ii + k N (i)\j va k i (t) > 0 for all (i, j) E, or equivalently v a (t) g (t) ( v a (0)) W for all t 0 from the definition of W in (5). Now, choose another initialization v a (0) satisfying both v a (0) v a (0) and v a (0) 0. Due to v a (0) W and v a (0) v a (0), by using P2), we have g ( v a (0)) g (v a (0)), that is, v a () v a (). Furthermore, substituting v a (0) 0 into (4) gives v a () 0, and thereby v a () v a (0). Combining with v a () v a () leads to v a () v a () v a (0). Due to the assumption v a () W, by applying P2) to v a () v a () v a (0), we obtain v a (2) v a (2) v a (). Combining with the fact v a () 0 as proved above, we have v a (2) v a (2) v a () 0. By induction, it can be derived that v a (t + ) v a (t + ) v a (t) 0, t. (8) Then, from the definition of W in (5), the assumption v a (t) W is equivalent to p ii + k N (i)\j va k i (t) > 0 for all (i, j) E, where va k i (t) are the elements of va (t) with (k, i) E arranged in the same order as v a (t). By rearranging the terms in p ii + k N (i)\j va k i (t) > 0, we obtain v a γ i(t) > p ii k N (i)\j,γ va k i (t). Applying va (t) 0 shown in (8) into this inequality, it can be inferred that v a γ i(t) > p ii k N (i)\j,γ va k i (t) p ii, that is, v a γ i(t) > p ii for all (γ, i) E. Combining with v a (t) v a (t) shown in (8), for all (i, j) E, we have v a i j(t) > p jj. (9) It can be seen from (8) and (9) that v a (t) is a monotonically non-increasing and lower bounded sequence. Thus, v a (t) must converge to a vector v a, that is, v a = g(v a ). On the other hand, substituting (2) into (9) gives p 2 ij p ii + k N (i)\j va k i (t) > p jj. Due to v a (t) W, it can be inferred from P) and (8) that v a (t) W, or equivalently p ii + k N (i)\j va k i (t) > July 3, X (c) 203 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See

10 0.09/TSP , IEEE Transactions on Signal Processing 0 0. Together with the fact p jj > 0, we obtain p ii + k N (i)\j va k i (t) > p2 ij p jj. Since v a (t) converges to v a, taking the limit on both sides of the inequality gives p ii + k N (i)\j va k i p2 ij p jj. Hence, p ii + k N (i)\j va k i > 0. From the definition of W in (5), we have va W. Combining with v a = g(v a ), according to the definition of S in (7), it is clear that v a S. This contradicts with the prerequisite S =. Therefore, S cannot be. IV. NECESSARY AND SUFFICIENT CONVERGENCE CONDITION OF MESSAGE PARAMETER v a (t) According to Theorem, Gaussian form of messages may only be maintained for some v a (0). Thus, the choice of initialization v a (0) also plays an important role in the convergence of v a (t). A. Synchronous Scheduling In this section, we will investigate the convergence condition of v a (t) under synchronous scheduling, which means that v a (t) is updated as v a (t + ) = g(v a (t)). The necessary and sufficient convergence condition under nonnegative initialization set is derived first. Then, the convergence condition is proved to hold under a more general initialization set. Lemma 2. v a (t) converges to the same point v a S for all v a (0) 0 if and only if S. Proof: See Appendix A. Obviously, the initialization v a (0) = 0 used by the convergence condition in [24] is implied by the proposed initialization set v a (0) 0 in Lemma 2. In the following, we show that the initialization set could be further expanded to the set { } w0 A {w 0} {w w 0 w 0 int(s )} w w 0 S and w 0 = lim g (t) (0), (20) t where int(s ) means the interior of S. Notice that the set A consists of the union of three parts. The first part {w 0} corresponds to the initialization set given in Lemma 2. For the case S but int(s ) =, the second part is empty. Also, according to Lemma 2, we have lim t g (t) (0) = v a S. Thus, the third part of set A reduces to {w v a }. Due to the fact that v a S implies v a < 0 given by P5), we have {w 0} {w v a }. For the case int(s ), according to (63), we have w 0 v a for any w 0 int(s ). This implies {w v a } {w w 0 w 0 int(s )}. Obviously, in this case, we also have {w 0} {w w 0 w 0 int(s )}. July 3, X (c) 203 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See

11 0.09/TSP , IEEE Transactions on Signal Processing Therefore, if S, the set A is always larger than the set {w 0}. Theorem 2. v a (t) converges to the same point v a for all v a (0) A if and only if S. Proof: The first part {w 0} of A corresponds to the case covered in Lemma 2, thus has already been established. To prove the sufficiency, we consider the case int (S ) and the case int(s ) = but S, respectively. For the case int (S ), it is known from (20) that A = {w w 0 w 0 int(s )} in this case. To prove the theorem, we will first prove that v a (t) converges to the same point v a for all v a (0) int(s ). Then, notice that for any v a (0) {w w 0 w 0 int(s )}, it can be upper bounded by a point from {w 0} and lower bounded by a point from int(s ). Because both the upper bound and lower bound converge to v a, we can easily extend the convergence to the much larger set v a (0) {w w 0 w 0 int(s )}. First, we prove that v a (t) converges to v a for all v a (0) int(s ). To do this, we establish the following three facts first. ) For any v a (0) int(s ), according to P6), we have v a (t) v a (t + ) and v a (t) S. With P5), we also have v a (t) < 0. It can be seen that v a (t) is a monotonically non-decreasing and upper bounded sequence, thus v a (t) converges to a fixed point, which is denoted as v a. On the other hand, let v a (0) 0 correspond to a point in the first part of A, due to v a (0) < 0, we have v a (0) < v a (0). By using P2), applying g( ) to v a (0) < v a (0) for t times gives v a (t) v a (t). Since v a (t) and v a (t) converge to v a and v a, respectively, taking the limit on both sides of v a (t) v a (t) gives v a v a. Due to v a (0) int(s ), the definition of S in (7) implies that v a (0) < g( v a (0)), that is, v a (0) < v a (). Combining with v a (t) v a (t + ) established above, we obtain v a (0) < v a. Together with the fact v a v a, we obtain v a (0) < v a v a. (2) 2) For a v a (0) int (S ), construct a vector ˆv a (0) = λ v a (0) + ( λ)v a, (22) where λ (0, ). Since g ij (w) is a concave function over w S and { v a (0), v a } S, we have g(λ v a (0) + ( λ)v a ) λg( v a (0)) + ( λ)g(v a ), or equivalently g(λ v a (0) + ( λ)v a ) λ v a () + ( λ)v a. According to P2), applying g( ) to this inequality gives July 3, X (c) 203 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See

12 0.09/TSP , IEEE Transactions on Signal Processing 2 g (2) (λ v a (0) + ( λ)v a ) g(λ v a () + ( λ)v a ). Because of the concave property of g ij (w), it is known that g(λ v a () + ( λ)v a ) λ v a (2) + ( λ)v a. Thus, we have g (2) (λ v a (0) + ( λ)v a ) λ v a (2) + ( λ)v a. By induction, it can be inferred that g (t) (λ v a (0)+( λ)v a ) λ v a (t)+( λ)v a for all t 0. With ˆv a (0) = λ v a (0)+( λ)v a given by (22), we obtain ˆv a (t) λ v a (t) + ( λ)v a, (23) where ˆv a (t) g (t) (ˆv a (0)). Since { v a (0), v a } S and S is a convex set as indicated in P4), we have ˆv a (0) = λ v a (0) + ( λ)v a S. According to P6), ˆv a (t) g (t) (ˆv a (0)) is a monotonically non-decreasing sequence. Furthermore, according to P5), ˆv a (t) is also upper bounded by 0. Thus, ˆv a (t) converges to a fixed point, which is denoted as ˆv a. Taking limits on both sides of (23) gives ˆv a λ v a + ( λ)v a = v a + ( λ)(v a v a ). (24) 3) From v a (0) < v a v a given by (2), we have v a v a (0) > 0 and v a v a (0) > 0. If λ (0, ) and is close to, the relation ( λ)(v a v a (0)) < v a v a (0) always holds, that is, λ (0, ), s.t. ( λ)(v a v a (0)) < v a v a (0). (25) Rearranging the terms in (25) gives λ (0, ), s.t. λ v a (0) + ( λ)v a < v a. Due to λ v a (0) + ( λ)v a = ˆv a (0) given in (22), thus (25) implies that there always exists a λ (0, ) such that ˆv a (0) < v a. Now, applying g( ) to ˆv a (0) < v a for t times, from P2), we obtain g (t) (ˆv a (0)) g (t) ( v a ), or equivalently ˆv a (t) v a. Taking the limit on both sides of ˆv a (t) v a, it can be inferred that λ (0, ), s.t. ˆv a v a. (26) Now, we make use of (2), (24) and (26) to prove that the convergence limit v a is identical to the convergence limit v a. Combining (2) and (26) gives λ (0, ), s.t. ˆv a v a v a. (27) From the second inequality of (27), we have ( λ)(v a v a ) 0. Substituting this relation into (24), we can infer that ˆv a v a. Comparing this result to the first inequality of (27), we July 3, X (c) 203 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See

13 0.09/TSP , IEEE Transactions on Signal Processing 3 obtain ˆv a = v a. Due to ˆv a = v a, (24) becomes ( λ)(v a v a ) 0. For λ (0, ), the relation is equivalent to v a v a. Combining it with the second inequality in (27), we obtain v a = v a. (28) Next, we will extend the initialization set to the whole set A = {w w 0 w 0 int(s )}. For any v a (0) A, we can always find a vl a(0) int(s ) and a vu(0) a 0 such that vl a(0) v a (0) vu(0). a According to P2), applying g( ) to this inequality repeatedly gives g (t) (vl a(0)) g (t) ( v a (0)) g (t) (vu(0)). a Since vl a(t) = g(t) (vl a(0)) and va u(t) = g (t) (vu(0)) a both converge to v a, thus v a (t) converges to v a, too. Now, consider the case int(s ) = but S. In this case, according to the discussion after (20), A = {w v a }. For any v a (0) A, we can always find a vu(0) a {w 0} such that v a v a (0) vu(0). a Applying g( ) to this inequality for t times, from P2), we obtain v a g (t) ( v a (0)) g (t) (vu(0)). a Since vu(t) a converges to v a, thus v a (t) also converges to v a. On the other hand, if S =, according to Theorem, the messages of Gaussian BP passed in factor graph lose the Gaussian form, thus the parameters of messages v a (t) cannot converge in this case. B. Asynchronous Scheduling In this section, convergence conditions under asynchronous scheduling schemes will be investigated. Modified from the synchronous updating equation in (2), arriving precision in asynchronous scheduling is updated as p 2 ij vi j(t a + ) = p ii + k N (i)\j va k i (τ k i a (t)), t T i j (29) where T i j is the set of time instants at which vi j(t) a are updated; 0 τk i a (t) t is the last updated time instant of vk i a (t) at time t. In this paper, we only consider the totally asynchronous scheduling defined as follows. Definition. (Totally Asynchronous Scheduling) [27]: The sets T i j are infinite, and τ a i j(t) satisfies 0 τ a i j(t) t and lim t τ a i j(t) = for all (i, j) E. In order to prove the convergence of v a (t) under asynchronous scheduling given by Definition, we need to find a sequence of nonempty sets {U(m)} for m 0 such that the following two sufficient conditions are satisfied [27]: July 3, X (c) 203 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See

14 0.09/TSP , IEEE Transactions on Signal Processing 4 Synchronous Convergence Condition: the sequence of sets satisfy ) U(m + ) U(m); 2) g(w) U(m + ) for w U(m); 3) U(m) converges to a fixed point; Box Condition: U(m) can be written as product of subsets of individual components U i j (m) for all (i, j) E. While the formal proof of asynchronous convergence under these conditions are provided in [27], we give a brief intuitive explanation below. Suppose when t t m, v a i j(t) U i j (m) for all (i, j) E, or expressed using the Box Condition v a (t) U(m) for t t m. Now, consider the updating of an individual component vi j(t). a Due to lim t always find a t (i j) m+ > t m such that when t t (i j) τk i a (t) =, we can m+, we have τ a k i (t) t m and thereby v a k i (τ a k i (t)) U k i(m) for all k N (i)\j. Putting v a k i (τ a k i (t)) into function g ij( ) in (29) and applying condition 2) in Synchronous Convergence Condition, we have v a i j(t + ) U i j (m + ) for t t (i j) m+. Thus, by choosing t m+ = max t (i j) m+, we have when t t m+, (i,j) E vi j(t) a U i j (m + ) for all (i, j) E. Further due to conditions ) and 3) in Synchronous Convergence Condition, we know that v a (t) will gradually converge to a fixed point. Theorem 3. v a (t) converges for any v a (0) A and all choices of schedulings if and only if S. Proof: We first prove the sufficient condition. Since S, as given in the proof of Theorem 2, for any v a (0) A, there always exist an element v a (0) int (S ) {v a } and ˆv a (0) 0 such that v a (0) v a (0) ˆv a (0). Construct a sequence of sets as U(m) = {w v a (m) w ˆv a (m)}. (30) Since v a (0) int (S ) {v a }, we have v a (m) v a (m + ) according to P6). Further, for any ˆv a (0) 0, putting ˆv a (0) into (4), we have ˆv a () < 0, thereby ˆv a () < ˆv a (0). Since ˆv a (m) converges according to Theorem 2, then ˆv a (m) W for all m 0. Thus, according to P2), by applying g( ) to ˆv a () < ˆv a (0) and by induction, we have ˆv a (m + ) ˆv a (m). By exploiting v a (m) v a (m + ), ˆv a (m + ) ˆv a (m) and the definition of U(m) in (30), we have U(m + ) U(m). (3) Now, for any w U(m), by applying g( ) to v a (m) w ˆv a (m), we obtain v a (m + ) g(w) ˆv a (m + ). Thus, from the definition of U(m) in (30), we have the relation g (w) U(m + ), w U(m). (32) July 3, X (c) 203 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See

15 0.09/TSP , IEEE Transactions on Signal Processing 5 Furthermore, according to Theorem 2, we have lim m va (m) = lim m ˆva (m) = v a, hence U(m) will converge to the fixed point v a. Thus, the set U(m) satisfies the Synchronous Convergence Condition [27]. Furthermore, according to the definition of U(m) in (30), v a (m) U(m) means v i j(m) a vi j(m) a ˆv i j(m) a for all (i, j) E, that is, vi j(m) a U i j (m) with U i j (m) = {w ij v i j(m) a w ij ˆv i j(m)}. a Thus, U(m) can be represented in form of product of subsets of individual components U i j (m) for all (i, j) E, and thereby the Box Condition is satisfied. Hence, the sufficient condition is proved. On the other hand, if S =, according to Theorem, the messages of Gaussian BP passed in factor graph lose the Gaussian form, thus the parameters of messages v a (t) cannot converge in this case. From Theorems 2 and 3, it can be seen that the convergence condition of v a (t) under asynchronous scheduling is the same as that under synchronous scheduling. V. NECESSARY AND SUFFICIENT CONVERGENCE CONDITION OF BELIEF VARIANCE σ 2 i (t) Although the convergence conditions for v a (t) are derived in the last section, what we are really interested in is the convergence condition for the belief variance σ 2 i (t). According to (), the variance σi 2 (t) and v a (t) are related by σi 2 (t) = p ii + k N (i) va k i (t), or equivalently = p σi 2(t) ii + k N (i) va k i (t). A. Convergence Condition first. In order to investigate the convergence condition of σ 2 i (t), we present the following lemma Lemma 3. If S and v a (0) A, then lim i V. t σi 2 t σi 2 (t) > 0 for all i V or lim (t) = 0 for all Proof: In the proof, we first consider the special case under tree-structured factor graph as well as the impact of initializations. Then, for the case of loopy factor graph, we prove that there exists at least one node γ V such that lim lim t σγ(t) 2 t σγ(t) 2 0. At last, we demonstrate that either > 0 for all γ V or lim = 0 for all γ V will hold. t σγ(t) 2 First, if the factor graph is of tree structure, according to properties of BP in a tree structured factor graph, it is known that lim t σ 2 γ(t) = [P ] γγ []. Due to P 0, we have lim t σ 2 γ(t) = July 3, X (c) 203 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See

16 0.09/TSP , IEEE Transactions on Signal Processing 6 [P ] γγ > 0, and thereby lim t σγ(t) 2 graph containing loops. > 0 for all γ V. In the following, we focus on the factor If S, according to Theorem 2, v a (t) converges to the same point for any initialization v a (0) A. Due to = p σi 2(t) ii + k N (i) va k i (t), then lim always exists and is identical t σi 2 (t) for all v a (0) A. Therefore, we only need to consider a specific initialization, e.g. v a (0) = 0. Second, we prove lim 0 for at least one node γ V. Due to the factor graph containing t σγ(t) 2 loops, we can always find a node γ such that N (γ) 2, where means the cardinality of a set. Denote nodes i, j as two neighbors of node γ, that is, i, j N (γ). Since v a (0) = 0, from (63), we have v a v a (t). Applying this relation to σ 2 γ(t) = p γγ + k N (γ)\i va k γ (t) + va i γ(t) gives σ 2 γ(t) p γγ + k N (γ)\i By further using v a v a (t), we can easily obtain p γγ + k N (γ)\i It can be further shown that p γγ + k N (γ)\j v a k γ + v a j γ(t) = p γγ + k N (γ)\j va k γ p jj + k N (j)\γ va k j (t ) a = b = p γγ + k N (γ)\j va k γ p jj + k N (j)\γ va k j (t ) p γγ + k N (γ)\j va k γ p jj + k N (j)\γ va k j (t ) where the equality a = holds because v a k γ(t) + v a i γ = p γγ + p γγ + v a k γ(t) + v a i γ. (33) k N (γ)\i,j k N (γ)\j v a k γ(t) + v a i γ + v a j γ(t) v a k γ + v a j γ(t). (34) p jj + vk j(t a ) + p jj + k N (j)\γ va k j (t ) p γγ + vj γ(t) a k N (j)\γ k N (γ)\j va k γ p jj + p 2 vk j(t a γj ) p γγ + k N (j)\γ k N (γ)\j va k γ p jj + vk j(t a ) + v a, (35) k N (j)\γ and = b is obtained by using the limiting form of (2) vγ j a = γ j (p jj + k N (j)\γ vak j (t ) ) v a j γ(t) = p 2 γj from (2); p 2 γj p γγ + k N (γ)\j va k γ. Combining July 3, X (c) 203 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See

17 0.09/TSP , IEEE Transactions on Signal Processing 7 (34) and (35) gives p γγ + k N (γ)\i v a k γ(t)+v a i γ p γγ + k N (γ)\j va k γ p jj + k N (j)\γ va k j (t ) p jj + k N (j)\γ v a k j(t ) + v a Due to v a S from Lemma 2, then v a W. From the definition of W in (5), for all (ν 0, ν ) E, we have p ν0 ν 0 + γ j (36) k N (ν 0 )\ν v a k ν 0 > 0. (37) By using (37) and v a (t) v a implied by (63), we can infer p ν0 ν 0 + k N (ν 0 )\ν v a k ν 0 (t) > 0 for all (ν 0, ν ) E, and thereby p ν0 ν 0 + k N (ν 0 )\ν v a k ν 0 p ν ν + k N (ν )\ν 0 v a k ν (t) > 0. (38) Since the factor graph contains loops, there always exists a walk (j 0, j,, j t, j t, i) with j t = γ such that all (j l, j l ) E and N (j l ) 2 for l = 0,,, t. Then, using (36) repeatedly on such a walk gives p γγ + k N (γ)\i t vk γ(t) a + vi γ a l=0 p jl+ j l+ + k N (j l+ )\j l vk j a l+ p jl j l + p k N (j l )\j l+ vk j a j0 j l (l) 0 + = ( ) t p j0 j 0 + vj a j 0 l=0. k N (j 0 )\j v a k j 0 (0) + v a p jl+ j l+ + k N (j l+ )\j l v a k j l+ p jl j l + k N (j l )\j l+ v a k j l (l), (39) where the last equality holds due to v a (0) = 0. Due to N (j 0 ) 2, there exists a node ν such that j N (j 0 )\ν. By exploiting the fact that v a < 0 due to v a S, we obtain p j0 j 0 + v a j j 0 p j0 j 0 + k N (j 0 )\ν va k j 0. Combining with (37), we can infer that p j0 j 0 + v a j j 0 > 0. (40) Putting (38) and (40) into (39), we obtain p γγ + k N (γ)\i va k γ (t) + va i γ > 0. Applying this result to (33) gives σ 2 γ(t) > 0 for all t 0, and thereby lim t 0. σγ(t) 2 At last, we will prove that lim > 0 for all γ V or lim = 0 for all γ V. Notice t σγ(t) 2 t σγ(t) 2 that the relation in (35) holds for any (γ, j) E. Thus, taking the limit on both sides of (35) and recognizing that lim = p t σγ 2(t) γγ + k N (γ) va k γ obtain lim t σγ(t) = p γγ + 2 p jj + k N (j)\γ va k N (γ)\j va k γ and lim t k j lim σ 2 j (t) = p jj + k N (j) va k j, we t σj 2 (t). (4) j j 0 July 3, X (c) 203 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See

18 0.09/TSP , IEEE Transactions on Signal Processing 8 Since p γγ + k N (γ)\j va k γ > 0 for all (γ, j) E given in (37), according to (4), if lim 0, then lim lim t t σj 2 σj 2 (t) > 0 for all nodes j N (γ), and vice versa. Similarly, if lim t t σγ 2 > σγ(t) 2 (t) = 0, then (t) = 0 for all nodes j N (γ), and vice versa. Recall that we have proved there exists a node γ with lim t σγ(t) 2 for all i V or lim t σi 2 t σi 2 0. Therefore, for a fully connected factor graph, either lim t = 0 for all i V. (t) > 0 σi 2(t) Obviously, if lim 0, then lim (t) t σ2 i (t) exists, and hence the convergence condition of σi 2 (t) is the same as that of σi 2 (t). However, Lemma 3 reveals that the scenario lim = 0 t σi 2(t) cannot be excluded. Thus, we have the following theorem. Theorem 4. σ 2 i (t) with i V converges to the same positive value for all v a (0) A under synchronous and asynchronous schedulings if and only if S =, where { S w w S and p ii + w ki > 0, i V k N (i) }. (42) Proof: First, we prove if S =, then σ 2 i (t) converges for any v a (0) A. From the definition of S in (42), if S, then S. According to Theorems 2 and 3, v a (t) converges to v a for all v a (0) A under both synchronous and asynchronous schedulings. Thus, lim = t σi 2(t) p ii + k N (i) va k i (t) will converge to p ii + k N (i) va k i. Due to S =, there exists a w such that w S and p ii + k N (i) w ki > 0 for all i V. Due to w S, it can be inferred from (63) that w v a. Putting this result into p ii + k N (i) w ki > 0, we obtain p ii + k N (i) va k i > 0 and its inverse exists. Thus, if S, the variance σ 2 i (t) = p ii + k N (i) va k i (t) p ii + k N (i) va k i > 0. converges to Next, we prove by contradiction that if σ 2 i (t) converges for v a (0) A, then S =. Suppose that S =. This assumption contains two cases: ) S = ; 2) S and p ii + k N (i) w ki 0 for some i V. First, consider the case S =. Due to S =, according to Theorem, the messages of Gaussian BP passed in factor graph cannot maintain Gaussian form, hence v a (t) becomes undefined. Therefore, the variance σ 2 i (t) cannot converge in this case, which contradicts with the prerequisite that σ 2 i (t) converges. Second, consider the case S. Due to S, according to Theorems 2 and 3, v a (t) converges to v a S for any v a (0) A. According to Lemma 3, we further have lim lim t σ 2 (t) t k i σ 2 (t) = p ii + k N (i) va = p ii + k N (i) va k i = 0 for all i V. If p ii + k N (i) va k i > 0 for all i V or > 0 for all i V, combining with v a S, we can infer from the definition of S in (42) that v a S, which July 3, X (c) 203 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See

19 0.09/TSP , IEEE Transactions on Signal Processing 9 contradicts with the assumption S =. On the other hand, if p ii + k N (i) va k i = 0 for all i V, obviously, the variance σi 2 (t) = p ii + cannot converge, which contradicts with k N (i) va k i (t) the prerequisite that σi 2 (t) converges. In summary, we have proved that if σi 2 (t) converges for all v a (0) A, then S = cannot hold, or equivalently we must have S. Remark : Due to the belief mean µ i (t) = h i+ k N (i) βa k i (t), if the convergence condition σi 2(t) of belief variance σ 2 i (t) is already known, the convergence of µ i (t) is determined by that of message parameters βk i a (t). As pointed out in [24], βa k j (t) is updated in accordance to a set of linear equations upon the convergence of belief variances. Thus, after deriving the convergence condition of σ 2 i (t), we can then apply the theories from linear equations to analyze the convergence condition of belief mean µ i (t). However, the convergence condition of belief mean µ i (t) would be lengthy and will be investigated in the future. B. Verification Using Semi-definite Programming [30] Next, we show that whether S is can be determined by solving the following SDP problem min w, α s.t. α (43) p ii + w ki p ij k N (i)\j 0, (i, j) E; p ij α + p ii + k N (i) w ij w ki 0, i V. The SDP problem (43) can be solved efficiently by existing softwares, such as CVX [3] and SeDuMi [32], etc. Theorem 5. S if and only if the optimal solution α of (43) satisfies α < 0. Proof: First, notice that the SDP problem in (43) is equivalent to the following optimization July 3, X (c) 203 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See

20 0.09/TSP , IEEE Transactions on Signal Processing 20 problem min w, α α (44) s.t. w ij g ij (w) 0, (i, j) E; p ii w ki 0, (i, j) E; k N (i)\j p ii k N (i) w ki α, i V. If S, according to definition of S in (42), there must exist a w such that w g(w), w W, and p ii + k N (i) w ki > 0 for all i V. Obviously, these three conditions are equivalent to w ij g ij (w) 0, p ii k N (i)\j w ki < 0 for all (i, j) E and p ii k N (i) w ki < 0 for all i V. Thus, by defining α = p ii k N (i) w ki, it can be seen that (w, α) satisfies the constraints in (44). Due to p ii k N (i) w ki < 0, thus we have α < 0. Since α is a feasible solution of the minimization problem in (44), the optimal solution of (44) cannot be greater than α, hence it can be inferred that α < 0. Next, we prove the reverse also holds. If (w, α ) is the optimal solution of (44) with α < 0, (w, α ) must satisfy the constraints in (44), thus the following three conditions hold: ) w ij g ij (w ) 0; 2) p ii k N (i)\j w ki 0; 3) p ii k N (i) w ki α. For the second constraint, if p ii k N (i)\j w ki = 0, the function g ij (w ) = p 2 ij p ii + k N (i)\j w ki becomes undefined, thus p ii k N (i)\j w ki = 0 will never happen. Hence, we always have p ii k N (i)\j w ki < 0. For the third constraint, due to α < 0, it can be inferred that p ii k N (i) w ki < 0. Now, comparing p ii k N (i)\j w ki 0, p ii k N (i)\j w ki < 0 and p ii k N (i) w ki < 0 with the definition of set S in (42), we have w S, and hence S. Remark 2: Using the alternating direction method of multipliers (ADMM) [27], the SDP problem in (43) can be reformulated into N locally connected low-dimensional SDP subproblems, thus can be solved distributively. Not only this avoids ( the gathering of information at (N ) ) 4 a central processing unit, the complexity is also reduced from O i= N (i) of directly ( N ) solving SDP to O i= N (i) 4 per iteration of using ADMM technique. Furthermore, since what we are really interested in is to know whether S =, ADMM can stop its updating immediately if an intermediary vector is found to be within S. Thus, the required number of iterations can be reduced significantly. Because the derivation of ADMM is well-documented July 3, X (c) 203 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See

21 0.09/TSP , IEEE Transactions on Signal Processing 2 in [27], [33], we do not give the details here. It should be emphasized that despite of the low complexity of ADMM at each iteration, due to the unknown number of required iterations, it cannot be proved that the overall complexity is always lower than that of direct matrix inverse O(N 3 ). Remark 3: As discussed after (20), we know that A = {w w 0 w 0 int(s )} if int(s ). In particular, with a proof similar to that of Theorem 5, it can be shown that if the SDP min w, α s.t. α p ii + w ki k N (i)\j p ij 0, (i, j) E (45) p ij α w ij has a solution α < 0, then the optimal w must be a point in int(s ). VI. RELATIONSHIP WITH THE CONDITION BASED ON COMPUTATION TREE In this section, we will establish the relation between the proposed convergence condition and the one proposed in [24], which is the best currently available. For the convergence condition in [24], the original PDF in () is first transformed into the normalized form { f( x) exp } 2 xt P x + h T x, (46) where x = diag 2 (P)x, P = diag 2 (P)Pdiag 2 (P) and h = diag 2 (P)h with diag(p) denoting a diagonal matrix by only retaining the diagonal elements of P. Then, Gaussian BP is carried out with P and h using the updating equations in (6), (7), (9) and (0). Under the normalized P and h, we denote the corresponding message parameters as ṽ a (t) and β a (t). With ṽ a (t), the corresponding belief variance can be calculated using () as σ 2 i (t) = p ii + k N (i) ṽa k i (t). The variance of the original PDF f(x) can be recovered from p ii σ 2 i (t). Moreover, under the normalized P, the sets W, S, Ã and S can also be defined similar to (5), (7), (20) and (42), respectively. By noticing that the message-passing process of Gaussian BP in the factor graph can be translated exactly into the message passing in a computation tree, [24] proposed a convergence condition for belief variance σ 2 i (t) in terms of computation tree. A computation tree T γ, is constructed by choosing one of the variable nodes γ as root node and N (γ) as its direct descendants, connected through factor node f γ,ν ( x γ, x ν ) = exp{ p γν x γ x ν }, where ν is the July 3, X (c) 203 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See

22 0.09/TSP , IEEE Transactions on Signal Processing 22 x x 2 x3 x4 x 3 x 4 x3 x2 x x 4 x x3 x x x x 3 2 x 2 x 4 x 3 x 4 x x 4 Fig. : Illustration of computation tree T(, 4) and computation sub-tree T s (, 4, 4) within the dashed line. index of any descendant. For each descendant ν, the neighbors of ν excluding its parent node are connected as next level descendants, also connected through the corresponding factor nodes. The process is repeated until the computation tree T γ, has + layers of variables nodes (including the root node). The computation tree is completed by further connecting factor node f i ( x i ) = exp{ 2 x2 i + h i x i } to variable node x i for i V. Furthermore, a computation subtree T s (γ, ν, ) can be obtained by cutting the branch of x ν in the first layer from T(γ, ). An example of computation tree T(, 4) as well as computation subtree T s (, 4, 4) corresponding to a fully connected factor graph with four variables are illustrated in Fig.. By assigning to each variable node with a new index i =, 2,, T γ,, T γ, can be viewed as a factor graph for a new PDF, where T γ, denotes the number of variable nodes in T γ,. The new PDF represented by T γ, is f γ, ( x γ, ) exp { 2 xtγ, P γ, x γ, + h Tγ, x γ, }, (47) where x γ, is a T γ, vector; P γ, and h γ, are the corresponding precision matrix and linear coefficient vector, respectively. According to the equivalence between the message-passing process in original factor graph and that in computation tree [6], [24], under the prerequisite of P γ, 0 and ṽ a (0) = 0, it is known that σ 2 γ( ) = [ P γ, ]. (48) July 3, X (c) 203 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See

23 0.09/TSP , IEEE Transactions on Signal Processing 23 Similarly, the computation sub-tree T s (γ, ν, ) can also be viewed as a factor graph with P γ\ν, being the corresponding symmetric precision matrix in the PDF. Furthermore, if (48) holds, we also have σ 2 γ\ν( ) = [ P γ\ν, ], (49) where σ γ\ν 2 ( ) p γγ+ k N (γ)\ν ṽa k γ ( ). In [24, Proposition 25], it is proved that if lim ρ( P γ, I) <, the variance σ γ(t) 2 converges for ṽ a (0) = 0; but if lim ρ( P γ, I) >, Gaussian BP becomes ill-posed. Notice that the limit lim ρ( P γ, I) always exist and are identical for all γ V [24, Lemma 24]. The following lemma and theorem reveal the relation between the proposed convergence condition and that based on computation tree. Lemma 4. If P γ, 0 for all γ V and 0, then ṽ a (t) converges to ṽ a S for ṽ a (0) = 0. Proof: See Appendix B. Now, we give the following theorem. Theorem 6. S if and only if either of the following two conditions holds: ) lim ρ( P γ, I) < ; 2) lim ρ( P γ, I) =, p γγ + k N (γ) ṽa k γ 0 with ṽa (0) = 0, and P ν, 0 for all ν V and 0. Proof: Sufficient Condition: First, we prove that if lim ρ( P γ, I) <, then S =. With the monotonically increasing property of ρ( P γ, I) [24, Lemma 23] and the assumption that its limit is smaller than, we have ρ( P γ, I) ρ u with ρ u lim ρ( P γ, I) (0, ). Expressing in terms of eigenvalues, we have ρ u λ( P γ, ) ρ u, or equivalently ρ u λ( P γ, ) + ρ u. (50) { } For a symmetric P γ,, the eigenvalues of P γ, are, and (50) is equivalent to λ( P γ, ) + ρ u λ( P γ, ) ρ u. (5) July 3, X (c) 203 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See

24 0.09/TSP , IEEE Transactions on Signal Processing 24 Due to 0 < ρ u <, it can be obtained from (5) that λ( P γ, ) > 0, that is, P γ, 0, γ V and 0. (52) Notice that the diagonal element of a symmetric matrix is always smaller or equal to the matrix s maximum eigenvalue [34], that is, [ P γ, ] ii λ max ( P γ, ). Combining with the fact that for a positive definite matrix, its diagonal elements are positive, we obtain 0 < [ P γ, ] ii λ max ( P γ, ). Since the upper bound of (5) applies to all eigenvalues, we obtain 0 < [ P γ, ] ii ρ u. (53) Due to [ P γ, ] = σ γ( ) 2 from (48) and σ γ( ) 2 = p γγ+ k N (γ) ṽa k γ ( ), it can be inferred from (53) that p γγ + k N (γ) ṽa k γ ( ) ρ u for all 0. Due to P γ, 0 from (52), we have P γ, 0, and according to Lemma 4, ṽ a (t) always converges to ṽ a S. Thus, it can be inferred that p γγ + k N (γ) ṽa k γ ρ u. With ρ u <, we obtain p γγ + ṽk γ a > 0, γ V. (54) k N (γ) Combining (54) with ṽ a S, it can be inferred that ṽ a S by the definition, thus S =. Next, we prove the second scenario. Due to P γ, 0, according to Lemma 4, ṽ a (t) converges to ṽ a S, and thereby S and p γγ + k N (γ) ṽa exists. Due to S, according to Lemma 3, it is known that p γγ + k N (γ) ṽa > 0 for all γ V or p γγ + k N (γ) ṽa = 0 for all γ V. From the prerequisite p γγ + k N (γ) ṽa 0, we can infer that p γγ + k N (γ) ṽa > 0 holds for all γ V. Combining with ṽ a S, we can see that ṽ a S, thus S. Necessary Condition: In the following, we will first prove that the precision matrix of a computation tree can always be written into a special structure by re-indexing the nodes in the tree. Based on the special structure, the necessity is proved by the method of induction. First, notice that changing indexing schemes in the computation tree does not affect the positive definiteness of the corresponding precision matrix P γ,. So, we consider an indexing scheme July 3, X (c) 203 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See

25 0.09/TSP , IEEE Transactions on Signal Processing 25 such that the precision matrix P γ, + for 0 can be represented in the form p γγ ã T k \γ, ãt k 2 \γ, ãt k N (γ) \γ, ã k \γ, P k \γ, 0 0 P γ, + =. ã k2 \γ, , (55) ã k N (γ) \γ, 0 0 P k N (γ) \γ, where k i N (γ) for i =, 2,, N (γ) ; ã ki \γ, = [ p ki γ, 0,, 0] T is a vector with length T s (k i, γ, ). Notice that the ( + )-order computation tree T(γ, + ) consists of the root node [ x] γ and a set of -order computation sub-trees T s (k i, γ, ) with the corresponding root node k i. The lower-right block diagonal structure of (55) can be easily obtained by assigning consecutive indices to nodes inside each sub-tree T s (k i, γ, ). Moreover, since there is only one connection from node γ to the root node k i of each sub-tree T s (k i, γ, ), ã ki \γ, contains only one nonzero element p ki γ. Then, by assigning the smallest index to the root node in each T s (k i, γ, ), the only nonzero element in ã ki \γ, must locate at the first position. Therefore, the precision matrix P γ, + can be represented in the form of (55). When = 0, obviously, P γ, = p γγ > 0. Suppose P γ, 0 for some 0, we need to prove P γ, + 0. From (55), it is clear that P γ, + 0 if and only if the following two conditions are satisfied [34, p. 472]: ) Bdiag ( P k \γ,,, P k N (γ) \γ, 0, (56) p γγ ã T k\γ, P k\γ, ãk\γ, > 0, (57) k N (γ) where Bdiag( ) denotes block diagonal matrix with the elements located along the main diagonal. Due to P k, 0 for all k V by assumption, then its sub-matrices P k\γ, 0 for all (k, γ) E, thus the first condition (56) holds. On the other hand, for the second condition (57), we write p γγ k N (γ) ã T k\γ, P k\γ, ãk\γ, a = p γγ k N (γ) b = p γγ + k N (γ) where the equality a = holds since ã k\γ, has only one nonzero p kγ p 2 kγ σ 2 k\γ( ) [ P k\γ, ] = σ k\γ 2 ( ) from (49); and = b holds due to p 2 kγ σ2 k\γ ( ) = ṽ a k γ( + ), (58) in the first element, and p 2 kγ p = γγ+ k N (γ)\ν ṽa k γ ( ) July 3, X (c) 203 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See

26 0.09/TSP , IEEE Transactions on Signal Processing 26 ṽ a k γ ( + ) given in (2). If S, according to Theorem 4, we have p γγ + k N (γ) ṽ a k γ > 0. (59) Due to S S, then S = also implies S. According to (63), it can be inferred that ṽ a (t) ṽ a. Combining with (59), we obtain p γγ + k N (γ) ṽa k γ (t) > 0. Substituting the result into (58), it can be inferred that the second condition (57) holds as well. Thus, we have P γ, 0 for all γ V and > 0. Furthermore, from (59), it is obvious that p γγ + k N (γ) ṽa k γ 0. From P γ, 0, we obtain I (I P γ, ) 0, and hence λ max (I P γ, ) > 0. Since P γ, represents a tree-structured factor graph, it is proved in [24, Proposition 5] that λ max (I P γ, ) = ρ( P γ, I). Therefore, it can be obtained that ρ( P γ, I) <, and thereby lim ρ( P γ, I). (60) Finally, if lim ρ( P γ, I) <, due to P γ, 0 from (52), we have P γ, 0. Together with (54), it can be inferred that under the prerequisite of lim ρ( P γ, I) <, the conditions P γ, 0 and p γγ + k N (γ) ṽa k γ 0 are automatically satisfied. Therefore, if S =, we have either lim ρ( P γ, I) < or lim ρ( P γ, I) =, p γγ + k N (γ) ṽa k γ 0 and P γ, 0. From Theorems 4 and 6, it can be obtained that the variance σγ(t) 2 converges as lim ρ( P γ, I) <, and diverges as lim ρ( P γ, I) >, which are consistent with the results proposed in [24]. Moreover, it can be seen from Theorem 6 that it is not sufficient to determine the convergence of variance σ γ(t) 2 by using lim ρ( P γ, I) = only. This fills in the gap of [24] in the scenario of lim ρ( P γ, I) =. Albeit with similar conclusions to [24], we need to emphasize that the criterion lim ρ( P γ, I) < proposed in [24] is not easy to check in practice due to the infinite dimension, while our condition S can be verified by solving an SDP problem given in Theorem 5. Moreover, the initialization is expanded from the a single choice ṽ a (0) = 0 in [24] to a much larger set ṽ a (0) Ã in this paper. The flexibility on the choice of initialization is useful to accelerate the convergence of variance σ 2 i (t) if the initialization is chosen close to the convergent point. July 3, X (c) 203 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See

27 0.09/TSP , IEEE Transactions on Signal Processing 27 VII. NUMERICAL EXAMPLES In this section, numerical experiments are presented to corroborate the theories in this paper. The example is based on the precision matrices P constructed as, if i = j p ij =, (6) ζ θ mod(i+j,0)+, if i j where ζ is a coefficient indicating the correlation strength among variables; and θ k is the k- th element of the vector θ = [0.3, 0.0, 0.7, 0.05, 0, 0.2, 0.07, 0., 0.02, 0.03] T. The varying of correlation strength ζ induces a series of matrices, and the positive definite constraint P 0 required by a valid PDF is guaranteed when ζ < Fig. 2 illustrates how the optimal solution α of (43) varies with the correlation strength ζ. It can be seen that the optimal solution α always exists and the condition α < 0 holds for all ζ , while no feasible solution exists in the SDP problem (43) when ζ > According to Theorem 4, this means that if ζ , the variance σ 2 i (t) with i V converges to the same point for all initializations v a (0) A under both synchronous and asynchronous schedulings. On the other hand, if ζ > , the variance σ 2 i (t) cannot converge. To verify the convergence of belief variances under ζ , Fig. 3 shows how the variance σ 2 (t) of the -st variable evolves as a function of t when ζ = , which is slightly smaller than It can be observed that the variance σ 2 (t) converges to the same value under both synchronous and asynchronous schedulings and different initializations of v a (0) = 0 and v a (0) = w, where w is the optimal solution of (45). For the asynchronous case, a scheduling with 30% chance of not updating the messages at each iteration is considered. On the other hand, Fig. 4 verifies the divergence of variance σ 2 (t) when ζ = , which is slightly larger than In this figure, synchronous scheduling and the same initializations as that of Fig. 3 are used. It can be seen that σ 2 (t) fluctuates as iterations proceed, and does not show sign of convergence. VIII. CONCLUSIONS In this paper, the necessary and sufficient convergence condition for the variances of Gaussian BP was developed for synchronous and asynchronous schedulings. The initialization set of the proposed condition is much larger than the usual choice of a single point of zero. It is proved that July 3, X (c) 203 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See

28 0.09/TSP , IEEE Transactions on Signal Processing Convergence Metric of Variance X: Y: Convergence Metric: α ζ= Correlation Strength (ζ) Fig. 2: The value of α under different correlation strength ζ. Convergence of Variance with ζ= v a (0) = w v a (0) = 0 asynchronous synchronous σ 2 (t) Number of Iterations (t) Fig. 3: Illustration for the convergence of variance σ 2 (t) under different schedulings and initializations with w being the optimal solution of (45). the convergence condition can be verified efficiently by solving an SDP problem. Furthermore, the relationship between the convergence condition proposed in this paper and the one based on computation tree was established. The relationship fills in a missing piece of the result in [24] where the spectral radius of computation tree is equal to one. Numerical examples were further proposed to verify the proposed convergence conditions. July 3, X (c) 203 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See

29 0.09/TSP , IEEE Transactions on Signal Processing 29 5 Divergence of Variance with ζ= v a (0) = 0 5 σ 2 (t) 0 v a (0) = w Number of Iterations (t) Fig. 4: Illustration for the divergence of variance σ 2 (t) with w being the optimal solution of (45). APPENDIX A PROOF OF LEMMA 2 First, we prove that v a (t) converges for any v a (0) 0 given S. For any w S, according to P5), we have w < 0. Thus, for any v a (0) 0, the relation w v a (0) always holds. Notice that w W due to w S and S W. Applying P2) to w v a (0), we obtain g(w) v a (). Combining it with w g(w) from P6) gives w v a (). On the other hand, substituting v a (0) 0 into (4) gives v a () < 0. (62) Due to v a (0) 0, thus v a () v a (0). Combining it with w v a () gives w v a () v a (0). Applying g( ) to w v a () < v a (0), it can be inferred from P2) that g(w) v a (2) < v a (). Together with w g(w) as claimed by P6), we obtain w v a (2) < v a (). By induction, we can infer that w v a (t + ) v a (t). (63) It can be seen from (63) that v a (t) is a monotonically non-increasing but lower bounded sequence, thus it converges. Next, we prove that v a (t) converges to the same point for all v a (0) 0. For any v a (0) 0, according to P2), applying g( ) on both sides of 0 v a (0) gives g(0) g(v a (0)) = v a (). July 3, X (c) 203 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See

30 0.09/TSP , IEEE Transactions on Signal Processing 30 Combining this relation with (62) leads to g(0) v a () 0. Applying g( ) on this inequality for t more times, it can be obtained from P2) that g (t+) (0) v a (t + ) g (t) (0) for all t 0. By denoting lim t g (t) (0) = v a, it is obvious that v a (t) also converges to the v a. Finally, since v a is a fixed point of g( ), hence v a = g(v a ). From (63), we obtain w v a. Due to w S, thus w W by definition. Applying P), it can be inferred that v a W. Combining with the fact v a = g(v a ), according to the definition of S in (7), we obtain v a S. On the other hand, if S =, according to Theorem, the messages of Gaussian BP passed in factor graph cannot be maintained at the Gaussian form, thus the parameters of messages v a (t) cannot converge in this case. Due to P γ, 0, then P γ, APPENDIX B PROOF OF LEMMA 4 0, hence the diagonal elements [ P γ, ] ii > 0 [34]. With σ γ( ) 2 = [ P γ, ] and = p σ γ(t) 2 γγ + k N (γ) ṽa k γ (t), we have p γγ + ṽk γ(t) a > ṽν γ(t). a (64) k N (γ)\ν Next, we prove ṽ a (t) W for t 0. Obviously, ṽ a (0) = 0 W. Suppose ṽ a (t) W holds for some t 0, and according to the definition of W in (5), this is equivalent to p νν + k N (ν)\γ ṽa k ν (t) > 0. Substituting this inequality into (4) gives ṽa ν γ(t + ) < 0. Then applying ṽν γ(t a + ) < 0 into (64) gives p γγ + k N (γ)\ν ṽa k γ (t + ) > 0 for all (ν, γ) E, or equivalently ṽ a (t + ) W. Thus, we have ṽ a (t) W for all t 0. Finally, substituting ṽ a (0) = 0 into (3) gives ṽ a () < 0, and thereby ṽ a () < ṽ a (0). From ṽ a (t) W and P2), applying g( ) on both sides of ṽ a () < ṽ a (0) for t times gives ṽ a (t + ) ṽ a (t). (65) From ṽ a (0) = 0 and (65), we have ṽ a (t) 0. Putting the result into (64) gives ṽ a ν γ(t) > p γγ for all (ν, γ) E. Together with (65), it can be seen that ṽ a (t) is a monotonically non-increasing and lower bounded sequence. Thus, ṽ a (t) converges to a fixed point ṽ a with ṽ a = g(ṽ a ). Moreover, from ṽ a () < 0 and (65), we have ṽ a < 0. Putting the result into the limiting form of (64) gives p γγ + k N (γ)\ν ṽa k γ > 0 for all (ν, γ) E, or equivalently ṽa W. Combining with ṽ a = g(ṽ a ), we obtain ṽ a S by the definition of S. July 3, X (c) 203 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See

31 0.09/TSP , IEEE Transactions on Signal Processing 3 REFERENCES [] F. R. Kschischang, B. J. Frey, and H. A. Loeliger, Factor graphs and the sum-product algorithm, IEEE Trans. Inf. Theory, vol. 47, pp , 200. [2] H. A. Loeliger, An introduction to factor graphs, IEEE Signal Process. Mag., vol. 2, pp. 28-4, [3] H. A. Loeliger, J. Dauwels, H. Junli, S. Korl, P. Li, and F. R. Kschischang, The Factor graph approach to model-based signal processing, Proc. IEEE, vol. 95, no. 6, pp , June [4] C.M. Bishop, Pattern Recognition and Machine Learning. Springer, Aug [5] D. Koller and N. Friedman. Probabilistic Graphical Models Principles and Techniques. MIT Press, [6] Y. Weiss and W. T. Freeman, Correctness of belief propagation in Gaussian graphical models of arbitrary topology, Neural Comput., vol. 3, pp , 200. [7] Y. Liu, V. Chandrasekaran, A. Anandkumar, and A. S. Willsky, Feedback message passing for inference in Gaussian graphical models, IEEE Trans. Signal Process., vol. 60, no. 8, pp , 202. [8] A. Montanari, B. Prabhakar, and D. Tse, Belief propagation based multiuser detection, in Proc. IEEE Information Theory Workshop, Punta del Este, Uruguay, Mar [9] Q. Guo and D. Huang, EM-based joint channel estimation and detection for frequency selective channels using Gaussian message passing, IEEE Trans. Signal Process., vol. 59, pp , 20. [0] Q. Guo and P. Li, LMMSE turbo equalization based on factor graphs, IEEE J. Sel. Areas Commun., vol. 26, pp. 3-39, [] Y. El-Kurdi, W. J. Gross, and D. Giannacopoulos, Efficient implementation of Gaussian belief propagation solver for large sparse diagonally dominant linear systems, IEEE Trans. Magn., vol. 48, no. 2, pp , Feb [2] O. Shental, P. H. Siegel, J. K. Wolf, D. Bickson, and D. Dolev, Gaussian belief propagation solver for systems of linear equations, in Proc. IEEE Int. Symp. Inform. Theory (ISIT), 2008, pp [3] X. Tan and J. Li, Computationally efficient sparse Bayesian learning via belief propagation, IEEE Trans. Signal Process., vol. 58, pp , 200. [4] V. Chandrasekaran, J. K. Johnson, and A. S. Willsky, Estimation in Gaussian graphical models using tractable subgraphs: a walk-sum analysis, IEEE Trans. Signal Process., vol. 56, pp , [5] N. Boon Loong, J. S. Evans, S. V. Hanly, and D. Aktas, Distributed downlink beamforming with cooperative base stations, IEEE Trans. Inf. Theory, vol. 54, pp , [6] F. Lehmann, Iterative mitigation of intercell interference in cellular networks based on Gaussian belief propagation, IEEE Trans. Veh. Technol., vol. 6, pp , 202. [7] M. Leng and Y. C. Wu, Distributed clock synchronization for wireless sensor networks using belief propagation, IEEE Trans. Signal Process., vol.59, no., pp , Nov. 20. [8] A. Ahmad, D. Zennaro, E. Serpedin, and L. Vangelista, A factor graph approach to clock offset estimation in wireless sensor networks, IEEE Trans. Inf. Theory, vol. 58, no. 7, pp , July 202. [9] M. Leng, T. Wee-Peng, and T. Q. S. Quek, Cooperative and distributed localization for wireless sensor networks in multipath environments, in Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP), 202, pp [20] Y. Song, B. Wang, Z. Shi, K. Pattipati, and S. Gupta, Distributed algorithms for energy-efficient even self-deployment in mobile sensor networks, in IEEE Trans. Mobile Compt., 203, pp. -4. July 3, X (c) 203 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See

32 0.09/TSP , IEEE Transactions on Signal Processing 32 [2] G. Zhang, W. Xu, and Y. Wang, Fast distributed rate control algorithm with QoS support in Ad Hoc networks, in Proc. IEEE Global Telecommunications Conf. (Globecom), 200. [22] D. Dolev, A. Zymnis, S. P. Boyd, D. Bickson, and Y. Tock, Distributed large scale network utility maximization, in Proc. IEEE Int. Symp. Inform. Theory (ISIT), 2009, pp [23] M. W. Seeger and D. P. Wipf, Variational Bayesian inference techniques, IEEE Signal Process. Mag., vol. 27, no. 6, pp. 89, Nov [24] D. M. Malioutov, J. K. Johnson, and A. S. Willsky, Walk-sums and belief propagation in Gaussian graphical models, J. Mach. Learn. Res., vol. 7, no., pp , [25] C. C. Moallemi and B. Van Roy, Convergence of min-sum message passing for quadratic optimization, IEEE Trans. Inf. Theory, vol. 55, pp , [26] S. Tatikonda and M. I. Jordan, Loopy belief propagation and Gibbs measures, in Proc. 8th Annu. Conf. Uncertainty Artif. Intell. (UAI-02), San Francisco, CA, Aug. 2002, pp [27] D.P. Bertsekas and J.N. Tsitsiklis, Parallel and Distributed Computation: Numerical Methods. Englewood Cliffs, NJ: Prentice-Hall, 989. [28] S. Boyd and L. Vandenberghe, Convex Optimization. Cambridge, U.K.: Cambridge Univ. Press, [29] C. C. Moallemi and B. Van Roy, Convergence of Min-Sum Message Passing for Convex Optimization, IEEE Trans. Inf. Theory, vol. 56, no. 4, pp , 200. [30] L. Vandenberghe and S. Boyd, Semidefinite programming, SIAM Rev., vol. 38, no., pp , Mar [3] M. Grant, S. Boyd, and Y. Ye, cvx: Matlab software for disciplined convex programming. [Online]. Available: http: // boyd/cvx/ [32] J. F. Sturm, Using SeDumi.02, a MATLAB toolbox for optimization over symmetric cones, Optim. Methods Software, no. -22, pp , 999. [33] S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein, Distributed optimization and statistical learning via the alternating method of multipliers, Found. Trends in Mach. Learn., vol. 3, no., pp. -22, 20. [34] R. Horn and C.R. Johnson, Matrix Analysis. New York: Cambridge Univ. Press, 985. Qinliang Su received the B.Eng degree from Chongqing University in 2007 and M.Eng degree from Zhejiang University in 200 (both with honors), respectively. He is currently pursuing the Ph.D degree with the Department of Electrical and Electronic Engineering, the University of Hong Kong, Hong Kong. His current research interests include signal processing for wireless communications and networks; statistical inference and learning; graphical models; distributed and parallel computation; convex optimization theories. July 3, X (c) 203 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See

33 0.09/TSP , IEEE Transactions on Signal Processing 33 Yik-Chung Wu received the B.Eng. (EEE) degree in 998 and the M.Phil. degree in 200 from the University of Hong Kong (HKU). He received the Croucher Foundation scholarship in 2002 to study Ph.D. degree at Texas A&M University, College Station, and graduated in From August 2005 to August 2006, he was with the Thomson Corporate Research, Princeton, NJ, as a Member of Technical Staff. Since September 2006, he has been with HKU, currently as an Associate Professor. He was a visiting scholar at Princeton University, in summer 20. His research interests are in general area of signal processing and communication systems, and in particular distributed signal processing and communications; optimization theories for communication systems; estimation and detection theories in transceiver designs; and smart grid. Dr. Wu served as an Editor for IEEE Communications Letters, is currently an Editor for IEEE Transactions on Communications and Journal of Communications and Networks. July 3, X (c) 203 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See

Walk-Sum Interpretation and Analysis of Gaussian Belief Propagation

Walk-Sum Interpretation and Analysis of Gaussian Belief Propagation Walk-Sum Interpretation and Analysis of Gaussian Belief Propagation Jason K. Johnson, Dmitry M. Malioutov and Alan S. Willsky Department of Electrical Engineering and Computer Science Massachusetts Institute

More information

Message-Passing Algorithms for GMRFs and Non-Linear Optimization

Message-Passing Algorithms for GMRFs and Non-Linear Optimization Message-Passing Algorithms for GMRFs and Non-Linear Optimization Jason Johnson Joint Work with Dmitry Malioutov, Venkat Chandrasekaran and Alan Willsky Stochastic Systems Group, MIT NIPS Workshop: Approximate

More information

Message passing and approximate message passing

Message passing and approximate message passing Message passing and approximate message passing Arian Maleki Columbia University 1 / 47 What is the problem? Given pdf µ(x 1, x 2,..., x n ) we are interested in arg maxx1,x 2,...,x n µ(x 1, x 2,..., x

More information

Variational Inference (11/04/13)

Variational Inference (11/04/13) STA561: Probabilistic machine learning Variational Inference (11/04/13) Lecturer: Barbara Engelhardt Scribes: Matt Dickenson, Alireza Samany, Tracy Schifeling 1 Introduction In this lecture we will further

More information

13 : Variational Inference: Loopy Belief Propagation and Mean Field

13 : Variational Inference: Loopy Belief Propagation and Mean Field 10-708: Probabilistic Graphical Models 10-708, Spring 2012 13 : Variational Inference: Loopy Belief Propagation and Mean Field Lecturer: Eric P. Xing Scribes: Peter Schulam and William Wang 1 Introduction

More information

Inference in Bayesian Networks

Inference in Bayesian Networks Andrea Passerini passerini@disi.unitn.it Machine Learning Inference in graphical models Description Assume we have evidence e on the state of a subset of variables E in the model (i.e. Bayesian Network)

More information

14 : Theory of Variational Inference: Inner and Outer Approximation

14 : Theory of Variational Inference: Inner and Outer Approximation 10-708: Probabilistic Graphical Models 10-708, Spring 2014 14 : Theory of Variational Inference: Inner and Outer Approximation Lecturer: Eric P. Xing Scribes: Yu-Hsin Kuo, Amos Ng 1 Introduction Last lecture

More information

Probabilistic and Bayesian Machine Learning

Probabilistic and Bayesian Machine Learning Probabilistic and Bayesian Machine Learning Day 4: Expectation and Belief Propagation Yee Whye Teh ywteh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit University College London http://www.gatsby.ucl.ac.uk/

More information

Approximate Message Passing

Approximate Message Passing Approximate Message Passing Mohammad Emtiyaz Khan CS, UBC February 8, 2012 Abstract In this note, I summarize Sections 5.1 and 5.2 of Arian Maleki s PhD thesis. 1 Notation We denote scalars by small letters

More information

13 : Variational Inference: Loopy Belief Propagation

13 : Variational Inference: Loopy Belief Propagation 10-708: Probabilistic Graphical Models 10-708, Spring 2014 13 : Variational Inference: Loopy Belief Propagation Lecturer: Eric P. Xing Scribes: Rajarshi Das, Zhengzhong Liu, Dishan Gupta 1 Introduction

More information

17 Variational Inference

17 Variational Inference Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science 6.438 Algorithms for Inference Fall 2014 17 Variational Inference Prompted by loopy graphs for which exact

More information

UC Berkeley Department of Electrical Engineering and Computer Science Department of Statistics. EECS 281A / STAT 241A Statistical Learning Theory

UC Berkeley Department of Electrical Engineering and Computer Science Department of Statistics. EECS 281A / STAT 241A Statistical Learning Theory UC Berkeley Department of Electrical Engineering and Computer Science Department of Statistics EECS 281A / STAT 241A Statistical Learning Theory Solutions to Problem Set 2 Fall 2011 Issued: Wednesday,

More information

Approximation Metrics for Discrete and Continuous Systems

Approximation Metrics for Discrete and Continuous Systems University of Pennsylvania ScholarlyCommons Departmental Papers (CIS) Department of Computer & Information Science May 2007 Approximation Metrics for Discrete Continuous Systems Antoine Girard University

More information

14 : Theory of Variational Inference: Inner and Outer Approximation

14 : Theory of Variational Inference: Inner and Outer Approximation 10-708: Probabilistic Graphical Models 10-708, Spring 2017 14 : Theory of Variational Inference: Inner and Outer Approximation Lecturer: Eric P. Xing Scribes: Maria Ryskina, Yen-Chia Hsu 1 Introduction

More information

11. Learning graphical models

11. Learning graphical models Learning graphical models 11-1 11. Learning graphical models Maximum likelihood Parameter learning Structural learning Learning partially observed graphical models Learning graphical models 11-2 statistical

More information

6.867 Machine learning, lecture 23 (Jaakkola)

6.867 Machine learning, lecture 23 (Jaakkola) Lecture topics: Markov Random Fields Probabilistic inference Markov Random Fields We will briefly go over undirected graphical models or Markov Random Fields (MRFs) as they will be needed in the context

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Brown University CSCI 295-P, Spring 213 Prof. Erik Sudderth Lecture 11: Inference & Learning Overview, Gaussian Graphical Models Some figures courtesy Michael Jordan s draft

More information

CS264: Beyond Worst-Case Analysis Lecture #11: LP Decoding

CS264: Beyond Worst-Case Analysis Lecture #11: LP Decoding CS264: Beyond Worst-Case Analysis Lecture #11: LP Decoding Tim Roughgarden October 29, 2014 1 Preamble This lecture covers our final subtopic within the exact and approximate recovery part of the course.

More information

Expectation Propagation in Factor Graphs: A Tutorial

Expectation Propagation in Factor Graphs: A Tutorial DRAFT: Version 0.1, 28 October 2005. Do not distribute. Expectation Propagation in Factor Graphs: A Tutorial Charles Sutton October 28, 2005 Abstract Expectation propagation is an important variational

More information

On the complexity of maximizing the minimum Shannon capacity in wireless networks by joint channel assignment and power allocation

On the complexity of maximizing the minimum Shannon capacity in wireless networks by joint channel assignment and power allocation On the complexity of maximizing the minimum Shannon capacity in wireless networks by joint channel assignment and power allocation Mikael Fallgren Royal Institute of Technology December, 2009 Abstract

More information

Approximate Inference Part 1 of 2

Approximate Inference Part 1 of 2 Approximate Inference Part 1 of 2 Tom Minka Microsoft Research, Cambridge, UK Machine Learning Summer School 2009 http://mlg.eng.cam.ac.uk/mlss09/ Bayesian paradigm Consistent use of probability theory

More information

Contraction Methods for Convex Optimization and Monotone Variational Inequalities No.16

Contraction Methods for Convex Optimization and Monotone Variational Inequalities No.16 XVI - 1 Contraction Methods for Convex Optimization and Monotone Variational Inequalities No.16 A slightly changed ADMM for convex optimization with three separable operators Bingsheng He Department of

More information

The common-line problem in congested transit networks

The common-line problem in congested transit networks The common-line problem in congested transit networks R. Cominetti, J. Correa Abstract We analyze a general (Wardrop) equilibrium model for the common-line problem in transit networks under congestion

More information

3.10 Lagrangian relaxation

3.10 Lagrangian relaxation 3.10 Lagrangian relaxation Consider a generic ILP problem min {c t x : Ax b, Dx d, x Z n } with integer coefficients. Suppose Dx d are the complicating constraints. Often the linear relaxation and the

More information

Loopy Belief Propagation for Bipartite Maximum Weight b-matching

Loopy Belief Propagation for Bipartite Maximum Weight b-matching Loopy Belief Propagation for Bipartite Maximum Weight b-matching Bert Huang and Tony Jebara Computer Science Department Columbia University New York, NY 10027 Outline 1. Bipartite Weighted b-matching 2.

More information

x. Figure 1: Examples of univariate Gaussian pdfs N (x; µ, σ 2 ).

x. Figure 1: Examples of univariate Gaussian pdfs N (x; µ, σ 2 ). .8.6 µ =, σ = 1 µ = 1, σ = 1 / µ =, σ =.. 3 1 1 3 x Figure 1: Examples of univariate Gaussian pdfs N (x; µ, σ ). The Gaussian distribution Probably the most-important distribution in all of statistics

More information

Fault-Tolerant Consensus

Fault-Tolerant Consensus Fault-Tolerant Consensus CS556 - Panagiota Fatourou 1 Assumptions Consensus Denote by f the maximum number of processes that may fail. We call the system f-resilient Description of the Problem Each process

More information

13: Variational inference II

13: Variational inference II 10-708: Probabilistic Graphical Models, Spring 2015 13: Variational inference II Lecturer: Eric P. Xing Scribes: Ronghuo Zheng, Zhiting Hu, Yuntian Deng 1 Introduction We started to talk about variational

More information

Probabilistic Graphical Models

Probabilistic Graphical Models 2016 Robert Nowak Probabilistic Graphical Models 1 Introduction We have focused mainly on linear models for signals, in particular the subspace model x = Uθ, where U is a n k matrix and θ R k is a vector

More information

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016 Bayesian Networks: Construction, Inference, Learning and Causal Interpretation Volker Tresp Summer 2016 1 Introduction So far we were mostly concerned with supervised learning: we predicted one or several

More information

Probabilistic Graphical Models. Theory of Variational Inference: Inner and Outer Approximation. Lecture 15, March 4, 2013

Probabilistic Graphical Models. Theory of Variational Inference: Inner and Outer Approximation. Lecture 15, March 4, 2013 School of Computer Science Probabilistic Graphical Models Theory of Variational Inference: Inner and Outer Approximation Junming Yin Lecture 15, March 4, 2013 Reading: W & J Book Chapters 1 Roadmap Two

More information

SMT 2013 Power Round Solutions February 2, 2013

SMT 2013 Power Round Solutions February 2, 2013 Introduction This Power Round is an exploration of numerical semigroups, mathematical structures which appear very naturally out of answers to simple questions. For example, suppose McDonald s sells Chicken

More information

Math 471 (Numerical methods) Chapter 3 (second half). System of equations

Math 471 (Numerical methods) Chapter 3 (second half). System of equations Math 47 (Numerical methods) Chapter 3 (second half). System of equations Overlap 3.5 3.8 of Bradie 3.5 LU factorization w/o pivoting. Motivation: ( ) A I Gaussian Elimination (U L ) where U is upper triangular

More information

Approximate Inference Part 1 of 2

Approximate Inference Part 1 of 2 Approximate Inference Part 1 of 2 Tom Minka Microsoft Research, Cambridge, UK Machine Learning Summer School 2009 http://mlg.eng.cam.ac.uk/mlss09/ 1 Bayesian paradigm Consistent use of probability theory

More information

Lecture 13: Spectral Graph Theory

Lecture 13: Spectral Graph Theory CSE 521: Design and Analysis of Algorithms I Winter 2017 Lecture 13: Spectral Graph Theory Lecturer: Shayan Oveis Gharan 11/14/18 Disclaimer: These notes have not been subjected to the usual scrutiny reserved

More information

Appendix A. Formal Proofs

Appendix A. Formal Proofs Distributed Reasoning for Multiagent Simple Temporal Problems Appendix A Formal Proofs Throughout this paper, we provided proof sketches to convey the gist of the proof when presenting the full proof would

More information

Research Reports on Mathematical and Computing Sciences

Research Reports on Mathematical and Computing Sciences ISSN 1342-284 Research Reports on Mathematical and Computing Sciences Exploiting Sparsity in Linear and Nonlinear Matrix Inequalities via Positive Semidefinite Matrix Completion Sunyoung Kim, Masakazu

More information

Lecture 8: PGM Inference

Lecture 8: PGM Inference 15 September 2014 Intro. to Stats. Machine Learning COMP SCI 4401/7401 Table of Contents I 1 Variable elimination Max-product Sum-product 2 LP Relaxations QP Relaxations 3 Marginal and MAP X1 X2 X3 X4

More information

Belief Update in CLG Bayesian Networks With Lazy Propagation

Belief Update in CLG Bayesian Networks With Lazy Propagation Belief Update in CLG Bayesian Networks With Lazy Propagation Anders L Madsen HUGIN Expert A/S Gasværksvej 5 9000 Aalborg, Denmark Anders.L.Madsen@hugin.com Abstract In recent years Bayesian networks (BNs)

More information

Outline. Relaxation. Outline DMP204 SCHEDULING, TIMETABLING AND ROUTING. 1. Lagrangian Relaxation. Lecture 12 Single Machine Models, Column Generation

Outline. Relaxation. Outline DMP204 SCHEDULING, TIMETABLING AND ROUTING. 1. Lagrangian Relaxation. Lecture 12 Single Machine Models, Column Generation Outline DMP204 SCHEDULING, TIMETABLING AND ROUTING 1. Lagrangian Relaxation Lecture 12 Single Machine Models, Column Generation 2. Dantzig-Wolfe Decomposition Dantzig-Wolfe Decomposition Delayed Column

More information

Decentralized Stochastic Control with Partial Sharing Information Structures: A Common Information Approach

Decentralized Stochastic Control with Partial Sharing Information Structures: A Common Information Approach Decentralized Stochastic Control with Partial Sharing Information Structures: A Common Information Approach 1 Ashutosh Nayyar, Aditya Mahajan and Demosthenis Teneketzis Abstract A general model of decentralized

More information

Machine Learning for Data Science (CS4786) Lecture 24

Machine Learning for Data Science (CS4786) Lecture 24 Machine Learning for Data Science (CS4786) Lecture 24 Graphical Models: Approximate Inference Course Webpage : http://www.cs.cornell.edu/courses/cs4786/2016sp/ BELIEF PROPAGATION OR MESSAGE PASSING Each

More information

Message Passing Algorithms and Junction Tree Algorithms

Message Passing Algorithms and Junction Tree Algorithms Message Passing lgorithms and Junction Tree lgorithms Le Song Machine Learning II: dvanced Topics S 8803ML, Spring 2012 Inference in raphical Models eneral form of the inference problem P X 1,, X n Ψ(

More information

Bayesian Machine Learning - Lecture 7

Bayesian Machine Learning - Lecture 7 Bayesian Machine Learning - Lecture 7 Guido Sanguinetti Institute for Adaptive and Neural Computation School of Informatics University of Edinburgh gsanguin@inf.ed.ac.uk March 4, 2015 Today s lecture 1

More information

Self-Concordant Barrier Functions for Convex Optimization

Self-Concordant Barrier Functions for Convex Optimization Appendix F Self-Concordant Barrier Functions for Convex Optimization F.1 Introduction In this Appendix we present a framework for developing polynomial-time algorithms for the solution of convex optimization

More information

Numerical Analysis: Interpolation Part 1

Numerical Analysis: Interpolation Part 1 Numerical Analysis: Interpolation Part 1 Computer Science, Ben-Gurion University (slides based mostly on Prof. Ben-Shahar s notes) 2018/2019, Fall Semester BGU CS Interpolation (ver. 1.00) AY 2018/2019,

More information

Design and Analysis of Algorithms Lecture Notes on Convex Optimization CS 6820, Fall Nov 2 Dec 2016

Design and Analysis of Algorithms Lecture Notes on Convex Optimization CS 6820, Fall Nov 2 Dec 2016 Design and Analysis of Algorithms Lecture Notes on Convex Optimization CS 6820, Fall 206 2 Nov 2 Dec 206 Let D be a convex subset of R n. A function f : D R is convex if it satisfies f(tx + ( t)y) tf(x)

More information

5 Handling Constraints

5 Handling Constraints 5 Handling Constraints Engineering design optimization problems are very rarely unconstrained. Moreover, the constraints that appear in these problems are typically nonlinear. This motivates our interest

More information

Statistical Approaches to Learning and Discovery

Statistical Approaches to Learning and Discovery Statistical Approaches to Learning and Discovery Graphical Models Zoubin Ghahramani & Teddy Seidenfeld zoubin@cs.cmu.edu & teddy@stat.cmu.edu CALD / CS / Statistics / Philosophy Carnegie Mellon University

More information

DS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra.

DS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra. DS-GA 1002 Lecture notes 0 Fall 2016 Linear Algebra These notes provide a review of basic concepts in linear algebra. 1 Vector spaces You are no doubt familiar with vectors in R 2 or R 3, i.e. [ ] 1.1

More information

RMT 2013 Power Round Solutions February 2, 2013

RMT 2013 Power Round Solutions February 2, 2013 RMT 013 Power Round Solutions February, 013 1. (a) (i) {0, 5, 7, 10, 11, 1, 14} {n N 0 : n 15}. (ii) Yes, 5, 7, 11, 16 can be generated by a set of fewer than 4 elements. Specifically, it is generated

More information

Binary Compressive Sensing via Analog. Fountain Coding

Binary Compressive Sensing via Analog. Fountain Coding Binary Compressive Sensing via Analog 1 Fountain Coding Mahyar Shirvanimoghaddam, Member, IEEE, Yonghui Li, Senior Member, IEEE, Branka Vucetic, Fellow, IEEE, and Jinhong Yuan, Senior Member, IEEE, arxiv:1508.03401v1

More information

Chapter 8 Cluster Graph & Belief Propagation. Probabilistic Graphical Models 2016 Fall

Chapter 8 Cluster Graph & Belief Propagation. Probabilistic Graphical Models 2016 Fall Chapter 8 Cluster Graph & elief ropagation robabilistic Graphical Models 2016 Fall Outlines Variable Elimination 消元法 imple case: linear chain ayesian networks VE in complex graphs Inferences in HMMs and

More information

Submodularity in Machine Learning

Submodularity in Machine Learning Saifuddin Syed MLRG Summer 2016 1 / 39 What are submodular functions Outline 1 What are submodular functions Motivation Submodularity and Concavity Examples 2 Properties of submodular functions Submodularity

More information

Estimates for probabilities of independent events and infinite series

Estimates for probabilities of independent events and infinite series Estimates for probabilities of independent events and infinite series Jürgen Grahl and Shahar evo September 9, 06 arxiv:609.0894v [math.pr] 8 Sep 06 Abstract This paper deals with finite or infinite sequences

More information

Min-Max Message Passing and Local Consistency in Constraint Networks

Min-Max Message Passing and Local Consistency in Constraint Networks Min-Max Message Passing and Local Consistency in Constraint Networks Hong Xu, T. K. Satish Kumar, and Sven Koenig University of Southern California, Los Angeles, CA 90089, USA hongx@usc.edu tkskwork@gmail.com

More information

Connectedness of Random Walk Segmentation

Connectedness of Random Walk Segmentation Connectedness of Random Walk Segmentation Ming-Ming Cheng and Guo-Xin Zhang TNList, Tsinghua University, Beijing, China. Abstract Connectedness of random walk segmentation is examined, and novel properties

More information

Anytime Planning for Decentralized Multi-Robot Active Information Gathering

Anytime Planning for Decentralized Multi-Robot Active Information Gathering Anytime Planning for Decentralized Multi-Robot Active Information Gathering Brent Schlotfeldt 1 Dinesh Thakur 1 Nikolay Atanasov 2 Vijay Kumar 1 George Pappas 1 1 GRASP Laboratory University of Pennsylvania

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Brown University CSCI 2950-P, Spring 2013 Prof. Erik Sudderth Lecture 12: Gaussian Belief Propagation, State Space Models and Kalman Filters Guest Kalman Filter Lecture by

More information

September Math Course: First Order Derivative

September Math Course: First Order Derivative September Math Course: First Order Derivative Arina Nikandrova Functions Function y = f (x), where x is either be a scalar or a vector of several variables (x,..., x n ), can be thought of as a rule which

More information

5. Duality. Lagrangian

5. Duality. Lagrangian 5. Duality Convex Optimization Boyd & Vandenberghe Lagrange dual problem weak and strong duality geometric interpretation optimality conditions perturbation and sensitivity analysis examples generalized

More information

Connectedness. Proposition 2.2. The following are equivalent for a topological space (X, T ).

Connectedness. Proposition 2.2. The following are equivalent for a topological space (X, T ). Connectedness 1 Motivation Connectedness is the sort of topological property that students love. Its definition is intuitive and easy to understand, and it is a powerful tool in proofs of well-known results.

More information

Lecture 4 October 18th

Lecture 4 October 18th Directed and undirected graphical models Fall 2017 Lecture 4 October 18th Lecturer: Guillaume Obozinski Scribe: In this lecture, we will assume that all random variables are discrete, to keep notations

More information

5. Sum-product algorithm

5. Sum-product algorithm Sum-product algorithm 5-1 5. Sum-product algorithm Elimination algorithm Sum-product algorithm on a line Sum-product algorithm on a tree Sum-product algorithm 5-2 Inference tasks on graphical models consider

More information

Maxime GUILLAUD. Huawei Technologies Mathematical and Algorithmic Sciences Laboratory, Paris

Maxime GUILLAUD. Huawei Technologies Mathematical and Algorithmic Sciences Laboratory, Paris 1/21 Maxime GUILLAUD Alignment Huawei Technologies Mathematical and Algorithmic Sciences Laboratory, Paris maxime.guillaud@huawei.com http://research.mguillaud.net/ Optimisation Géométrique sur les Variétés

More information

Convexity in R n. The following lemma will be needed in a while. Lemma 1 Let x E, u R n. If τ I(x, u), τ 0, define. f(x + τu) f(x). τ.

Convexity in R n. The following lemma will be needed in a while. Lemma 1 Let x E, u R n. If τ I(x, u), τ 0, define. f(x + τu) f(x). τ. Convexity in R n Let E be a convex subset of R n. A function f : E (, ] is convex iff f(tx + (1 t)y) (1 t)f(x) + tf(y) x, y E, t [0, 1]. A similar definition holds in any vector space. A topology is needed

More information

1 Adjacency matrix and eigenvalues

1 Adjacency matrix and eigenvalues CSC 5170: Theory of Computational Complexity Lecture 7 The Chinese University of Hong Kong 1 March 2010 Our objective of study today is the random walk algorithm for deciding if two vertices in an undirected

More information

The Multi-Agent Rendezvous Problem - The Asynchronous Case

The Multi-Agent Rendezvous Problem - The Asynchronous Case 43rd IEEE Conference on Decision and Control December 14-17, 2004 Atlantis, Paradise Island, Bahamas WeB03.3 The Multi-Agent Rendezvous Problem - The Asynchronous Case J. Lin and A.S. Morse Yale University

More information

Partitioned Covariance Matrices and Partial Correlations. Proposition 1 Let the (p + q) (p + q) covariance matrix C > 0 be partitioned as C = C11 C 12

Partitioned Covariance Matrices and Partial Correlations. Proposition 1 Let the (p + q) (p + q) covariance matrix C > 0 be partitioned as C = C11 C 12 Partitioned Covariance Matrices and Partial Correlations Proposition 1 Let the (p + q (p + q covariance matrix C > 0 be partitioned as ( C11 C C = 12 C 21 C 22 Then the symmetric matrix C > 0 has the following

More information

Expectation Propagation Algorithm

Expectation Propagation Algorithm Expectation Propagation Algorithm 1 Shuang Wang School of Electrical and Computer Engineering University of Oklahoma, Tulsa, OK, 74135 Email: {shuangwang}@ou.edu This note contains three parts. First,

More information

2. Intersection Multiplicities

2. Intersection Multiplicities 2. Intersection Multiplicities 11 2. Intersection Multiplicities Let us start our study of curves by introducing the concept of intersection multiplicity, which will be central throughout these notes.

More information

11 The Max-Product Algorithm

11 The Max-Product Algorithm Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science 6.438 Algorithms for Inference Fall 2014 11 The Max-Product Algorithm In the previous lecture, we introduced

More information

RECOVERING NORMAL NETWORKS FROM SHORTEST INTER-TAXA DISTANCE INFORMATION

RECOVERING NORMAL NETWORKS FROM SHORTEST INTER-TAXA DISTANCE INFORMATION RECOVERING NORMAL NETWORKS FROM SHORTEST INTER-TAXA DISTANCE INFORMATION MAGNUS BORDEWICH, KATHARINA T. HUBER, VINCENT MOULTON, AND CHARLES SEMPLE Abstract. Phylogenetic networks are a type of leaf-labelled,

More information

List of Errata for the Book A Mathematical Introduction to Compressive Sensing

List of Errata for the Book A Mathematical Introduction to Compressive Sensing List of Errata for the Book A Mathematical Introduction to Compressive Sensing Simon Foucart and Holger Rauhut This list was last updated on September 22, 2018. If you see further errors, please send us

More information

Prioritized Sweeping Converges to the Optimal Value Function

Prioritized Sweeping Converges to the Optimal Value Function Technical Report DCS-TR-631 Prioritized Sweeping Converges to the Optimal Value Function Lihong Li and Michael L. Littman {lihong,mlittman}@cs.rutgers.edu RL 3 Laboratory Department of Computer Science

More information

Characterization of Convex and Concave Resource Allocation Problems in Interference Coupled Wireless Systems

Characterization of Convex and Concave Resource Allocation Problems in Interference Coupled Wireless Systems 2382 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL 59, NO 5, MAY 2011 Characterization of Convex and Concave Resource Allocation Problems in Interference Coupled Wireless Systems Holger Boche, Fellow, IEEE,

More information

Recitation 9: Loopy BP

Recitation 9: Loopy BP Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science 6.438 Algorithms For Inference Fall 204 Recitation 9: Loopy BP General Comments. In terms of implementation,

More information

Supplemental for Spectral Algorithm For Latent Tree Graphical Models

Supplemental for Spectral Algorithm For Latent Tree Graphical Models Supplemental for Spectral Algorithm For Latent Tree Graphical Models Ankur P. Parikh, Le Song, Eric P. Xing The supplemental contains 3 main things. 1. The first is network plots of the latent variable

More information

Inference in Graphical Models Variable Elimination and Message Passing Algorithm

Inference in Graphical Models Variable Elimination and Message Passing Algorithm Inference in Graphical Models Variable Elimination and Message Passing lgorithm Le Song Machine Learning II: dvanced Topics SE 8803ML, Spring 2012 onditional Independence ssumptions Local Markov ssumption

More information

A Distributed Newton Method for Network Utility Maximization, II: Convergence

A Distributed Newton Method for Network Utility Maximization, II: Convergence A Distributed Newton Method for Network Utility Maximization, II: Convergence Ermin Wei, Asuman Ozdaglar, and Ali Jadbabaie October 31, 2012 Abstract The existing distributed algorithms for Network Utility

More information

Decentralized Control of Discrete Event Systems with Bounded or Unbounded Delay Communication

Decentralized Control of Discrete Event Systems with Bounded or Unbounded Delay Communication Decentralized Control of Discrete Event Systems with Bounded or Unbounded Delay Communication Stavros Tripakis Abstract We introduce problems of decentralized control with communication, where we explicitly

More information

Assignment 1: From the Definition of Convexity to Helley Theorem

Assignment 1: From the Definition of Convexity to Helley Theorem Assignment 1: From the Definition of Convexity to Helley Theorem Exercise 1 Mark in the following list the sets which are convex: 1. {x R 2 : x 1 + i 2 x 2 1, i = 1,..., 10} 2. {x R 2 : x 2 1 + 2ix 1x

More information

The Strong Largeur d Arborescence

The Strong Largeur d Arborescence The Strong Largeur d Arborescence Rik Steenkamp (5887321) November 12, 2013 Master Thesis Supervisor: prof.dr. Monique Laurent Local Supervisor: prof.dr. Alexander Schrijver KdV Institute for Mathematics

More information

Distributed Optimization over Networks Gossip-Based Algorithms

Distributed Optimization over Networks Gossip-Based Algorithms Distributed Optimization over Networks Gossip-Based Algorithms Angelia Nedić angelia@illinois.edu ISE Department and Coordinated Science Laboratory University of Illinois at Urbana-Champaign Outline Random

More information

SPARSE signal representations have gained popularity in recent

SPARSE signal representations have gained popularity in recent 6958 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 57, NO. 10, OCTOBER 2011 Blind Compressed Sensing Sivan Gleichman and Yonina C. Eldar, Senior Member, IEEE Abstract The fundamental principle underlying

More information

Lagrange Duality. Daniel P. Palomar. Hong Kong University of Science and Technology (HKUST)

Lagrange Duality. Daniel P. Palomar. Hong Kong University of Science and Technology (HKUST) Lagrange Duality Daniel P. Palomar Hong Kong University of Science and Technology (HKUST) ELEC5470 - Convex Optimization Fall 2017-18, HKUST, Hong Kong Outline of Lecture Lagrangian Dual function Dual

More information

A Framework for Training-Based Estimation in Arbitrarily Correlated Rician MIMO Channels with Rician Disturbance

A Framework for Training-Based Estimation in Arbitrarily Correlated Rician MIMO Channels with Rician Disturbance A Framework for Training-Based Estimation in Arbitrarily Correlated Rician MIMO Channels with Rician Disturbance IEEE TRANSACTIONS ON SIGNAL PROCESSING Volume 58, Issue 3, Pages 1807-1820, March 2010.

More information

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014 Bayesian Networks: Construction, Inference, Learning and Causal Interpretation Volker Tresp Summer 2014 1 Introduction So far we were mostly concerned with supervised learning: we predicted one or several

More information

Lecture 8: Linear Algebra Background

Lecture 8: Linear Algebra Background CSE 521: Design and Analysis of Algorithms I Winter 2017 Lecture 8: Linear Algebra Background Lecturer: Shayan Oveis Gharan 2/1/2017 Scribe: Swati Padmanabhan Disclaimer: These notes have not been subjected

More information

Locating the Source of Diffusion in Large-Scale Networks

Locating the Source of Diffusion in Large-Scale Networks Locating the Source of Diffusion in Large-Scale Networks Supplemental Material Pedro C. Pinto, Patrick Thiran, Martin Vetterli Contents S1. Detailed Proof of Proposition 1..................................

More information

Continuous-Model Communication Complexity with Application in Distributed Resource Allocation in Wireless Ad hoc Networks

Continuous-Model Communication Complexity with Application in Distributed Resource Allocation in Wireless Ad hoc Networks Continuous-Model Communication Complexity with Application in Distributed Resource Allocation in Wireless Ad hoc Networks Husheng Li 1 and Huaiyu Dai 2 1 Department of Electrical Engineering and Computer

More information

Least Sparsity of p-norm based Optimization Problems with p > 1

Least Sparsity of p-norm based Optimization Problems with p > 1 Least Sparsity of p-norm based Optimization Problems with p > Jinglai Shen and Seyedahmad Mousavi Original version: July, 07; Revision: February, 08 Abstract Motivated by l p -optimization arising from

More information

Structured Variational Inference

Structured Variational Inference Structured Variational Inference Sargur srihari@cedar.buffalo.edu 1 Topics 1. Structured Variational Approximations 1. The Mean Field Approximation 1. The Mean Field Energy 2. Maximizing the energy functional:

More information

Learning from Sensor Data: Set II. Behnaam Aazhang J.S. Abercombie Professor Electrical and Computer Engineering Rice University

Learning from Sensor Data: Set II. Behnaam Aazhang J.S. Abercombie Professor Electrical and Computer Engineering Rice University Learning from Sensor Data: Set II Behnaam Aazhang J.S. Abercombie Professor Electrical and Computer Engineering Rice University 1 6. Data Representation The approach for learning from data Probabilistic

More information

Unsupervised Learning with Permuted Data

Unsupervised Learning with Permuted Data Unsupervised Learning with Permuted Data Sergey Kirshner skirshne@ics.uci.edu Sridevi Parise sparise@ics.uci.edu Padhraic Smyth smyth@ics.uci.edu School of Information and Computer Science, University

More information

Convex Optimization Boyd & Vandenberghe. 5. Duality

Convex Optimization Boyd & Vandenberghe. 5. Duality 5. Duality Convex Optimization Boyd & Vandenberghe Lagrange dual problem weak and strong duality geometric interpretation optimality conditions perturbation and sensitivity analysis examples generalized

More information

LECTURE 15: COMPLETENESS AND CONVEXITY

LECTURE 15: COMPLETENESS AND CONVEXITY LECTURE 15: COMPLETENESS AND CONVEXITY 1. The Hopf-Rinow Theorem Recall that a Riemannian manifold (M, g) is called geodesically complete if the maximal defining interval of any geodesic is R. On the other

More information

p L yi z n m x N n xi

p L yi z n m x N n xi y i z n x n N x i Overview Directed and undirected graphs Conditional independence Exact inference Latent variables and EM Variational inference Books statistical perspective Graphical Models, S. Lauritzen

More information

STATS 306B: Unsupervised Learning Spring Lecture 5 April 14

STATS 306B: Unsupervised Learning Spring Lecture 5 April 14 STATS 306B: Unsupervised Learning Spring 2014 Lecture 5 April 14 Lecturer: Lester Mackey Scribe: Brian Do and Robin Jia 5.1 Discrete Hidden Markov Models 5.1.1 Recap In the last lecture, we introduced

More information

APPENDIX A. Background Mathematics. A.1 Linear Algebra. Vector algebra. Let x denote the n-dimensional column vector with components x 1 x 2.

APPENDIX A. Background Mathematics. A.1 Linear Algebra. Vector algebra. Let x denote the n-dimensional column vector with components x 1 x 2. APPENDIX A Background Mathematics A. Linear Algebra A.. Vector algebra Let x denote the n-dimensional column vector with components 0 x x 2 B C @. A x n Definition 6 (scalar product). The scalar product

More information