Semidefinite relaxations for approximate inference on graphs with cycles

Size: px
Start display at page:

Download "Semidefinite relaxations for approximate inference on graphs with cycles"

Transcription

1 Semidefinite relaxations for approximate inference on graphs with cycles Martin J. Wainwright Electrical Engineering and Computer Science UC Berkeley, Berkeley, CA Michael I. Jordan Computer Science and Statistics UC Berkeley, Berkeley, CA Abstract We present a new method for calculating approximate marginals for probability distributions defined by graphs with cycles, based on a Gaussian entropy bound combined with a semidefinite outer bound on the marginal polytope. This combination leads to a log-determinant maximization problem that can be solved by efficient interior point methods [8]. As with the Bethe approximation and its generalizations [12], the optimizing arguments of this problem can be taken as approximations to the exact marginals. In contrast to Bethe/Kikuchi approaches, our variational problem is strictly convex and so has a unique global optimum. An additional desirable feature is that the value of the optimal solution is guaranteed to provide an upper bound on the log partition function. In experimental trials, the performance of the log-determinant relaxation is comparable to or better than the sum-product algorithm, and by a substantial margin for certain problem classes. Finally, the zero-temperature limit of our log-determinant relaxation recovers a class of well-known semidefinite relaxations for integer programming [e.g., 3]. 1 Introduction Given a probability distribution defined by a graphical model (e.g., Markov random field, factor graph), a key problem is the computation of marginal distributions. Although highly efficient algorithms exist for trees, exact solutions are prohibitively complex for more general graphs of any substantial size. This difficulty motivates the use of algorithms for computing approximations to marginal distributions, a problem to which we refer as approximate inference. One widely-used algorithm is the belief propagation or sum-product algorithm. As shown by Yedidia et al. [12], it can be interpreted as a method for attempting to solve a variational problem wherein the exact entropy is replaced by the Bethe approximation. Moreover, Yedidia et al. proposed extensions to the Bethe approximation based on clustering operations. An unattractive feature of the Bethe approach and its extensions is that with certain exceptions [e.g., 6], the associated variational problems are typically not convex, thus leading to algorithmic complications, and also raising the possibility of multiple local optima. Secondly, in contrast to other variational methods (e.g., mean field [4]), the optimal values of Bethe-type variational problems fail to provide bounds on the log partition function. This

2 function arises in various contexts, including approximate parameter estimation and large deviations exponents, so that such bounds are of interest in their own right. This paper introduces a new class of variational problems that are both convex and provide upper bounds. Our derivation relies on a Gaussian upper bound on the discrete entropy of a suitably regularized random vector, and a semidefinite outer bound on the set of valid marginal distributions. The combination leads to a log-determinant maximization problem with a unique optimum that can be found by efficient interior point methods [8]. As with the Bethe/Kikuchi approximations and sum-product algorithms, the optimizing arguments of this problem can be taken as approximations to the marginal distributions of the underlying graphical model. Moreover, taking the zero-temperature limit recovers a class of wellknown semidefinite programming relaxations for integer programming problems [e.g., 3]. 2 Problem set-up We consider an undirected graph G = (V, E) with n = V nodes. Associated with each vertex s V is a random variable x s taking values in the discrete space X = {0, 1,..., m 1}. We let x = {x s s V } denote a random vector taking values in the Cartesian product space X n. Our analysis makes use of the following exponential representation of a graph-structured distribution p(x). For some index set I, we let φ = {φ α α I} denote a collection of potential functions associated with the cliques of G, and let θ = {θ α α I} be a vector of parameters associated with these potential functions. The exponential family determined by φ is the following collection: p(x; θ) = exp { θ α φ α (x) Φ(θ) } (1a) Φ(θ) = log α x X n exp { θ α φ α (x) }. Here Φ(θ) is the log partition function that serves to normalize the distribution. In a minimal representation, the functions {φ α } are affinely independent, and d = I corresponds to the dimension of the family. For example, one minimal representation of a binaryvalued random vector on a graph with pairwise cliques is the standard Ising model, in which φ = {x s s V } { x s x t (s, t) E}. Here the index set I = V E, and d = n + E. In order to incorporate higher order interactions, we simply add higher degree monomials (e.g., x s x t x u for a third order interaction) to the collection of potential functions. Similar representations exist for discrete processes on alphabets with m > 2 elements. 2.1 Duality and marginal polytopes It is well known that Φ is convex in terms of θ, and strictly so for a minimal representation. Accordingly, it is natural to consider its conjugate dual function, which is defined by the relation: α (1b) Φ (µ) = sup θ R d { µ, θ Φ(θ)}. (2) Here the vector of dual variables µ is the same dimension as exponential parameter θ (i.e., µ R d ). It is straightforward to show that the partial derivatives of Φ with respect to θ correspond to cumulants of φ(x); in particular, the first order derivatives define mean parameters: Φ (θ) = p(x; θ)φ α (x) = E θ [φ α (x)]. (3) θ α x X n

3 In order to compute Φ (µ) for a given µ, we take the derivative with respect to θ of the quantity within curly braces in Eqn. (2). Setting this derivative to zero and making use of Eqn. (3) yields defining conditions for a vector θ(µ) attaining the optimum in Eqn. (2): µ α = E θ(µ) [φ α (x)] α I (4) It can be shown [10] that Eqn. (4) has a solution if and only if µ belongs to the relative interior of the set: MARG(G; φ) = { µ R d p(x) φ(x) = µ for some p( )} (5) x X n Note that this set is equivalent to the convex hull of the finite collection of vectors {φ(x) x X n }; consequently, the Minkowski-Weyl theorem [7] guarantees that it can be characterized by a finite number of linear inequality constraints. We refer to this set as the marginal polytope 1 associated with the graph G and the potentials φ. In order to calculate an explicit form for Φ (µ) for any µ MARG(G; φ), we substitute the relation in Eqn. (4) into the definition of Φ, thereby obtaining: Φ (µ) = µ, θ(µ) Φ(θ(µ)) = p(x; θ(µ)) log p(x; θ(µ)). (6) x X n This relation establishes that for µ in the relative interior of MARG(G; φ), the value of the conjugate dual Φ (µ) is given by the negative entropy of the distribution p(x; θ(µ)), where the pair θ(µ) and µ are dually coupled via Eqn. (4). For µ / cl MARG(G; φ), it can be shown [10] that the value of the dual is +. Since Φ is lower semi-continuous, taking the conjugate twice recovers the original function [7]; applying this fact to Φ and Φ, we obtain the following relation: Φ(θ) = max µ MARG(G;φ) { θ, µ Φ (µ) }. (7) Moreover, we are guaranteed that the optimum is attained uniquely at the exact marginals µ = {µ α } of p(x; θ). This variational formulation plays a central role in our development in the sequel. 2.2 Challenges with the variational formulation There are two difficulties associated with the variational formulation (7). First of all, observe that the (negative) entropy Φ, as a function of only the mean parameters µ, is implicitly defined; indeed, it is typically impossible to specify an explicit form for Φ. Key exceptions are trees and hypertrees, for which the entropy is well-known to decompose into a sum of local entropies defined by local marginals on the (hyper)edges [1]. Secondly, for a general graph with cycles, the marginal polytope MARG(G; φ) is defined by a number of inequalities that grows rapidly in graph size [e.g., 2]. Trees and hypertrees again are important exceptions: in this case, the junction tree theorem [e.g., 1] provides a compact representation of the associated marginal polytopes. The Bethe approach (and its generalizations) can be understood as consisting of two steps: (a) replacing the exact entropy Φ with a tree (or hypertree) approximation; and (b) replacing the marginal polytope MARG(G; φ) with constraint sets defined by tree (or hypertree) consistency conditions. However, since the (hyper)tree approximations used do not bound the exact entropy, the optimal values of Bethe-type variational problems do not provide a bound on the value of the log partition function Φ(θ). Requirements for bounding Φ are both an outer bound on the marginal polytope, as well as an upper bound on the entropy Φ. 1 When φ α corresponds to an indicator function, then µ α is a marginal probability; otherwise, this choice entails a minor abuse of terminology.

4 3 Log-determinant relaxation In this section, we state and prove a set of upper bounds based on the solution of a variational problem involving determinant maximization and semidefinite constraints. Although the ideas and methods described here are more generally applicable, for the sake of clarity in exposition we focus here on the case of a binary vector x { 1, +1} n of spins. It is also convenient to define all problems with respect to the complete graph K n (i.e., fully connected). We use the standard (minimal) Ising representation for a binary problem, in terms of the potential functions φ = {x s s V } {x s x t (s, t)}. On the complete graph, there are d = n + ( n 2) such potential functions in total. Of course, any problem can be embedded into the complete graph by setting to zero a subset of the {θ st } parameters. (In particular, for a graph G = (V, E), we simply set θ st = 0 for all pairs (s, t) / E.) 3.1 Outer bounds on the marginal polytope We first focus on the marginal polytope MARG(K n ) MARG(K n ; φ) of valid dual variables {µ s, µ st }, as defined in Eqn. (5). In this section, we describe a set of semidefinite and linear constraints that any valid dual vector µ MARG(K n ) must satisfy Semidefinite constraints Given an arbitrary vector µ R d, consider the following (n + 1) (n + 1) matrix: 1 µ 1 µ 2 µ n 1 µ n µ 1 1 µ 12 µ 1n µ 2 µ 21 1 µ 2n M 1 [µ] := µ. n 1... µn,(n 1) µ n µ n1 µ n2 µ (n 1),n 1 The motivation underlying this definition is the following: suppose that the given dual vector µ actually belongs to MARG(K n ), in which case there exists some distribution p(x; θ) such that µ s = x p(x; θ) x s and µ st = x p(x; θ) x sx t. Thus, if µ MARG(K n ), the matrix M 1 [µ] can be interpreted as the matrix of second order moments for the vector (1, x), as computed under p(x; θ). (Note in particular that the diagonal elements are all one, since x 2 s = 1 when x s { 1, +1}.) Since any such moment matrix must be positive semidefinite, 2 we have established the following: Lemma 1 (Semidefinite outer bound). The binary marginal polytope MARG(K n ) is contained within the semidefinite constraint set: SDEF 1 := { µ R d M1 [µ] 0 } (9) This semidefinite relaxation can be further strengthened by including higher order terms in the moment matrices [5] Additional linear constraints It is straightforward to augment these semidefinite constraints with additional linear constraints. Here we focus in particular on two classes of constraints, referred to as rooted and unrooted triangle inequalities by Deza and Laurent [2], that are of especial relevance in the graphical model setting. 2 To be explicit, letting z = (1, x), then for any vector a R n+1, we have a T M 1[µ]a = a T E[zz T ]a = E[ a T z 2 ], which is certainly non-negative. (8)

5 Pairwise edge constraints: It is natural to require that the subset of mean parameters associated with each pair of random variables (x s, x t ) namely, µ s, µ t and µ st specify a valid pairwise marginal distribution. Letting {a, b} take values in { 1, +1} 2, consider the set of four linear constraints of the following form: 1 + a µ s + b µ t + ab µ st 0. (10) It can be shown [11, 10] that these constraints are necessary and sufficient to guarantee the existence of a consistent pairwise marginal. By the junction tree theorem [1], this pairwise consistency guarantees that the constraints of Eqn. (10) provide a complete description of the binary marginal polytope for any tree-structured graph. Moreover, for a general graph with cycles, they are equivalent to the tree-consistent constraint set used in the Bethe approach [12] when applied to a binary vector x { 1, +1} n. Triplet constraints: Local consistency can be extended to triplets {x s, x t, x u }, and even more generally to higher order subsets. For the triplet case, consider the following set of constraints (and permutations thereof) among the pairwise mean parameters {µ st, µ su, µ tu }: µ st + µ su + µ tu 1, µ st µ su µ tu 1. (11) It can be shown [11, 10] that these constraints, in conjunction with the pairwise constraints (10), are necessary and sufficient to ensure that the collection of mean parameters {µ s, µ t, µ u, µ st, µ su, µ tu } uniquely determine a valid marginal over the triplet (x s, x t, x u ). Once again, by applying the junction tree theorem [1], we conclude that the constraints (10) and (11) provide a complete characterization of the binary marginal polytope for hypertrees of width two. It is worthwhile observing that this set of constraints is equivalent to those that are implicitly enforced by any Kikuchi approximation [12] with clusters of size three (when applied to a binary problem). 3.2 Gaussian entropy bound We now turn to the task of upper bounding the entropy. Our starting point is the familiar interpretation of the Gaussian as the maximum entropy distribution subject to covariance constraints: Lemma 2. The (differential) entropy h( x) := p( x) log p( x)d x is upper bounded by the entropy 1 2 log det cov( x) + n 2 log(2πe) of a Gaussian with matched covariance. Of interest to us is the discrete entropy of a discrete-valued random vector x { 1, +1} n, whereas the Gaussian bound of Lemma 2 applies to the differential entropy of a continuousvalued random vector. Therefore, we need to convert our discrete vector to the continuous space. In order to do so, we define a new continuous random vector via x = 1 2x + u, where u is a random vector independent of x, with each element independently and identically distributed 3 as u s U[ 1 2, 1 2 ]. The motivation for rescaling x by 1 2 is to pack the boxes together as tightly as possible. Lemma 3. We have h( x) = H(x), where h and H denote the differential and discrete entropies of x and x respectively. Proof. By construction, the differential entropy can be decomposed as a sum of integrals over hyperboxes of unit volume, one for each configuration, over which the probability density of x is constant. 3 The notation U[a, b] denotes the uniform distribution on the interval [a, b].

6 3.3 Log-determinant relaxation Equipped with these building blocks, we are now ready to state and prove a log-determinant relaxation for the log partition function. Theorem 1. Let x { 1, +1} n, and let OUT(K n ) be any convex outer bound on MARG(K n ) that is contained within SDEF 1. Then there holds { Φ(θ) max θ, µ + 1 µ OUT(K n) 2 log det [ M 1 (µ) blkdiag[0, I n] ]} + n 2 log(πe 2 ) (12) where blkdiag[0, I n ] is an (n+1) (n+1) block-diagonal matrix. Moreover, the optimum is attained at a unique µ OUT(K n ). Proof. For any µ MARG(K n ), let x be a random vector with these mean parameters. Consider the continuous-valued random vector x = 1 2x + u. From Lemma 3, we have H(x) = h( x); combining this equality with Lemma 2, we obtain the upper bound H(x) 1 2 log det cov( x) + n 2 log(2πe). Since x and u are independent and u U[ 1/2, 1/2], we can write cov( x) = 1 4 cov(x) I n. Next we use the Schur complement formula [8] to express the log determinant as follows: log det cov( x) = log det { M 1 [µ] blkdiag[0, I n] } + n log 1 4. (13) Combining Eqn. (13) with the Gaussian upper bound leads to the following expression: H(x) = Φ (µ) 1 2 log det ( M 1 [µ] blkdiag[0, I n] ) + n 2 log(πe 2 ) Substituting this upper bound into the variational representation of Eqn. (7) and using the fact that OUT(K n ) is an outer bound on MARG(G) yields Eqn. (12). By construction, the cost function is strictly convex so that the optimum is unique. The inclusion OUT(K n ) SDEF 1 in the statement of Theorem 1 guarantees that the matrix M 1 (µ) will always be positive semidefinite. Importantly, the optimization problem in Eqn. (12) is a determinant maximization problem, for which efficient interior point methods have been developed [e.g., 8]. 4 Experimental results The relevance of the log-determinant relaxation for applications is two-fold: it provides upper bounds on the log partition function, and the maximizing arguments µ OUT(K n ) of Eqn. (12) can be taken as approximations to the exact marginals of the distribution p(x; θ). So as to test its performance in computing approximate marginals, we performed extensive experiments on the complete graph (fully connected) and the 2-D nearestneighbor lattice model. We treated relatively small problems with 16 nodes so as to enable comparison to the exact answer. For any given trial, we specified the distribution p(x; θ) by randomly choosing θ as follows. The single node parameters were chosen as θ s U[ 0.25, 0.25] independently 4 for each node. For a given coupling strength d coup > 0, we investigated three possible types of coupling: (a) for repulsive interactions, θ st U[ 2d coup, 0]; (b) for mixed interactions, θ st U[ d coup, +d coup ]; (c) for attractive interactions, θ st U[0, 2d coup ]. For each distribution p(x; θ), we performed the following computations: (a) the exact marginal probability p(x s = 1; θ) at each node; and (b) approximate marginals computed 4 Here U[a, b] denotes the uniform distribution on [a, b].

7 from the Bethe approximation with the sum-product algorithm, or (c) log-determinant approximate marginals from Theorem 1 using the outer bound OUT(K n ) given by the first semidefinite relaxation SDEF 1 in conjunction with the pairwise linear constraints in Eqn. (10). We computed the exact marginal values either by exhaustive summation (complete graph), or by the junction tree algorithm (lattices). We used the standard parallel message-passing form of the sum-product algorithm with a damping factor 5 γ = The log-determinant problem of Theorem 1 was solved using interior point methods [8]. For each graph (fully connected or grid), we examined a total of 6 conditions: 2 different potential strengths (weak or strong) for each of the 3 types of coupling (attractive, mixed, and repulsive). We computed the l 1 -error 1 n n s=1 p(x s = 1; θ) µ s, where µ s was the approximate marginal computed either by SP or by LD. Problem type Sum-product Method Log-determinant Graph Coupling Strength Median Range Median Range R (0.25, 0.25) [0.01, 0.10] [0.01, 0.03] R (0.25, 0.50) [0.03, 0.20] [0.01, 0.04] Full M (0.25, 0.25) [0.00, 0.04] [0.01, 0.03] M (0.25, 0.50) [0.01, 0.31] [0.01, 0.06] A (0.25, 0.06) [0.00, 0.08] [0.01, 0.06] A (0.25, 0.12) [0.08, 0.86] [0.01, 0.09] R (0.25, 1.0) [0.04, 0.59] [0.01, 0.12] R (0.25, 2.0) [0.04, 0.78] [0.00, 0.12] Grid M (0.25, 1.0) [0.00, 0.20] [0.01, 0.02] M (0.25, 2.0) [0.01, 0.54] [0.01, 0.11] A (0.25, 1.0) [0.06, 0.90] [0.01, 0.13] A (0.25, 2.0) [0.06, 0.94] [0.00, 0.12] Table 1. Statistics of the l 1-approximation error for the sum-product (SP) and logdeterminant (LD) methods for the fully connected graph K 16, as well as the 4-nearest neighbor grid with 16 nodes, with varying coupling and potential strengths. Table 1 shows quantitative results for 100 trials performed in each of the 12 experimental conditions, including only those trials for which SP converged. The potential strength is given as the pair (d obs, d coup ); note that d obs = 0.25 in all trials. For each method, we show the sample median, and the range [min, max] of the errors. Overall, the performance of LD is better than that of SP, and often substantially so. The performance of SP is slightly better in the regime of weak coupling and relatively strong observations (θ s values); see the entries marked with in the table. In the remaining cases, the LD method outperforms SP, and with a large margin for many examples with strong coupling. The two methods also differ substantially in the ranges of the approximation error. The SP method exhibits some instability, with the error for certain problems being larger than 0.5; for the same problems, the LD error ranges are much smaller, with a worst case maximum error over all trials and conditions of In addition, the behavior of SP can change dramatically between the weakly coupled and strongly coupled conditions, whereas the LD results remain stable. 5 More precisely, we updated messages in the log domain as γ log M new st + (1 γ) log M old st.

8 5 Discussion In this paper, we developed a new method for approximate inference based on the combination of a Gaussian entropy bound with semidefinite constraints on the marginal polytope. The resultant log-determinant maximization problem can be solved by efficient interior point methods [8]. In experimental trials, the log-determinant method was either comparable or better than the sum-product algorithm, and by a substantial margin for certain problem classes. Of particular interest is that, in contrast to the sum-product algorithm, the performance degrades gracefully as the interaction strength is increased. It can be shown [11, 10] that in the zero-temperature limit, the log-determinant relaxation (12) reduces to a class of semidefinite relaxations that are widely used in combinatorial optimization. One open question is whether techniques for bounding the performance of such semidefinite relaxations [e.g., 3] can be adapted to the finite temperature case. Although this paper focused exclusively on the binary problem, the methods described here can be extended to other classes of random variables. It remains to develop a deeper understanding of the interaction between the two components to these approximations (i.e., the entropy bound, and the outer bound on the marginal polytope), as well as how to tailor approximations to particular graph structures. Finally, semidefinite constraints can be combined with entropy approximations (preferably convex) other than the Gaussian bound used in this paper, among them convexified Bethe/Kikuchi entropy approximations [9]. Acknowledgements: Thanks to Constantine Caramanis and Laurent El Ghaoui for helpful discussions. Work funded by NSF grant IIS , ARO MURI DAA , and a grant from Intel Corporation. References [1] R. G. Cowell, A. P. Dawid, S. L. Lauritzen, and D. J. Spiegelhalter. Probabilistic networks and expert systems. Statistics for Engineering and Information Science. Springer-Verlag, [2] M. Deza and M. Laurent. Geometry of cuts and metric embeddings. Springer-Verlag, New York, [3] M. X. Goemans and D. P. Williamson. Improved approximation algorithms for maximum cut and satisfiability problems using semidefinite programming. Journal of the ACM, 42: , [4] M. Jordan, editor. Learning in graphical models. MIT Press, Cambridge, MA, [5] J. B. Lasserre. Global optimization with polynomials and the problem of moments. SIAM Journal on Optimization, 11(3): , [6] R. J. McEliece and M. Yildirim. Belief propagation on partially ordered sets. In D. Gilliam and J. Rosenthal, editors, Mathematical Theory of Systems and Networks. Institute for Mathematics and its Applications, [7] G. Rockafellar. Convex Analysis. Princeton University Press, Princeton, [8] L. Vandenberghe, S. Boyd, and S. Wu. Determinant maximization with linear matrix inequality constraints. SIAM Journal on Matrix Analysis and Applications, 19: , [9] M. J. Wainwright, T. S. Jaakkola, and A. S. Willsky. A new class of upper bounds on the log partition function. In Uncertainty in Artificial Intelligence, volume 18, pages , August [10] M. J. Wainwright and M. I. Jordan. Graphical models, exponential families, and variational inference. Technical report, UC Berkeley, Department of Statistics, No. 649, [11] M. J. Wainwright and M. I. Jordan. Semidefinite relaxations for approximate inference on graphs with cycles. Technical report, UC Berkeley, UCB/CSD , January [12] J. S. Yedidia, W. T. Freeman, and Y. Weiss. Understanding belief propagation and its generalizations. Technical Report TR , Mitsubishi Electric Research Labs, January 2002.

14 : Theory of Variational Inference: Inner and Outer Approximation

14 : Theory of Variational Inference: Inner and Outer Approximation 10-708: Probabilistic Graphical Models 10-708, Spring 2017 14 : Theory of Variational Inference: Inner and Outer Approximation Lecturer: Eric P. Xing Scribes: Maria Ryskina, Yen-Chia Hsu 1 Introduction

More information

MAP estimation via agreement on (hyper)trees: Message-passing and linear programming approaches

MAP estimation via agreement on (hyper)trees: Message-passing and linear programming approaches MAP estimation via agreement on (hyper)trees: Message-passing and linear programming approaches Martin Wainwright Tommi Jaakkola Alan Willsky martinw@eecs.berkeley.edu tommi@ai.mit.edu willsky@mit.edu

More information

Exact MAP estimates by (hyper)tree agreement

Exact MAP estimates by (hyper)tree agreement Exact MAP estimates by (hyper)tree agreement Martin J. Wainwright, Department of EECS, UC Berkeley, Berkeley, CA 94720 martinw@eecs.berkeley.edu Tommi S. Jaakkola and Alan S. Willsky, Department of EECS,

More information

A new class of upper bounds on the log partition function

A new class of upper bounds on the log partition function Accepted to IEEE Transactions on Information Theory This version: March 25 A new class of upper bounds on the log partition function 1 M. J. Wainwright, T. S. Jaakkola and A. S. Willsky Abstract We introduce

More information

Probabilistic Graphical Models. Theory of Variational Inference: Inner and Outer Approximation. Lecture 15, March 4, 2013

Probabilistic Graphical Models. Theory of Variational Inference: Inner and Outer Approximation. Lecture 15, March 4, 2013 School of Computer Science Probabilistic Graphical Models Theory of Variational Inference: Inner and Outer Approximation Junming Yin Lecture 15, March 4, 2013 Reading: W & J Book Chapters 1 Roadmap Two

More information

Probabilistic Graphical Models

Probabilistic Graphical Models School of Computer Science Probabilistic Graphical Models Variational Inference III: Variational Principle I Junming Yin Lecture 16, March 19, 2012 X 1 X 1 X 1 X 1 X 2 X 3 X 2 X 2 X 3 X 3 Reading: X 4

More information

Probabilistic and Bayesian Machine Learning

Probabilistic and Bayesian Machine Learning Probabilistic and Bayesian Machine Learning Day 4: Expectation and Belief Propagation Yee Whye Teh ywteh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit University College London http://www.gatsby.ucl.ac.uk/

More information

14 : Theory of Variational Inference: Inner and Outer Approximation

14 : Theory of Variational Inference: Inner and Outer Approximation 10-708: Probabilistic Graphical Models 10-708, Spring 2014 14 : Theory of Variational Inference: Inner and Outer Approximation Lecturer: Eric P. Xing Scribes: Yu-Hsin Kuo, Amos Ng 1 Introduction Last lecture

More information

Dual Decomposition for Marginal Inference

Dual Decomposition for Marginal Inference Dual Decomposition for Marginal Inference Justin Domke Rochester Institute of echnology Rochester, NY 14623 Abstract We present a dual decomposition approach to the treereweighted belief propagation objective.

More information

A Convex Upper Bound on the Log-Partition Function for Binary Graphical Models

A Convex Upper Bound on the Log-Partition Function for Binary Graphical Models A Convex Upper Bound on the Log-Partition Function for Binary Graphical Models Laurent El Ghaoui and Assane Gueye Department of Electrical Engineering and Computer Science University of California Berkeley

More information

Does Better Inference mean Better Learning?

Does Better Inference mean Better Learning? Does Better Inference mean Better Learning? Andrew E. Gelfand, Rina Dechter & Alexander Ihler Department of Computer Science University of California, Irvine {agelfand,dechter,ihler}@ics.uci.edu Abstract

More information

Discrete Markov Random Fields the Inference story. Pradeep Ravikumar

Discrete Markov Random Fields the Inference story. Pradeep Ravikumar Discrete Markov Random Fields the Inference story Pradeep Ravikumar Graphical Models, The History How to model stochastic processes of the world? I want to model the world, and I like graphs... 2 Mid to

More information

Probabilistic Graphical Models

Probabilistic Graphical Models School of Computer Science Probabilistic Graphical Models Variational Inference IV: Variational Principle II Junming Yin Lecture 17, March 21, 2012 X 1 X 1 X 1 X 1 X 2 X 3 X 2 X 2 X 3 X 3 Reading: X 4

More information

Graphical models and message-passing Part II: Marginals and likelihoods

Graphical models and message-passing Part II: Marginals and likelihoods Graphical models and message-passing Part II: Marginals and likelihoods Martin Wainwright UC Berkeley Departments of Statistics, and EECS Tutorial materials (slides, monograph, lecture notes) available

More information

Concavity of reweighted Kikuchi approximation

Concavity of reweighted Kikuchi approximation Concavity of reweighted ikuchi approximation Po-Ling Loh Department of Statistics The Wharton School University of Pennsylvania loh@wharton.upenn.edu Andre Wibisono Computer Science Division University

More information

Approximating the Partition Function by Deleting and then Correcting for Model Edges (Extended Abstract)

Approximating the Partition Function by Deleting and then Correcting for Model Edges (Extended Abstract) Approximating the Partition Function by Deleting and then Correcting for Model Edges (Extended Abstract) Arthur Choi and Adnan Darwiche Computer Science Department University of California, Los Angeles

More information

CSE 254: MAP estimation via agreement on (hyper)trees: Message-passing and linear programming approaches

CSE 254: MAP estimation via agreement on (hyper)trees: Message-passing and linear programming approaches CSE 254: MAP estimation via agreement on (hyper)trees: Message-passing and linear programming approaches A presentation by Evan Ettinger November 11, 2005 Outline Introduction Motivation and Background

More information

Tightness of LP Relaxations for Almost Balanced Models

Tightness of LP Relaxations for Almost Balanced Models Tightness of LP Relaxations for Almost Balanced Models Adrian Weller University of Cambridge AISTATS May 10, 2016 Joint work with Mark Rowland and David Sontag For more information, see http://mlg.eng.cam.ac.uk/adrian/

More information

On the optimality of tree-reweighted max-product message-passing

On the optimality of tree-reweighted max-product message-passing Appeared in Uncertainty on Artificial Intelligence, July 2005, Edinburgh, Scotland. On the optimality of tree-reweighted max-product message-passing Vladimir Kolmogorov Microsoft Research Cambridge, UK

More information

Expectation Consistent Free Energies for Approximate Inference

Expectation Consistent Free Energies for Approximate Inference Expectation Consistent Free Energies for Approximate Inference Manfred Opper ISIS School of Electronics and Computer Science University of Southampton SO17 1BJ, United Kingdom mo@ecs.soton.ac.uk Ole Winther

More information

Junction Tree, BP and Variational Methods

Junction Tree, BP and Variational Methods Junction Tree, BP and Variational Methods Adrian Weller MLSALT4 Lecture Feb 21, 2018 With thanks to David Sontag (MIT) and Tony Jebara (Columbia) for use of many slides and illustrations For more information,

More information

Inconsistent parameter estimation in Markov random fields: Benefits in the computation-limited setting

Inconsistent parameter estimation in Markov random fields: Benefits in the computation-limited setting Inconsistent parameter estimation in Markov random fields: Benefits in the computation-limited setting Martin J. Wainwright Department of Statistics, and Department of Electrical Engineering and Computer

More information

Introduction to graphical models: Lecture III

Introduction to graphical models: Lecture III Introduction to graphical models: Lecture III Martin Wainwright UC Berkeley Departments of Statistics, and EECS Martin Wainwright (UC Berkeley) Some introductory lectures January 2013 1 / 25 Introduction

More information

Walk-Sum Interpretation and Analysis of Gaussian Belief Propagation

Walk-Sum Interpretation and Analysis of Gaussian Belief Propagation Walk-Sum Interpretation and Analysis of Gaussian Belief Propagation Jason K. Johnson, Dmitry M. Malioutov and Alan S. Willsky Department of Electrical Engineering and Computer Science Massachusetts Institute

More information

Quadratic Programming Relaxations for Metric Labeling and Markov Random Field MAP Estimation

Quadratic Programming Relaxations for Metric Labeling and Markov Random Field MAP Estimation Quadratic Programming Relaations for Metric Labeling and Markov Random Field MAP Estimation Pradeep Ravikumar John Lafferty School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213,

More information

Linear Response for Approximate Inference

Linear Response for Approximate Inference Linear Response for Approximate Inference Max Welling Department of Computer Science University of Toronto Toronto M5S 3G4 Canada welling@cs.utoronto.ca Yee Whye Teh Computer Science Division University

More information

Fractional Belief Propagation

Fractional Belief Propagation Fractional Belief Propagation im iegerinck and Tom Heskes S, niversity of ijmegen Geert Grooteplein 21, 6525 EZ, ijmegen, the etherlands wimw,tom @snn.kun.nl Abstract e consider loopy belief propagation

More information

Graphical Model Inference with Perfect Graphs

Graphical Model Inference with Perfect Graphs Graphical Model Inference with Perfect Graphs Tony Jebara Columbia University July 25, 2013 joint work with Adrian Weller Graphical models and Markov random fields We depict a graphical model G as a bipartite

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models David Sontag New York University Lecture 6, March 7, 2013 David Sontag (NYU) Graphical Models Lecture 6, March 7, 2013 1 / 25 Today s lecture 1 Dual decomposition 2 MAP inference

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Brown University CSCI 295-P, Spring 213 Prof. Erik Sudderth Lecture 11: Inference & Learning Overview, Gaussian Graphical Models Some figures courtesy Michael Jordan s draft

More information

3 Undirected Graphical Models

3 Undirected Graphical Models Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science 6.438 Algorithms For Inference Fall 2014 3 Undirected Graphical Models In this lecture, we discuss undirected

More information

On the Choice of Regions for Generalized Belief Propagation

On the Choice of Regions for Generalized Belief Propagation UAI 2004 WELLING 585 On the Choice of Regions for Generalized Belief Propagation Max Welling (welling@ics.uci.edu) School of Information and Computer Science University of California Irvine, CA 92697-3425

More information

Min-Max Message Passing and Local Consistency in Constraint Networks

Min-Max Message Passing and Local Consistency in Constraint Networks Min-Max Message Passing and Local Consistency in Constraint Networks Hong Xu, T. K. Satish Kumar, and Sven Koenig University of Southern California, Los Angeles, CA 90089, USA hongx@usc.edu tkskwork@gmail.com

More information

Variational algorithms for marginal MAP

Variational algorithms for marginal MAP Variational algorithms for marginal MAP Alexander Ihler UC Irvine CIOG Workshop November 2011 Variational algorithms for marginal MAP Alexander Ihler UC Irvine CIOG Workshop November 2011 Work with Qiang

More information

Preconditioner Approximations for Probabilistic Graphical Models

Preconditioner Approximations for Probabilistic Graphical Models Preconditioner Approximations for Probabilistic Graphical Models Pradeep Ravikumar John Lafferty School of Computer Science Carnegie Mellon University Abstract We present a family of approximation techniques

More information

12 : Variational Inference I

12 : Variational Inference I 10-708: Probabilistic Graphical Models, Spring 2015 12 : Variational Inference I Lecturer: Eric P. Xing Scribes: Fattaneh Jabbari, Eric Lei, Evan Shapiro 1 Introduction Probabilistic inference is one of

More information

Probabilistic Graphical Models

Probabilistic Graphical Models School of Computer Science Probabilistic Graphical Models Variational Inference II: Mean Field Method and Variational Principle Junming Yin Lecture 15, March 7, 2012 X 1 X 1 X 1 X 1 X 2 X 3 X 2 X 2 X 3

More information

The Generalized Distributive Law and Free Energy Minimization

The Generalized Distributive Law and Free Energy Minimization The Generalized Distributive Law and Free Energy Minimization Srinivas M. Aji Robert J. McEliece Rainfinity, Inc. Department of Electrical Engineering 87 N. Raymond Ave. Suite 200 California Institute

More information

Variational Algorithms for Marginal MAP

Variational Algorithms for Marginal MAP Variational Algorithms for Marginal MAP Qiang Liu Department of Computer Science University of California, Irvine Irvine, CA, 92697 qliu1@ics.uci.edu Alexander Ihler Department of Computer Science University

More information

High dimensional ising model selection using l 1 -regularized logistic regression

High dimensional ising model selection using l 1 -regularized logistic regression High dimensional ising model selection using l 1 -regularized logistic regression 1 Department of Statistics Pennsylvania State University 597 Presentation 2016 1/29 Outline Introduction 1 Introduction

More information

13 : Variational Inference: Loopy Belief Propagation

13 : Variational Inference: Loopy Belief Propagation 10-708: Probabilistic Graphical Models 10-708, Spring 2014 13 : Variational Inference: Loopy Belief Propagation Lecturer: Eric P. Xing Scribes: Rajarshi Das, Zhengzhong Liu, Dishan Gupta 1 Introduction

More information

Estimating the Capacity of the 2-D Hard Square Constraint Using Generalized Belief Propagation

Estimating the Capacity of the 2-D Hard Square Constraint Using Generalized Belief Propagation Estimating the Capacity of the 2-D Hard Square Constraint Using Generalized Belief Propagation Navin Kashyap (joint work with Eric Chan, Mahdi Jafari Siavoshani, Sidharth Jaggi and Pascal Vontobel, Chinese

More information

A Hierarchy of Polyhedral Approximations of Robust Semidefinite Programs

A Hierarchy of Polyhedral Approximations of Robust Semidefinite Programs A Hierarchy of Polyhedral Approximations of Robust Semidefinite Programs Raphael Louca Eilyan Bitar Abstract Robust semidefinite programs are NP-hard in general In contrast, robust linear programs admit

More information

Fisher Information in Gaussian Graphical Models

Fisher Information in Gaussian Graphical Models Fisher Information in Gaussian Graphical Models Jason K. Johnson September 21, 2006 Abstract This note summarizes various derivations, formulas and computational algorithms relevant to the Fisher information

More information

Expectation Propagation in Factor Graphs: A Tutorial

Expectation Propagation in Factor Graphs: A Tutorial DRAFT: Version 0.1, 28 October 2005. Do not distribute. Expectation Propagation in Factor Graphs: A Tutorial Charles Sutton October 28, 2005 Abstract Expectation propagation is an important variational

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Lecture 9: Variational Inference Relaxations Volkan Cevher, Matthias Seeger Ecole Polytechnique Fédérale de Lausanne 24/10/2011 (EPFL) Graphical Models 24/10/2011 1 / 15

More information

Review: Directed Models (Bayes Nets)

Review: Directed Models (Bayes Nets) X Review: Directed Models (Bayes Nets) Lecture 3: Undirected Graphical Models Sam Roweis January 2, 24 Semantics: x y z if z d-separates x and y d-separation: z d-separates x from y if along every undirected

More information

Variational Inference (11/04/13)

Variational Inference (11/04/13) STA561: Probabilistic machine learning Variational Inference (11/04/13) Lecturer: Barbara Engelhardt Scribes: Matt Dickenson, Alireza Samany, Tracy Schifeling 1 Introduction In this lecture we will further

More information

17 Variational Inference

17 Variational Inference Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science 6.438 Algorithms for Inference Fall 2014 17 Variational Inference Prompted by loopy graphs for which exact

More information

UNDERSTANDING BELIEF PROPOGATION AND ITS GENERALIZATIONS

UNDERSTANDING BELIEF PROPOGATION AND ITS GENERALIZATIONS UNDERSTANDING BELIEF PROPOGATION AND ITS GENERALIZATIONS JONATHAN YEDIDIA, WILLIAM FREEMAN, YAIR WEISS 2001 MERL TECH REPORT Kristin Branson and Ian Fasel June 11, 2003 1. Inference Inference problems

More information

Tightening Fractional Covering Upper Bounds on the Partition Function for High-Order Region Graphs

Tightening Fractional Covering Upper Bounds on the Partition Function for High-Order Region Graphs Tightening Fractional Covering Upper Bounds on the Partition Function for High-Order Region Graphs Tamir Hazan TTI Chicago Chicago, IL 60637 Jian Peng TTI Chicago Chicago, IL 60637 Amnon Shashua The Hebrew

More information

Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science Algorithms For Inference Fall 2014

Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science Algorithms For Inference Fall 2014 Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science 6.438 Algorithms For Inference Fall 2014 Problem Set 3 Issued: Thursday, September 25, 2014 Due: Thursday,

More information

Concavity of reweighted Kikuchi approximation

Concavity of reweighted Kikuchi approximation Concavity of reweighted Kikuchi approximation Po-Ling Loh Department of Statistics The Wharton School University of Pennsylvania loh@wharton.upenn.edu Andre Wibisono Computer Science Division University

More information

Convexifying the Bethe Free Energy

Convexifying the Bethe Free Energy 402 MESHI ET AL. Convexifying the Bethe Free Energy Ofer Meshi Ariel Jaimovich Amir Globerson Nir Friedman School of Computer Science and Engineering Hebrew University, Jerusalem, Israel 91904 meshi,arielj,gamir,nir@cs.huji.ac.il

More information

Learning discrete graphical models via generalized inverse covariance matrices

Learning discrete graphical models via generalized inverse covariance matrices Learning discrete graphical models via generalized inverse covariance matrices Duzhe Wang, Yiming Lv, Yongjoon Kim, Young Lee Department of Statistics University of Wisconsin-Madison {dwang282, lv23, ykim676,

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Brown University CSCI 2950-P, Spring 2013 Prof. Erik Sudderth Lecture 12: Gaussian Belief Propagation, State Space Models and Kalman Filters Guest Kalman Filter Lecture by

More information

Marginals-to-Models Reducibility

Marginals-to-Models Reducibility Marginals-to-Models Reducibility Tim Roughgarden Stanford University tim@cs.stanford.edu Michael Kearns University of Pennsylvania mkearns@cis.upenn.edu Abstract We consider a number of classical and new

More information

Maximum Weight Matching via Max-Product Belief Propagation

Maximum Weight Matching via Max-Product Belief Propagation Maximum Weight Matching via Max-Product Belief Propagation Mohsen Bayati Department of EE Stanford University Stanford, CA 94305 Email: bayati@stanford.edu Devavrat Shah Departments of EECS & ESD MIT Boston,

More information

High-dimensional graphical model selection: Practical and information-theoretic limits

High-dimensional graphical model selection: Practical and information-theoretic limits 1 High-dimensional graphical model selection: Practical and information-theoretic limits Martin Wainwright Departments of Statistics, and EECS UC Berkeley, California, USA Based on joint work with: John

More information

Linear and conic programming relaxations: Graph structure and message-passing

Linear and conic programming relaxations: Graph structure and message-passing Linear and conic programming relaxations: Graph structure and message-passing Martin Wainwright UC Berkeley Departments of EECS and Statistics Banff Workshop Partially supported by grants from: National

More information

A Geometric View of Conjugate Priors

A Geometric View of Conjugate Priors A Geometric View of Conjugate Priors Arvind Agarwal and Hal Daumé III School of Computing, University of Utah, Salt Lake City, Utah, 84112 USA {arvind,hal}@cs.utah.edu Abstract. In Bayesian machine learning,

More information

High-dimensional graphical model selection: Practical and information-theoretic limits

High-dimensional graphical model selection: Practical and information-theoretic limits 1 High-dimensional graphical model selection: Practical and information-theoretic limits Martin Wainwright Departments of Statistics, and EECS UC Berkeley, California, USA Based on joint work with: John

More information

Correctness of belief propagation in Gaussian graphical models of arbitrary topology

Correctness of belief propagation in Gaussian graphical models of arbitrary topology Correctness of belief propagation in Gaussian graphical models of arbitrary topology Yair Weiss Computer Science Division UC Berkeley, 485 Soda Hall Berkeley, CA 94720-1776 Phone: 510-642-5029 yweiss@cs.berkeley.edu

More information

MIT Algebraic techniques and semidefinite optimization February 14, Lecture 3

MIT Algebraic techniques and semidefinite optimization February 14, Lecture 3 MI 6.97 Algebraic techniques and semidefinite optimization February 4, 6 Lecture 3 Lecturer: Pablo A. Parrilo Scribe: Pablo A. Parrilo In this lecture, we will discuss one of the most important applications

More information

Inference as Optimization

Inference as Optimization Inference as Optimization Sargur Srihari srihari@cedar.buffalo.edu 1 Topics in Inference as Optimization Overview Exact Inference revisited The Energy Functional Optimizing the Energy Functional 2 Exact

More information

Linear Response Algorithms for Approximate Inference in Graphical Models

Linear Response Algorithms for Approximate Inference in Graphical Models Linear Response Algorithms for Approximate Inference in Graphical Models Max Welling Department of Computer Science University of Toronto 10 King s College Road, Toronto M5S 3G4 Canada welling@cs.toronto.edu

More information

Bounding the Partition Function using Hölder s Inequality

Bounding the Partition Function using Hölder s Inequality Qiang Liu Alexander Ihler Department of Computer Science, University of California, Irvine, CA, 92697, USA qliu1@uci.edu ihler@ics.uci.edu Abstract We describe an algorithm for approximate inference in

More information

Sparse Gaussian conditional random fields

Sparse Gaussian conditional random fields Sparse Gaussian conditional random fields Matt Wytock, J. ico Kolter School of Computer Science Carnegie Mellon University Pittsburgh, PA 53 {mwytock, zkolter}@cs.cmu.edu Abstract We propose sparse Gaussian

More information

A Combined LP and QP Relaxation for MAP

A Combined LP and QP Relaxation for MAP A Combined LP and QP Relaxation for MAP Patrick Pletscher ETH Zurich, Switzerland pletscher@inf.ethz.ch Sharon Wulff ETH Zurich, Switzerland sharon.wulff@inf.ethz.ch Abstract MAP inference for general

More information

U.C. Berkeley CS294: Spectral Methods and Expanders Handout 11 Luca Trevisan February 29, 2016

U.C. Berkeley CS294: Spectral Methods and Expanders Handout 11 Luca Trevisan February 29, 2016 U.C. Berkeley CS294: Spectral Methods and Expanders Handout Luca Trevisan February 29, 206 Lecture : ARV In which we introduce semi-definite programming and a semi-definite programming relaxation of sparsest

More information

Modeling and Estimation in Gaussian Graphical Models: Maximum-Entropy Methods and Walk-Sum Analysis. Venkat Chandrasekaran

Modeling and Estimation in Gaussian Graphical Models: Maximum-Entropy Methods and Walk-Sum Analysis. Venkat Chandrasekaran Modeling and Estimation in Gaussian Graphical Models: Maximum-Entropy Methods and Walk-Sum Analysis by Venkat Chandrasekaran Submitted to the Department of Electrical Engineering and Computer Science in

More information

Semidefinite and Second Order Cone Programming Seminar Fall 2001 Lecture 5

Semidefinite and Second Order Cone Programming Seminar Fall 2001 Lecture 5 Semidefinite and Second Order Cone Programming Seminar Fall 2001 Lecture 5 Instructor: Farid Alizadeh Scribe: Anton Riabov 10/08/2001 1 Overview We continue studying the maximum eigenvalue SDP, and generalize

More information

Chris Bishop s PRML Ch. 8: Graphical Models

Chris Bishop s PRML Ch. 8: Graphical Models Chris Bishop s PRML Ch. 8: Graphical Models January 24, 2008 Introduction Visualize the structure of a probabilistic model Design and motivate new models Insights into the model s properties, in particular

More information

Convergence Analysis of Reweighted Sum-Product Algorithms

Convergence Analysis of Reweighted Sum-Product Algorithms IEEE TRANSACTIONS ON SIGNAL PROCESSING, PAPER IDENTIFICATION NUMBER T-SP-06090-2007.R1 1 Convergence Analysis of Reweighted Sum-Product Algorithms Tanya Roosta, Student Member, IEEE, Martin J. Wainwright,

More information

arxiv: v1 [cs.lg] 1 May 2010

arxiv: v1 [cs.lg] 1 May 2010 A Geometric View of Conjugate Priors Arvind Agarwal and Hal Daumé III School of Computing, University of Utah, Salt Lake City, Utah, 84112 USA {arvind,hal}@cs.utah.edu arxiv:1005.0047v1 [cs.lg] 1 May 2010

More information

High dimensional Ising model selection

High dimensional Ising model selection High dimensional Ising model selection Pradeep Ravikumar UT Austin (based on work with John Lafferty, Martin Wainwright) Sparse Ising model US Senate 109th Congress Banerjee et al, 2008 Estimate a sparse

More information

Lecture 9: PGM Learning

Lecture 9: PGM Learning 13 Oct 2014 Intro. to Stats. Machine Learning COMP SCI 4401/7401 Table of Contents I Learning parameters in MRFs 1 Learning parameters in MRFs Inference and Learning Given parameters (of potentials) and

More information

6.867 Machine learning, lecture 23 (Jaakkola)

6.867 Machine learning, lecture 23 (Jaakkola) Lecture topics: Markov Random Fields Probabilistic inference Markov Random Fields We will briefly go over undirected graphical models or Markov Random Fields (MRFs) as they will be needed in the context

More information

arxiv: v1 [stat.ml] 26 Feb 2013

arxiv: v1 [stat.ml] 26 Feb 2013 Variational Algorithms for Marginal MAP Qiang Liu Donald Bren School of Information and Computer Sciences University of California, Irvine Irvine, CA, 92697-3425, USA qliu1@uci.edu arxiv:1302.6584v1 [stat.ml]

More information

13 : Variational Inference: Loopy Belief Propagation and Mean Field

13 : Variational Inference: Loopy Belief Propagation and Mean Field 10-708: Probabilistic Graphical Models 10-708, Spring 2012 13 : Variational Inference: Loopy Belief Propagation and Mean Field Lecturer: Eric P. Xing Scribes: Peter Schulam and William Wang 1 Introduction

More information

Probabilistic Graphical Models (I)

Probabilistic Graphical Models (I) Probabilistic Graphical Models (I) Hongxin Zhang zhx@cad.zju.edu.cn State Key Lab of CAD&CG, ZJU 2015-03-31 Probabilistic Graphical Models Modeling many real-world problems => a large number of random

More information

On the Convergence of the Convex Relaxation Method and Distributed Optimization of Bethe Free Energy

On the Convergence of the Convex Relaxation Method and Distributed Optimization of Bethe Free Energy On the Convergence of the Convex Relaxation Method and Distributed Optimization of Bethe Free Energy Ming Su Department of Electrical Engineering and Department of Statistics University of Washington,

More information

Lecture 1: Entropy, convexity, and matrix scaling CSE 599S: Entropy optimality, Winter 2016 Instructor: James R. Lee Last updated: January 24, 2016

Lecture 1: Entropy, convexity, and matrix scaling CSE 599S: Entropy optimality, Winter 2016 Instructor: James R. Lee Last updated: January 24, 2016 Lecture 1: Entropy, convexity, and matrix scaling CSE 599S: Entropy optimality, Winter 2016 Instructor: James R. Lee Last updated: January 24, 2016 1 Entropy Since this course is about entropy maximization,

More information

Likelihood Analysis of Gaussian Graphical Models

Likelihood Analysis of Gaussian Graphical Models Faculty of Science Likelihood Analysis of Gaussian Graphical Models Ste en Lauritzen Department of Mathematical Sciences Minikurs TUM 2016 Lecture 2 Slide 1/43 Overview of lectures Lecture 1 Markov Properties

More information

CS Lecture 19. Exponential Families & Expectation Propagation

CS Lecture 19. Exponential Families & Expectation Propagation CS 6347 Lecture 19 Exponential Families & Expectation Propagation Discrete State Spaces We have been focusing on the case of MRFs over discrete state spaces Probability distributions over discrete spaces

More information

Deriving the EM-Based Update Rules in VARSAT. University of Toronto Technical Report CSRG-580

Deriving the EM-Based Update Rules in VARSAT. University of Toronto Technical Report CSRG-580 Deriving the EM-Based Update Rules in VARSAT University of Toronto Technical Report CSRG-580 Eric I. Hsu Department of Computer Science University of Toronto eihsu@cs.toronto.edu Abstract Here we show

More information

Tensor rank-one decomposition of probability tables

Tensor rank-one decomposition of probability tables Tensor rank-one decomposition of probability tables Petr Savicky Inst. of Comp. Science Academy of Sciences of the Czech Rep. Pod vodárenskou věží 2 82 7 Prague, Czech Rep. http://www.cs.cas.cz/~savicky/

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Lecture 11 CRFs, Exponential Family CS/CNS/EE 155 Andreas Krause Announcements Homework 2 due today Project milestones due next Monday (Nov 9) About half the work should

More information

Graphs with Cycles: A Summary of Martin Wainwright s Ph.D. Thesis

Graphs with Cycles: A Summary of Martin Wainwright s Ph.D. Thesis Graphs with Cycles: A Summary of Martin Wainwright s Ph.D. Thesis Constantine Caramanis November 6, 2002 This paper is a (partial) summary of chapters 2,5 and 7 of Martin Wainwright s Ph.D. Thesis. For

More information

A Cluster-Cumulant Expansion at the Fixed Points of Belief Propagation

A Cluster-Cumulant Expansion at the Fixed Points of Belief Propagation A Cluster-Cumulant Expansion at the Fixed Points of Belief Propagation Max Welling Dept. of Computer Science University of California, Irvine Irvine, CA 92697-3425, USA Andrew E. Gelfand Dept. of Computer

More information

CONSTRAINED PERCOLATION ON Z 2

CONSTRAINED PERCOLATION ON Z 2 CONSTRAINED PERCOLATION ON Z 2 ZHONGYANG LI Abstract. We study a constrained percolation process on Z 2, and prove the almost sure nonexistence of infinite clusters and contours for a large class of probability

More information

Gaussian Graphical Models and Graphical Lasso

Gaussian Graphical Models and Graphical Lasso ELE 538B: Sparsity, Structure and Inference Gaussian Graphical Models and Graphical Lasso Yuxin Chen Princeton University, Spring 2017 Multivariate Gaussians Consider a random vector x N (0, Σ) with pdf

More information

Fixed Point Solutions of Belief Propagation

Fixed Point Solutions of Belief Propagation Fixed Point Solutions of Belief Propagation Christian Knoll Signal Processing and Speech Comm. Lab. Graz University of Technology christian.knoll@tugraz.at Dhagash Mehta Dept of ACMS University of Notre

More information

NOTES ON PROBABILISTIC DECODING OF PARITY CHECK CODES

NOTES ON PROBABILISTIC DECODING OF PARITY CHECK CODES NOTES ON PROBABILISTIC DECODING OF PARITY CHECK CODES PETER CARBONETTO 1. Basics. The diagram given in p. 480 of [7] perhaps best summarizes our model of a noisy channel. We start with an array of bits

More information

Undirected Graphical Models

Undirected Graphical Models Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Properties Properties 3 Generative vs. Conditional

More information

The local equivalence of two distances between clusterings: the Misclassification Error metric and the χ 2 distance

The local equivalence of two distances between clusterings: the Misclassification Error metric and the χ 2 distance The local equivalence of two distances between clusterings: the Misclassification Error metric and the χ 2 distance Marina Meilă University of Washington Department of Statistics Box 354322 Seattle, WA

More information

Convex relaxation. In example below, we have N = 6, and the cut we are considering

Convex relaxation. In example below, we have N = 6, and the cut we are considering Convex relaxation The art and science of convex relaxation revolves around taking a non-convex problem that you want to solve, and replacing it with a convex problem which you can actually solve the solution

More information

Robustness and duality of maximum entropy and exponential family distributions

Robustness and duality of maximum entropy and exponential family distributions Chapter 7 Robustness and duality of maximum entropy and exponential family distributions In this lecture, we continue our study of exponential families, but now we investigate their properties in somewhat

More information

Introduction to Semidefinite Programming I: Basic properties a

Introduction to Semidefinite Programming I: Basic properties a Introduction to Semidefinite Programming I: Basic properties and variations on the Goemans-Williamson approximation algorithm for max-cut MFO seminar on Semidefinite Programming May 30, 2010 Semidefinite

More information

Belief Update in CLG Bayesian Networks With Lazy Propagation

Belief Update in CLG Bayesian Networks With Lazy Propagation Belief Update in CLG Bayesian Networks With Lazy Propagation Anders L Madsen HUGIN Expert A/S Gasværksvej 5 9000 Aalborg, Denmark Anders.L.Madsen@hugin.com Abstract In recent years Bayesian networks (BNs)

More information