An Importance Sampling Algorithm for Models with Weak Couplings
|
|
- Beverly Hood
- 6 years ago
- Views:
Transcription
1 An Importance ampling Algorithm for Models with Weak Couplings Mehdi Molkaraie EH Zurich arxiv: v1 [cs.i] 4 Jul 2016 Abstract We propose an importance sampling algorithm to estimate the partition function of the Ising model and the q-state Potts model. he proposal (auxiliary) distribution is defined on a spanning tree of the Forney factor graph representing the model, and computations are done on the remaining edges. In contrast, in an analogous importance sampling algorithm in the dual Forney factor graph, computations are done on a spanning tree, and the proposal distribution is defined on the remaining edges. I. INRODUCION We consider the problem of estimating the partition function of the ferromagnetic Ising and q-state Potts models with spatially varying (bond-dependent) coupling parameters. he partition function is an important quantity in statistical physics [1], [2], in machine learning [3], and in information theory [4]. In general, the partition function is not available analytically but is only available as a summation with an exponential number of terms making its exact computation intractable. herefore, we rely on approximating the partition function [5] or deriving bounds on this quantity [6], [7]. In this paper, we first represent the models of interest with their modified Forney factor graphs (FFG), which are constructed via simple manipulations in the original FFG [8]. We then define a proposal distribution on a spanning tree in the modified FFG to propose an importance sampling algorithm for estimating the partition function. he algorithm can efficiently compute an estimate of the partition function when the coupling parameters associated with the edges that lie out of the spanning tree are weak. In contrast, similar importance sampling algorithms can be designed in the dual FFG of the models by defining the proposal distribution on the edges that lie out of a spanning tree of the model. In this case, the partition function can be efficiently estimated when the coupling parameters on the spanning are strong [9] [11]. he paper is organized as follows. In ection II, we review the Ising model, the q-state Potts model, and their graphical model representations in terms of FFGs. he modified FFGs of the models are presented in ection III. In ection IV, we describe the importance sampling algorithm for estimating the partition function. Contrast to analogous algorithms in the dual FFG is discussed is in ection V. II. HE MODE et X 1, X 2,..., X N be a collection of discrete random variables. uppose each random variable takes on values in a finite alphabet X, which in this context is equal to the abelian group Z/qZ = {0, 1,..., q 1}. et x i represent a possible realization of X i, x stand for a configuration (x 1, x 2,..., x N ), and X stand for (X 1, X 2,..., X N ). For simplicity, we assume ferromagnetic models, with periodic boundaries, with pairwise interactions, and without an external magnetic field. Although some of our results are applicable to more general settings. et f : X N R 0 be a non-negative function, which factors into a product of local functions υ k,l : X 2 R 0 as f(x) = υ k,l (x k, x l ) (1) (k, l) E where E contains all the unordered pairs (k, l) with non-zero interactions. A real coupling parameter J k,l is associated with the interacting pair (x k, x l ). From (1), we define the following probability mass function (known as the Boltzmann distribution [1]) p(x) = f(x) Z Here, the normalization constant Z is the partition function given by Z = f(x) (3) x X N A. he Ising Model In the Ising model, q = 2 and { e J k,l, if x υ k,l (x k, x l ) = k = x l e J (4) k,l, if x k x l he model is called ferromagnetic (resp. antiferromagnetic) if J k,l > 0 (resp. J k,l < 0) for each (k, l) E. If the couplings can be both positive or negative, the model is known as an Ising spin glass. B. he q-tate Potts Model In the Potts model, q > 2 and { e J k,l, if x υ k,l (x k, x l ) = k = x l (5) 1, if x k x l where J k,l > 0 in a ferromagnetic model. (2)
2 C. FFG of the Models he factorization in (1) can be represented by a FFG, in which nodes represent the factors and edges represent the variables. he edge that represents some variable x is connected to the node representing the factor υ( ) if and only if x is an argument of υ( ). If a variable (an edge) appears in more than two factors, such a variable is replicated using equality indicator factors [8]. he FFG of the 2D Ising model with pairwise (nearestneighbor) interactions is shown in Fig. 1, where the unlabeled boxes represent factors (4) and the boxes labeled = are equality indicator factors. E.g., in Fig. 1, for variables X, X, X, and X the equality indicator factor is given by Φ = (x, x, x, x ) = δ(x x ) δ(x x ) δ(x x )(6) where δ( ) is the Kronecker delta function. imilarly, Fig. 1 shows the FFG of the 2D Potts model with pairwise interactions, where the unlabeled boxes represent factors as in (5). Note that in a 2D model with periodic boundary conditions E = 2N (7) III. HE MODIFIED FFG In this section, we present the modified FFG of the Ising and Potts models. Recall that all arithmetic manipulations are done modulo 2 in the case of the Ising model, and modulo q in the case of the Potts model. A. Modified FFG of the Ising Model We note that each factor (4) is only a function of x k + x l, we can thus represent υ k,l ( ) using only one variable y m. We thus let { e J m, if y υ m (y m ) = (8) e Jm, if y m = 1 Following the above observation, we can build the modified FFG of the 2D Ising model as shown in Fig. 2, where the unlabeled boxes represent (8) and boxes labeled + are mod 2 indicator factors, which impose the constraint that all their incident variables sum to zero (modulo 2). E.g., in Fig. 2, for binary variables X 1, X 2, and Y 1 the mod 2 indicator factor is given by Φ + (y 1, x 1, x 2 ) = δ(y 1 + x 1 + x 2 ) (9) et Y = (Y 1, Y 2,..., Y E ) be the set of all the variables attached to the mod 2 indicator factor factors and y be a realization of Y. Here, E denotes the cardinality of E, which is also equal to the number of unordered interacting pairs in the model. emma 1. Consider a cycle of length c in the modified FFG of the Ising model. For variables Y 1, Y 2,..., Y c attached to the mod 2 indicator factors in the cycle, it holds that c Y (10) m=1 = = = = Φ = X X = X = = = X = = = = = = = = Fig. 1: he FFG of the 2D Ising model with nearest-neighbor interactions, where the unlabeled boxes represent (4) and boxes containing = symbols are given by (6). Y c X = 1 Y 1 X + = 2 Y 2 X + = 3 + = Φ + Φ = Fig. 2: Modified FFG of the 2D Ising model with nearestneighbor interactions, where the unlabeled boxes represent (8), boxes containing + symbols are as in (9), and boxes containing = symbols are given by (6). he thick edges show a cycle of length c, where variables Y 1, Y 2,..., Y c on the cycle are marked blue. Proof. In (10), each Y m can be expanded as the symmetric difference of the corresponding adjacent variables (X k, X l ) in the cycle. Moreover, each variable appears twice in this expansion. We conclude that c m=1 Y. An example of a cycle is shown by thick edges in Fig. 2, where Y 1, Y 2,..., Y c are marked by blue edges. B. Modified FFG of the q-tate Potts Model In this case, each factor (5) is only a function of x k x l. imilar to our approach in ection III-A, we represent (5) as { e J k, if y υ k (y k ) = k = 0 (11) 1, otherwise. he modified FFG of the 2D Potts model is shown in Fig. 3, where the unlabeled boxes represent factors (11) and boxes labeled + are mod q indicator factors, which impose the
3 X = 1 Y 1 Φ = X + = 2 + = + = Φ + = + = + = + = = + = + = + = Y 1 X = 1 X + = 2 + = + = + = + + = = + + = + = = + = + = + = Fig. 3: Modified FFG of the 2D Potts model with nearestneighbor interactions, where the unlabeled boxes represent (11), boxes containing + symbols are as in (12), boxes containing = symbols are given by (6), and the small circles attached to mod q indicator factors denote sign inverters. constraint that all their incident variables sum to zero (modulo q). E.g., in Fig. 4, for variables X 1, X 2, and Y 1 the mod q indicator factor is given by Φ + (y 1, x 1, x 2, ) = δ(y 1 + x 1 x 2 ) (12) A sign inverter (depicted by a small circle) is inserted in one of the edges incident to the mod q indicator factors. However, the choice on which side to insert them can be made arbitrarily (because of the symmetry in the factors). here is again a linear dependency among the variables Y 1, Y 2,..., Y c in any cycle of length c in the modified FFG of the Potts model. However, the dependency among variables is affected by the arrangement of the sign inverters. As an example, in Fig. 4 we have arranged the sign inverters in a way that the sum of the variable in any cycle of length four be zero: each Y m can be expanded as the difference between the corresponding adjacent variables (X k, X l ) in the cycle; furthermore, each variable appears twice in this expansion, once with a positive sign, and once with a negative sign. C. Variables in the Modified FFG We partition E into two disjoint subsets and, where is a spanning tree in the modified FFG. hus Y is also partitioned into Y and Y. In such a partitioning, Y can be computed as linear combination of Y (cf., emma 1). An example of a spanning tree in the modified FFG of the 2D Ising model is illustrated in Fig. 5, where the thick blue edges represent Y and the thin red edges represent Y. Here, Y is a linear combination of Y. In a 2D grid with periodic boundary conditions, we have = N 1 (13) = N + 1 (14) + = + + = Fig. 4: Another modified FFG of a 2D Potts model with nearest-neighbor interactions, where in every cycle of length four, the sum of variables attached to mod q indicator factors is zero (modulo q). Accordingly, let Υ (y ) = m υ m (y m ) (15) Υ (y ) = m υ m (y m ) (16) We define the following proposal probability mass function on a spanning tree in the modified FFG with q(y ) = Υ (y ) Z q (17) Z q = y Υ (y ) (18) In this set-up, Z q is available in closed form. For the Ising model, we obtain Z q = m (e Jm + e Jm ) (19) = 2 cosh J m (20) m and for the q-state Potts model We also let Z q = m q 1 υ m (t) (21) t=0 = m (e Jm + q 1) (22) Υ(y) = m E υ m (y m ) (23) he global probability mass function in the modified FFG can then be defined as p M (y) = Υ(y) Z M (24)
4 where Z M is the partition function of the modified FFG. he partition functions Z and Z M are closely related. emma 2. In a ferromagnetic Ising model, the partition functions Z and Z M are related to each other by Z = 2Z M (25) Proof. et x be the component-wise addition of x and the all-ones vector, i.e., in x, components of x that are 0 become 1, and those that are 1 become 0. From each x, we can create a valid configuration y in the modified NFG. But then, x will give rise to the same configuration y. Hence, each valid configuration y in the modified NFG corresponds to two configurations x and x. Due to the symmetry in the factors, these configurations contribute equally to the sum in (3); therefore, Z = 2Z M. emma 3. For a ferromagnetic q-state Potts model, the partition functions Z and Z M are related to each other by Z = qz M (26) he proof follows along the same lines as in the proof of emma 2. In ection IV, we propose an importance sampling algorithm in the modified FFG of the Ising model and the q- state Potts model to estimate Z M, which can then be used to compute an estimate Z. IV. IMPORANCE AMPING IN HE MODIFIED FFG he importance sampling algorithm works as follows. We first draw independent samples y (1), y(2),... according to q(y ) in (17), and therefrom compute y (1), y(2),.... hese samples are then used to compute an estimate of Z M. Drawing independent samples according to q(y ) is straightforward. For the Ising model, the product form of (15) suggests that to draw y (l) we can do the following 1: draw u (l) 1, u(l) 2,..., u(l) i.i.d. U[0, 1] 2: for m = 1 to do 3: if u (l) 1 m < 2Jm 1 + e then 4: y m (l) = 0 5: else 6: y m (l) = 1 7: end if 8: end for In ine e 2Jm = υ m (0) υ m (0) + υ m (1) (27) which is equal to sigm(2j k ), where sigm( ) denotes the sigmoid (logistic) function [3, Chapter 1]. imilarly, in the case of the q-state Potts model, we can apply the following subroutine to draw independent samples y (l) according to the corresponding proposal distribution. Fig. 5: A spanning tree in the modified FFG of the 2D Ising model. he thick blue edges represent Y and the thin red edges represent Y. Here, Y is a linear combination of Y. 1: draw u (l) 1, u(l) 2,..., u(l) i.i.d. U[0, 1] 2: for m = 1 to do 3: if u (l) m < 4: y (l) 5: else 1 then Jm 1 + (q 1)e 6: draw y m (l) randomly from {1, 2,..., q 1} 7: end if 8: end for After drawing y (l), we compute y(l). We then use the samples in the following importance sampling algorithm. 1: for l = 1 to do 2: draw y (l) according to q(y ) 3: compute y (l) 4: end for 5: compute Ẑ I M = Z q l=1 Υ (y (l) ) (28) We show that ẐM I is an unbiased estimator of Z M. E q [ ẐI M ] = Z q [ E q Υ (Y ) ] = y = Z M l=1 Υ (y )Υ (y ) A. he Variance of ẐM I For a finite-size model, the variance of ẐI M as can be computed [ (ẐI ) ] 2 Var[ ẐI M ] = E M ( E [ ẐM]) I 2 (29) = ( 1 Z 2 q E q [ Υ 2 (Y ) ] ZM 2 ) (30)
5 Hence Var[ ẐI M ] = Z 2 M = y ( Zq Z M ) 2 Eq [ Υ 2 (Y ) ] 1 (31) p 2 M (y) q(y) 1 (32) = χ 2( p M, q ) (33) where χ 2 (, ) denotes the chi-squared divergence, which is non-negative, with equality to zero if and only if its two arguments are equal [12, Chapter 4]. For simplicity, let us assume that for m, the coupling parameters of the model are constant denoted by J. In the limit J 0, we have Hence lim p M(y) = q(y) (34) J 0 lim χ2( p M, q ) = 0 (35) J 0 herefore, Z M can be estimated efficiently via the proposed importance sampling estimator when J m is small for m. V. IMPORANCE AMPING IN HE DUA FFG We briefly discuss an analogous importance sampling algorithm in the dual FFG of the q-state Potts model. In the dual FFG, we denote the partition function by Z d. We will use the tilde symbol to denote variables in the dual domain. he partition functions, Z and Z d, are related to each other via the normal factor graph duality theorem [13], [14]. In a 2D q-state Potts model with periodic boundary conditions, according to the normal factor graph duality theorem From (26), we obtain Z d = q N Z (36) Z d = q N+1 Z M (37) he primal and dual FFGs have the same topology. In the dual FFG, factors are replaced by their Fourier transforms and variables are replaced by their corresponding dual variables. he dual FFG of the 2D q-state Potts model is shown in Fig. 6, where the unlabeled boxes represent factors as { e J m + q 1, if ỹ γ m (ỹ m ) = (38) e Jm 1, otherwise, which is the one-dimensional discrete Fourier transform of (11), boxes labeled + are mod q indicator factors as in (9), and boxes containing = symbols are equality indicator factors given by (6). he sign inverters are depicted by small circles attached to equality indicator factors. For more details on the dual FFG of the Potts model, see [11]. We again partition E into two disjoint subsets and, where is a spanning tree in the dual FFG. he set of random variables Ỹ (represented by the edges/bonds), is also partitioned into Ỹ and Ỹ. However, in the dual FFG Ỹ can be computed as linear combination of Ỹ. In other Φ = Ỹ 1 Φ + Fig. 6: he dual FFG of the 2D Potts model with nearestneighbor interactions, where the unlabeled boxes represent (38), boxes containing + symbols are as in (9), boxes containing = symbols are given by (6), and the small circles attached to equality indicator factors denote sign inverters. he thick edges show a spanning tree in the dual FFG. words, for a given realization of Ỹ, we can compute Ỹ deterministically. Notice that Ỹ = E. An example of such a partitioning is shown in Fig. 6, where Ỹ is the set of all the variables associated with the thick edges and Ỹ is the set of all the variables associated with the remaining thin edges. et Γ (ỹ ) = m γ m (ỹ m ) (39) Γ (ỹ ) = m γ m (ỹ m ) (40) We define the following proposal probability mass function where Z qd is analytically available as We let q d (ỹ ) = Γ (ỹ ) Z qd (41) Z qd = ỹ Γ (ỹ ) (42) = q 1 γ m (t) (43) m t=0 ( ) = q exp J m (44) m Γ(ỹ) = m E γ m (ỹ m ) (45) and define the global probability mass function in the dual FFG as p d (ỹ) = Γ(ỹ) Z d (46)
6 where Z d is the partition function of the dual FFG. he importance sampling algorithm works as follows: at iteration l, we draw a sample ỹ (l) according to the proposal distribution (41). he product form of (39) suggests that in order to draw ỹ (l) we can apply the following subroutine [11]. 1: draw u (l) 1, u(l) 2,..., u(l) i.i.d. U[0, 1] 2: for m = 1 to do 3: if u (l) m < 4: ỹ (l) 5: else 1 + (q 1)e Jm q then 6: draw ỹ m (l) randomly from {1, 2,..., q 1} 7: end if 8: end for After drawing ỹ (l), we compute ỹ(l). Finally, we use the following importance sampling algorithm to estimate Z d. 1: for l = 1 to do 2: draw ỹ (l) according to q(ỹ ) 3: compute ỹ (l) 4: end for 5: compute Here, Ẑ I d Ẑ I d = Z q d l=1 is an unbiased estimator of Z d, i.e., Γ (ỹ (l) ) (47) E qd [ ẐI d ] = Z d (48) see [10]. imilar to our approach in ection IV-A, we can show Var[ ẐI d ] = χ 2( ) p d, q d (49) Z 2 d For simplicity, we assume that for m, the coupling parameters of the model are constant denoted by J. In the limit J hus lim p d(ỹ) = q d (ỹ) (50) J lim J Z 2 d Var[ ẐI d ] = 0 (51) We conclude that Z d can be estimated efficiently via the importance sampling estimator when J m is large for m. For more details on constructing the dual FFG of the Ising model an the q-state Potts model, see [15], [16], [9] [11]. VI. CONCUION We proposed an importance sampling algorithm in the modified FFG of the Ising model and the q-state Potts model to estimate the partition function. he proposal distribution of the importance sampling algorithm is defined on a spanning tree of the model. he algorithm can efficiently compute an estimate of the partition function when the coupling parameters associated with the edges that lie out of the spanning tree are weak. In contrast, the proposal distribution for the analogous importance sampling algorithm in the dual FFG is defined on the edges that lie out of the spanning tree. In this case, accurate estimates of the partition function can be obtained when couplings associated with the edges of the spanning tree are strong. he methods can handle more demanding cases when combined with annealed importance sampling [17]. REFERENCE [1] J. M. Yeomans, tatistical Mechanics of Phase ransitions. Oxford University Press, [2] R. J. Baxter, Exactly olved Models in tatistical Mechanics. Dover Publications, [3] K. P. Murphy, Machine earning: A Probabilistic Perspective. he MI Press, [4] D. J. C. MacKay, Information heory, Inference, and earning Algorithms. Cambridge University Press, [5] G. Potamianos and J. Goutsias, tochastic approximation algorithms for partition function estimation of Gibbs random fields, IEEE rans. Information heory, vol. 43, pp , Nov [6] J.. Yedidia, An idiosyncratic journey beyond mean field theory, in Advanced Mean Field Methods: heory and Practice. he MI Press, 2001, pp [7] M. J. Wainwright,.. Jaakkola, and A.. Willsky, A new class of upper bounds on the log partition function, IEEE rans. Information heory, vol. 51, pp , July [8] G. D. Forney, Jr., Codes on graphs: normal realization, IEEE rans. Information heory, vol. 47, pp , Feb [9] M. Molkaraie, An importance sampling scheme for models in a strong external field, Proc IEEE Int. ymp. on Inf. heory, Hong Kong, June 14 19, 2015, pp [10] M. Molkaraie, An importance sampling algorithm for the Ising model with strong couplings, Proc Int. Zurich eminar on Communications (IZ), March 2 4, 2016, pp [11] M. Molkaraie and V. Gómez, Efficient Monte Carlo methods for the Potts model at low temperature, arxiv: [12] I. Csiszár and P. C. hields, Information heory and tatistics: A utoral. now Publishers Inc., [13] A. Al-Bashabsheh and Y. Mao, Normal factor graphs and holographic transformations, IEEE rans. Information heory, vol. 57, pp , Feb [14] G. D. Forney, Jr., Codes on graphs: duality and MacWilliams identities, IEEE rans. Information heory, vol. 57, pp , Feb [15] M. Molkaraie and H.-A. oeliger, Partition function of the Ising model via factor graph duality, Proc IEEE Int. ymp. on Information heory, Istanbul, urkey, July 7 12, 2013, pp [16] A. Al-Bashabsheh and Y. Mao, On stochastic estimation of the partition function, Proc IEEE Int. ymp. on Information heory, Honolulu, UA, June 29 July 4, [17] R. M. Neal, Annealed importance sampling, tatistics and Computing, vol. 11, pp , 2001.
Marginal Densities, Factor Graph Duality, and High-Temperature Series Expansions
Marginal Densities, Factor Graph Duality, and High-Temperature Series Expansions Mehdi Molkaraie mehdi.molkaraie@alumni.ethz.ch arxiv:90.02733v [stat.ml] 7 Jan 209 Abstract We prove that the marginals
More informationEfficient Monte Carlo Methods for the Potts Model at Low Temperature
Efficient Monte Carlo Methods for the Potts Model at Low Temperature Mehdi Molkaraie Universitat Pompeu Fabra 08018 arcelona, Spain mehdi.molkaraie@upf.edu Vicenç Gómez Universitat Pompeu Fabra 08018 arcelona,
More informationExtending Monte Carlo Methods to Factor Graphs with Negative and Complex Factors
ITW 202 Extending Monte Carlo Methods to Factor Graphs with Negative and Complex Factors Mehdi Molkaraie and Hans-Andrea Loeliger ETH Zurich Dept. of Information Technology & Electrical Engineering 8092
More informationCodes on Graphs, Normal Realizations, and Partition Functions
Codes on Graphs, Normal Realizations, and Partition Functions G. David Forney, Jr. 1 Workshop on Counting, Inference and Optimization Princeton, NJ November 3, 2011 1 Joint work with: Heide Gluesing-Luerssen,
More informationCodes on graphs. Chapter Elementary realizations of linear block codes
Chapter 11 Codes on graphs In this chapter we will introduce the subject of codes on graphs. This subject forms an intellectual foundation for all known classes of capacity-approaching codes, including
More informationCONSTRAINED PERCOLATION ON Z 2
CONSTRAINED PERCOLATION ON Z 2 ZHONGYANG LI Abstract. We study a constrained percolation process on Z 2, and prove the almost sure nonexistence of infinite clusters and contours for a large class of probability
More informationEstimating the Capacity of the 2-D Hard Square Constraint Using Generalized Belief Propagation
Estimating the Capacity of the 2-D Hard Square Constraint Using Generalized Belief Propagation Navin Kashyap (joint work with Eric Chan, Mahdi Jafari Siavoshani, Sidharth Jaggi and Pascal Vontobel, Chinese
More informationA graph contains a set of nodes (vertices) connected by links (edges or arcs)
BOLTZMANN MACHINES Generative Models Graphical Models A graph contains a set of nodes (vertices) connected by links (edges or arcs) In a probabilistic graphical model, each node represents a random variable,
More informationProbabilistic and Bayesian Machine Learning
Probabilistic and Bayesian Machine Learning Day 4: Expectation and Belief Propagation Yee Whye Teh ywteh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit University College London http://www.gatsby.ucl.ac.uk/
More information13 : Variational Inference: Loopy Belief Propagation and Mean Field
10-708: Probabilistic Graphical Models 10-708, Spring 2012 13 : Variational Inference: Loopy Belief Propagation and Mean Field Lecturer: Eric P. Xing Scribes: Peter Schulam and William Wang 1 Introduction
More informationDoes Better Inference mean Better Learning?
Does Better Inference mean Better Learning? Andrew E. Gelfand, Rina Dechter & Alexander Ihler Department of Computer Science University of California, Irvine {agelfand,dechter,ihler}@ics.uci.edu Abstract
More informationPolar Codes: Graph Representation and Duality
Polar Codes: Graph Representation and Duality arxiv:1312.0372v1 [cs.it] 2 Dec 2013 M. Fossorier ETIS ENSEA/UCP/CNRS UMR-8051 6, avenue du Ponceau, 95014, Cergy Pontoise, France Email: mfossorier@ieee.org
More information3 : Representation of Undirected GM
10-708: Probabilistic Graphical Models 10-708, Spring 2016 3 : Representation of Undirected GM Lecturer: Eric P. Xing Scribes: Longqi Cai, Man-Chia Chang 1 MRF vs BN There are two types of graphical models:
More informationbound on the likelihood through the use of a simpler variational approximating distribution. A lower bound is particularly useful since maximization o
Category: Algorithms and Architectures. Address correspondence to rst author. Preferred Presentation: oral. Variational Belief Networks for Approximate Inference Wim Wiegerinck David Barber Stichting Neurale
More informationDual Decomposition for Marginal Inference
Dual Decomposition for Marginal Inference Justin Domke Rochester Institute of echnology Rochester, NY 14623 Abstract We present a dual decomposition approach to the treereweighted belief propagation objective.
More informationApproximating the Partition Function by Deleting and then Correcting for Model Edges (Extended Abstract)
Approximating the Partition Function by Deleting and then Correcting for Model Edges (Extended Abstract) Arthur Choi and Adnan Darwiche Computer Science Department University of California, Los Angeles
More informationMachine Learning for Data Science (CS4786) Lecture 24
Machine Learning for Data Science (CS4786) Lecture 24 Graphical Models: Approximate Inference Course Webpage : http://www.cs.cornell.edu/courses/cs4786/2016sp/ BELIEF PROPAGATION OR MESSAGE PASSING Each
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 7 Approximate
More informationFractional Belief Propagation
Fractional Belief Propagation im iegerinck and Tom Heskes S, niversity of ijmegen Geert Grooteplein 21, 6525 EZ, ijmegen, the etherlands wimw,tom @snn.kun.nl Abstract e consider loopy belief propagation
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 7 Approximate
More informationA = {(x, u) : 0 u f(x)},
Draw x uniformly from the region {x : f(x) u }. Markov Chain Monte Carlo Lecture 5 Slice sampler: Suppose that one is interested in sampling from a density f(x), x X. Recall that sampling x f(x) is equivalent
More informationThe Phase Transition of the 2D-Ising Model
The Phase Transition of the 2D-Ising Model Lilian Witthauer and Manuel Dieterle Summer Term 2007 Contents 1 2D-Ising Model 2 1.1 Calculation of the Physical Quantities............... 2 2 Location of the
More informationDiscrete Markov Random Fields the Inference story. Pradeep Ravikumar
Discrete Markov Random Fields the Inference story Pradeep Ravikumar Graphical Models, The History How to model stochastic processes of the world? I want to model the world, and I like graphs... 2 Mid to
More information13 : Variational Inference: Loopy Belief Propagation
10-708: Probabilistic Graphical Models 10-708, Spring 2014 13 : Variational Inference: Loopy Belief Propagation Lecturer: Eric P. Xing Scribes: Rajarshi Das, Zhengzhong Liu, Dishan Gupta 1 Introduction
More informationFactor Graphs and Message Passing Algorithms Part 1: Introduction
Factor Graphs and Message Passing Algorithms Part 1: Introduction Hans-Andrea Loeliger December 2007 1 The Two Basic Problems 1. Marginalization: Compute f k (x k ) f(x 1,..., x n ) x 1,..., x n except
More informationarxiv: v1 [stat.ml] 28 Oct 2017
Jinglin Chen Jian Peng Qiang Liu UIUC UIUC Dartmouth arxiv:1710.10404v1 [stat.ml] 28 Oct 2017 Abstract We propose a new localized inference algorithm for answering marginalization queries in large graphical
More informationVariational Inference (11/04/13)
STA561: Probabilistic machine learning Variational Inference (11/04/13) Lecturer: Barbara Engelhardt Scribes: Matt Dickenson, Alireza Samany, Tracy Schifeling 1 Introduction In this lecture we will further
More informationChapter 16. Structured Probabilistic Models for Deep Learning
Peng et al.: Deep Learning and Practice 1 Chapter 16 Structured Probabilistic Models for Deep Learning Peng et al.: Deep Learning and Practice 2 Structured Probabilistic Models way of using graphs to describe
More informationDirected and Undirected Graphical Models
Directed and Undirected Davide Bacciu Dipartimento di Informatica Università di Pisa bacciu@di.unipi.it Machine Learning: Neural Networks and Advanced Models (AA2) Last Lecture Refresher Lecture Plan Directed
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear
More informationProbabilistic Graphical Models
School of Computer Science Probabilistic Graphical Models Variational Inference III: Variational Principle I Junming Yin Lecture 16, March 19, 2012 X 1 X 1 X 1 X 1 X 2 X 3 X 2 X 2 X 3 X 3 Reading: X 4
More informationLecture 6: Graphical Models
Lecture 6: Graphical Models Kai-Wei Chang CS @ Uniersity of Virginia kw@kwchang.net Some slides are adapted from Viek Skirmar s course on Structured Prediction 1 So far We discussed sequence labeling tasks:
More informationHigh dimensional ising model selection using l 1 -regularized logistic regression
High dimensional ising model selection using l 1 -regularized logistic regression 1 Department of Statistics Pennsylvania State University 597 Presentation 2016 1/29 Outline Introduction 1 Introduction
More information14 : Theory of Variational Inference: Inner and Outer Approximation
10-708: Probabilistic Graphical Models 10-708, Spring 2017 14 : Theory of Variational Inference: Inner and Outer Approximation Lecturer: Eric P. Xing Scribes: Maria Ryskina, Yen-Chia Hsu 1 Introduction
More informationParticle Methods as Message Passing
Particle Methods as Message Passing Justin Dauwels RIKEN Brain Science Institute Hirosawa,2-1,Wako-shi,Saitama,Japan Email: justin@dauwels.com Sascha Korl Phonak AG CH-8712 Staefa, Switzerland Email: sascha.korl@phonak.ch
More informationUndirected graphical models
Undirected graphical models Semantics of probabilistic models over undirected graphs Parameters of undirected models Example applications COMP-652 and ECSE-608, February 16, 2017 1 Undirected graphical
More information3 Undirected Graphical Models
Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science 6.438 Algorithms For Inference Fall 2014 3 Undirected Graphical Models In this lecture, we discuss undirected
More informationProbabilistic Graphical Models. Theory of Variational Inference: Inner and Outer Approximation. Lecture 15, March 4, 2013
School of Computer Science Probabilistic Graphical Models Theory of Variational Inference: Inner and Outer Approximation Junming Yin Lecture 15, March 4, 2013 Reading: W & J Book Chapters 1 Roadmap Two
More informationCS242: Probabilistic Graphical Models Lecture 7B: Markov Chain Monte Carlo & Gibbs Sampling
CS242: Probabilistic Graphical Models Lecture 7B: Markov Chain Monte Carlo & Gibbs Sampling Professor Erik Sudderth Brown University Computer Science October 27, 2016 Some figures and materials courtesy
More informationRenormalization Group for the Two-Dimensional Ising Model
Chapter 8 Renormalization Group for the Two-Dimensional Ising Model The two-dimensional (2D) Ising model is arguably the most important in statistical physics. This special status is due to Lars Onsager
More informationApril 20th, Advanced Topics in Machine Learning California Institute of Technology. Markov Chain Monte Carlo for Machine Learning
for for Advanced Topics in California Institute of Technology April 20th, 2017 1 / 50 Table of Contents for 1 2 3 4 2 / 50 History of methods for Enrico Fermi used to calculate incredibly accurate predictions
More informationMarkov Chain Monte Carlo Lecture 4
The local-trap problem refers to that in simulations of a complex system whose energy landscape is rugged, the sampler gets trapped in a local energy minimum indefinitely, rendering the simulation ineffective.
More informationMarkov and Gibbs Random Fields
Markov and Gibbs Random Fields Bruno Galerne bruno.galerne@parisdescartes.fr MAP5, Université Paris Descartes Master MVA Cours Méthodes stochastiques pour l analyse d images Lundi 6 mars 2017 Outline The
More informationMAP Examples. Sargur Srihari
MAP Examples Sargur srihari@cedar.buffalo.edu 1 Potts Model CRF for OCR Topics Image segmentation based on energy minimization 2 Examples of MAP Many interesting examples of MAP inference are instances
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning More Approximate Inference Mark Schmidt University of British Columbia Winter 2018 Last Time: Approximate Inference We ve been discussing graphical models for density estimation,
More informationIntroduction to the Renormalization Group
Introduction to the Renormalization Group Gregory Petropoulos University of Colorado Boulder March 4, 2015 1 / 17 Summary Flavor of Statistical Physics Universality / Critical Exponents Ising Model Renormalization
More informationKyle Reing University of Southern California April 18, 2018
Renormalization Group and Information Theory Kyle Reing University of Southern California April 18, 2018 Overview Renormalization Group Overview Information Theoretic Preliminaries Real Space Mutual Information
More informationDoes the Wake-sleep Algorithm Produce Good Density Estimators?
Does the Wake-sleep Algorithm Produce Good Density Estimators? Brendan J. Frey, Geoffrey E. Hinton Peter Dayan Department of Computer Science Department of Brain and Cognitive Sciences University of Toronto
More informationIntroduction to Graphical Models. Srikumar Ramalingam School of Computing University of Utah
Introduction to Graphical Models Srikumar Ramalingam School of Computing University of Utah Reference Christopher M. Bishop, Pattern Recognition and Machine Learning, Jonathan S. Yedidia, William T. Freeman,
More informationIntroduction to Machine Learning CMU-10701
Introduction to Machine Learning CMU-10701 Markov Chain Monte Carlo Methods Barnabás Póczos & Aarti Singh Contents Markov Chain Monte Carlo Methods Goal & Motivation Sampling Rejection Importance Markov
More informationDEEP LEARNING CHAPTER 3 PROBABILITY & INFORMATION THEORY
DEEP LEARNING CHAPTER 3 PROBABILITY & INFORMATION THEORY OUTLINE 3.1 Why Probability? 3.2 Random Variables 3.3 Probability Distributions 3.4 Marginal Probability 3.5 Conditional Probability 3.6 The Chain
More informationPartition Functions of Normal Factor Graphs
Partition Functions of Normal Factor Graphs G. David Forney, r. Laboratory for nformation and Decision Systems Massachusetts nstitute of Technology Cambridge, M 02139, US forneyd@comcast.net Pascal O.
More informationKrammers-Wannier Duality in Lattice Systems
Krammers-Wannier Duality in Lattice Systems Sreekar Voleti 1 1 Department of Physics, University of Toronto, Toronto, Ontario, Canada M5S 1A7. (Dated: December 9, 2018) I. INTRODUCTION It was shown by
More informationProbabilistic Graphical Models Lecture Notes Fall 2009
Probabilistic Graphical Models Lecture Notes Fall 2009 October 28, 2009 Byoung-Tak Zhang School of omputer Science and Engineering & ognitive Science, Brain Science, and Bioinformatics Seoul National University
More informationCSC 412 (Lecture 4): Undirected Graphical Models
CSC 412 (Lecture 4): Undirected Graphical Models Raquel Urtasun University of Toronto Feb 2, 2016 R Urtasun (UofT) CSC 412 Feb 2, 2016 1 / 37 Today Undirected Graphical Models: Semantics of the graph:
More information13: Variational inference II
10-708: Probabilistic Graphical Models, Spring 2015 13: Variational inference II Lecturer: Eric P. Xing Scribes: Ronghuo Zheng, Zhiting Hu, Yuntian Deng 1 Introduction We started to talk about variational
More informationMarkov Chains and MCMC
Markov Chains and MCMC Markov chains Let S = {1, 2,..., N} be a finite set consisting of N states. A Markov chain Y 0, Y 1, Y 2,... is a sequence of random variables, with Y t S for all points in time
More information11. Learning graphical models
Learning graphical models 11-1 11. Learning graphical models Maximum likelihood Parameter learning Structural learning Learning partially observed graphical models Learning graphical models 11-2 statistical
More informationReplica Condensation and Tree Decay
Replica Condensation and Tree Decay Arthur Jaffe and David Moser Harvard University Cambridge, MA 02138, USA Arthur Jaffe@harvard.edu, David.Moser@gmx.net June 7, 2007 Abstract We give an intuitive method
More informationarxiv: v2 [math.pr] 26 Aug 2017
CONSTRAINED PERCOLATION, ISING MODEL AND XOR ISING MODEL ON PLANAR LATTICES ZHONGYANG LI arxiv:1707.04183v2 [math.pr] 26 Aug 2017 Abstract. We study constrained percolation models on planar lattices including
More informationRepresentation of undirected GM. Kayhan Batmanghelich
Representation of undirected GM Kayhan Batmanghelich Review Review: Directed Graphical Model Represent distribution of the form ny p(x 1,,X n = p(x i (X i i=1 Factorizes in terms of local conditional probabilities
More informationAn ABC interpretation of the multiple auxiliary variable method
School of Mathematical and Physical Sciences Department of Mathematics and Statistics Preprint MPS-2016-07 27 April 2016 An ABC interpretation of the multiple auxiliary variable method by Dennis Prangle
More informationBayesian Learning and Inference in Recurrent Switching Linear Dynamical Systems
Bayesian Learning and Inference in Recurrent Switching Linear Dynamical Systems Scott W. Linderman Matthew J. Johnson Andrew C. Miller Columbia University Harvard and Google Brain Harvard University Ryan
More informationIntroduction to Machine Learning Midterm Exam Solutions
10-701 Introduction to Machine Learning Midterm Exam Solutions Instructors: Eric Xing, Ziv Bar-Joseph 17 November, 2015 There are 11 questions, for a total of 100 points. This exam is open book, open notes,
More informationRapid Introduction to Machine Learning/ Deep Learning
Rapid Introduction to Machine Learning/ Deep Learning Hyeong In Choi Seoul National University 1/24 Lecture 5b Markov random field (MRF) November 13, 2015 2/24 Table of contents 1 1. Objectives of Lecture
More informationHandout 5. α a1 a n. }, where. xi if a i = 1 1 if a i = 0.
Notes on Complexity Theory Last updated: October, 2005 Jonathan Katz Handout 5 1 An Improved Upper-Bound on Circuit Size Here we show the result promised in the previous lecture regarding an upper-bound
More informationDistributed Estimation, Information Loss and Exponential Families. Qiang Liu Department of Computer Science Dartmouth College
Distributed Estimation, Information Loss and Exponential Families Qiang Liu Department of Computer Science Dartmouth College Statistical Learning / Estimation Learning generative models from data Topic
More informationIntroduction to Machine Learning Midterm Exam
10-701 Introduction to Machine Learning Midterm Exam Instructors: Eric Xing, Ziv Bar-Joseph 17 November, 2015 There are 11 questions, for a total of 100 points. This exam is open book, open notes, but
More informationProbabilistic Graphical Models
School of Computer Science Probabilistic Graphical Models Variational Inference IV: Variational Principle II Junming Yin Lecture 17, March 21, 2012 X 1 X 1 X 1 X 1 X 2 X 3 X 2 X 2 X 3 X 3 Reading: X 4
More informationIntelligent Systems:
Intelligent Systems: Undirected Graphical models (Factor Graphs) (2 lectures) Carsten Rother 15/01/2015 Intelligent Systems: Probabilistic Inference in DGM and UGM Roadmap for next two lectures Definition
More informationLow-Density Parity-Check Codes
Department of Computer Sciences Applied Algorithms Lab. July 24, 2011 Outline 1 Introduction 2 Algorithms for LDPC 3 Properties 4 Iterative Learning in Crowds 5 Algorithm 6 Results 7 Conclusion PART I
More informationModels of Language Acquisition: Part II
Models of Language Acquisition: Part II Matilde Marcolli CS101: Mathematical and Computational Linguistics Winter 2015 Probably Approximately Correct Model of Language Learning General setting of Statistical
More informationDavid B. Lukatsky and Ariel Afek Department of Chemistry, Ben-Gurion University of the Negev, Beer-Sheva Israel
Sequence correlations shape protein promiscuity David B. Lukatsky and Ariel Afek Department of Chemistry, Ben-Gurion University of the Negev, Beer-Sheva 84105 Israel Abstract We predict that diagonal correlations
More informationMarkov Chain Monte Carlo Inference. Siamak Ravanbakhsh Winter 2018
Graphical Models Markov Chain Monte Carlo Inference Siamak Ravanbakhsh Winter 2018 Learning objectives Markov chains the idea behind Markov Chain Monte Carlo (MCMC) two important examples: Gibbs sampling
More informationarxiv:cond-mat/ v2 [cond-mat.stat-mech] 26 May 1998
Self-dual property of the Potts model in one dimension arxiv:cond-mat/9805301v2 [cond-mat.stat-mech] 26 May 1998 F. Y. Wu Department of Physics Northeastern University, Boston, Massachusetts 02115 Abstract
More informationWalk-Sum Interpretation and Analysis of Gaussian Belief Propagation
Walk-Sum Interpretation and Analysis of Gaussian Belief Propagation Jason K. Johnson, Dmitry M. Malioutov and Alan S. Willsky Department of Electrical Engineering and Computer Science Massachusetts Institute
More informationMarkov Chain Monte Carlo (MCMC)
Markov Chain Monte Carlo (MCMC Dependent Sampling Suppose we wish to sample from a density π, and we can evaluate π as a function but have no means to directly generate a sample. Rejection sampling can
More informationS i J <ij> h mf = h + Jzm (4) and m, the magnetisation per spin, is just the mean value of any given spin. S i = S k k (5) N.
Statistical Physics Section 10: Mean-Field heory of the Ising Model Unfortunately one cannot solve exactly the Ising model or many other interesting models) on a three dimensional lattice. herefore one
More informationMIT Algebraic techniques and semidefinite optimization February 14, Lecture 3
MI 6.97 Algebraic techniques and semidefinite optimization February 4, 6 Lecture 3 Lecturer: Pablo A. Parrilo Scribe: Pablo A. Parrilo In this lecture, we will discuss one of the most important applications
More informationRegister machines L2 18
Register machines L2 18 Algorithms, informally L2 19 No precise definition of algorithm at the time Hilbert posed the Entscheidungsproblem, just examples. Common features of the examples: finite description
More information17 Variational Inference
Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science 6.438 Algorithms for Inference Fall 2014 17 Variational Inference Prompted by loopy graphs for which exact
More information4 : Exact Inference: Variable Elimination
10-708: Probabilistic Graphical Models 10-708, Spring 2014 4 : Exact Inference: Variable Elimination Lecturer: Eric P. ing Scribes: Soumya Batra, Pradeep Dasigi, Manzil Zaheer 1 Probabilistic Inference
More informationReview: Directed Models (Bayes Nets)
X Review: Directed Models (Bayes Nets) Lecture 3: Undirected Graphical Models Sam Roweis January 2, 24 Semantics: x y z if z d-separates x and y d-separation: z d-separates x from y if along every undirected
More informationUNDERSTANDING BELIEF PROPOGATION AND ITS GENERALIZATIONS
UNDERSTANDING BELIEF PROPOGATION AND ITS GENERALIZATIONS JONATHAN YEDIDIA, WILLIAM FREEMAN, YAIR WEISS 2001 MERL TECH REPORT Kristin Branson and Ian Fasel June 11, 2003 1. Inference Inference problems
More informationBayesian Learning in Undirected Graphical Models
Bayesian Learning in Undirected Graphical Models Zoubin Ghahramani Gatsby Computational Neuroscience Unit University College London, UK http://www.gatsby.ucl.ac.uk/ Work with: Iain Murray and Hyun-Chul
More informationApproximate inference, Sampling & Variational inference Fall Cours 9 November 25
Approimate inference, Sampling & Variational inference Fall 2015 Cours 9 November 25 Enseignant: Guillaume Obozinski Scribe: Basile Clément, Nathan de Lara 9.1 Approimate inference with MCMC 9.1.1 Gibbs
More informationAutomatic Differentiation Equipped Variable Elimination for Sensitivity Analysis on Probabilistic Inference Queries
Automatic Differentiation Equipped Variable Elimination for Sensitivity Analysis on Probabilistic Inference Queries Anonymous Author(s) Affiliation Address email Abstract 1 2 3 4 5 6 7 8 9 10 11 12 Probabilistic
More informationExpectation Consistent Free Energies for Approximate Inference
Expectation Consistent Free Energies for Approximate Inference Manfred Opper ISIS School of Electronics and Computer Science University of Southampton SO17 1BJ, United Kingdom mo@ecs.soton.ac.uk Ole Winther
More informationMachine Learning, Fall 2009: Midterm
10-601 Machine Learning, Fall 009: Midterm Monday, November nd hours 1. Personal info: Name: Andrew account: E-mail address:. You are permitted two pages of notes and a calculator. Please turn off all
More information6 Markov Chain Monte Carlo (MCMC)
6 Markov Chain Monte Carlo (MCMC) The underlying idea in MCMC is to replace the iid samples of basic MC methods, with dependent samples from an ergodic Markov chain, whose limiting (stationary) distribution
More informationStreaming Algorithms for Optimal Generation of Random Bits
Streaming Algorithms for Optimal Generation of Random Bits ongchao Zhou Electrical Engineering Department California Institute of echnology Pasadena, CA 925 Email: hzhou@caltech.edu Jehoshua Bruck Electrical
More informationAlternative Parameterizations of Markov Networks. Sargur Srihari
Alternative Parameterizations of Markov Networks Sargur srihari@cedar.buffalo.edu 1 Topics Three types of parameterization 1. Gibbs Parameterization 2. Factor Graphs 3. Log-linear Models with Energy functions
More informationGraphical Models and Independence Models
Graphical Models and Independence Models Yunshu Liu ASPITRG Research Group 2014-03-04 References: [1]. Steffen Lauritzen, Graphical Models, Oxford University Press, 1996 [2]. Christopher M. Bishop, Pattern
More informationStatistical Thermodynamics Solution Exercise 8 HS Solution Exercise 8
Statistical Thermodynamics Solution Exercise 8 HS 05 Solution Exercise 8 Problem : Paramagnetism - Brillouin function a According to the equation for the energy of a magnetic dipole in an external magnetic
More informationUndirected Graphical Models
Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Properties Properties 3 Generative vs. Conditional
More informationExpectation Propagation in Factor Graphs: A Tutorial
DRAFT: Version 0.1, 28 October 2005. Do not distribute. Expectation Propagation in Factor Graphs: A Tutorial Charles Sutton October 28, 2005 Abstract Expectation propagation is an important variational
More informationBayesian Model Scoring in Markov Random Fields
Bayesian Model Scoring in Markov Random Fields Sridevi Parise Bren School of Information and Computer Science UC Irvine Irvine, CA 92697-325 sparise@ics.uci.edu Max Welling Bren School of Information and
More informationWORLD SCIENTIFIC (2014)
WORLD SCIENTIFIC (2014) LIST OF PROBLEMS Chapter 1: Magnetism of Free Electrons and Atoms 1. Orbital and spin moments of an electron: Using the theory of angular momentum, calculate the orbital
More informationThe Generalized Distributive Law and Free Energy Minimization
The Generalized Distributive Law and Free Energy Minimization Srinivas M. Aji Robert J. McEliece Rainfinity, Inc. Department of Electrical Engineering 87 N. Raymond Ave. Suite 200 California Institute
More informationInformation, Physics, and Computation
Information, Physics, and Computation Marc Mezard Laboratoire de Physique Thdorique et Moales Statistiques, CNRS, and Universit y Paris Sud Andrea Montanari Department of Electrical Engineering and Department
More information