An Importance Sampling Algorithm for Models with Weak Couplings

Size: px

Start display at page:

Download "An Importance Sampling Algorithm for Models with Weak Couplings"

Beverly Hood
6 years ago
Views:

1 An Importance ampling Algorithm for Models with Weak Couplings Mehdi Molkaraie EH Zurich arxiv: v1 [cs.i] 4 Jul 2016 Abstract We propose an importance sampling algorithm to estimate the partition function of the Ising model and the q-state Potts model. he proposal (auxiliary) distribution is defined on a spanning tree of the Forney factor graph representing the model, and computations are done on the remaining edges. In contrast, in an analogous importance sampling algorithm in the dual Forney factor graph, computations are done on a spanning tree, and the proposal distribution is defined on the remaining edges. I. INRODUCION We consider the problem of estimating the partition function of the ferromagnetic Ising and q-state Potts models with spatially varying (bond-dependent) coupling parameters. he partition function is an important quantity in statistical physics [1], [2], in machine learning [3], and in information theory [4]. In general, the partition function is not available analytically but is only available as a summation with an exponential number of terms making its exact computation intractable. herefore, we rely on approximating the partition function [5] or deriving bounds on this quantity [6], [7]. In this paper, we first represent the models of interest with their modified Forney factor graphs (FFG), which are constructed via simple manipulations in the original FFG [8]. We then define a proposal distribution on a spanning tree in the modified FFG to propose an importance sampling algorithm for estimating the partition function. he algorithm can efficiently compute an estimate of the partition function when the coupling parameters associated with the edges that lie out of the spanning tree are weak. In contrast, similar importance sampling algorithms can be designed in the dual FFG of the models by defining the proposal distribution on the edges that lie out of a spanning tree of the model. In this case, the partition function can be efficiently estimated when the coupling parameters on the spanning are strong [9] [11]. he paper is organized as follows. In ection II, we review the Ising model, the q-state Potts model, and their graphical model representations in terms of FFGs. he modified FFGs of the models are presented in ection III. In ection IV, we describe the importance sampling algorithm for estimating the partition function. Contrast to analogous algorithms in the dual FFG is discussed is in ection V. II. HE MODE et X 1, X 2,..., X N be a collection of discrete random variables. uppose each random variable takes on values in a finite alphabet X, which in this context is equal to the abelian group Z/qZ = {0, 1,..., q 1}. et x i represent a possible realization of X i, x stand for a configuration (x 1, x 2,..., x N ), and X stand for (X 1, X 2,..., X N ). For simplicity, we assume ferromagnetic models, with periodic boundaries, with pairwise interactions, and without an external magnetic field. Although some of our results are applicable to more general settings. et f : X N R 0 be a non-negative function, which factors into a product of local functions υ k,l : X 2 R 0 as f(x) = υ k,l (x k, x l ) (1) (k, l) E where E contains all the unordered pairs (k, l) with non-zero interactions. A real coupling parameter J k,l is associated with the interacting pair (x k, x l ). From (1), we define the following probability mass function (known as the Boltzmann distribution [1]) p(x) = f(x) Z Here, the normalization constant Z is the partition function given by Z = f(x) (3) x X N A. he Ising Model In the Ising model, q = 2 and { e J k,l, if x υ k,l (x k, x l ) = k = x l e J (4) k,l, if x k x l he model is called ferromagnetic (resp. antiferromagnetic) if J k,l > 0 (resp. J k,l < 0) for each (k, l) E. If the couplings can be both positive or negative, the model is known as an Ising spin glass. B. he q-tate Potts Model In the Potts model, q > 2 and { e J k,l, if x υ k,l (x k, x l ) = k = x l (5) 1, if x k x l where J k,l > 0 in a ferromagnetic model. (2)

2 C. FFG of the Models he factorization in (1) can be represented by a FFG, in which nodes represent the factors and edges represent the variables. he edge that represents some variable x is connected to the node representing the factor υ( ) if and only if x is an argument of υ( ). If a variable (an edge) appears in more than two factors, such a variable is replicated using equality indicator factors [8]. he FFG of the 2D Ising model with pairwise (nearestneighbor) interactions is shown in Fig. 1, where the unlabeled boxes represent factors (4) and the boxes labeled = are equality indicator factors. E.g., in Fig. 1, for variables X, X, X, and X the equality indicator factor is given by Φ = (x, x, x, x ) = δ(x x ) δ(x x ) δ(x x )(6) where δ( ) is the Kronecker delta function. imilarly, Fig. 1 shows the FFG of the 2D Potts model with pairwise interactions, where the unlabeled boxes represent factors as in (5). Note that in a 2D model with periodic boundary conditions E = 2N (7) III. HE MODIFIED FFG In this section, we present the modified FFG of the Ising and Potts models. Recall that all arithmetic manipulations are done modulo 2 in the case of the Ising model, and modulo q in the case of the Potts model. A. Modified FFG of the Ising Model We note that each factor (4) is only a function of x k + x l, we can thus represent υ k,l ( ) using only one variable y m. We thus let { e J m, if y υ m (y m ) = (8) e Jm, if y m = 1 Following the above observation, we can build the modified FFG of the 2D Ising model as shown in Fig. 2, where the unlabeled boxes represent (8) and boxes labeled + are mod 2 indicator factors, which impose the constraint that all their incident variables sum to zero (modulo 2). E.g., in Fig. 2, for binary variables X 1, X 2, and Y 1 the mod 2 indicator factor is given by Φ + (y 1, x 1, x 2 ) = δ(y 1 + x 1 + x 2 ) (9) et Y = (Y 1, Y 2,..., Y E ) be the set of all the variables attached to the mod 2 indicator factor factors and y be a realization of Y. Here, E denotes the cardinality of E, which is also equal to the number of unordered interacting pairs in the model. emma 1. Consider a cycle of length c in the modified FFG of the Ising model. For variables Y 1, Y 2,..., Y c attached to the mod 2 indicator factors in the cycle, it holds that c Y (10) m=1 = = = = Φ = X X = X = = = X = = = = = = = = Fig. 1: he FFG of the 2D Ising model with nearest-neighbor interactions, where the unlabeled boxes represent (4) and boxes containing = symbols are given by (6). Y c X = 1 Y 1 X + = 2 Y 2 X + = 3 + = Φ + Φ = Fig. 2: Modified FFG of the 2D Ising model with nearestneighbor interactions, where the unlabeled boxes represent (8), boxes containing + symbols are as in (9), and boxes containing = symbols are given by (6). he thick edges show a cycle of length c, where variables Y 1, Y 2,..., Y c on the cycle are marked blue. Proof. In (10), each Y m can be expanded as the symmetric difference of the corresponding adjacent variables (X k, X l ) in the cycle. Moreover, each variable appears twice in this expansion. We conclude that c m=1 Y. An example of a cycle is shown by thick edges in Fig. 2, where Y 1, Y 2,..., Y c are marked by blue edges. B. Modified FFG of the q-tate Potts Model In this case, each factor (5) is only a function of x k x l. imilar to our approach in ection III-A, we represent (5) as { e J k, if y υ k (y k ) = k = 0 (11) 1, otherwise. he modified FFG of the 2D Potts model is shown in Fig. 3, where the unlabeled boxes represent factors (11) and boxes labeled + are mod q indicator factors, which impose the

3 X = 1 Y 1 Φ = X + = 2 + = + = Φ + = + = + = + = = + = + = + = Y 1 X = 1 X + = 2 + = + = + = + + = = + + = + = = + = + = + = Fig. 3: Modified FFG of the 2D Potts model with nearestneighbor interactions, where the unlabeled boxes represent (11), boxes containing + symbols are as in (12), boxes containing = symbols are given by (6), and the small circles attached to mod q indicator factors denote sign inverters. constraint that all their incident variables sum to zero (modulo q). E.g., in Fig. 4, for variables X 1, X 2, and Y 1 the mod q indicator factor is given by Φ + (y 1, x 1, x 2, ) = δ(y 1 + x 1 x 2 ) (12) A sign inverter (depicted by a small circle) is inserted in one of the edges incident to the mod q indicator factors. However, the choice on which side to insert them can be made arbitrarily (because of the symmetry in the factors). here is again a linear dependency among the variables Y 1, Y 2,..., Y c in any cycle of length c in the modified FFG of the Potts model. However, the dependency among variables is affected by the arrangement of the sign inverters. As an example, in Fig. 4 we have arranged the sign inverters in a way that the sum of the variable in any cycle of length four be zero: each Y m can be expanded as the difference between the corresponding adjacent variables (X k, X l ) in the cycle; furthermore, each variable appears twice in this expansion, once with a positive sign, and once with a negative sign. C. Variables in the Modified FFG We partition E into two disjoint subsets and, where is a spanning tree in the modified FFG. hus Y is also partitioned into Y and Y. In such a partitioning, Y can be computed as linear combination of Y (cf., emma 1). An example of a spanning tree in the modified FFG of the 2D Ising model is illustrated in Fig. 5, where the thick blue edges represent Y and the thin red edges represent Y. Here, Y is a linear combination of Y. In a 2D grid with periodic boundary conditions, we have = N 1 (13) = N + 1 (14) + = + + = Fig. 4: Another modified FFG of a 2D Potts model with nearest-neighbor interactions, where in every cycle of length four, the sum of variables attached to mod q indicator factors is zero (modulo q). Accordingly, let Υ (y ) = m υ m (y m ) (15) Υ (y ) = m υ m (y m ) (16) We define the following proposal probability mass function on a spanning tree in the modified FFG with q(y ) = Υ (y ) Z q (17) Z q = y Υ (y ) (18) In this set-up, Z q is available in closed form. For the Ising model, we obtain Z q = m (e Jm + e Jm ) (19) = 2 cosh J m (20) m and for the q-state Potts model We also let Z q = m q 1 υ m (t) (21) t=0 = m (e Jm + q 1) (22) Υ(y) = m E υ m (y m ) (23) he global probability mass function in the modified FFG can then be defined as p M (y) = Υ(y) Z M (24)

4 where Z M is the partition function of the modified FFG. he partition functions Z and Z M are closely related. emma 2. In a ferromagnetic Ising model, the partition functions Z and Z M are related to each other by Z = 2Z M (25) Proof. et x be the component-wise addition of x and the all-ones vector, i.e., in x, components of x that are 0 become 1, and those that are 1 become 0. From each x, we can create a valid configuration y in the modified NFG. But then, x will give rise to the same configuration y. Hence, each valid configuration y in the modified NFG corresponds to two configurations x and x. Due to the symmetry in the factors, these configurations contribute equally to the sum in (3); therefore, Z = 2Z M. emma 3. For a ferromagnetic q-state Potts model, the partition functions Z and Z M are related to each other by Z = qz M (26) he proof follows along the same lines as in the proof of emma 2. In ection IV, we propose an importance sampling algorithm in the modified FFG of the Ising model and the q- state Potts model to estimate Z M, which can then be used to compute an estimate Z. IV. IMPORANCE AMPING IN HE MODIFIED FFG he importance sampling algorithm works as follows. We first draw independent samples y (1), y(2),... according to q(y ) in (17), and therefrom compute y (1), y(2),.... hese samples are then used to compute an estimate of Z M. Drawing independent samples according to q(y ) is straightforward. For the Ising model, the product form of (15) suggests that to draw y (l) we can do the following 1: draw u (l) 1, u(l) 2,..., u(l) i.i.d. U[0, 1] 2: for m = 1 to do 3: if u (l) 1 m < 2Jm 1 + e then 4: y m (l) = 0 5: else 6: y m (l) = 1 7: end if 8: end for In ine e 2Jm = υ m (0) υ m (0) + υ m (1) (27) which is equal to sigm(2j k ), where sigm( ) denotes the sigmoid (logistic) function [3, Chapter 1]. imilarly, in the case of the q-state Potts model, we can apply the following subroutine to draw independent samples y (l) according to the corresponding proposal distribution. Fig. 5: A spanning tree in the modified FFG of the 2D Ising model. he thick blue edges represent Y and the thin red edges represent Y. Here, Y is a linear combination of Y. 1: draw u (l) 1, u(l) 2,..., u(l) i.i.d. U[0, 1] 2: for m = 1 to do 3: if u (l) m < 4: y (l) 5: else 1 then Jm 1 + (q 1)e 6: draw y m (l) randomly from {1, 2,..., q 1} 7: end if 8: end for After drawing y (l), we compute y(l). We then use the samples in the following importance sampling algorithm. 1: for l = 1 to do 2: draw y (l) according to q(y ) 3: compute y (l) 4: end for 5: compute Ẑ I M = Z q l=1 Υ (y (l) ) (28) We show that ẐM I is an unbiased estimator of Z M. E q [ ẐI M ] = Z q [ E q Υ (Y ) ] = y = Z M l=1 Υ (y )Υ (y ) A. he Variance of ẐM I For a finite-size model, the variance of ẐI M as can be computed [ (ẐI ) ] 2 Var[ ẐI M ] = E M ( E [ ẐM]) I 2 (29) = ( 1 Z 2 q E q [ Υ 2 (Y ) ] ZM 2 ) (30)

5 Hence Var[ ẐI M ] = Z 2 M = y ( Zq Z M ) 2 Eq [ Υ 2 (Y ) ] 1 (31) p 2 M (y) q(y) 1 (32) = χ 2( p M, q ) (33) where χ 2 (, ) denotes the chi-squared divergence, which is non-negative, with equality to zero if and only if its two arguments are equal [12, Chapter 4]. For simplicity, let us assume that for m, the coupling parameters of the model are constant denoted by J. In the limit J 0, we have Hence lim p M(y) = q(y) (34) J 0 lim χ2( p M, q ) = 0 (35) J 0 herefore, Z M can be estimated efficiently via the proposed importance sampling estimator when J m is small for m. V. IMPORANCE AMPING IN HE DUA FFG We briefly discuss an analogous importance sampling algorithm in the dual FFG of the q-state Potts model. In the dual FFG, we denote the partition function by Z d. We will use the tilde symbol to denote variables in the dual domain. he partition functions, Z and Z d, are related to each other via the normal factor graph duality theorem [13], [14]. In a 2D q-state Potts model with periodic boundary conditions, according to the normal factor graph duality theorem From (26), we obtain Z d = q N Z (36) Z d = q N+1 Z M (37) he primal and dual FFGs have the same topology. In the dual FFG, factors are replaced by their Fourier transforms and variables are replaced by their corresponding dual variables. he dual FFG of the 2D q-state Potts model is shown in Fig. 6, where the unlabeled boxes represent factors as { e J m + q 1, if ỹ γ m (ỹ m ) = (38) e Jm 1, otherwise, which is the one-dimensional discrete Fourier transform of (11), boxes labeled + are mod q indicator factors as in (9), and boxes containing = symbols are equality indicator factors given by (6). he sign inverters are depicted by small circles attached to equality indicator factors. For more details on the dual FFG of the Potts model, see [11]. We again partition E into two disjoint subsets and, where is a spanning tree in the dual FFG. he set of random variables Ỹ (represented by the edges/bonds), is also partitioned into Ỹ and Ỹ. However, in the dual FFG Ỹ can be computed as linear combination of Ỹ. In other Φ = Ỹ 1 Φ + Fig. 6: he dual FFG of the 2D Potts model with nearestneighbor interactions, where the unlabeled boxes represent (38), boxes containing + symbols are as in (9), boxes containing = symbols are given by (6), and the small circles attached to equality indicator factors denote sign inverters. he thick edges show a spanning tree in the dual FFG. words, for a given realization of Ỹ, we can compute Ỹ deterministically. Notice that Ỹ = E. An example of such a partitioning is shown in Fig. 6, where Ỹ is the set of all the variables associated with the thick edges and Ỹ is the set of all the variables associated with the remaining thin edges. et Γ (ỹ ) = m γ m (ỹ m ) (39) Γ (ỹ ) = m γ m (ỹ m ) (40) We define the following proposal probability mass function where Z qd is analytically available as We let q d (ỹ ) = Γ (ỹ ) Z qd (41) Z qd = ỹ Γ (ỹ ) (42) = q 1 γ m (t) (43) m t=0 ( ) = q exp J m (44) m Γ(ỹ) = m E γ m (ỹ m ) (45) and define the global probability mass function in the dual FFG as p d (ỹ) = Γ(ỹ) Z d (46)

6 where Z d is the partition function of the dual FFG. he importance sampling algorithm works as follows: at iteration l, we draw a sample ỹ (l) according to the proposal distribution (41). he product form of (39) suggests that in order to draw ỹ (l) we can apply the following subroutine [11]. 1: draw u (l) 1, u(l) 2,..., u(l) i.i.d. U[0, 1] 2: for m = 1 to do 3: if u (l) m < 4: ỹ (l) 5: else 1 + (q 1)e Jm q then 6: draw ỹ m (l) randomly from {1, 2,..., q 1} 7: end if 8: end for After drawing ỹ (l), we compute ỹ(l). Finally, we use the following importance sampling algorithm to estimate Z d. 1: for l = 1 to do 2: draw ỹ (l) according to q(ỹ ) 3: compute ỹ (l) 4: end for 5: compute Here, Ẑ I d Ẑ I d = Z q d l=1 is an unbiased estimator of Z d, i.e., Γ (ỹ (l) ) (47) E qd [ ẐI d ] = Z d (48) see [10]. imilar to our approach in ection IV-A, we can show Var[ ẐI d ] = χ 2( ) p d, q d (49) Z 2 d For simplicity, we assume that for m, the coupling parameters of the model are constant denoted by J. In the limit J hus lim p d(ỹ) = q d (ỹ) (50) J lim J Z 2 d Var[ ẐI d ] = 0 (51) We conclude that Z d can be estimated efficiently via the importance sampling estimator when J m is large for m. For more details on constructing the dual FFG of the Ising model an the q-state Potts model, see [15], [16], [9] [11]. VI. CONCUION We proposed an importance sampling algorithm in the modified FFG of the Ising model and the q-state Potts model to estimate the partition function. he proposal distribution of the importance sampling algorithm is defined on a spanning tree of the model. he algorithm can efficiently compute an estimate of the partition function when the coupling parameters associated with the edges that lie out of the spanning tree are weak. In contrast, the proposal distribution for the analogous importance sampling algorithm in the dual FFG is defined on the edges that lie out of the spanning tree. In this case, accurate estimates of the partition function can be obtained when couplings associated with the edges of the spanning tree are strong. he methods can handle more demanding cases when combined with annealed importance sampling [17]. REFERENCE [1] J. M. Yeomans, tatistical Mechanics of Phase ransitions. Oxford University Press, [2] R. J. Baxter, Exactly olved Models in tatistical Mechanics. Dover Publications, [3] K. P. Murphy, Machine earning: A Probabilistic Perspective. he MI Press, [4] D. J. C. MacKay, Information heory, Inference, and earning Algorithms. Cambridge University Press, [5] G. Potamianos and J. Goutsias, tochastic approximation algorithms for partition function estimation of Gibbs random fields, IEEE rans. Information heory, vol. 43, pp , Nov [6] J.. Yedidia, An idiosyncratic journey beyond mean field theory, in Advanced Mean Field Methods: heory and Practice. he MI Press, 2001, pp [7] M. J. Wainwright,.. Jaakkola, and A.. Willsky, A new class of upper bounds on the log partition function, IEEE rans. Information heory, vol. 51, pp , July [8] G. D. Forney, Jr., Codes on graphs: normal realization, IEEE rans. Information heory, vol. 47, pp , Feb [9] M. Molkaraie, An importance sampling scheme for models in a strong external field, Proc IEEE Int. ymp. on Inf. heory, Hong Kong, June 14 19, 2015, pp [10] M. Molkaraie, An importance sampling algorithm for the Ising model with strong couplings, Proc Int. Zurich eminar on Communications (IZ), March 2 4, 2016, pp [11] M. Molkaraie and V. Gómez, Efficient Monte Carlo methods for the Potts model at low temperature, arxiv: [12] I. Csiszár and P. C. hields, Information heory and tatistics: A utoral. now Publishers Inc., [13] A. Al-Bashabsheh and Y. Mao, Normal factor graphs and holographic transformations, IEEE rans. Information heory, vol. 57, pp , Feb [14] G. D. Forney, Jr., Codes on graphs: duality and MacWilliams identities, IEEE rans. Information heory, vol. 57, pp , Feb [15] M. Molkaraie and H.-A. oeliger, Partition function of the Ising model via factor graph duality, Proc IEEE Int. ymp. on Information heory, Istanbul, urkey, July 7 12, 2013, pp [16] A. Al-Bashabsheh and Y. Mao, On stochastic estimation of the partition function, Proc IEEE Int. ymp. on Information heory, Honolulu, UA, June 29 July 4, [17] R. M. Neal, Annealed importance sampling, tatistics and Computing, vol. 11, pp , 2001.

Marginal Densities, Factor Graph Duality, and High-Temperature Series Expansions

Marginal Densities, Factor Graph Duality, and High-Temperature Series Expansions Mehdi Molkaraie mehdi.molkaraie@alumni.ethz.ch arxiv:90.02733v [stat.ml] 7 Jan 209 Abstract We prove that the marginals