Density Propagation and Improved Bounds on the Partition Function

Size: px

Start display at page:

Download "Density Propagation and Improved Bounds on the Partition Function"

Lionel Smith
5 years ago
Views:

1 Densty Propagaton and Improved Bounds on the Partton Functon Stefano Ermon, Carla P. Gomes Dept. of Computer Scence Cornell Unversty Ithaca NY 14853, U.S.A. Ashsh Sabharwal IBM Watson esearch Ctr. Yorktown Heghts NY 10598, U.S.A. Bart Selman Dept. of Computer Scence Cornell Unversty Ithaca NY 14853, U.S.A. Abstract Gven a probablstc graphcal model, ts densty of states s a functon that, for any lkelhood value, gves the number of confguratons wth that probablty. We ntroduce a novel message-passng algorthm called Densty Propagaton (DP) for estmatng ths functon. We show that DP s exact for tree-structured graphcal models and s, n general, a strct generalzaton of both sum-product and maxproduct algorthms. Further, we use densty of states and tree decomposton to ntroduce a new famly of upper and lower bounds on the partton functon. For any tree decomposton, the new upper bound based on fner-graned densty of state nformaton s provably at least as tght as prevously known bounds based on convexty of the log-partton functon, and strclty stronger f a general condton holds. We conclude wth emprcal evdence of mprovement over convex relaxatons and mean-feld based bounds. 1 Introducton Assocated wth any undrected graphcal model [1] s the so-called densty of states, a term borrowed from statstcal physcs ndcatng a functon that, for any lkelhood value, gves the number of confguratons wth that probablty. The densty of states plays an mportant role n statstcal physcs because t provdes a fne graned descrpton of the system, and can be used to effcently compute many propertes of nterests, such as the partton functon and ts parameterzed verson [, 3]. It can be seen that computng the densty of states s computatonally ntractable n the worst case, snce t subsumes a #-P complete problem (computng the partton functon) and an NP-hard one (MAP nference). All current approxmate technques estmatng the densty of states are based on samplng, the most promnent beng the Wang-Landau algorthm [3] and ts mproved varants []. These methods have been shown to be very effectve n practce. However, they do not provde any guarantee on the qualty of the results. Furthermore, they gnore the structure of the underlyng graphcal model, effectvely treatng the energy functon (whch gves the log-lkelhood of a confguraton) as a black-box. As a frst step towards explotng the structure of the graphcal model when computng the densty of states, we propose an algorthm called DENSITYPOPAGATION (DP). The algorthm s based on dynamc programmng and can be convenently expressed n terms of message passng on the graphcal model. We show that DENSITYPOPAGATION computes the densty of states exactly for any tree-structured graphcal model. It s closely related to the popular Sum-Product (Belef Propagaton, BP) and Max-Product (MP) algorthms, and can be seen as a generalzaton of both. However, t computes somethng much rcher, namely the densty of states, whch contans nformaton such as the partton functon and varable margnals. Although we do not work at the level of ndvdual confguratons, DENSITYPOPAGATION allows us to reason n terms of groups of confguratons wth the same probablty (energy). Beng able to solve nference tasks for certan tractable classes of problems (e.g., trees) s mportant because one can often decompose a complex problem nto tractable subproblems (such as spannng 1

2 trees) [4], and the solutons to these smpler problems can be combned to recover useful propertes of the orgnal graphcal model [5, 6]. In ths paper we show that by combnng the addtonal nformaton gven by the densty of states, we can obtan a new famly of upper and lower bounds on the partton functon. We prove that the new upper bound s always at least as tght as the one based on the convexty of the log-partton functon [4], and we provde a general condton where the new bound s strctly tghter. Further, we llustrate emprcally that the new upper bound mproves upon the convexty-based one on Isng grd and clque models, and that the new lower bound s emprcally slghtly stronger than the one gven by mean-feld theory [4, 7]. Problem defnton and setup We consder a graphcal model specfed as a factor graph wth N = V dscrete random varables x, V where x X. The global random vector x = {x s, s V } takes value n the cartesan product X = X 1 X X N, wth cardnalty D = X = N =1 X. We consder a probablty dstrbuton over elements x X (called confguratons) p(x) = 1 Z ψ α ({x} α ) (1) α I that factors nto potentals or factors ψ α : {x} α +, where I s an ndex set and {x} α V a subset of varables the factor ψ α depends on. The correspondng factor graph s a bpartte graph wth vertex set V I. In the factor graph, each varable node V s connected wth all the factors α I that depend on. Smlarly, each factor node α I s connected wth all the varable nodes {x} α. We denote the neghbours of and α by N () and N (α) respectvely. We wll also make use of the related exponental representaton [8]. Let φ be a collecton of potental functons {φ α, α I}, defned over the ndex set I. Gven an exponental parameter vector Θ = {Θ α, α I}, the exponental famly defned by φ s the famly of probablty dstrbutons over X defned as follows: p(x, Θ) = 1 1 exp(θ φ(x)) = Z(Θ) Z(Θ) exp ( ) Θ α φ α ({x} α ) α I () Gven an exponental famly, we defne the densty of states [] as a functon n : I : (E, Θ) {x X Θ φ(x) = E} where for any exponental parameter Θ, t holds that + n(e, Θ)dE = X. We wll refer to the quantty α I Θ αφ α ({x} α ) as the energy of a confguraton x. The densty n(e, Θ) s the partton functon for the mcrocanoncal ensemble (solated system at equlbrum wth constant energy, volume, and number of partcles), whle Z(Θ) s the partton functon for the tradtonal canoncal ensemble. We wll denote wth n(e) (omttng the parameter) the densty of states of the orgnal factor graph. 3 Densty Propagaton Snce any propostonal Satsfablty (SAT) nstance can be effently encoded as a factor graph (e.g., by defnng a unform probablty measure over satsfyng assgnments), t s clear that computng the densty of states s computatonally ntractable n the worst case, as a generalzaton of an NP- Complete problem (satsfablty testng) and a #-P complete problem (model countng). We show that the densty of states can be computed effcently 1 for acyclc graphcal models. We provde a Dynamc Programmng algorthm, whch can also be nterpreted as a message passng algorthm on the factor graph, called DENSITYPOPAGATION (DP), whch computes the densty of states exactly for acyclc graphcal models. 1 Polynomal n the cardnalty of the functon s support, whch could be exponental n N n the worst case.

3 3.1 Densty propagaton equatons DENSITYPOPAGATION works by exchangng messages from varable to factor nodes and vce versa. Unlke tradtonal message passng algorthms, where messages represent margnal probabltes (vectors of real numbers), for every x X a DENSITYPOPAGATION message m a (x ) represents an unnormalzed dscrete probablty dstrbuton wth a fnte alphabet (a margnal densty of states). We use the notaton m a (x )(E) to denote the value of the functon m a (x ) evaluated at pont E. At every teraton, messages are updated accordng to the followng rules. The message from varable node to factor node a s updated as follows: m a (x ) = m b (x ) (3) where s the convoluton operator (commutatve, assocatve and dstrbutve). Intutvely, the convoluton operaton corresponds to workng wth the sum of (condtonally) ndependent random varables, such as the ones correspondng to dfferent subtrees n a tree-structured graphcal model. The message from factor a to varable s updated as follows: m a (x ) = m j a (x j ) δ Eα({x} α) (4) where δ Eα({x} α) s a Drac delta functon centered at E α (x α ) = log ψ α (x α ). For tree structured graphcal models, DENSITYPOPAGATION converges after a fnte number of teratons, ndependent of the ntal condton, to the true densty of states. Formally, Theorem 1. For any varable s V and E, for any ntal condton, after a fnte number of teratons q X s b N () m b (q)(e) = {x X α log ψ α(x α ) = E}. The proof s by nducton on the sze of the tree (omtted due to lack of space). The most effcent message update schedule for tree structured models s a two-pass procedure where messages are frst sent from the leaves to the root node, and then propagated backwards from the root to the leaves. However, as wth other message-passng algorthms, for tree structured problems the algorthm wll converge wth ether a sequental or a parallel update schedule, wth any ntal condton for the messages. Although DP requres the same number of messages updates as BP and MP, DP updates are more expensve because they requre the computaton of convolutons. In the worst case, the densty of states can have an exponental number of non-zero entres (.e., E such that n(e) > 0, whch we wll also refer to as buckets ), for nstance when potentals are set to logarthms of prme numbers, so that every x X has a dfferent probablty. However, n many practcal problems of nterest (e.g., Isng models, grounded Markov Logc Networks [9]), the number of energy buckets s lmted. Another key property of equatons (4) and (3) s that, unlke Belef Propagaton and Max- Product algorthms, the message update operator s lnear, although n a hgher dmensonal space of probablty dstrbutons elatonshp wth sum and max product algorthms DENSITYPOPAGATION s closely related to tradtonal message passng algorthms such as BP (Belef Propagaton, Sum-Product) and MP (Max-Product), snce t s based on the same (condtonal) ndependence assumptons. Specfcally, as shown by the next theorem, both BP and MP can be seen as smplfed versons of DENSITYPOPAGATION that only consder certan global statstcs of the dstrbutons represented by DENSITYPOPAGATION messages. Theorem. Assumng the same ntal condton and message update schedule, at every teraton k we can recover Belef Propagaton and Max-Product margnals from DENSITYPOPAGATION messages. Proof. The Max-Product algorthm corresponds to only consder the entry assocated wth the hghest probablty,.e. max{e m j (x j )(E) > 0}. For compactness, let us defne ths quantty γ j (x j ) max E {E m j (x j )(E) > 0}. Accordng to DP update n equaton (3), the quanttes γ a (x ) are updated as follows γ a (x ) = max {E m b (x )(E) > 0} = γ b (x ) E 3

4 Usng equaton (4) max{e max E γ a (x ) = max {E E m j a (x j ) δ Eα({x} α)(e) > 0} = max m j a (x j ) δ Eα({x} α)(e) > 0} = γ j a (x j ) + E α ({x} α ) These results show that the quanttes γ j (x j ) are updated accordng to the Max-Product algorthm (wth messages n log-scale). To see the relatonshp wth BP, for every DP message m j (x j ), let us defne µ j (x j ) = m j (x j )(E) exp(e) 1 = m j (x j )(E) exp(e) E Notce that µ j (x j ) would correspond to an unnormalzed margnal probablty, assumng that m j (x j ) s the densty of states of the problem when varable j s clamped to value x j. Accordng to DP update n equaton (3), the quanttes µ a (x ) are updated as follows µ a (x ) = m a (x )(E) exp(e) 1 = m b (x )(E) exp(e) 1 = µ b (x ) that s, we recover BP updates of messages from varable to factor nodes. Smlarly, usng (4) µ a (x ) = µ a (x )(E) exp(e) 1 = m j a (x j ) δ Eα({x} α)(e) exp(e) 1 = m j a (x j ) δ Eα({x} α)(e) exp(e) 1 = ψ α ({x} α ) µ j a (x ) we recover BP updates from factors to varable nodes for the µ j quanttes (whch correspond to margnals computed accordng to the estmated denstes). Smlarly, f we defne temperature versons of the margnals µ T j (x j) m j (x j )(E) exp(e/t ) 1, we recover the temperatureversons of Belef Propagaton updates, smlar to [10] and [11]. As other message passng algorthms, DENSITYPOPAGATION updates are well defned also for loopy graphcal models, even though there s no guarantee of convergence or correctness [1]. The correspondence wth BP and MP (Theorem ) however stll holds: f loopy BP converges, then the correspondng quanttes µ j computed from DP messages wll converge as well, and to the same value (assumng the same ntal condton and update schedule). Notce however that the convergence of the µ j does not mply the convergence of DENSITYPOPAGATION messages (e.g. n probablty, law, or L p ). In fact, we have observed emprcally that the stuaton where µ j converge but m j do not converge (not even n dstrbuton) s farly common. It would be nterestng to see f there s a varatonal nterpretaton for DENSITYPOPAGATION equatons, lke n [13]. Notce also that Juncton Tree style algorthms could also be used n conjuncton wth DP updates for the messages. 4 Boundng densty of states usng tractable famles Usng technques lke DENSITYPOPAGATION, we can compute the densty of states exactly for tractable famles such as tree-structured graphcal models. Let p(x, Θ ) be a general (ntractable) probablstc model of nterest, and let Θ be a famly of tractable parameters (e.g., correspondng to trees) such that Θ s a convex combnaton of Θ, as defned formally below and used prevously by Wanwrght et al. [5, 6]. See below (Fgure 1) for an example of a possble decomposton of a Isng model nto tractable dstrbutons. By computng the partton functon or MAP estmates for the tree structured subproblems, Wanwrght et al. showed that one can recover useful nformaton about the orgnal ntractable problem, for nstance by explotng convexty of the logpartton functon log Z(Θ). 4

5 We present a way to explot the decomposton dea to derve an upper bound on the densty of states n(e, Θ ) of the orgnal ntractable model, despte the fact that densty of states s not a convex functon. The result below gves a pont-by-pont upper bound whch, to the best of our knowledge, s the frst bound of ths knd for densty of states. Theorem 3. Let Θ = n =1 γ Θ, n =1 γ = 1, and y n = E n 1 =1 y. Then n n(e, Θ)... mn {n(y, γ Θ )} dy 1 dy... dy n 1 =1 Proof. From the defnton of densty of states and usng 1 {} to denote the 0-1 ndcator functon, n(e, Θ ) = 1 {Θ φ(x)=e} = = ( n... =1 ( n =... = {( γθ)φ(x)=e} 1 {γθ φ(x)=y } 1 {γθ φ(x)=y } =1 ( n mn =1 n mn =1 { ) ) dy 1 dy... dy n 1 dy 1 dy... dy n 1 { } ) 1{γΘ φ(x)=y } dy 1 dy... dy n 1 ( ) } 1{γΘ φ(x)=y } dy 1 dy... dy n 1 n 1 where y n = E =1 y Exchangng fnte sum and ntegrals Observng that ( 1{γΘ φ(x)=y } ) s precsely n(y, γ Θ ) fnshes the proof. 5 New bounds on the partton functon The densty of states n(e, Θ ) can be used to compute the partton functon, snce by defnton Z(Θ ) = n(e, Θ ) exp(e) 1. We can therefore get an upper bound on Z(Θ ) by ntegratng the pont-by-pont upper bound on n(e, Θ ) from Theorem 3. Ths bound can be tghter than the known bound [6] obtaned by applyng Jensen s nequalty to the log-partton functon (whch s convex), gven by log Z(Θ ) γ log Z(Θ ). For nstance, consder a graphcal model wth weghts that are large enough such that the densty of states based sum defnng Z(Θ ) s domnated by the contrbuton of the hghest-energy bucket. As a concrete example, consder the decomposton n Fgure 1. As the edge weght w (w = n the fgure) grows, the convexty-based bound wll approxmately equal the geometrc average of exp(6w) and 8 exp(w), whch s 4 exp(4w). On the other hand, the bound based on Theorem 3 wll approxmately equal mn{, 8} exp(( + 6)w/) = exp(4w). In general, the latter bound wll always be strctly better for large enough w unless the hghest-energy bucket counts are dentcal across all Θ. Whle ths s already promsng, we can, n fact, obtan a much tghter bound by takng nto account the nteractons between dfferent energy levels across any parameter decomposton, e.g., by enforcng the fact that there are a total of X confguratons. For compactness, n the followng let us defne y (x) = exp(θ φ(x)) for any x X and = 1,, n. Then, Z(Θ ) = exp(θ φ(x)) = y (x) γ Theorem 4. Let Π be the (fnte) set of all possble permutatons of X. Gven σ = (σ 1,, σ n ) Π n, let Z(Θ, σ) = y (σ (x)) γ. Then, mn σ Π Z(Θ, σ) Z(Θ ) max n σ Π Z(Θ, σ) (5) n Proof. Let σ I Π n denote a collecton of n dentty permutatons. Then we have Z(Θ ) = Z(Θ, σ I ), whch proves the upper and lower bounds n equaton (5). 5

6 Algorthm 1 Greedy algorthm for the maxmum matchng (upper bound). 1: whle there exsts E such that n(e, Θ ) > 0 do : E max(θ ) max E {E n(e, Θ ) > 0)}, for = 1,, n 3: c mn {n(e max(θ 1), Θ 1),, n(e max(θ n), Θ n)} 4: u b (γ 1E max(θ 1) + + γ ne max(θ n), Θ 1,, Θ n) c 5: n(e max(θ ), Θ ) n(e max(θ ), Θ ) c, for = 1,, n 6: end whle We can thnk of σ Π n as an n-dmensonal matchng. For any, j, σ (x) matches wth σ j (x), and σ(x) gves the correspondng hyper-edge. If we defne the weght of each hyper-edge n the matchng graph as w(σ(x)) = y (σ (x)) γ then Z(Θ, σ) = w(σ(x)) corresponds to the weght of the matchng represented by σ. We can therefore thnk the bounds n Equaton (5) as gven by a maxmum and mnmum matchng respectvely. Intutvely, the maxmum matchng corresponds to the case where the confguratons n the hgh energy buckets of the denstes happen to be the same confguraton (matchng), so that ther energes are summed up. 5.1 Upper bound The maxmum matchng max σ Z(Θ, σ) (.e., the upper bound on the partton functon) can be computed usng Algorthm 1. Algorthm 1 returns a dstrbuton u b such that E u b(e) = X and E u b(e) exp(e) = max σ Z(Θ, σ). Notce however that u b s not a vald pont by pont upper bound on the densty n(e, Θ ) of the orgnal mode. Proposton 1. Algorthm 1 computes the maxmum matchng and ts runtme s bounded by the total number of non-empty buckets {E n(e, Θ ) > 0}. Proof. The correctness of Algorthm 1 follows from observng that exp(e 1 +E )+exp(e 1+E ) exp(e 1 + E ) + exp(e 1 + E ) when E 1 E 1 and E E. Intutvely, ths means that for n = parameters t s always optmal to connect the hghest energy confguratons, therefore the greedy method s optmal. Ths result can be generalzed for n > by nducton. The runtme s proportonal to the total number of buckets because we remove one bucket from at least one densty at every teraton. The key property of Algorthm 1 s that even though t defnes a matchng over an exponental number of confguratons X, ts runtme proportonal to the total number of buckets, because t matches confguratons n groups at the bucket level. We can show that the value of the maxmum matchng s at least as tght as the bound provded by the convexty of the log-partton functon, whch s used for example by Tree eweghted Belef Propagaton (TWBP) [6]. Theorem 5. For any parameter decomposton n =1 γ Θ = Θ, the upper bound gven by the maxmum matchng n (5) and computed usng Algorthm 1 s always at least as tght as the bound obtaned usng the convexty of the log-partton functon. Proof. The bound obtaned by applyng Jensen s nequalty to the log-partton functon (whch s convex), gven by log Z(Θ ) γ log Z(Θ ) [6] leads to the followng geometrc average bound Z(Θ ) ( x y (x)) γ. Gven any n permutatons of the confguratons σ : X X for = 1,, n (n partcular, t holds for the one attanng the maxmum matchng value) we have y (σ (x)) γ = y (σ (x)) γ 1 y (σ (x)) γ 1/γ = ( ) γ y (σ (x)) x x where we used Generalzed Holder nequalty and the norm l ndcates a sum over X. 5. Lower bound We also provde Algorthm to compute the mnmum matchng when there are n = parameters. The proof of correctness s smlar to that for Proposton 1. Proposton. For n =, Algorthm computes the mnmum matchng and ts runtme s bounded by the total number of non-empty buckets {E n(e, Θ ) > 0}. 6

7 Algorthm Greedy algorthm for the mnmum matchng wth n = parameters (lower bound). 1: whle there exsts E such that n(e, Θ ) > 0 do : E max(θ ) max E {E n(e, Θ ) > 0)}; E mn(θ ) mn E {E n(e, Θ ) > 0)} 3: c mn {n(e max(θ 1), Θ 1), n(e mn(θ ), Θ )} 4: l b (γ 1E max(θ 1) + γ E mn(θ ), Θ 1, Θ ) c 5: n(e max(θ 1), Θ 1) n(e max(θ 1), Θ 1) c ; n(e mn(θ ), Θ ) n(e mn(θ ), Θ ) c 6: end whle For the mnmum matchng case, the nducton argument does not apply and the result cannot be extended to the case n >. For that case, we can obtan a weaker lower bound by applyng everse Generalzed Holder nequalty [14]. Specfcally, let s 1,, s n 1 < 0 and s n such that 1 s = 1. We then have mn σ Z(Θ, σ) = y (σ mn, (x)) γ = y (σ mn, (x)) γ 1 (6) x y (σ mn, (x)) γ s = ( ) 1 s y (σ mn, (x)) sγ = ( ) 1 s y (x) sγ x x Notce ths result cannot be appled f y (x) = 0,.e. there are factors assgnng probablty zero (hard constrants) n the probablstc model. 6 Emprcal Evaluaton To evaluate the qualty of the bounds, we consder an Isng model from statstcal physcs, where gven a graph (V, E), sngle node varables x s, s V are Bernoull dstrbuted (x s {0, 1})), and the global random vector s dstrbuted accordng to p(x, Θ) = 1 Z(Θ) exp Θ s x s + Θ j 1 {x=x j} s V (,j) E Fgure 1 shows a smple grd Isng model wth exponental parameter Θ = [0, 0, 0, 0, 1, 1, 1, 1] (Θ s = 0 and Θ j = 1) decomposed as the convex sum of two parameters Θ 1 and Θ correspondng to tractable dstrbutons,.e. Θ = (1/)Θ 1 + (1/)Θ. The correspondng partton functon s Z(Θ ) = + 1 exp() + exp(4) In panels (a) and (b) we report the correspondng densty of states n(e, Θ 1 ) and n(e, Θ ) as hstograms. For nstance, for the model correspondng to Θ there are only two global confguratons (all varables postve and all negatve) that gve an energy of 6. It can be seen from the denstes reported that Z(Θ 1 ) = + 6 exp() + 6 exp(4) + exp(6) , whle Z(Θ ) = exp() The correspondng geometrc average (obtaned from the convexty of the log-partton functon) s (Z(Θ 1 )) (Z(Θ )) In panels (c1) and (c) we show u b and l b computed usng Algorthms 1 and,.e. the solutons to the maxmum and mnmum matchng problems, respectvely. For nstance, for the maxmum matchng case the confguratons wth energy 6 from n(e, Θ 1 ) are matched wth of the 8 wth energy from n(e, Θ ), gvng an energy 6/ + / = 4. Notce that u b and l b are not vald bounds on ndvdual denstes of states themselves, but they nonetheless provde upper and lower bounds on the partton functon as shown n the fgure: and 134.7, resp. The bound (7) gven by nverse Holder nequalty wth s 1 = 1, s = 1/ s 16., whle the mean feld lower bound [4, 7] s In ths case, the addtonal nformaton provded by the densty leads to tghter upper and lower bounds on the partton functon. In Fgure we report the upper bounds obtaned for several types of Isng models (n all cases, Θ s = 0,.e., there s no external feld). In the two left plots, we consder a 5 5 square Isng model, once wth attractve nteractons (Θ j [0.1w, 0.w]) and once wth mxed nteractons (Θ j [ 0.1w, 0.1w]). In the two rght plots, we use a complete graph (a clque) wth N = 9 vertces. For each model, we compute the upper bound gven by TWBP (wth edge appearance probabltes µ e based on a subset of randomly selected spannng trees) and the mean-feld bound usng the mplementatons n lbdai [15]. We then compute the bound based on the maxmum matchng usng the same set of spannng trees. For the grd case, we also use a combnaton of spannng trees and compute the correspondng lower bound based on the mnmum matchng (notce 7

8 (a1) 1 (b1) confguratons energy (a) hstogram confguratons energy (b) hstogram matchng problem soluton confguratons (c1) u b confguratons (c) l b upper bound: Z ub =+6e+6e 3 +e 4 energy lower bound: Z lb =e+1e +e 3 energy Fgure 1: Decomposton of a Isng model, densty of states soluton obtaned wth maxmum and mnmum matchng algorthms, and the correspondng upper and lower bounds on Z(Θ ). elatve upper bound error Convexty MaxMatchng Edge strength (a) 5 5 grd, attractve. elatve upper bound error Convexty MaxMatchng Edge strength (b) 5 5 grd, mxed. elatve upper bound error Convexty MaxMatchng Edge strength (c) 9-Clque, attractve. elatve upper bound error Convexty MaxMatchng Edge strength (d) 9-Clque, mxed. Fgure : elatve error of the upper bounds. Top row: 5 5 ferromagnetc grd. Left: attractve nteractons. ght: mxed nteractons. t s not possble to cover all the edges n a clque wth only spannng tree). For each bound, we report the relatve error, defned as (log(bound) log(z)) / log(z), where Z s the true partton functon, computed usng the juncton tree method. In these experments, both our upper and lower bounds mprove over the ones obtaned wth T- WBP [6] and mean-feld respectvely. The lower bound based on mnmum matchng vsually overlaps wth the mean-feld bound and s thus omtted from Fgure. It s, however, strctly better, even f by a small amount. Notce that we mght be able to get a better bound by choosng a dfferent set of parameters Θ (whch may be suboptmal for TW-BP). We also used numercal optmzaton (BFGS and BOBYQA [16]) to select the values of s n the nverse Holder bound (7) (notce that once we have computed the denstes n(e, Θ ), evaluatng the bound s cheap,.e., t does not requre solvng an nference task). We found that both optmzaton strateges are very senstve to the ntal condton; however, by optmzng the parameters we were always able to obtan a lower bound at least as good as the one gven by mean feld. 7 Conclusons We presented DENSITYPOPAGATION, a novel message passng algorthm to compute the densty of states whle explotng the structure of the underlyng graphcal model. We showed that DENSI- TYPOPAGATION computes the exact densty for tree structured graphcal models, s closely related to Belef Propagaton and Max-Product algorthms, and s n fact a generalzaton of both. We ntroduced a new famly of bounds on the partton functon based on tree decomposton but wthout relyng on convexty. We showed both theoretcally and emprcally that the addtonal nformaton provded by the densty of states leads to better bounds than standard convexty-based ones. Ths work opens up several nterestng drectons. These nclude an exploraton of the convergence propertes of (loopy) DENSITYPOPAGATION specfcally n relaton to BP and MP [17, 18, 19, 0], an nvestgaton of the exstence of a varatonal nterpretaton for the updates, and devsng an effcent strategy to select the parameters Θ of the tree decomposton such that the proposed bound s further optmzed. 8

9 eferences [1] M.J. Wanwrght and M.I. Jordan. Graphcal models, exponental famles, and varatonal nference. Foundatons and Trends n Machne Learnng, 1(1-):1 305, 008. [] S. Ermon, C. Gomes, A. Sabharwal, and B. Selman. Accelerated Adaptve Markov Chan for Partton Functon Computaton. Neural Informaton Processng Systems, 011. [3] F. Wang and DP Landau. Effcent, multple-range random walk algorthm to calculate the densty of states. Physcal evew Letters, 86(10): , 001. [4] M.J. Wanwrght. Stochastc processes on graphs wth cycles: geometrc and Varatonal approaches. PhD thess, Massachusetts Insttute of Technology, 00. [5] M. Wanwrght, T. Jaakkola, and A. Wllsky. Exact map estmates by (hyper) tree agreement. Advances n neural nformaton processng systems, pages , 003. [6] M.J. Wanwrght. Tree-reweghted belef propagaton algorthms and approxmate ML estmaton va pseudo-moment matchng. In AISTATS, 003. [7] G. Pars and. Shankar. Statstcal feld theory. Physcs Today, 41:110, [8] L.D. Brown. Fundamentals of statstcal exponental famles: wth applcatons n statstcal decson theory. Insttute of Mathematcal Statstcs, [9] M. chardson and P. Domngos. Markov logc networks. Machne Learnng, 6(1): , 006. [10] Y. Wess, C. Yanover, and T. Meltzer. MAP estmaton, lnear programmng and belef propagaton wth convex free energes. In Uncertanty n Artfcal Intellgence, 007. [11] T. Hazan and A. Shashua. Norm-product belef propagaton: Prmal-dual message-passng for approxmate nference. Informaton Theory, IEEE Transactons on, 56(1): , 010. [1] K.P. Murphy, Y. Wess, and M.I. Jordan. Loopy belef propagaton for approxmate nference: An emprcal study. In Proceedngs of the Ffteenth conference on Uncertanty n artfcal ntellgence, pages Morgan Kaufmann Publshers Inc., [13] J.S. Yedda, W.T. Freeman, and Y. Wess. Understandng belef propagaton and ts generalzatons. Explorng artfcal ntellgence n the new mllennum, 8:36 39, 003. [14] W.S. Cheung. Generalzatons of hölders nequalty. Internatonal Journal of Mathematcs and Mathematcal Scences, 6:7 10, 001. [15] J.M. Mooj. lbdai: A free and open source c++ lbrary for dscrete approxmate nference n graphcal models. The Journal of Machne Learnng esearch, 11: , 010. [16] M.J.D. Powell. The BOBYQA algorthm for bound constraned optmzaton wthout dervatves. Unversty of Cambrdge Techncal eport, 009. [17] Davd Sontag, Talya Meltzer, Amr Globerson, Tomm Jaakkola, and Yar Wess. Tghtenng LP relaxatons for MAP usng message passng. In Uncertanty n Artfcal Intellgence, pages , 008. [18] T. Meltzer, A. Globerson, and Y. Wess. Convergent message passng algorthms: a unfyng vew. In Proceedngs of the Twenty-Ffth Conference on Uncertanty n Artfcal Intellgence, pages AUAI Press, 009. [19] A.T. Ihler, J.W. Fsher, and A.S. Wllsky. Loopy belef propagaton: Convergence and effects of message errors. Journal of Machne Learnng esearch, 6(1):905, 006. [0] J.M. Mooj and H.J. Kappen. Suffcent condtons for convergence of the sum product algorthm. Informaton Theory, IEEE Transactons on, 53(1): ,

Density Propagation and Improved Bounds on the Partition Function

Density Propagation and Improved Bounds on the Partition Function Densty Propagaton and Improved Bounds on the Partton Functon Stefano Ermon, Carla P. Gomes Dept. of Computer Scence Cornell Unversty Ithaca NY 1853, U.S.A. Ashsh Sabharwal IBM Watson esearch Ctr. Yorktown