Lifted Generalized Dual Decomposition

Size: px

Start display at page:

Download "Lifted Generalized Dual Decomposition"

Elisabeth Hudson
5 years ago
Views:

1 Lifted Generalized Dual Decomposition Nicholas Gallo Uniersity of California Irine Irine, CA Alexander Ihler Uniersity of California Irine Irine, CA Abstract Many real-world problems, such as Maro Logic Networs (MLNs) with eidence, can be represented as a highly symmetric graphical model perturbed by additional potentials. In these models, ariational inference approaches that exploit exact model symmetries are often forced to ground the entire problem, while methods that exploit approximate symmetries (such as by constructing an oer-symmetric approximate model) offer no guarantees on solution quality. In this paper, we present a method based on a lifted ariant of the generalized dual decomposition (GenDD) for marginal MAP inference which proides a principled way to exploit symmetric sub-structures in a graphical model. We deelop a coarse-tofine inference procedure that proides any-time upper bounds on the objectie. The upper bound property of GenDD proides a principled way to guide the refinement process, proiding good any-time performance and eentually arriing at the ground optimal solution. Introduction A central tas in many application domains (Wainwright and Jordan 2008) is computing lielihoods and marginal probabilities oer distributions defined by a graphical model. These tass are, in general, intractable, and this has motiated the deelopment of many approximate inference techniques. Of significant theoretical and practical interest are ariational techniques which formulate the inference problem as a constrained optimization problem. Within this class, methods that proide upper bounds on the partition function and gie rise to conex optimization problems are particularly attractie since conergence can be guaranteed and the quality of two bounds can be easily compared. This class contains the tree-reweighted bounds (Wainwright, Jaaola, and Willsy 2005) and primal ariants such as GenDD (Ping, Liu, and Ihler 2015), which we study here. Recently, there has been a growing interest in the deelopment of lifted inference techniques, both exact (Bui, Huynh, and de Salo Braz 2012; Poole 2003) and approximate (Bui, Huynh, and Riedel 2012; Mladeno, Ahmadi, and Kersting 2012; Mladeno and Kersting 2015; Bui, Huynh, and Sontag 2014; Mladeno, Globerson, and Kersting 2014), which exploit model symmetries. Most of Copyright c 2018, Association for the Adancement of Artificial Intelligence ( All rights resered. these wors tae a well established ariational approximation and identify exact problem symmetries either algorithmically (e.g., redundant message computations of loopy belief propagation (LBP) (Singla and Domingos 2008; Kersting, Ahmadi, and Natarajan 2009)) or ia graph theoretic notions of symmetry (Bui, Huynh, and Riedel 2012; Mladeno, Ahmadi, and Kersting 2012; Mladeno and Kersting 2015; Bui, Huynh, and Sontag 2014; Mladeno, Globerson, and Kersting 2014). These methods, howeer, often hae significant trouble dealing with models possessing highly symmetric substructures, but few exact symmetries. Some wors (Venugopal and Gogate 2014; Van den Broec and Darwiche 2013) address this by generating a symmetric approximation to an MLN (Richardson and Domingos 2006) with eidence, but offer no guarantees on the quality of the approximation. (Broec and Niepert 2014) correct this biasedness problem by using an oer-symmetric approximation to construct a proposal distribution for a Metropolis-Hastings chain, which guarantees conergence to the true marginals, but gies no guarantees on mixing time. Other wors approximate lifted BP by grouping messages with similar alues (Kersting et al. 2010) or satisfying a desired structural property (Singla, Nath, and Domingos 2014). (Singla, Nath, and Domingos 2014) analyzes errors the approximation induces on the true BP messages, but lie BP, offers no guarantees on solution quality or conergence. Similar in spirit to our wor is (Habeeb et al. 2017) which sequentially soles a set of relaxed MAP inference problems for computer ision tass. Their method operates in a coarse to fine manner soling, at each leel, a MAP problem where groups of pixels are restricted to hae the same alue. This pixel grouping is relaxed and the finer MAP problem is initialized with the solution to the coarser problem, guaranteeing monotonic improement of the solution. Our paper proides a framewor for exploiting approximate model symmetries, based on a lifted ariant of GenDD (Ping, Liu, and Ihler 2015). In contrast to wors that create an oer-symmetric model approximation, our method imposes a set of oer-symmetric restrictions on the ariational cost-shifting updates (messages) associated with the GenDD problem. Guided by a measure of solution quality, these constraints are gradually relaxed, producing a sequence of problems of increasing accuracy, but increasing cost. Our experi-

2 mental results show the superiority of the objectie based refinement criteria, and good anytime performance compared to methods that exploit exact symmetries. Bacground A Maro random field (MRF) oer n discrete random ariables X = [X 1... X n ] taing alues x = [x 1... x n ] (X 1... X n ) has probability density function p(x = x; θ) = exp [ θ α (x α ) Φ(θ) ], α F Φ(θ) = log x X n exp [ θ α (x α ) ] α F where F represents the set of cliques of the distribution, with α F being a subset of the ariables, associated with a potential table θ α. Φ(θ) is the log partition function which normalizes the distribution. Furthermore, denote the set of ariable indices as V = {1... n}. Generalized dual decomposition Motiated by the intractability of computing Φ(θ), (Ping, Liu, and Ihler 2015) deelops efficient upper bounds on Φ(θ) ia a decomposition based on Hölder s inequality. Their bound generalizes the dual decomposition bound (Globerson and Jaaola 2008) for the MAP problem, maintaining two ey properties: (1) it decomposes into a sum of terms defined oer cliques of the graphical model and (2) a family of bounds can be indexed by a set of ariational parameters where finding the parameter setting yielding the tightest bound can be formulated as a conex optimization problem. We reiew their main results. For any non-negatie function f(x) and w R >0, the power sum operator is defined as w f(x) = [ f(x) 1/w] w. x The power sum reduces to standard sum when w = 1 and approaches max x f(x) as w 0 +. We also define the ector power sum w x F (x) = w x w 1 x 1 F (x), where w is a ector of weights, x is a ector of random ariables, and F is a non-negatie function with inputs. The set of ariational parameters δ = {δα r α F, r N + α } and w = {w r α α F, r N + α }, where N+ i = {1... i} for any positie integer i, reparameterize the distribution to proide a tighter bound. The bound of (Ping, Liu, and Ihler 2015) can be written as Φ(θ) V x u (δ, w) + α F c α (δ, w) def = L(δ, w). (1) where, letting w α = [wα 1... w α α ], the clique terms are w α [ c α (δ, w) = log exp θ α (x α ) x α α r=1 ] δα(x r αr ) (2) Figure 1: Position explicit factor graph associated with α 1 = (x 1, x 2 ), α 2 = (x 2, x 3 ). Factor cells (α i, r) are associated with parameters (δ r α i, w r α i ) and unary terms are sums of neighboring cost shifting ariables: δ 0 1 = δ 1 α 1, δ 0 2 = δ 2 α 1 + δ 1 α 2, δ 0 3 = δ 2 α 2 (similarly for the weights). and the unary terms are w 0 u (δ, w) = log exp(δ(x 0 )), (3) x with δ(x 0 ) = (α,r) N δα(x r ) and w = 1 (α,r) N wα r where N = {(α, r) α r = } are the neighbors of. Furthermore, we define w 0 = max(0, w ) and require wα r 0 for all (α, r). 1 We also require that the local elimination order of each clique power sum term (2) be consistent with a global elimination order o, meaning that for all i < j, o(α i ) < o(α j ). If this is iolated for an α in the input, that α and its associated potential are permuted to obey this order. Ground factor graphs In order to facilitate deelopment of our lifted inference procedure, we modify the standard factor graph depiction of graphical models to mae each ariable s position in its participating factors explicit. To this end, we draw a circular node associated with each ariable and a rectangle partitioned into α cells associated with each factor α. Position cell (α, r) is associated with terms (δα, r wα) r (labeled with just δα r for compactness) and has an edge with its ariable neighbor = α r, which is associated with terms δ,w 0,u (labeled with just δ). 0 A simple example is illustrated in figure 1. 2 Exploiting symmetries Lifted inference techniques identify a set of symmetries in the ground ariational problem that can be exploited computationally. Exact lifted ariational inference (Singla and Domingos 2008; Mladeno, Globerson, and Kersting 2014; Mladeno and Kersting 2015; Mladeno, Ahmadi, and Kersting 2012) identifies a stable partition (Berholz, Bonsma, and Grohe 2013) of the graph as sufficient to characterize symmetries in the solution to the ground ariational problem. In this paper, we impose a set of oer-symmetric constraints on the ground ariational parameters (δ, w), inducing symmetries in the objectie (1) that will, in general, be 1 In (Ping, Liu, and Ihler 2015), w 0 is a free ariable with restrictions w 0 0 and w 0 + (α,r) N w r α = 1. In our formulation, this sum is always larger than or equal to 1 (proiding a alid bound), with equality holding at the optimum. 2 (Mladeno and Kersting 2015) used a similar representation where each position cell is its own node, connected to its factor node. For compactness, we prefer the notation presented here.

3 coarser than those implied by a stable partition, yielding looser but computationally cheaper bounds. Parameter and objectie symmetries Consider a disjoint partition P F = {F 1... F F } of the ground factors F (F i F j = for i j, f F f = F) where all factors in the same partition are required to hae the same potential function. That is, for each f there exists a representatie factor θ f (with associated scope size r f and domains X f r for r N+ r f ) such that for all α F f, θ α = θ f (and α = r f and X αr = X f r for r N+ r r ). In this paper, we consider oer-symmetric constraints on the ariational parameters which forces terms associated with factors in the same partition to be identical. That is, we consider parameters in the set S(P F ) = { } (δ, w) ( δ, w) F f P F, α F f, r N + α δ r α ( xr f )= δ r f ( xr f ), wr α = wr f where δ = { δ f r F f P F, r N + r f } and w = { w f r F f P F, r N + r f } are restricted sets of parameters. For any (δ, w) S(P F ), we identify symmetries in the clique terms (2) as c α (δ, w) = c f ( δ, w) for all α F f where w f [ c f ( δ, w) = def log exp θ f ( x) x r f r=1 ] δ f r ( x r f ) and w f = [ w f 1... w r f f ] and x = [ x1 f... x r f f ]. To recognize the symmetries induced in the aggregated cost shifting δ, 0 w 0 (and hence u (3)) terms, first let N = {α α F f, (α, r) N } represent the neighbors of which contribute δα( x r r f ) = δ f r( xr f ) to the sum. There are M = N such neighbors. Furthermore, define a disjoint partition P V = {V 1... V K } of V where all ariables in the same partition hae the same domain (X = X for all V ) and the same neighbor counts gien P F. That is, there exist representatie counts M such that for M = M for all V and all (f, r). Now, for all V we hae δ 0 ( x ) = δ 0 ( x ) def = N (4) M δ r f ( x ) (5) for all alues x and N = {(f, r) M > 0} (and similarly, w = w def = 1 N M w f r ). These symmetries imply that u (δ, w) = ū ( δ, w) for all V. Our objectie can now be ealuated oer the smaller set of representatie parameters ( δ, w), with symmetries fully specified by P = (P F, P V ). We say that such P is alid and we hae L(δ, w) = L( δ, w) where L( δ, w) def = F f P F F f c f ( δ, w) + V P V V ū ( δ, w). (6) Lifted factor graphs Associated with any alid partition P is a lifted graph which compactly illustrates the symmetries in the ariational parameters and problem terms (1). A lifted factor graph has a super-node associated with each V and a super-factor with r f position cells associated with each F f. Position cell (f, r) is associated with δ f r, wr f (labeled with just δ f r ) and has a super-edge labeled M with super-node if M > 0. The term N defined after (5) can be seen as the neighborhood of in the lifted graph. We similarly define the neighborhood of (f, r) as N = { M > 0}. Example 1. Consider a model with factor graph depicted in figure 2a (with elimination order o = (R 1, R 2, T 1, T 2 )) and clique symmetries θ α 2 = θ α 3 = θ α 4 = θ α 5 = θ. Associated with the graph is the partition P F = {F g, F y } where F g = {α 1 } (green), F y = {α 2, α 3, α 4, α 5 } (yellow) and P V = {V p, V r, V b } where V p = {R 1 } (purple), V r = {R 2 } (red), and V b = {T 1, T 2 } (blue). 3 The top half of the middle panel indicates the oersymmetric parameter constrains specified by P F. The bottom half of that panel indicates the induced symmetries in the unary terms. For example, by examining the neighborhood of R 1 in the factor graph, we see that δr 0 1 = δα 1 +δ 1 2 α + 3 δα 1. Via the parameter symmetries δ 1 1 α = δ 1 2 α = δ 1 3 y and δα 1 = δ g, 1 we see that this is equialent to δ 0 1 R 1 = 2 δ y 1 + δ g. 1 Gradient symmetries Just as the oer-symmetric parameter constraints induce symmetries in the ground objectie function, they also induce (a different set of) symmetries in the gradient of the ground objectie. These gradient symmetries will be used to derie the gradient of the lifted problem, to identify exact symmetries in the ground problem, and to guide our coarseto-fine inference procedure. First, notice that the parameter symmetries imply not only symmetries in the alues of the clique (4) and unary (5) terms, but also in their gradients. That is, for any (δ, w) S(P F ), we hae c α δα( x r r f ) = c f f r( xr f ) and u δ( x 0 ) = ū 0( x ) for any α F f, V, and for all alues of x r f and x. Now, consider the set E = V N of ground factors α F f whose r th position cell (α, r) has neighbor = α r V. For any α E, the parameter δα r appears only in clique term c α in position r (multiplied by -1) and unary term u (multiplied by +1). We can therefore identify symmetries in the objectie gradient as L δα( x r ) = ū ( x 0 ) c f α( x r ) = ū 0( x ) c f def f r( x = ḡ ;δ ( x ) (7) ) 3 The main text used integers to index factor and ariable partitions; here we use letters, representing figure colors, for clarity.

(a) Valid, non-stable P, specifies oer-symmetric constraints on the ariational parameters of the ground problem.

Figure 2: (a) alid, non-stable partition imposes oer-symmetric constraints, yielding a looser bound. (b) Stable partition which characterizes exact symmetries of ground problem.

Lifted gradients and ground optimality The gradients of the lifted objectie (6) can be written using the gradient symmetries of the ground problem.

Since, ia (7), δα( x r r f ) = δ f r( xr f ) for all α F f (and all alues of x r f ), we can write L f r( xr f ) = α F f L δ r α( x r f ) = E N ḡ ;δ ( x r f ) (8) where E = M V (and similarly for the

In the special case where (f, r) has one lifted neighbor N = {}, F f = E and (8) simplifies to L f r( xr f ) = F L f δα( x r r f ), meaning that L/ δ r α( x r f ) = 0 when L/ r f ( xr f ) = 0.

4 (a) Valid, non-stable P, specifies oer-symmetric constraints on the ariational parameters of the ground problem. (b) Stable P, specifies symmetries in ariational parameters that yield an exact solution to the ground problem. Figure 2: (a) alid, non-stable partition imposes oer-symmetric constraints, yielding a looser bound. (b) Stable partition which characterizes exact symmetries of ground problem. More details in examples 1 and 2 of the text. for all alues of x (and similarly for the w gradients). Lifted gradients and ground optimality The gradients of the lifted objectie (6) can be written using the gradient symmetries of the ground problem. Specifying the lifted gradients in this way allows us to identify partitions whose symmetric parameter constraints specify an exact solution to the ground problem. Since, ia (7), δα( x r r f ) = δ f r( xr f ) for all α F f (and all alues of x r f ), we can write L f r( xr f ) = α F f L δ r α( x r f ) = E N ḡ ;δ ( x r f ) (8) where E = M V (and similarly for the w gradients). In the special case where (f, r) has one lifted neighbor N = {}, F f = E and (8) simplifies to L f r( xr f ) = F L f δα( x r r f ), meaning that L/ δ r α( x r f ) = 0 when L/ r f ( xr f ) = 0. Now, if all of the gradients of the lifted problem are zero, and its lifted graph has N = 1 for all (f, r), then all of the ground gradients are also zero. This means that a solution of the lifted problem proides a solution to the ground problem. A partition P whose lifted graph satisfies the property that N = 1 for all (f, r) is called stable. Example 2. If we replace P F from example 1 with F = {F g, F y, F o } where F g = {α 1 } (green), F y = {α 2, α 3 } (yellow), F o = {α 4, α 5 } (orange), then the partition is stable and S(P F ) contains a solution to the ground ariational problem. This is illustrated in figure 2b. Optimizing the lifted objectie In this paper, we use a blac-box procedure to perform global optimization utilizing the gradient of L. To ensure positiity of the weights, we define the parameters η = { η f r F f P F, r N + r f }, substituting w f r with ɛ+exp( ηr f ) (and L/ η f r = ( L/ w f r) exp( ηr f )). To ensure that the function is differentiable, we set w 0 = a( w ) where a(x) = ɛ+t log(1+exp(x/t)). 4 This enables unconstrained optimization oer the η and δ parameters. The gradients are straightforward to derie from the gradients in (Ping, Liu, and Ihler 2015) and are omitted for space. Coarse to fine approximation We now deelop a coarse to fine procedure that interleaes relaxations of the oer-symmetric parameter constraints S(P F ) with optimization, eentually arriing at the coarsest stable partition P and a solution to the ground problem. While optimization touches only the lifted graph, relaxation steps must touch a subset of the ground factors and ariables. To control this cost, we adapt the ey ideas from (Berholz, Bonsma, and Grohe 2013), which presented an asymptotically optimal algorithm for generating the coarsest stable partition of a graph, to our wor. This procedure is 4 Any t > 0, ɛ > 0 bound the objectie since it generates larger unary weights: a t(w ) > ɛ + max(0, w ) > max(0, w ).

5 Algorithm 1 Syntactic coarsest stable partition 1: Set P V to group ariables with identical X 2: Set F 0 f for f N+ F 0 to group factors with identical θ 3: Q ; 4: for F = 1 to F 0 do 5: F F F 0 f ; refinepv(0); 6: end for 7: while Q do 8: Choose any (f, r) Q, set Q Q \ (f, r) 9: Choose a N where E 1 2 F f 10: F F + 1; 11: F F E ; F f F f \ F F ; 12: refinepv(f); 13: end while used as a pre-processing step in wors on exact lifted ariational inference such as (Mladeno, Globerson, and Kersting 2014). Syntactic coarsest stable partition Throughout our description, we maintain a alid partition and its associated lifted graph. Recall from the subsection on lifted gradients (and Figure 2b) that a alid partition is stable if and only if each position cell in its lifted graph has exactly one neighbor. Algorithm 1 maintains the set Q = {(f, r) N 2} of position cells that iolate this property. The algorithm iteratiely selects an (f, r) Q and one of its neighbors N. Ground factors in F f are then split into two groups (line 11), based on whether their r th position neighbor is a ariable in V (recall from the section on gradient symmetries that this is E ), or not. In the appendix, we define the function refinepv that updates the partition P V to be consistent with the new neighbor counts M f and M F for all, updating the lifted graph and Q as needed. The restriction of choosing such that 1 2 F f (line 9) is necessary to maintain the complexity results of (Berholz, Bonsma, and Grohe 2013), as discussed in the appendix. E Coarse-to-fine algorithm One simple way to construct a coarse to fine inference procedure is to interrupt the main loop of Algorithm 1 to perform lifted inference with relaxations and symmetries specified by the current partition P. This would yield any-time bounds and ensure that we neer produce a model finer than the coarsest stable partition. In practice, howeer, superior results are obtained by choosing refinement operations based on their predicted decrease of the objectie alue. Algorithm 2 illustrates this idea, where Sf r (detailed in next section) judges the quality of a split of (f, r), cost(p) is the estimated cost of one inference iteration (we count the number of operations used to compute the lifted score (6)), and β controls how much refinement we allow between inference calls. Algorithm 2 Coarse to fine inference 1: Execute lines 1 to 6 of Algorithm 1 Initialize P 2: Ψ 0; 3: repeat 4: if cost(p) > Ψ then 5: [ δ, w] = inference(p, δ, w) 6: Set Sf r ia (9) (f, r) Q 7: Ψ cost(p) β; 8: end if 9: (f, r ) arg min Q Sf r 10: Set N 1 N (f,r ) as arg min of Sf r ia (9) 11: F F + 1; 12: F F N 1 E (f,r ) ; F f F f \ F F ; 13: refinepv(f ) 14: Set Sf r ia (9) for f {f, F }, r N + r f 15: until P is stable 16: [ δ, w] = inference(p, δ, w) Stable P inference Objectie based scoring function We first define a metric to measure the quality of a proposed parameter partition, then we will show how to optimize this metric efficiently, using only information contained in the lifted graph. For the set of ground parameters associated with an (f, r), we measure the error induced by the current parameter tying restrictions as the distance of the ground gradients to their projected gradient. Letting gα r be the set of ground gradients associated with an (α, r), this metric is d(f f, r) = α F f gα r γ 2 2, with γ = ( α F f gα)/ F r f. We would lie to find a split of F f into F f 1 and F f 2 that minimizes d(f f 1, r) + d(f f 2, r) d(f f, r), which is the change in the projected gradient distance. By symmetry, this split groups together factors with the same gα. r Recalling from (7) that gα r = ḡ for all α E, we write this grouping as F f i = N i E where N 1 N and N 2 = N \ N 1 (adopting the conention F f 1 F f 2 ). We can now form a compact optimization problem to find the best split. For any N N we hae d( N E, r) = d r f ( N ) where d r f ( N ) = ḡ γ 2 2 N E and γ = ( N E ḡ )/( N E ). Letting P( N ) be the power set (all subsets) of N, we sole S r f = min N 1 P( N ) i=1 2 d r f ( N i) d r f ( N ). (9) This is a weighted K-Means (K=2) problem with data ḡ and weights E. If we approximate this by using only one of the elements of ḡ (for example only the gradient wrt δ (1), as we use on binary problems in this paper s experiments), the optimum can be found exactly by sorting ḡ and looping oer all splits.

6 Weight Formula b ( x y) V (x) V (y) u x ( x) V (x) (a) Complete Graph Weight Formula b ( x y) L(x, y) (C(y) C(x)) u x ( x) C(x) l xy ( x y) L(x, y) (b) Binary Collectie Classification Weight Formula b ( x y) Q 1(x) Q 2(y) b ( x y) Q 2(x) Q 3(y) b ( x y) Q 3(x) Q 1(y) u i x ( x) Q i(x), i {1, 2, 3} (c) Clique-Cycle Figure 3: Test models with distinct soft eidence (u and l terms) C2F - Objectie C2F - Size C2F - Syntactic Stable C2F - Objectie C2F - Size C2F - Syntactic Stable (a) Complete (b) Collectie Classification C2F - Objectie C2F - Size C2F - Syntactic Stable (c) Clique Cycle Figure 4: Comparison of optimization at stable coloring (exact model symmetries) with our coarse to fine inference framewor for three different splitting methods, on each of the test models with d = 160 domain size for the logical ariables. Vertical blac dashed lines indicate coarse-to-fine transitions for objectie based splitting ( C2F - Objectie cure). Transitions for other C2F cures occur at approximately the same positions. Experiments This section proides an empirical illustration of our lifted GenDD algorithm. We demonstrate the superiority of our objectie based approach oer purely syntactic based approaches for generating a coarse to fine approximation sequence. We also demonstrate excellent any-time performance on MLNs with distinct soft eidence on eery node compared to exact lifted ariational approaches which are forced to ground the entire problem. Datasets and methodology Datasets. Figure 3 shows the models we use (similar models without eidence were used in (Mladeno and Kersting 2015; Bui, Huynh, and Sontag 2014) to ealuate performance on marginalization tass). In all models b was set to 2.5 and the u terms specify random eidence uniformly distributed on [ ζ, ζ] where ζ = log(( )/10 2 ), so that the u terms act lie probabilities between 0.01 and For the collectie classification problem, we also choose a random 10% of the x, y alues associated with lins L(x, y) and set l xy to a random alue on [0, 5.0]. This can be interpreted as the strength of a lin between two web pages. Parameter settings. In our experiments, we set ɛ = 10 3, t = 10 2 (in the definition of a(x)) and perform an LBFGS blac box optimization with ran 20 Hessian correction. For the coarse to fine procedure, it is important to balance performing inference with model refinement. We found it wored well to perform a small number of inference iterations (30), followed by a small amount of model refinement (setting β = 1.25 in Algorithm 2). This wored better than trying to diagnose inference conergence to determine the transition point. Future wor will loo at more principled methods of balancing inference wor with refinement. Timing methodology. In our experiments, we measure only the cost of performing inference, ignoring the cost of identifying symmetries. This is common in experiments on approximate lifted inference in other wors (Mladeno, Globerson, and Kersting 2014; Broec and Niepert 2014; Venugopal and Gogate 2014) as well as exact lifted ariational inference approaches (Bui, Huynh, and Riedel 2012; Bui, Huynh, and Sontag 2014; Mladeno and Kersting 2015) which assume problem symmetries are aailable before inference. Wors such as (Singla, Nath, and Domingos 2014; Kersting et al. 2010) note that identifying (approximate) symmetries can be a bottlenec. (Singla, Nath, and Domingos 2014) used a sparse hyper-cube based representation to specify approximate symmetries in MLNs; adapting lifted GenDD to leerage a similar representation is an interesting direction for future wor. Results Figure 4 compares our coarse to fine procedure s. performing inference on the stable coloring (ground model in our setup) on each of our three test models. The blue cure ( C2F - objectie ) performs objectie-based splitting as per Algorithm 2 using the metric (9) to guide splitting choice. We also show syntactic splitting ( C2F - syntactic, orange) as per Algorithm 1, interleaing inference when the predicted inference cost exceeds a cost bound (then multiplying that bound by β as in Algorithm 2). C2F - Size (red) splits the largest super-factor in Q and chooses a random split that is as een as possible (adding random edges from N to

7 (a) d = 20 C2F - Objectie Stable (b) d = (c) d = (d) d = (e) d = 320 Figure 5: Scalability comparison. Panels (a)-(e) show bound s time on instances of complete graph with eidence, arying logical ariable domain size d (in MLN specification). C2F exhibits increasing adantage as model grows larger. N 2, stopping when N E 1 > 1 2 F f and setting N 1 = N \ N 2 in line 10 of Algorithm 2), interleaing inference with refinement as in the other C2F methods. The stable (here, ground) inference process is shown in green. We see that objectie-based refinement significantly outperforms the others, with size-based refinement worse, but still better than syntactic splitting. Objectie-based splitting proides orders of magnitude speed-up oer the stable coloring for similar quality results, and all methods proide better early performance than the stable coloring, which must touch the entire model to een proide an initial bound. Figure 5 demonstrates the scalability of our approach. Plots (a)-(e) show the running time for our coarse-to-fine ( C2F - Objectie (blue)) method and the stable coloring method on instances of the complete graph, arying its size. Conclusions We presented a framewor for lifting GenDD for approximate inference by imposing oer-symmetric constraints on the optimization parameters to induce symmetry in the ariational bound. We deeloped a coarse-to-fine procedure that, guided by the objectie, proides high-quality any-time approximations een in models with no exact symmetry. Our method proides orders of magnitude speed-up on our benchmars, with increasing adantage for larger models. Acnowledgements This wor is sponsored in part by NSF grants IIS , IIS , and by the United States Air Force under Contract No. FA C-0011 and FA C Appendix Description of refinepv. After creating a new partition F F (possibly ia splitting it from F f ), algorithms 1 and 2 call refinepv to update P V such that ariables haing identical neighbor counts M = [M 1,..., M F ] are grouped together (where M f def = [M (f,1)... M (f, r f ) ] for all f ). We assume that at entry, the neighbor counts M f are correct for all and f = 1... F 1, and that P V correctly groups ariables by these counts, i.e., M f = M f for all V. The first loop updates N and N (F,r) and their counts. The second loop groups ariables by their signa- Algorithm 3 refinepv Input: f: if f 0, F F was deleted from F f before call. 1: for r = 1 to r F do 2: for α F F do Update ground neighbor lists 3: = α r ; 4: N 5: M 6: end for N M \ α; N (F,r) + 1; M (F,r) N (F,r) α; M (F,r) 1; 7: V m (, m) New ariable groupings 8: for α F F do 9: = α r ; K ; m = M (F,r) ; 10: V V \ ; V m Vm ; 11: end for 12: for (, m) : V m do Update lifted graph 13: K ; 14: if m last m() then New super-node 15: K K + 1; K K (f 16: M,r ) K M (f,r ) (f, r ) N 17: end if 18: V K V m; K K ( V K ) (F,r) 19: M K m; M K 20: end for 21: end for M K m; Note: Updating M (f,r ) (for any, f, r ) changes N, N (f,r ), Q. For compactness, these changes are not shown. ture [K, M (F,r) ], creating the sets V m = { K =, m = M (F,r) }, where K is the partition holding ariable ( V with = K ). This produces the same grouping as would be obtained by grouping the full signature [M 1 (F 1)... M, M F ] because only M and M (F,r) counts change during the call. Note that the change to M does not affect our grouping since for all V m, M is equal to its alue before the call minus M (F,r) (the updates to M K on line 19 reflect this fact). Lastly, the third loop adds V m sets to P V and updates quantities associated with the lifted graph ( M, N terms). V will be oer-written with V m where m is the last (, m) encountered in the loop. In all other cases, splits into a new super-node and its neighbors are copied (line 16).

8 Complexity analysis Each set is implemented as a hash table supporting O(1) addition and deletion (N (f,r ) stores pointers to factor scope ectors). Hence, each α F F contributes O( α ) time to the first two loops of refinepv. Throughout algorithms 1 and 2, an α is moed to an F F that is at most half the size of its current set. Thus, each α participates in at most log 2 F sets throughout either algorithm. Equialent to the result of (Berholz, Bonsma, and Grohe 2013), the total time of the these loops is O(R F log F ) where R = max α F α. Throughout algorithms 1 and 2, the total time spent in the third loop of refinepv is proportional to the size of the lifted graph. This is because the loop adds nodes and edges to the lifted graph in O(1) and each addition is accompanied by at most one deletion (if M K = 0 in line 19). Since the size of the lifted graph is at most the size of the ground graph (R F ), the total time is dominated by the first two loops. References Berholz, C.; Bonsma, P.; and Grohe, M Tight lower and upper bounds for the complexity of canonical colour refinement. In European Symposium on Algorithms, Springer. Broec, G. V. d., and Niepert, M Lifted probabilistic inference for asymmetric graphical models. arxi preprint arxi: Bui, H. B.; Huynh, T. N.; and de Salo Braz, R Exact lifted inference with distinct soft eidence on eery object. In AAAI. Bui, H. H.; Huynh, T. N.; and Riedel, S Automorphism groups of graphical models and lifted ariational inference. arxi preprint arxi: Bui, H. H.; Huynh, T. N.; and Sontag, D Lifted tree-reweighted ariational inference. arxi preprint arxi: Globerson, A., and Jaaola, T. S Fixing maxproduct: Conergent message passing algorithms for map lp-relaxations. In Adances in neural information processing systems, Habeeb, H.; Anand, A.; Singla, P.; et al Coarse-tofine lifted map inference in computer ision. arxi preprint arxi: Kersting, K.; Ahmadi, B.; and Natarajan, S Counting belief propagation. In Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence, AUAI Press. Kersting, K.; El Massaoudi, Y.; Hadiji, F.; and Ahmadi, B Informed lifting for message-passing. In AAAI. Mladeno, M.; Ahmadi, B.; and Kersting, K Lifted linear programming. In AISTATS, Mladeno, M., and Kersting, K Equitable partitions of concae free energies. In UAI, Mladeno, M.; Globerson, A.; and Kersting, K Lifted message passing as reparametrization of graphical models. In UAI, Ping, W.; Liu, Q.; and Ihler, A. T Decomposition bounds for marginal map. In Adances in Neural Information Processing Systems, Poole, D First-order probabilistic inference. In IJCAI, olume 3, Richardson, M., and Domingos, P Maro logic networs. Machine learning 62(1): Singla, P., and Domingos, P. M Lifted first-order belief propagation. In AAAI, olume 8, Singla, P.; Nath, A.; and Domingos, P. M Approximate lifting techniques for belief propagation. In AAAI, Van den Broec, G., and Darwiche, A On the complexity and approximation of binary eidence in lifted inference. In Adances in Neural Information Processing Systems, Venugopal, D., and Gogate, V Eidence-based clustering for scalable inference in maro logic. In Joint European Conference on Machine Learning and Knowledge Discoery in Databases, Springer. Wainwright, M. J., and Jordan, M. I Graphical models, exponential families, and ariational inference. Foundations and Trends R in Machine Learning 1(1 2): Wainwright, M. J.; Jaaola, T. S.; and Willsy, A. S A new class of upper bounds on the log partition function. IEEE Transactions on Information Theory 51(7):

Lifted Probabilistic Inference for Asymmetric Graphical Models

Lifted Probabilistic Inference for Asymmetric Graphical Models Guy Van den Broeck Department of Computer Science KU Leuven, Belgium guy.vandenbroeck@cs.kuleuven.be Mathias Niepert Computer Science and