Lifted Generalized Dual Decomposition

Size: px
Start display at page:

Download "Lifted Generalized Dual Decomposition"

Transcription

1 Lifted Generalized Dual Decomposition Nicholas Gallo Uniersity of California Irine Irine, CA Alexander Ihler Uniersity of California Irine Irine, CA Abstract Many real-world problems, such as Maro Logic Networs (MLNs) with eidence, can be represented as a highly symmetric graphical model perturbed by additional potentials. In these models, ariational inference approaches that exploit exact model symmetries are often forced to ground the entire problem, while methods that exploit approximate symmetries (such as by constructing an oer-symmetric approximate model) offer no guarantees on solution quality. In this paper, we present a method based on a lifted ariant of the generalized dual decomposition (GenDD) for marginal MAP inference which proides a principled way to exploit symmetric sub-structures in a graphical model. We deelop a coarse-tofine inference procedure that proides any-time upper bounds on the objectie. The upper bound property of GenDD proides a principled way to guide the refinement process, proiding good any-time performance and eentually arriing at the ground optimal solution. Introduction A central tas in many application domains (Wainwright and Jordan 2008) is computing lielihoods and marginal probabilities oer distributions defined by a graphical model. These tass are, in general, intractable, and this has motiated the deelopment of many approximate inference techniques. Of significant theoretical and practical interest are ariational techniques which formulate the inference problem as a constrained optimization problem. Within this class, methods that proide upper bounds on the partition function and gie rise to conex optimization problems are particularly attractie since conergence can be guaranteed and the quality of two bounds can be easily compared. This class contains the tree-reweighted bounds (Wainwright, Jaaola, and Willsy 2005) and primal ariants such as GenDD (Ping, Liu, and Ihler 2015), which we study here. Recently, there has been a growing interest in the deelopment of lifted inference techniques, both exact (Bui, Huynh, and de Salo Braz 2012; Poole 2003) and approximate (Bui, Huynh, and Riedel 2012; Mladeno, Ahmadi, and Kersting 2012; Mladeno and Kersting 2015; Bui, Huynh, and Sontag 2014; Mladeno, Globerson, and Kersting 2014), which exploit model symmetries. Most of Copyright c 2018, Association for the Adancement of Artificial Intelligence ( All rights resered. these wors tae a well established ariational approximation and identify exact problem symmetries either algorithmically (e.g., redundant message computations of loopy belief propagation (LBP) (Singla and Domingos 2008; Kersting, Ahmadi, and Natarajan 2009)) or ia graph theoretic notions of symmetry (Bui, Huynh, and Riedel 2012; Mladeno, Ahmadi, and Kersting 2012; Mladeno and Kersting 2015; Bui, Huynh, and Sontag 2014; Mladeno, Globerson, and Kersting 2014). These methods, howeer, often hae significant trouble dealing with models possessing highly symmetric substructures, but few exact symmetries. Some wors (Venugopal and Gogate 2014; Van den Broec and Darwiche 2013) address this by generating a symmetric approximation to an MLN (Richardson and Domingos 2006) with eidence, but offer no guarantees on the quality of the approximation. (Broec and Niepert 2014) correct this biasedness problem by using an oer-symmetric approximation to construct a proposal distribution for a Metropolis-Hastings chain, which guarantees conergence to the true marginals, but gies no guarantees on mixing time. Other wors approximate lifted BP by grouping messages with similar alues (Kersting et al. 2010) or satisfying a desired structural property (Singla, Nath, and Domingos 2014). (Singla, Nath, and Domingos 2014) analyzes errors the approximation induces on the true BP messages, but lie BP, offers no guarantees on solution quality or conergence. Similar in spirit to our wor is (Habeeb et al. 2017) which sequentially soles a set of relaxed MAP inference problems for computer ision tass. Their method operates in a coarse to fine manner soling, at each leel, a MAP problem where groups of pixels are restricted to hae the same alue. This pixel grouping is relaxed and the finer MAP problem is initialized with the solution to the coarser problem, guaranteeing monotonic improement of the solution. Our paper proides a framewor for exploiting approximate model symmetries, based on a lifted ariant of GenDD (Ping, Liu, and Ihler 2015). In contrast to wors that create an oer-symmetric model approximation, our method imposes a set of oer-symmetric restrictions on the ariational cost-shifting updates (messages) associated with the GenDD problem. Guided by a measure of solution quality, these constraints are gradually relaxed, producing a sequence of problems of increasing accuracy, but increasing cost. Our experi-

2 mental results show the superiority of the objectie based refinement criteria, and good anytime performance compared to methods that exploit exact symmetries. Bacground A Maro random field (MRF) oer n discrete random ariables X = [X 1... X n ] taing alues x = [x 1... x n ] (X 1... X n ) has probability density function p(x = x; θ) = exp [ θ α (x α ) Φ(θ) ], α F Φ(θ) = log x X n exp [ θ α (x α ) ] α F where F represents the set of cliques of the distribution, with α F being a subset of the ariables, associated with a potential table θ α. Φ(θ) is the log partition function which normalizes the distribution. Furthermore, denote the set of ariable indices as V = {1... n}. Generalized dual decomposition Motiated by the intractability of computing Φ(θ), (Ping, Liu, and Ihler 2015) deelops efficient upper bounds on Φ(θ) ia a decomposition based on Hölder s inequality. Their bound generalizes the dual decomposition bound (Globerson and Jaaola 2008) for the MAP problem, maintaining two ey properties: (1) it decomposes into a sum of terms defined oer cliques of the graphical model and (2) a family of bounds can be indexed by a set of ariational parameters where finding the parameter setting yielding the tightest bound can be formulated as a conex optimization problem. We reiew their main results. For any non-negatie function f(x) and w R >0, the power sum operator is defined as w f(x) = [ f(x) 1/w] w. x The power sum reduces to standard sum when w = 1 and approaches max x f(x) as w 0 +. We also define the ector power sum w x F (x) = w x w 1 x 1 F (x), where w is a ector of weights, x is a ector of random ariables, and F is a non-negatie function with inputs. The set of ariational parameters δ = {δα r α F, r N + α } and w = {w r α α F, r N + α }, where N+ i = {1... i} for any positie integer i, reparameterize the distribution to proide a tighter bound. The bound of (Ping, Liu, and Ihler 2015) can be written as Φ(θ) V x u (δ, w) + α F c α (δ, w) def = L(δ, w). (1) where, letting w α = [wα 1... w α α ], the clique terms are w α [ c α (δ, w) = log exp θ α (x α ) x α α r=1 ] δα(x r αr ) (2) Figure 1: Position explicit factor graph associated with α 1 = (x 1, x 2 ), α 2 = (x 2, x 3 ). Factor cells (α i, r) are associated with parameters (δ r α i, w r α i ) and unary terms are sums of neighboring cost shifting ariables: δ 0 1 = δ 1 α 1, δ 0 2 = δ 2 α 1 + δ 1 α 2, δ 0 3 = δ 2 α 2 (similarly for the weights). and the unary terms are w 0 u (δ, w) = log exp(δ(x 0 )), (3) x with δ(x 0 ) = (α,r) N δα(x r ) and w = 1 (α,r) N wα r where N = {(α, r) α r = } are the neighbors of. Furthermore, we define w 0 = max(0, w ) and require wα r 0 for all (α, r). 1 We also require that the local elimination order of each clique power sum term (2) be consistent with a global elimination order o, meaning that for all i < j, o(α i ) < o(α j ). If this is iolated for an α in the input, that α and its associated potential are permuted to obey this order. Ground factor graphs In order to facilitate deelopment of our lifted inference procedure, we modify the standard factor graph depiction of graphical models to mae each ariable s position in its participating factors explicit. To this end, we draw a circular node associated with each ariable and a rectangle partitioned into α cells associated with each factor α. Position cell (α, r) is associated with terms (δα, r wα) r (labeled with just δα r for compactness) and has an edge with its ariable neighbor = α r, which is associated with terms δ,w 0,u (labeled with just δ). 0 A simple example is illustrated in figure 1. 2 Exploiting symmetries Lifted inference techniques identify a set of symmetries in the ground ariational problem that can be exploited computationally. Exact lifted ariational inference (Singla and Domingos 2008; Mladeno, Globerson, and Kersting 2014; Mladeno and Kersting 2015; Mladeno, Ahmadi, and Kersting 2012) identifies a stable partition (Berholz, Bonsma, and Grohe 2013) of the graph as sufficient to characterize symmetries in the solution to the ground ariational problem. In this paper, we impose a set of oer-symmetric constraints on the ground ariational parameters (δ, w), inducing symmetries in the objectie (1) that will, in general, be 1 In (Ping, Liu, and Ihler 2015), w 0 is a free ariable with restrictions w 0 0 and w 0 + (α,r) N w r α = 1. In our formulation, this sum is always larger than or equal to 1 (proiding a alid bound), with equality holding at the optimum. 2 (Mladeno and Kersting 2015) used a similar representation where each position cell is its own node, connected to its factor node. For compactness, we prefer the notation presented here.

3 coarser than those implied by a stable partition, yielding looser but computationally cheaper bounds. Parameter and objectie symmetries Consider a disjoint partition P F = {F 1... F F } of the ground factors F (F i F j = for i j, f F f = F) where all factors in the same partition are required to hae the same potential function. That is, for each f there exists a representatie factor θ f (with associated scope size r f and domains X f r for r N+ r f ) such that for all α F f, θ α = θ f (and α = r f and X αr = X f r for r N+ r r ). In this paper, we consider oer-symmetric constraints on the ariational parameters which forces terms associated with factors in the same partition to be identical. That is, we consider parameters in the set S(P F ) = { } (δ, w) ( δ, w) F f P F, α F f, r N + α δ r α ( xr f )= δ r f ( xr f ), wr α = wr f where δ = { δ f r F f P F, r N + r f } and w = { w f r F f P F, r N + r f } are restricted sets of parameters. For any (δ, w) S(P F ), we identify symmetries in the clique terms (2) as c α (δ, w) = c f ( δ, w) for all α F f where w f [ c f ( δ, w) = def log exp θ f ( x) x r f r=1 ] δ f r ( x r f ) and w f = [ w f 1... w r f f ] and x = [ x1 f... x r f f ]. To recognize the symmetries induced in the aggregated cost shifting δ, 0 w 0 (and hence u (3)) terms, first let N = {α α F f, (α, r) N } represent the neighbors of which contribute δα( x r r f ) = δ f r( xr f ) to the sum. There are M = N such neighbors. Furthermore, define a disjoint partition P V = {V 1... V K } of V where all ariables in the same partition hae the same domain (X = X for all V ) and the same neighbor counts gien P F. That is, there exist representatie counts M such that for M = M for all V and all (f, r). Now, for all V we hae δ 0 ( x ) = δ 0 ( x ) def = N (4) M δ r f ( x ) (5) for all alues x and N = {(f, r) M > 0} (and similarly, w = w def = 1 N M w f r ). These symmetries imply that u (δ, w) = ū ( δ, w) for all V. Our objectie can now be ealuated oer the smaller set of representatie parameters ( δ, w), with symmetries fully specified by P = (P F, P V ). We say that such P is alid and we hae L(δ, w) = L( δ, w) where L( δ, w) def = F f P F F f c f ( δ, w) + V P V V ū ( δ, w). (6) Lifted factor graphs Associated with any alid partition P is a lifted graph which compactly illustrates the symmetries in the ariational parameters and problem terms (1). A lifted factor graph has a super-node associated with each V and a super-factor with r f position cells associated with each F f. Position cell (f, r) is associated with δ f r, wr f (labeled with just δ f r ) and has a super-edge labeled M with super-node if M > 0. The term N defined after (5) can be seen as the neighborhood of in the lifted graph. We similarly define the neighborhood of (f, r) as N = { M > 0}. Example 1. Consider a model with factor graph depicted in figure 2a (with elimination order o = (R 1, R 2, T 1, T 2 )) and clique symmetries θ α 2 = θ α 3 = θ α 4 = θ α 5 = θ. Associated with the graph is the partition P F = {F g, F y } where F g = {α 1 } (green), F y = {α 2, α 3, α 4, α 5 } (yellow) and P V = {V p, V r, V b } where V p = {R 1 } (purple), V r = {R 2 } (red), and V b = {T 1, T 2 } (blue). 3 The top half of the middle panel indicates the oersymmetric parameter constrains specified by P F. The bottom half of that panel indicates the induced symmetries in the unary terms. For example, by examining the neighborhood of R 1 in the factor graph, we see that δr 0 1 = δα 1 +δ 1 2 α + 3 δα 1. Via the parameter symmetries δ 1 1 α = δ 1 2 α = δ 1 3 y and δα 1 = δ g, 1 we see that this is equialent to δ 0 1 R 1 = 2 δ y 1 + δ g. 1 Gradient symmetries Just as the oer-symmetric parameter constraints induce symmetries in the ground objectie function, they also induce (a different set of) symmetries in the gradient of the ground objectie. These gradient symmetries will be used to derie the gradient of the lifted problem, to identify exact symmetries in the ground problem, and to guide our coarseto-fine inference procedure. First, notice that the parameter symmetries imply not only symmetries in the alues of the clique (4) and unary (5) terms, but also in their gradients. That is, for any (δ, w) S(P F ), we hae c α δα( x r r f ) = c f f r( xr f ) and u δ( x 0 ) = ū 0( x ) for any α F f, V, and for all alues of x r f and x. Now, consider the set E = V N of ground factors α F f whose r th position cell (α, r) has neighbor = α r V. For any α E, the parameter δα r appears only in clique term c α in position r (multiplied by -1) and unary term u (multiplied by +1). We can therefore identify symmetries in the objectie gradient as L δα( x r ) = ū ( x 0 ) c f α( x r ) = ū 0( x ) c f def f r( x = ḡ ;δ ( x ) (7) ) 3 The main text used integers to index factor and ariable partitions; here we use letters, representing figure colors, for clarity.

4 (a) Valid, non-stable P, specifies oer-symmetric constraints on the ariational parameters of the ground problem. (b) Stable P, specifies symmetries in ariational parameters that yield an exact solution to the ground problem. Figure 2: (a) alid, non-stable partition imposes oer-symmetric constraints, yielding a looser bound. (b) Stable partition which characterizes exact symmetries of ground problem. More details in examples 1 and 2 of the text. for all alues of x (and similarly for the w gradients). Lifted gradients and ground optimality The gradients of the lifted objectie (6) can be written using the gradient symmetries of the ground problem. Specifying the lifted gradients in this way allows us to identify partitions whose symmetric parameter constraints specify an exact solution to the ground problem. Since, ia (7), δα( x r r f ) = δ f r( xr f ) for all α F f (and all alues of x r f ), we can write L f r( xr f ) = α F f L δ r α( x r f ) = E N ḡ ;δ ( x r f ) (8) where E = M V (and similarly for the w gradients). In the special case where (f, r) has one lifted neighbor N = {}, F f = E and (8) simplifies to L f r( xr f ) = F L f δα( x r r f ), meaning that L/ δ r α( x r f ) = 0 when L/ r f ( xr f ) = 0. Now, if all of the gradients of the lifted problem are zero, and its lifted graph has N = 1 for all (f, r), then all of the ground gradients are also zero. This means that a solution of the lifted problem proides a solution to the ground problem. A partition P whose lifted graph satisfies the property that N = 1 for all (f, r) is called stable. Example 2. If we replace P F from example 1 with F = {F g, F y, F o } where F g = {α 1 } (green), F y = {α 2, α 3 } (yellow), F o = {α 4, α 5 } (orange), then the partition is stable and S(P F ) contains a solution to the ground ariational problem. This is illustrated in figure 2b. Optimizing the lifted objectie In this paper, we use a blac-box procedure to perform global optimization utilizing the gradient of L. To ensure positiity of the weights, we define the parameters η = { η f r F f P F, r N + r f }, substituting w f r with ɛ+exp( ηr f ) (and L/ η f r = ( L/ w f r) exp( ηr f )). To ensure that the function is differentiable, we set w 0 = a( w ) where a(x) = ɛ+t log(1+exp(x/t)). 4 This enables unconstrained optimization oer the η and δ parameters. The gradients are straightforward to derie from the gradients in (Ping, Liu, and Ihler 2015) and are omitted for space. Coarse to fine approximation We now deelop a coarse to fine procedure that interleaes relaxations of the oer-symmetric parameter constraints S(P F ) with optimization, eentually arriing at the coarsest stable partition P and a solution to the ground problem. While optimization touches only the lifted graph, relaxation steps must touch a subset of the ground factors and ariables. To control this cost, we adapt the ey ideas from (Berholz, Bonsma, and Grohe 2013), which presented an asymptotically optimal algorithm for generating the coarsest stable partition of a graph, to our wor. This procedure is 4 Any t > 0, ɛ > 0 bound the objectie since it generates larger unary weights: a t(w ) > ɛ + max(0, w ) > max(0, w ).

5 Algorithm 1 Syntactic coarsest stable partition 1: Set P V to group ariables with identical X 2: Set F 0 f for f N+ F 0 to group factors with identical θ 3: Q ; 4: for F = 1 to F 0 do 5: F F F 0 f ; refinepv(0); 6: end for 7: while Q do 8: Choose any (f, r) Q, set Q Q \ (f, r) 9: Choose a N where E 1 2 F f 10: F F + 1; 11: F F E ; F f F f \ F F ; 12: refinepv(f); 13: end while used as a pre-processing step in wors on exact lifted ariational inference such as (Mladeno, Globerson, and Kersting 2014). Syntactic coarsest stable partition Throughout our description, we maintain a alid partition and its associated lifted graph. Recall from the subsection on lifted gradients (and Figure 2b) that a alid partition is stable if and only if each position cell in its lifted graph has exactly one neighbor. Algorithm 1 maintains the set Q = {(f, r) N 2} of position cells that iolate this property. The algorithm iteratiely selects an (f, r) Q and one of its neighbors N. Ground factors in F f are then split into two groups (line 11), based on whether their r th position neighbor is a ariable in V (recall from the section on gradient symmetries that this is E ), or not. In the appendix, we define the function refinepv that updates the partition P V to be consistent with the new neighbor counts M f and M F for all, updating the lifted graph and Q as needed. The restriction of choosing such that 1 2 F f (line 9) is necessary to maintain the complexity results of (Berholz, Bonsma, and Grohe 2013), as discussed in the appendix. E Coarse-to-fine algorithm One simple way to construct a coarse to fine inference procedure is to interrupt the main loop of Algorithm 1 to perform lifted inference with relaxations and symmetries specified by the current partition P. This would yield any-time bounds and ensure that we neer produce a model finer than the coarsest stable partition. In practice, howeer, superior results are obtained by choosing refinement operations based on their predicted decrease of the objectie alue. Algorithm 2 illustrates this idea, where Sf r (detailed in next section) judges the quality of a split of (f, r), cost(p) is the estimated cost of one inference iteration (we count the number of operations used to compute the lifted score (6)), and β controls how much refinement we allow between inference calls. Algorithm 2 Coarse to fine inference 1: Execute lines 1 to 6 of Algorithm 1 Initialize P 2: Ψ 0; 3: repeat 4: if cost(p) > Ψ then 5: [ δ, w] = inference(p, δ, w) 6: Set Sf r ia (9) (f, r) Q 7: Ψ cost(p) β; 8: end if 9: (f, r ) arg min Q Sf r 10: Set N 1 N (f,r ) as arg min of Sf r ia (9) 11: F F + 1; 12: F F N 1 E (f,r ) ; F f F f \ F F ; 13: refinepv(f ) 14: Set Sf r ia (9) for f {f, F }, r N + r f 15: until P is stable 16: [ δ, w] = inference(p, δ, w) Stable P inference Objectie based scoring function We first define a metric to measure the quality of a proposed parameter partition, then we will show how to optimize this metric efficiently, using only information contained in the lifted graph. For the set of ground parameters associated with an (f, r), we measure the error induced by the current parameter tying restrictions as the distance of the ground gradients to their projected gradient. Letting gα r be the set of ground gradients associated with an (α, r), this metric is d(f f, r) = α F f gα r γ 2 2, with γ = ( α F f gα)/ F r f. We would lie to find a split of F f into F f 1 and F f 2 that minimizes d(f f 1, r) + d(f f 2, r) d(f f, r), which is the change in the projected gradient distance. By symmetry, this split groups together factors with the same gα. r Recalling from (7) that gα r = ḡ for all α E, we write this grouping as F f i = N i E where N 1 N and N 2 = N \ N 1 (adopting the conention F f 1 F f 2 ). We can now form a compact optimization problem to find the best split. For any N N we hae d( N E, r) = d r f ( N ) where d r f ( N ) = ḡ γ 2 2 N E and γ = ( N E ḡ )/( N E ). Letting P( N ) be the power set (all subsets) of N, we sole S r f = min N 1 P( N ) i=1 2 d r f ( N i) d r f ( N ). (9) This is a weighted K-Means (K=2) problem with data ḡ and weights E. If we approximate this by using only one of the elements of ḡ (for example only the gradient wrt δ (1), as we use on binary problems in this paper s experiments), the optimum can be found exactly by sorting ḡ and looping oer all splits.

6 Weight Formula b ( x y) V (x) V (y) u x ( x) V (x) (a) Complete Graph Weight Formula b ( x y) L(x, y) (C(y) C(x)) u x ( x) C(x) l xy ( x y) L(x, y) (b) Binary Collectie Classification Weight Formula b ( x y) Q 1(x) Q 2(y) b ( x y) Q 2(x) Q 3(y) b ( x y) Q 3(x) Q 1(y) u i x ( x) Q i(x), i {1, 2, 3} (c) Clique-Cycle Figure 3: Test models with distinct soft eidence (u and l terms) C2F - Objectie C2F - Size C2F - Syntactic Stable C2F - Objectie C2F - Size C2F - Syntactic Stable (a) Complete (b) Collectie Classification C2F - Objectie C2F - Size C2F - Syntactic Stable (c) Clique Cycle Figure 4: Comparison of optimization at stable coloring (exact model symmetries) with our coarse to fine inference framewor for three different splitting methods, on each of the test models with d = 160 domain size for the logical ariables. Vertical blac dashed lines indicate coarse-to-fine transitions for objectie based splitting ( C2F - Objectie cure). Transitions for other C2F cures occur at approximately the same positions. Experiments This section proides an empirical illustration of our lifted GenDD algorithm. We demonstrate the superiority of our objectie based approach oer purely syntactic based approaches for generating a coarse to fine approximation sequence. We also demonstrate excellent any-time performance on MLNs with distinct soft eidence on eery node compared to exact lifted ariational approaches which are forced to ground the entire problem. Datasets and methodology Datasets. Figure 3 shows the models we use (similar models without eidence were used in (Mladeno and Kersting 2015; Bui, Huynh, and Sontag 2014) to ealuate performance on marginalization tass). In all models b was set to 2.5 and the u terms specify random eidence uniformly distributed on [ ζ, ζ] where ζ = log(( )/10 2 ), so that the u terms act lie probabilities between 0.01 and For the collectie classification problem, we also choose a random 10% of the x, y alues associated with lins L(x, y) and set l xy to a random alue on [0, 5.0]. This can be interpreted as the strength of a lin between two web pages. Parameter settings. In our experiments, we set ɛ = 10 3, t = 10 2 (in the definition of a(x)) and perform an LBFGS blac box optimization with ran 20 Hessian correction. For the coarse to fine procedure, it is important to balance performing inference with model refinement. We found it wored well to perform a small number of inference iterations (30), followed by a small amount of model refinement (setting β = 1.25 in Algorithm 2). This wored better than trying to diagnose inference conergence to determine the transition point. Future wor will loo at more principled methods of balancing inference wor with refinement. Timing methodology. In our experiments, we measure only the cost of performing inference, ignoring the cost of identifying symmetries. This is common in experiments on approximate lifted inference in other wors (Mladeno, Globerson, and Kersting 2014; Broec and Niepert 2014; Venugopal and Gogate 2014) as well as exact lifted ariational inference approaches (Bui, Huynh, and Riedel 2012; Bui, Huynh, and Sontag 2014; Mladeno and Kersting 2015) which assume problem symmetries are aailable before inference. Wors such as (Singla, Nath, and Domingos 2014; Kersting et al. 2010) note that identifying (approximate) symmetries can be a bottlenec. (Singla, Nath, and Domingos 2014) used a sparse hyper-cube based representation to specify approximate symmetries in MLNs; adapting lifted GenDD to leerage a similar representation is an interesting direction for future wor. Results Figure 4 compares our coarse to fine procedure s. performing inference on the stable coloring (ground model in our setup) on each of our three test models. The blue cure ( C2F - objectie ) performs objectie-based splitting as per Algorithm 2 using the metric (9) to guide splitting choice. We also show syntactic splitting ( C2F - syntactic, orange) as per Algorithm 1, interleaing inference when the predicted inference cost exceeds a cost bound (then multiplying that bound by β as in Algorithm 2). C2F - Size (red) splits the largest super-factor in Q and chooses a random split that is as een as possible (adding random edges from N to

7 (a) d = 20 C2F - Objectie Stable (b) d = (c) d = (d) d = (e) d = 320 Figure 5: Scalability comparison. Panels (a)-(e) show bound s time on instances of complete graph with eidence, arying logical ariable domain size d (in MLN specification). C2F exhibits increasing adantage as model grows larger. N 2, stopping when N E 1 > 1 2 F f and setting N 1 = N \ N 2 in line 10 of Algorithm 2), interleaing inference with refinement as in the other C2F methods. The stable (here, ground) inference process is shown in green. We see that objectie-based refinement significantly outperforms the others, with size-based refinement worse, but still better than syntactic splitting. Objectie-based splitting proides orders of magnitude speed-up oer the stable coloring for similar quality results, and all methods proide better early performance than the stable coloring, which must touch the entire model to een proide an initial bound. Figure 5 demonstrates the scalability of our approach. Plots (a)-(e) show the running time for our coarse-to-fine ( C2F - Objectie (blue)) method and the stable coloring method on instances of the complete graph, arying its size. Conclusions We presented a framewor for lifting GenDD for approximate inference by imposing oer-symmetric constraints on the optimization parameters to induce symmetry in the ariational bound. We deeloped a coarse-to-fine procedure that, guided by the objectie, proides high-quality any-time approximations een in models with no exact symmetry. Our method proides orders of magnitude speed-up on our benchmars, with increasing adantage for larger models. Acnowledgements This wor is sponsored in part by NSF grants IIS , IIS , and by the United States Air Force under Contract No. FA C-0011 and FA C Appendix Description of refinepv. After creating a new partition F F (possibly ia splitting it from F f ), algorithms 1 and 2 call refinepv to update P V such that ariables haing identical neighbor counts M = [M 1,..., M F ] are grouped together (where M f def = [M (f,1)... M (f, r f ) ] for all f ). We assume that at entry, the neighbor counts M f are correct for all and f = 1... F 1, and that P V correctly groups ariables by these counts, i.e., M f = M f for all V. The first loop updates N and N (F,r) and their counts. The second loop groups ariables by their signa- Algorithm 3 refinepv Input: f: if f 0, F F was deleted from F f before call. 1: for r = 1 to r F do 2: for α F F do Update ground neighbor lists 3: = α r ; 4: N 5: M 6: end for N M \ α; N (F,r) + 1; M (F,r) N (F,r) α; M (F,r) 1; 7: V m (, m) New ariable groupings 8: for α F F do 9: = α r ; K ; m = M (F,r) ; 10: V V \ ; V m Vm ; 11: end for 12: for (, m) : V m do Update lifted graph 13: K ; 14: if m last m() then New super-node 15: K K + 1; K K (f 16: M,r ) K M (f,r ) (f, r ) N 17: end if 18: V K V m; K K ( V K ) (F,r) 19: M K m; M K 20: end for 21: end for M K m; Note: Updating M (f,r ) (for any, f, r ) changes N, N (f,r ), Q. For compactness, these changes are not shown. ture [K, M (F,r) ], creating the sets V m = { K =, m = M (F,r) }, where K is the partition holding ariable ( V with = K ). This produces the same grouping as would be obtained by grouping the full signature [M 1 (F 1)... M, M F ] because only M and M (F,r) counts change during the call. Note that the change to M does not affect our grouping since for all V m, M is equal to its alue before the call minus M (F,r) (the updates to M K on line 19 reflect this fact). Lastly, the third loop adds V m sets to P V and updates quantities associated with the lifted graph ( M, N terms). V will be oer-written with V m where m is the last (, m) encountered in the loop. In all other cases, splits into a new super-node and its neighbors are copied (line 16).

8 Complexity analysis Each set is implemented as a hash table supporting O(1) addition and deletion (N (f,r ) stores pointers to factor scope ectors). Hence, each α F F contributes O( α ) time to the first two loops of refinepv. Throughout algorithms 1 and 2, an α is moed to an F F that is at most half the size of its current set. Thus, each α participates in at most log 2 F sets throughout either algorithm. Equialent to the result of (Berholz, Bonsma, and Grohe 2013), the total time of the these loops is O(R F log F ) where R = max α F α. Throughout algorithms 1 and 2, the total time spent in the third loop of refinepv is proportional to the size of the lifted graph. This is because the loop adds nodes and edges to the lifted graph in O(1) and each addition is accompanied by at most one deletion (if M K = 0 in line 19). Since the size of the lifted graph is at most the size of the ground graph (R F ), the total time is dominated by the first two loops. References Berholz, C.; Bonsma, P.; and Grohe, M Tight lower and upper bounds for the complexity of canonical colour refinement. In European Symposium on Algorithms, Springer. Broec, G. V. d., and Niepert, M Lifted probabilistic inference for asymmetric graphical models. arxi preprint arxi: Bui, H. B.; Huynh, T. N.; and de Salo Braz, R Exact lifted inference with distinct soft eidence on eery object. In AAAI. Bui, H. H.; Huynh, T. N.; and Riedel, S Automorphism groups of graphical models and lifted ariational inference. arxi preprint arxi: Bui, H. H.; Huynh, T. N.; and Sontag, D Lifted tree-reweighted ariational inference. arxi preprint arxi: Globerson, A., and Jaaola, T. S Fixing maxproduct: Conergent message passing algorithms for map lp-relaxations. In Adances in neural information processing systems, Habeeb, H.; Anand, A.; Singla, P.; et al Coarse-tofine lifted map inference in computer ision. arxi preprint arxi: Kersting, K.; Ahmadi, B.; and Natarajan, S Counting belief propagation. In Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence, AUAI Press. Kersting, K.; El Massaoudi, Y.; Hadiji, F.; and Ahmadi, B Informed lifting for message-passing. In AAAI. Mladeno, M.; Ahmadi, B.; and Kersting, K Lifted linear programming. In AISTATS, Mladeno, M., and Kersting, K Equitable partitions of concae free energies. In UAI, Mladeno, M.; Globerson, A.; and Kersting, K Lifted message passing as reparametrization of graphical models. In UAI, Ping, W.; Liu, Q.; and Ihler, A. T Decomposition bounds for marginal map. In Adances in Neural Information Processing Systems, Poole, D First-order probabilistic inference. In IJCAI, olume 3, Richardson, M., and Domingos, P Maro logic networs. Machine learning 62(1): Singla, P., and Domingos, P. M Lifted first-order belief propagation. In AAAI, olume 8, Singla, P.; Nath, A.; and Domingos, P. M Approximate lifting techniques for belief propagation. In AAAI, Van den Broec, G., and Darwiche, A On the complexity and approximation of binary eidence in lifted inference. In Adances in Neural Information Processing Systems, Venugopal, D., and Gogate, V Eidence-based clustering for scalable inference in maro logic. In Joint European Conference on Machine Learning and Knowledge Discoery in Databases, Springer. Wainwright, M. J., and Jordan, M. I Graphical models, exponential families, and ariational inference. Foundations and Trends R in Machine Learning 1(1 2): Wainwright, M. J.; Jaaola, T. S.; and Willsy, A. S A new class of upper bounds on the log partition function. IEEE Transactions on Information Theory 51(7):

Lifted Probabilistic Inference for Asymmetric Graphical Models

Lifted Probabilistic Inference for Asymmetric Graphical Models Lifted Probabilistic Inference for Asymmetric Graphical Models Guy Van den Broeck Department of Computer Science KU Leuven, Belgium guy.vandenbroeck@cs.kuleuven.be Mathias Niepert Computer Science and

More information

Lecture 6: Graphical Models

Lecture 6: Graphical Models Lecture 6: Graphical Models Kai-Wei Chang CS @ Uniersity of Virginia kw@kwchang.net Some slides are adapted from Viek Skirmar s course on Structured Prediction 1 So far We discussed sequence labeling tasks:

More information

Does Better Inference mean Better Learning?

Does Better Inference mean Better Learning? Does Better Inference mean Better Learning? Andrew E. Gelfand, Rina Dechter & Alexander Ihler Department of Computer Science University of California, Irvine {agelfand,dechter,ihler}@ics.uci.edu Abstract

More information

Approximating the Partition Function by Deleting and then Correcting for Model Edges (Extended Abstract)

Approximating the Partition Function by Deleting and then Correcting for Model Edges (Extended Abstract) Approximating the Partition Function by Deleting and then Correcting for Model Edges (Extended Abstract) Arthur Choi and Adnan Darwiche Computer Science Department University of California, Los Angeles

More information

StarAI Full, 6+1 pages Short, 2 page position paper or abstract

StarAI Full, 6+1 pages Short, 2 page position paper or abstract StarAI 2015 Fifth International Workshop on Statistical Relational AI At the 31st Conference on Uncertainty in Artificial Intelligence (UAI) (right after ICML) In Amsterdam, The Netherlands, on July 16.

More information

14 : Theory of Variational Inference: Inner and Outer Approximation

14 : Theory of Variational Inference: Inner and Outer Approximation 10-708: Probabilistic Graphical Models 10-708, Spring 2017 14 : Theory of Variational Inference: Inner and Outer Approximation Lecturer: Eric P. Xing Scribes: Maria Ryskina, Yen-Chia Hsu 1 Introduction

More information

13 : Variational Inference: Loopy Belief Propagation

13 : Variational Inference: Loopy Belief Propagation 10-708: Probabilistic Graphical Models 10-708, Spring 2014 13 : Variational Inference: Loopy Belief Propagation Lecturer: Eric P. Xing Scribes: Rajarshi Das, Zhengzhong Liu, Dishan Gupta 1 Introduction

More information

Bounding the Partition Function using Hölder s Inequality

Bounding the Partition Function using Hölder s Inequality Qiang Liu Alexander Ihler Department of Computer Science, University of California, Irvine, CA, 92697, USA qliu1@uci.edu ihler@ics.uci.edu Abstract We describe an algorithm for approximate inference in

More information

Automorphism Groups of Graphical Models and Lifted Variational Inference

Automorphism Groups of Graphical Models and Lifted Variational Inference Automorphism Groups of Graphical Models and Lifted Variational Inference Hung Hai Bui Artificial Intelligence Center SRI International bui@ai.sri.com Tuyen N. Huynh Artificial Intelligence Center SRI International

More information

Dual Decomposition for Marginal Inference

Dual Decomposition for Marginal Inference Dual Decomposition for Marginal Inference Justin Domke Rochester Institute of echnology Rochester, NY 14623 Abstract We present a dual decomposition approach to the treereweighted belief propagation objective.

More information

13 : Variational Inference: Loopy Belief Propagation and Mean Field

13 : Variational Inference: Loopy Belief Propagation and Mean Field 10-708: Probabilistic Graphical Models 10-708, Spring 2012 13 : Variational Inference: Loopy Belief Propagation and Mean Field Lecturer: Eric P. Xing Scribes: Peter Schulam and William Wang 1 Introduction

More information

A Combined LP and QP Relaxation for MAP

A Combined LP and QP Relaxation for MAP A Combined LP and QP Relaxation for MAP Patrick Pletscher ETH Zurich, Switzerland pletscher@inf.ethz.ch Sharon Wulff ETH Zurich, Switzerland sharon.wulff@inf.ethz.ch Abstract MAP inference for general

More information

Network Flow Problems Luis Goddyn, Math 408

Network Flow Problems Luis Goddyn, Math 408 Network Flow Problems Luis Goddyn, Math 48 Let D = (V, A) be a directed graph, and let s, t V (D). For S V we write δ + (S) = {u A : u S, S} and δ (S) = {u A : u S, S} for the in-arcs and out-arcs of S

More information

A Regularization Framework for Learning from Graph Data

A Regularization Framework for Learning from Graph Data A Regularization Framework for Learning from Graph Data Dengyong Zhou Max Planck Institute for Biological Cybernetics Spemannstr. 38, 7076 Tuebingen, Germany Bernhard Schölkopf Max Planck Institute for

More information

Probabilistic and Bayesian Machine Learning

Probabilistic and Bayesian Machine Learning Probabilistic and Bayesian Machine Learning Day 4: Expectation and Belief Propagation Yee Whye Teh ywteh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit University College London http://www.gatsby.ucl.ac.uk/

More information

Dual Decomposition for Inference

Dual Decomposition for Inference Dual Decomposition for Inference Yunshu Liu ASPITRG Research Group 2014-05-06 References: [1]. D. Sontag, A. Globerson and T. Jaakkola, Introduction to Dual Decomposition for Inference, Optimization for

More information

Accuracy Bounds for Belief Propagation

Accuracy Bounds for Belief Propagation Appeared at Uncertainty in Artificial Intelligence, July 2007. Some typos corrected. Accuracy Bounds for Belief Propagation Alexander T. Ihler Toyota Technological Institute, Chicago 427 East 60 th St.,

More information

Alternative Parameterizations of Markov Networks. Sargur Srihari

Alternative Parameterizations of Markov Networks. Sargur Srihari Alternative Parameterizations of Markov Networks Sargur srihari@cedar.buffalo.edu 1 Topics Three types of parameterization 1. Gibbs Parameterization 2. Factor Graphs 3. Log-linear Models with Energy functions

More information

Reduce and Re-Lift: Bootstrapped Lifted Likelihood Maximization for MAP

Reduce and Re-Lift: Bootstrapped Lifted Likelihood Maximization for MAP Reduce and Re-Lift: Bootstrapped Lifted Likelihood Maximization for MAP Fabian Hadiji and Kristian Kersting IGG, University of Bonn and KD, Fraunhofer IAIS {firstname.lastname}@iais.fraunhofer.de Abstract

More information

Tightness of LP Relaxations for Almost Balanced Models

Tightness of LP Relaxations for Almost Balanced Models Tightness of LP Relaxations for Almost Balanced Models Adrian Weller University of Cambridge AISTATS May 10, 2016 Joint work with Mark Rowland and David Sontag For more information, see http://mlg.eng.cam.ac.uk/adrian/

More information

Algorithms and Data Structures 2014 Exercises and Solutions Week 14

Algorithms and Data Structures 2014 Exercises and Solutions Week 14 lgorithms and ata tructures 0 xercises and s Week Linear programming. onsider the following linear program. maximize x y 0 x + y 7 x x 0 y 0 x + 3y Plot the feasible region and identify the optimal solution.

More information

Probabilistic Graphical Models. Theory of Variational Inference: Inner and Outer Approximation. Lecture 15, March 4, 2013

Probabilistic Graphical Models. Theory of Variational Inference: Inner and Outer Approximation. Lecture 15, March 4, 2013 School of Computer Science Probabilistic Graphical Models Theory of Variational Inference: Inner and Outer Approximation Junming Yin Lecture 15, March 4, 2013 Reading: W & J Book Chapters 1 Roadmap Two

More information

Sampling Algorithms for Probabilistic Graphical models

Sampling Algorithms for Probabilistic Graphical models Sampling Algorithms for Probabilistic Graphical models Vibhav Gogate University of Washington References: Chapter 12 of Probabilistic Graphical models: Principles and Techniques by Daphne Koller and Nir

More information

Towards Universal Cover Decoding

Towards Universal Cover Decoding International Symposium on Information Theory and its Applications, ISITA2008 Auckland, New Zealand, 7-10, December, 2008 Towards Uniersal Coer Decoding Nathan Axig, Deanna Dreher, Katherine Morrison,

More information

Variational Algorithms for Marginal MAP

Variational Algorithms for Marginal MAP Variational Algorithms for Marginal MAP Qiang Liu Department of Computer Science University of California, Irvine Irvine, CA, 92697 qliu1@ics.uci.edu Alexander Ihler Department of Computer Science University

More information

Graphical Model Inference with Perfect Graphs

Graphical Model Inference with Perfect Graphs Graphical Model Inference with Perfect Graphs Tony Jebara Columbia University July 25, 2013 joint work with Adrian Weller Graphical models and Markov random fields We depict a graphical model G as a bipartite

More information

Probabilistic Graphical Models

Probabilistic Graphical Models School of Computer Science Probabilistic Graphical Models Variational Inference II: Mean Field Method and Variational Principle Junming Yin Lecture 15, March 7, 2012 X 1 X 1 X 1 X 1 X 2 X 3 X 2 X 2 X 3

More information

Variational Inference (11/04/13)

Variational Inference (11/04/13) STA561: Probabilistic machine learning Variational Inference (11/04/13) Lecturer: Barbara Engelhardt Scribes: Matt Dickenson, Alireza Samany, Tracy Schifeling 1 Introduction In this lecture we will further

More information

Efficient Lifting of MAP LP Relaxations Using k-locality

Efficient Lifting of MAP LP Relaxations Using k-locality Martin Mladenov Amir Globerson Kristian Kersting TU Dortmund University Hebrew University TU Dortmund University {fn.ln}@cs.tu-dortmund.de amir.globerson@mail.huji.ac.il {fn.ln}@cs.tu-dortmund.de Abstract

More information

Fisher Information in Gaussian Graphical Models

Fisher Information in Gaussian Graphical Models Fisher Information in Gaussian Graphical Models Jason K. Johnson September 21, 2006 Abstract This note summarizes various derivations, formulas and computational algorithms relevant to the Fisher information

More information

Alternative Parameterizations of Markov Networks. Sargur Srihari

Alternative Parameterizations of Markov Networks. Sargur Srihari Alternative Parameterizations of Markov Networks Sargur srihari@cedar.buffalo.edu 1 Topics Three types of parameterization 1. Gibbs Parameterization 2. Factor Graphs 3. Log-linear Models Features (Ising,

More information

Undirected Graphical Models

Undirected Graphical Models Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Properties Properties 3 Generative vs. Conditional

More information

Non-Count Symmetries in Boolean & Multi-Valued Prob. Graphical Models

Non-Count Symmetries in Boolean & Multi-Valued Prob. Graphical Models Ankit Anand 1 Ritesh Noothigattu 1 Parag Singla and Mausam Department of CSE Machine Learning Department 1 Department of CSE I.I.T Delhi Carnegie Mellon University I.I.T Delhi Abstract Lifted inference

More information

CSE 254: MAP estimation via agreement on (hyper)trees: Message-passing and linear programming approaches

CSE 254: MAP estimation via agreement on (hyper)trees: Message-passing and linear programming approaches CSE 254: MAP estimation via agreement on (hyper)trees: Message-passing and linear programming approaches A presentation by Evan Ettinger November 11, 2005 Outline Introduction Motivation and Background

More information

Lecture 1. 1 Overview. 2 Maximum Flow. COMPSCI 532: Design and Analysis of Algorithms August 26, 2015

Lecture 1. 1 Overview. 2 Maximum Flow. COMPSCI 532: Design and Analysis of Algorithms August 26, 2015 COMPSCI 532: Design and Analysis of Algorithms August 26, 205 Lecture Lecturer: Debmalya Panigrahi Scribe: Allen Xiao Oeriew In this lecture, we will define the maximum flow problem and discuss two algorithms

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models David Sontag New York University Lecture 6, March 7, 2013 David Sontag (NYU) Graphical Models Lecture 6, March 7, 2013 1 / 25 Today s lecture 1 Dual decomposition 2 MAP inference

More information

Discrete Inference and Learning Lecture 3

Discrete Inference and Learning Lecture 3 Discrete Inference and Learning Lecture 3 MVA 2017 2018 h

More information

Lifted MAP Inference for Markov Logic

Lifted MAP Inference for Markov Logic Lifted MAP Inference for Markov Logic - Somdeb Sarkhel - Deepak Venugopal - Happy Mittal - Parag Singla - Vibhav Gogate Outline Introduction Lifted MAP Inference for Markov Logic Networks S. Sarkhel, D.

More information

Symmetry Breaking for Relational Weighted Model Finding

Symmetry Breaking for Relational Weighted Model Finding Symmetry Breaking for Relational Weighted Model Finding Tim Kopp Parag Singla Henry Kautz WL4AI July 27, 2015 Outline Weighted logics. Symmetry-breaking in SAT. Symmetry-breaking for weighted logics. Weighted

More information

Automatic Differentiation Equipped Variable Elimination for Sensitivity Analysis on Probabilistic Inference Queries

Automatic Differentiation Equipped Variable Elimination for Sensitivity Analysis on Probabilistic Inference Queries Automatic Differentiation Equipped Variable Elimination for Sensitivity Analysis on Probabilistic Inference Queries Anonymous Author(s) Affiliation Address email Abstract 1 2 3 4 5 6 7 8 9 10 11 12 Probabilistic

More information

arxiv: v1 [stat.ml] 28 Oct 2017

arxiv: v1 [stat.ml] 28 Oct 2017 Jinglin Chen Jian Peng Qiang Liu UIUC UIUC Dartmouth arxiv:1710.10404v1 [stat.ml] 28 Oct 2017 Abstract We propose a new localized inference algorithm for answering marginalization queries in large graphical

More information

Lecture 9: PGM Learning

Lecture 9: PGM Learning 13 Oct 2014 Intro. to Stats. Machine Learning COMP SCI 4401/7401 Table of Contents I Learning parameters in MRFs 1 Learning parameters in MRFs Inference and Learning Given parameters (of potentials) and

More information

Outline. Bayesian Networks: Belief Propagation in Singly Connected Networks. Form of Evidence and Notation. Motivation

Outline. Bayesian Networks: Belief Propagation in Singly Connected Networks. Form of Evidence and Notation. Motivation Outline Bayesian : in Singly Connected Huizhen u Dept. Computer Science, Uni. of Helsinki Huizhen u (U.H.) Bayesian : in Singly Connected Feb. 16 1 / 25 Huizhen u (U.H.) Bayesian : in Singly Connected

More information

New Rules for Domain Independent Lifted MAP Inference

New Rules for Domain Independent Lifted MAP Inference New Rules for Domain Independent Lifted MAP Inference Happy Mittal, Prasoon Goyal Dept. of Comp. Sci. & Engg. I.I.T. Delhi, Hauz Khas New Delhi, 116, India happy.mittal@cse.iitd.ac.in prasoongoyal13@gmail.com

More information

Fast Lifted MAP Inference via Partitioning

Fast Lifted MAP Inference via Partitioning Fast Lifted MAP Inference via Partitioning Somdeb Sarkhel The University of Texas at Dallas Parag Singla I.I.T. Delhi Abstract Vibhav Gogate The University of Texas at Dallas Recently, there has been growing

More information

Inference in Bayesian Networks

Inference in Bayesian Networks Andrea Passerini passerini@disi.unitn.it Machine Learning Inference in graphical models Description Assume we have evidence e on the state of a subset of variables E in the model (i.e. Bayesian Network)

More information

Section 6: PRISMATIC BEAMS. Beam Theory

Section 6: PRISMATIC BEAMS. Beam Theory Beam Theory There are two types of beam theory aailable to craft beam element formulations from. They are Bernoulli-Euler beam theory Timoshenko beam theory One learns the details of Bernoulli-Euler beam

More information

Review of Matrices and Vectors 1/45

Review of Matrices and Vectors 1/45 Reiew of Matrices and Vectors /45 /45 Definition of Vector: A collection of comple or real numbers, generally put in a column [ ] T "! Transpose + + + b a b a b b a a " " " b a b a Definition of Vector

More information

On the optimality of tree-reweighted max-product message-passing

On the optimality of tree-reweighted max-product message-passing Appeared in Uncertainty on Artificial Intelligence, July 2005, Edinburgh, Scotland. On the optimality of tree-reweighted max-product message-passing Vladimir Kolmogorov Microsoft Research Cambridge, UK

More information

12 : Variational Inference I

12 : Variational Inference I 10-708: Probabilistic Graphical Models, Spring 2015 12 : Variational Inference I Lecturer: Eric P. Xing Scribes: Fattaneh Jabbari, Eric Lei, Evan Shapiro 1 Introduction Probabilistic inference is one of

More information

Probabilistic Graphical Models

Probabilistic Graphical Models 2016 Robert Nowak Probabilistic Graphical Models 1 Introduction We have focused mainly on linear models for signals, in particular the subspace model x = Uθ, where U is a n k matrix and θ R k is a vector

More information

A note on the primal-dual method for the semi-metric labeling problem

A note on the primal-dual method for the semi-metric labeling problem A note on the primal-dual method for the semi-metric labeling problem Vladimir Kolmogorov vnk@adastral.ucl.ac.uk Technical report June 4, 2007 Abstract Recently, Komodakis et al. [6] developed the FastPD

More information

Lifted Relax, Compensate and then Recover: From Approximate to Exact Lifted Probabilistic Inference

Lifted Relax, Compensate and then Recover: From Approximate to Exact Lifted Probabilistic Inference Lifted Relax, Compensate and then Recover: From Approximate to Exact Lifted Probabilistic Inference Guy Van den Broeck Department of Computer Science KU Leuven guy.vandenbroeck@cs.kuleuven.be Arthur Choi

More information

Polynomial-Solvability of N P-class Problems

Polynomial-Solvability of N P-class Problems Polynomial-Solability of N P-class Problems Anatoly Panyuko paniukoa@susu.ru arxi:1409.03755 [cs.cc] 24 Aug 2018 Abstract LetHamiltoniancomplement ofthegraphg = (V(G), E(G))betheminimal cardinality set

More information

A Geometric Review of Linear Algebra

A Geometric Review of Linear Algebra A Geometric Reiew of Linear Algebra The following is a compact reiew of the primary concepts of linear algebra. The order of presentation is unconentional, with emphasis on geometric intuition rather than

More information

Dynamic Vehicle Routing with Moving Demands Part II: High speed demands or low arrival rates

Dynamic Vehicle Routing with Moving Demands Part II: High speed demands or low arrival rates Dynamic Vehicle Routing with Moing Demands Part II: High speed demands or low arrial rates Stephen L. Smith Shaunak D. Bopardikar Francesco Bullo João P. Hespanha Abstract In the companion paper we introduced

More information

Lifted Marginal MAP Inference

Lifted Marginal MAP Inference Lifted Marginal MAP Inference Vishal Sharma 1, Noman Ahmed Sheikh 2, Happy Mittal 1, Vibhav Gogate 3 and Parag Singla 1 1 IIT Delhi, {vishal.sharma, happy.mittal, parags}@cse.iitd.ac.in 2 IIT Delhi, nomanahmedsheikh@outlook.com

More information

Lifted Relax, Compensate and then Recover: From Approximate to Exact Lifted Probabilistic Inference

Lifted Relax, Compensate and then Recover: From Approximate to Exact Lifted Probabilistic Inference Lifted Relax, Compensate and then Recover: From Approximate to Exact Lifted Probabilistic Inference Guy Van den Broeck Department of Computer Science KU Leuven guy.vandenbroeck@cs.kuleuven.be Arthur Choi

More information

Intelligent Systems:

Intelligent Systems: Intelligent Systems: Undirected Graphical models (Factor Graphs) (2 lectures) Carsten Rother 15/01/2015 Intelligent Systems: Probabilistic Inference in DGM and UGM Roadmap for next two lectures Definition

More information

A Geometric Review of Linear Algebra

A Geometric Review of Linear Algebra A Geometric Reiew of Linear Algebra The following is a compact reiew of the primary concepts of linear algebra. I assume the reader is familiar with basic (i.e., high school) algebra and trigonometry.

More information

Bayesian Learning in Undirected Graphical Models

Bayesian Learning in Undirected Graphical Models Bayesian Learning in Undirected Graphical Models Zoubin Ghahramani Gatsby Computational Neuroscience Unit University College London, UK http://www.gatsby.ucl.ac.uk/ Work with: Iain Murray and Hyun-Chul

More information

Subproblem-Tree Calibration: A Unified Approach to Max-Product Message Passing

Subproblem-Tree Calibration: A Unified Approach to Max-Product Message Passing Subproblem-Tree Calibration: A Unified Approach to Max-Product Message Passing Huayan Wang huayanw@cs.stanford.edu Computer Science Department, Stanford University, Palo Alto, CA 90 USA Daphne Koller koller@cs.stanford.edu

More information

Toward Caching Symmetrical Subtheories for Weighted Model Counting

Toward Caching Symmetrical Subtheories for Weighted Model Counting Toward Caching Symmetrical Subtheories for Weighted Model Counting Tim Kopp University of Rochester Rochester, NY Parag Singla Indian Institute of Technology Hauz Khas, New Delhi Henry Kautz University

More information

The Budgeted Minimum Cost Flow Problem with Unit Upgrading Cost

The Budgeted Minimum Cost Flow Problem with Unit Upgrading Cost The Budgeted Minimum Cost Flow Problem with Unit Upgrading Cost Christina Büsing Sarah Kirchner Arie Koster Annika Thome October 6, 2015 Abstract The budgeted minimum cost flow problem (BMCF(K)) with unit

More information

Math Mathematical Notation

Math Mathematical Notation Math 160 - Mathematical Notation Purpose One goal in any course is to properly use the language of that subject. Finite Mathematics is no different and may often seem like a foreign language. These notations

More information

Variational algorithms for marginal MAP

Variational algorithms for marginal MAP Variational algorithms for marginal MAP Alexander Ihler UC Irvine CIOG Workshop November 2011 Variational algorithms for marginal MAP Alexander Ihler UC Irvine CIOG Workshop November 2011 Work with Qiang

More information

Relaxed Marginal Inference and its Application to Dependency Parsing

Relaxed Marginal Inference and its Application to Dependency Parsing Relaxed Marginal Inference and its Application to Dependency Parsing Sebastian Riedel David A. Smith Department of Computer Science University of Massachusetts, Amherst {riedel,dasmith}@cs.umass.edu Abstract

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Lecture Notes Fall 2009 November, 2009 Byoung-Ta Zhang School of Computer Science and Engineering & Cognitive Science, Brain Science, and Bioinformatics Seoul National University

More information

Junction Tree, BP and Variational Methods

Junction Tree, BP and Variational Methods Junction Tree, BP and Variational Methods Adrian Weller MLSALT4 Lecture Feb 21, 2018 With thanks to David Sontag (MIT) and Tony Jebara (Columbia) for use of many slides and illustrations For more information,

More information

Chapter 16. Structured Probabilistic Models for Deep Learning

Chapter 16. Structured Probabilistic Models for Deep Learning Peng et al.: Deep Learning and Practice 1 Chapter 16 Structured Probabilistic Models for Deep Learning Peng et al.: Deep Learning and Practice 2 Structured Probabilistic Models way of using graphs to describe

More information

Linear Response for Approximate Inference

Linear Response for Approximate Inference Linear Response for Approximate Inference Max Welling Department of Computer Science University of Toronto Toronto M5S 3G4 Canada welling@cs.utoronto.ca Yee Whye Teh Computer Science Division University

More information

Walk-Sum Interpretation and Analysis of Gaussian Belief Propagation

Walk-Sum Interpretation and Analysis of Gaussian Belief Propagation Walk-Sum Interpretation and Analysis of Gaussian Belief Propagation Jason K. Johnson, Dmitry M. Malioutov and Alan S. Willsky Department of Electrical Engineering and Computer Science Massachusetts Institute

More information

Relax then Compensate: On Max-Product Belief Propagation and More

Relax then Compensate: On Max-Product Belief Propagation and More Relax then Compensate: On Max-Product Belief Propagation and More Arthur Choi Computer Science Department University of California, Los Angeles Los Angeles, CA 995 aychoi@cs.ucla.edu Adnan Darwiche Computer

More information

Algorithms for Reasoning with Probabilistic Graphical Models. Class 3: Approximate Inference. International Summer School on Deep Learning July 2017

Algorithms for Reasoning with Probabilistic Graphical Models. Class 3: Approximate Inference. International Summer School on Deep Learning July 2017 Algorithms for Reasoning with Probabilistic Graphical Models Class 3: Approximate Inference International Summer School on Deep Learning July 2017 Prof. Rina Dechter Prof. Alexander Ihler Approximate Inference

More information

Max$Sum(( Exact(Inference(

Max$Sum(( Exact(Inference( Probabilis7c( Graphical( Models( Inference( MAP( Max$Sum(( Exact(Inference( Product Summation a 1 b 1 8 a 1 b 2 1 a 2 b 1 0.5 a 2 b 2 2 a 1 b 1 3 a 1 b 2 0 a 2 b 1-1 a 2 b 2 1 Max-Sum Elimination in Chains

More information

Introduction to Machine Learning Midterm Exam

Introduction to Machine Learning Midterm Exam 10-701 Introduction to Machine Learning Midterm Exam Instructors: Eric Xing, Ziv Bar-Joseph 17 November, 2015 There are 11 questions, for a total of 100 points. This exam is open book, open notes, but

More information

On the Linear Threshold Model for Diffusion of Innovations in Multiplex Social Networks

On the Linear Threshold Model for Diffusion of Innovations in Multiplex Social Networks On the Linear Threshold Model for Diffusion of Innoations in Multiplex Social Networks Yaofeng Desmond Zhong 1, Vaibha Sriastaa 2 and Naomi Ehrich Leonard 1 Abstract Diffusion of innoations in social networks

More information

Lecture 6: Graphical Models: Learning

Lecture 6: Graphical Models: Learning Lecture 6: Graphical Models: Learning 4F13: Machine Learning Zoubin Ghahramani and Carl Edward Rasmussen Department of Engineering, University of Cambridge February 3rd, 2010 Ghahramani & Rasmussen (CUED)

More information

Expectation Propagation in Factor Graphs: A Tutorial

Expectation Propagation in Factor Graphs: A Tutorial DRAFT: Version 0.1, 28 October 2005. Do not distribute. Expectation Propagation in Factor Graphs: A Tutorial Charles Sutton October 28, 2005 Abstract Expectation propagation is an important variational

More information

Variance Reduction for Stochastic Gradient Optimization

Variance Reduction for Stochastic Gradient Optimization Variance Reduction for Stochastic Gradient Optimization Chong Wang Xi Chen Alex Smola Eric P. Xing Carnegie Mellon Uniersity, Uniersity of California, Berkeley {chongw,xichen,epxing}@cs.cmu.edu alex@smola.org

More information

Computing Laboratory A GAME-BASED ABSTRACTION-REFINEMENT FRAMEWORK FOR MARKOV DECISION PROCESSES

Computing Laboratory A GAME-BASED ABSTRACTION-REFINEMENT FRAMEWORK FOR MARKOV DECISION PROCESSES Computing Laboratory A GAME-BASED ABSTRACTION-REFINEMENT FRAMEWORK FOR MARKOV DECISION PROCESSES Mark Kattenbelt Marta Kwiatkowska Gethin Norman Daid Parker CL-RR-08-06 Oxford Uniersity Computing Laboratory

More information

Probabilistic Graphical Models

Probabilistic Graphical Models School of Computer Science Probabilistic Graphical Models Variational Inference IV: Variational Principle II Junming Yin Lecture 17, March 21, 2012 X 1 X 1 X 1 X 1 X 2 X 3 X 2 X 2 X 3 X 3 Reading: X 4

More information

Pattern Recognition and Machine Learning

Pattern Recognition and Machine Learning Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability

More information

Concavity of reweighted Kikuchi approximation

Concavity of reweighted Kikuchi approximation Concavity of reweighted ikuchi approximation Po-Ling Loh Department of Statistics The Wharton School University of Pennsylvania loh@wharton.upenn.edu Andre Wibisono Computer Science Division University

More information

Feedback Loop between High Level Semantics and Low Level Vision

Feedback Loop between High Level Semantics and Low Level Vision Feedback Loop between High Level Semantics and Low Level Vision Varun K. Nagaraja Vlad I. Morariu Larry S. Davis {varun,morariu,lsd}@umiacs.umd.edu University of Maryland, College Park, MD, USA. Abstract.

More information

14 : Theory of Variational Inference: Inner and Outer Approximation

14 : Theory of Variational Inference: Inner and Outer Approximation 10-708: Probabilistic Graphical Models 10-708, Spring 2014 14 : Theory of Variational Inference: Inner and Outer Approximation Lecturer: Eric P. Xing Scribes: Yu-Hsin Kuo, Amos Ng 1 Introduction Last lecture

More information

A-level Mathematics. MM03 Mark scheme June Version 1.0: Final

A-level Mathematics. MM03 Mark scheme June Version 1.0: Final -leel Mathematics MM0 Mark scheme 660 June 0 Version.0: Final Mark schemes are prepared by the Lead ssessment Writer and considered, together with the releant questions, by a panel of subject teachers.

More information

CSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18

CSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18 CSE 417T: Introduction to Machine Learning Final Review Henry Chai 12/4/18 Overfitting Overfitting is fitting the training data more than is warranted Fitting noise rather than signal 2 Estimating! "#$

More information

Convexifying the Bethe Free Energy

Convexifying the Bethe Free Energy 402 MESHI ET AL. Convexifying the Bethe Free Energy Ofer Meshi Ariel Jaimovich Amir Globerson Nir Friedman School of Computer Science and Engineering Hebrew University, Jerusalem, Israel 91904 meshi,arielj,gamir,nir@cs.huji.ac.il

More information

Directed and Undirected Graphical Models

Directed and Undirected Graphical Models Directed and Undirected Davide Bacciu Dipartimento di Informatica Università di Pisa bacciu@di.unipi.it Machine Learning: Neural Networks and Advanced Models (AA2) Last Lecture Refresher Lecture Plan Directed

More information

Bayesian Networks: Representation, Variable Elimination

Bayesian Networks: Representation, Variable Elimination Bayesian Networks: Representation, Variable Elimination CS 6375: Machine Learning Class Notes Instructor: Vibhav Gogate The University of Texas at Dallas We can view a Bayesian network as a compact representation

More information

Introduction to Machine Learning Midterm Exam Solutions

Introduction to Machine Learning Midterm Exam Solutions 10-701 Introduction to Machine Learning Midterm Exam Solutions Instructors: Eric Xing, Ziv Bar-Joseph 17 November, 2015 There are 11 questions, for a total of 100 points. This exam is open book, open notes,

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Brown University CSCI 295-P, Spring 213 Prof. Erik Sudderth Lecture 11: Inference & Learning Overview, Gaussian Graphical Models Some figures courtesy Michael Jordan s draft

More information

STATISTICAL METHODS IN AI/ML Vibhav Gogate The University of Texas at Dallas. Bayesian networks: Representation

STATISTICAL METHODS IN AI/ML Vibhav Gogate The University of Texas at Dallas. Bayesian networks: Representation STATISTICAL METHODS IN AI/ML Vibhav Gogate The University of Texas at Dallas Bayesian networks: Representation Motivation Explicit representation of the joint distribution is unmanageable Computationally:

More information

Inference and Representation

Inference and Representation Inference and Representation David Sontag New York University Lecture 5, Sept. 30, 2014 David Sontag (NYU) Inference and Representation Lecture 5, Sept. 30, 2014 1 / 16 Today s lecture 1 Running-time of

More information

Residual migration in VTI media using anisotropy continuation

Residual migration in VTI media using anisotropy continuation Stanford Exploration Project, Report SERGEY, Noember 9, 2000, pages 671?? Residual migration in VTI media using anisotropy continuation Tariq Alkhalifah Sergey Fomel 1 ABSTRACT We introduce anisotropy

More information

Machine Learning Summer School

Machine Learning Summer School Machine Learning Summer School Lecture 3: Learning parameters and structure Zoubin Ghahramani zoubin@eng.cam.ac.uk http://learning.eng.cam.ac.uk/zoubin/ Department of Engineering University of Cambridge,

More information

Progress In Electromagnetics Research Symposium 2005, Hangzhou, China, August

Progress In Electromagnetics Research Symposium 2005, Hangzhou, China, August Progress In Electromagnetics Research Symposium 2005, Hangzhou, China, August 22-26 115 A Noel Technique for Localizing the Scatterer in Inerse Profiling of Two Dimensional Circularly Symmetric Dielectric

More information

Inferning with High Girth Graphical Models

Inferning with High Girth Graphical Models Uri Heinemann The Hebrew University of Jerusalem, Jerusalem, Israel Amir Globerson The Hebrew University of Jerusalem, Jerusalem, Israel URIHEI@CS.HUJI.AC.IL GAMIR@CS.HUJI.AC.IL Abstract Unsupervised learning

More information

Automorphism Groups of Graphical Models and Lifted Variational Inference

Automorphism Groups of Graphical Models and Lifted Variational Inference Automorphism Groups of Graphical Models and Lifted Variational Inference Hung Hai Bui Natural Language Understanding Lab Nuance Communications bui.h.hung@gmail.com Tuyen N. Huynh Artificial Intelligence

More information