Lifted Generalized Dual Decomposition
|
|
- Elisabeth Hudson
- 5 years ago
- Views:
Transcription
1 Lifted Generalized Dual Decomposition Nicholas Gallo Uniersity of California Irine Irine, CA Alexander Ihler Uniersity of California Irine Irine, CA Abstract Many real-world problems, such as Maro Logic Networs (MLNs) with eidence, can be represented as a highly symmetric graphical model perturbed by additional potentials. In these models, ariational inference approaches that exploit exact model symmetries are often forced to ground the entire problem, while methods that exploit approximate symmetries (such as by constructing an oer-symmetric approximate model) offer no guarantees on solution quality. In this paper, we present a method based on a lifted ariant of the generalized dual decomposition (GenDD) for marginal MAP inference which proides a principled way to exploit symmetric sub-structures in a graphical model. We deelop a coarse-tofine inference procedure that proides any-time upper bounds on the objectie. The upper bound property of GenDD proides a principled way to guide the refinement process, proiding good any-time performance and eentually arriing at the ground optimal solution. Introduction A central tas in many application domains (Wainwright and Jordan 2008) is computing lielihoods and marginal probabilities oer distributions defined by a graphical model. These tass are, in general, intractable, and this has motiated the deelopment of many approximate inference techniques. Of significant theoretical and practical interest are ariational techniques which formulate the inference problem as a constrained optimization problem. Within this class, methods that proide upper bounds on the partition function and gie rise to conex optimization problems are particularly attractie since conergence can be guaranteed and the quality of two bounds can be easily compared. This class contains the tree-reweighted bounds (Wainwright, Jaaola, and Willsy 2005) and primal ariants such as GenDD (Ping, Liu, and Ihler 2015), which we study here. Recently, there has been a growing interest in the deelopment of lifted inference techniques, both exact (Bui, Huynh, and de Salo Braz 2012; Poole 2003) and approximate (Bui, Huynh, and Riedel 2012; Mladeno, Ahmadi, and Kersting 2012; Mladeno and Kersting 2015; Bui, Huynh, and Sontag 2014; Mladeno, Globerson, and Kersting 2014), which exploit model symmetries. Most of Copyright c 2018, Association for the Adancement of Artificial Intelligence ( All rights resered. these wors tae a well established ariational approximation and identify exact problem symmetries either algorithmically (e.g., redundant message computations of loopy belief propagation (LBP) (Singla and Domingos 2008; Kersting, Ahmadi, and Natarajan 2009)) or ia graph theoretic notions of symmetry (Bui, Huynh, and Riedel 2012; Mladeno, Ahmadi, and Kersting 2012; Mladeno and Kersting 2015; Bui, Huynh, and Sontag 2014; Mladeno, Globerson, and Kersting 2014). These methods, howeer, often hae significant trouble dealing with models possessing highly symmetric substructures, but few exact symmetries. Some wors (Venugopal and Gogate 2014; Van den Broec and Darwiche 2013) address this by generating a symmetric approximation to an MLN (Richardson and Domingos 2006) with eidence, but offer no guarantees on the quality of the approximation. (Broec and Niepert 2014) correct this biasedness problem by using an oer-symmetric approximation to construct a proposal distribution for a Metropolis-Hastings chain, which guarantees conergence to the true marginals, but gies no guarantees on mixing time. Other wors approximate lifted BP by grouping messages with similar alues (Kersting et al. 2010) or satisfying a desired structural property (Singla, Nath, and Domingos 2014). (Singla, Nath, and Domingos 2014) analyzes errors the approximation induces on the true BP messages, but lie BP, offers no guarantees on solution quality or conergence. Similar in spirit to our wor is (Habeeb et al. 2017) which sequentially soles a set of relaxed MAP inference problems for computer ision tass. Their method operates in a coarse to fine manner soling, at each leel, a MAP problem where groups of pixels are restricted to hae the same alue. This pixel grouping is relaxed and the finer MAP problem is initialized with the solution to the coarser problem, guaranteeing monotonic improement of the solution. Our paper proides a framewor for exploiting approximate model symmetries, based on a lifted ariant of GenDD (Ping, Liu, and Ihler 2015). In contrast to wors that create an oer-symmetric model approximation, our method imposes a set of oer-symmetric restrictions on the ariational cost-shifting updates (messages) associated with the GenDD problem. Guided by a measure of solution quality, these constraints are gradually relaxed, producing a sequence of problems of increasing accuracy, but increasing cost. Our experi-
2 mental results show the superiority of the objectie based refinement criteria, and good anytime performance compared to methods that exploit exact symmetries. Bacground A Maro random field (MRF) oer n discrete random ariables X = [X 1... X n ] taing alues x = [x 1... x n ] (X 1... X n ) has probability density function p(x = x; θ) = exp [ θ α (x α ) Φ(θ) ], α F Φ(θ) = log x X n exp [ θ α (x α ) ] α F where F represents the set of cliques of the distribution, with α F being a subset of the ariables, associated with a potential table θ α. Φ(θ) is the log partition function which normalizes the distribution. Furthermore, denote the set of ariable indices as V = {1... n}. Generalized dual decomposition Motiated by the intractability of computing Φ(θ), (Ping, Liu, and Ihler 2015) deelops efficient upper bounds on Φ(θ) ia a decomposition based on Hölder s inequality. Their bound generalizes the dual decomposition bound (Globerson and Jaaola 2008) for the MAP problem, maintaining two ey properties: (1) it decomposes into a sum of terms defined oer cliques of the graphical model and (2) a family of bounds can be indexed by a set of ariational parameters where finding the parameter setting yielding the tightest bound can be formulated as a conex optimization problem. We reiew their main results. For any non-negatie function f(x) and w R >0, the power sum operator is defined as w f(x) = [ f(x) 1/w] w. x The power sum reduces to standard sum when w = 1 and approaches max x f(x) as w 0 +. We also define the ector power sum w x F (x) = w x w 1 x 1 F (x), where w is a ector of weights, x is a ector of random ariables, and F is a non-negatie function with inputs. The set of ariational parameters δ = {δα r α F, r N + α } and w = {w r α α F, r N + α }, where N+ i = {1... i} for any positie integer i, reparameterize the distribution to proide a tighter bound. The bound of (Ping, Liu, and Ihler 2015) can be written as Φ(θ) V x u (δ, w) + α F c α (δ, w) def = L(δ, w). (1) where, letting w α = [wα 1... w α α ], the clique terms are w α [ c α (δ, w) = log exp θ α (x α ) x α α r=1 ] δα(x r αr ) (2) Figure 1: Position explicit factor graph associated with α 1 = (x 1, x 2 ), α 2 = (x 2, x 3 ). Factor cells (α i, r) are associated with parameters (δ r α i, w r α i ) and unary terms are sums of neighboring cost shifting ariables: δ 0 1 = δ 1 α 1, δ 0 2 = δ 2 α 1 + δ 1 α 2, δ 0 3 = δ 2 α 2 (similarly for the weights). and the unary terms are w 0 u (δ, w) = log exp(δ(x 0 )), (3) x with δ(x 0 ) = (α,r) N δα(x r ) and w = 1 (α,r) N wα r where N = {(α, r) α r = } are the neighbors of. Furthermore, we define w 0 = max(0, w ) and require wα r 0 for all (α, r). 1 We also require that the local elimination order of each clique power sum term (2) be consistent with a global elimination order o, meaning that for all i < j, o(α i ) < o(α j ). If this is iolated for an α in the input, that α and its associated potential are permuted to obey this order. Ground factor graphs In order to facilitate deelopment of our lifted inference procedure, we modify the standard factor graph depiction of graphical models to mae each ariable s position in its participating factors explicit. To this end, we draw a circular node associated with each ariable and a rectangle partitioned into α cells associated with each factor α. Position cell (α, r) is associated with terms (δα, r wα) r (labeled with just δα r for compactness) and has an edge with its ariable neighbor = α r, which is associated with terms δ,w 0,u (labeled with just δ). 0 A simple example is illustrated in figure 1. 2 Exploiting symmetries Lifted inference techniques identify a set of symmetries in the ground ariational problem that can be exploited computationally. Exact lifted ariational inference (Singla and Domingos 2008; Mladeno, Globerson, and Kersting 2014; Mladeno and Kersting 2015; Mladeno, Ahmadi, and Kersting 2012) identifies a stable partition (Berholz, Bonsma, and Grohe 2013) of the graph as sufficient to characterize symmetries in the solution to the ground ariational problem. In this paper, we impose a set of oer-symmetric constraints on the ground ariational parameters (δ, w), inducing symmetries in the objectie (1) that will, in general, be 1 In (Ping, Liu, and Ihler 2015), w 0 is a free ariable with restrictions w 0 0 and w 0 + (α,r) N w r α = 1. In our formulation, this sum is always larger than or equal to 1 (proiding a alid bound), with equality holding at the optimum. 2 (Mladeno and Kersting 2015) used a similar representation where each position cell is its own node, connected to its factor node. For compactness, we prefer the notation presented here.
3 coarser than those implied by a stable partition, yielding looser but computationally cheaper bounds. Parameter and objectie symmetries Consider a disjoint partition P F = {F 1... F F } of the ground factors F (F i F j = for i j, f F f = F) where all factors in the same partition are required to hae the same potential function. That is, for each f there exists a representatie factor θ f (with associated scope size r f and domains X f r for r N+ r f ) such that for all α F f, θ α = θ f (and α = r f and X αr = X f r for r N+ r r ). In this paper, we consider oer-symmetric constraints on the ariational parameters which forces terms associated with factors in the same partition to be identical. That is, we consider parameters in the set S(P F ) = { } (δ, w) ( δ, w) F f P F, α F f, r N + α δ r α ( xr f )= δ r f ( xr f ), wr α = wr f where δ = { δ f r F f P F, r N + r f } and w = { w f r F f P F, r N + r f } are restricted sets of parameters. For any (δ, w) S(P F ), we identify symmetries in the clique terms (2) as c α (δ, w) = c f ( δ, w) for all α F f where w f [ c f ( δ, w) = def log exp θ f ( x) x r f r=1 ] δ f r ( x r f ) and w f = [ w f 1... w r f f ] and x = [ x1 f... x r f f ]. To recognize the symmetries induced in the aggregated cost shifting δ, 0 w 0 (and hence u (3)) terms, first let N = {α α F f, (α, r) N } represent the neighbors of which contribute δα( x r r f ) = δ f r( xr f ) to the sum. There are M = N such neighbors. Furthermore, define a disjoint partition P V = {V 1... V K } of V where all ariables in the same partition hae the same domain (X = X for all V ) and the same neighbor counts gien P F. That is, there exist representatie counts M such that for M = M for all V and all (f, r). Now, for all V we hae δ 0 ( x ) = δ 0 ( x ) def = N (4) M δ r f ( x ) (5) for all alues x and N = {(f, r) M > 0} (and similarly, w = w def = 1 N M w f r ). These symmetries imply that u (δ, w) = ū ( δ, w) for all V. Our objectie can now be ealuated oer the smaller set of representatie parameters ( δ, w), with symmetries fully specified by P = (P F, P V ). We say that such P is alid and we hae L(δ, w) = L( δ, w) where L( δ, w) def = F f P F F f c f ( δ, w) + V P V V ū ( δ, w). (6) Lifted factor graphs Associated with any alid partition P is a lifted graph which compactly illustrates the symmetries in the ariational parameters and problem terms (1). A lifted factor graph has a super-node associated with each V and a super-factor with r f position cells associated with each F f. Position cell (f, r) is associated with δ f r, wr f (labeled with just δ f r ) and has a super-edge labeled M with super-node if M > 0. The term N defined after (5) can be seen as the neighborhood of in the lifted graph. We similarly define the neighborhood of (f, r) as N = { M > 0}. Example 1. Consider a model with factor graph depicted in figure 2a (with elimination order o = (R 1, R 2, T 1, T 2 )) and clique symmetries θ α 2 = θ α 3 = θ α 4 = θ α 5 = θ. Associated with the graph is the partition P F = {F g, F y } where F g = {α 1 } (green), F y = {α 2, α 3, α 4, α 5 } (yellow) and P V = {V p, V r, V b } where V p = {R 1 } (purple), V r = {R 2 } (red), and V b = {T 1, T 2 } (blue). 3 The top half of the middle panel indicates the oersymmetric parameter constrains specified by P F. The bottom half of that panel indicates the induced symmetries in the unary terms. For example, by examining the neighborhood of R 1 in the factor graph, we see that δr 0 1 = δα 1 +δ 1 2 α + 3 δα 1. Via the parameter symmetries δ 1 1 α = δ 1 2 α = δ 1 3 y and δα 1 = δ g, 1 we see that this is equialent to δ 0 1 R 1 = 2 δ y 1 + δ g. 1 Gradient symmetries Just as the oer-symmetric parameter constraints induce symmetries in the ground objectie function, they also induce (a different set of) symmetries in the gradient of the ground objectie. These gradient symmetries will be used to derie the gradient of the lifted problem, to identify exact symmetries in the ground problem, and to guide our coarseto-fine inference procedure. First, notice that the parameter symmetries imply not only symmetries in the alues of the clique (4) and unary (5) terms, but also in their gradients. That is, for any (δ, w) S(P F ), we hae c α δα( x r r f ) = c f f r( xr f ) and u δ( x 0 ) = ū 0( x ) for any α F f, V, and for all alues of x r f and x. Now, consider the set E = V N of ground factors α F f whose r th position cell (α, r) has neighbor = α r V. For any α E, the parameter δα r appears only in clique term c α in position r (multiplied by -1) and unary term u (multiplied by +1). We can therefore identify symmetries in the objectie gradient as L δα( x r ) = ū ( x 0 ) c f α( x r ) = ū 0( x ) c f def f r( x = ḡ ;δ ( x ) (7) ) 3 The main text used integers to index factor and ariable partitions; here we use letters, representing figure colors, for clarity.
4 (a) Valid, non-stable P, specifies oer-symmetric constraints on the ariational parameters of the ground problem. (b) Stable P, specifies symmetries in ariational parameters that yield an exact solution to the ground problem. Figure 2: (a) alid, non-stable partition imposes oer-symmetric constraints, yielding a looser bound. (b) Stable partition which characterizes exact symmetries of ground problem. More details in examples 1 and 2 of the text. for all alues of x (and similarly for the w gradients). Lifted gradients and ground optimality The gradients of the lifted objectie (6) can be written using the gradient symmetries of the ground problem. Specifying the lifted gradients in this way allows us to identify partitions whose symmetric parameter constraints specify an exact solution to the ground problem. Since, ia (7), δα( x r r f ) = δ f r( xr f ) for all α F f (and all alues of x r f ), we can write L f r( xr f ) = α F f L δ r α( x r f ) = E N ḡ ;δ ( x r f ) (8) where E = M V (and similarly for the w gradients). In the special case where (f, r) has one lifted neighbor N = {}, F f = E and (8) simplifies to L f r( xr f ) = F L f δα( x r r f ), meaning that L/ δ r α( x r f ) = 0 when L/ r f ( xr f ) = 0. Now, if all of the gradients of the lifted problem are zero, and its lifted graph has N = 1 for all (f, r), then all of the ground gradients are also zero. This means that a solution of the lifted problem proides a solution to the ground problem. A partition P whose lifted graph satisfies the property that N = 1 for all (f, r) is called stable. Example 2. If we replace P F from example 1 with F = {F g, F y, F o } where F g = {α 1 } (green), F y = {α 2, α 3 } (yellow), F o = {α 4, α 5 } (orange), then the partition is stable and S(P F ) contains a solution to the ground ariational problem. This is illustrated in figure 2b. Optimizing the lifted objectie In this paper, we use a blac-box procedure to perform global optimization utilizing the gradient of L. To ensure positiity of the weights, we define the parameters η = { η f r F f P F, r N + r f }, substituting w f r with ɛ+exp( ηr f ) (and L/ η f r = ( L/ w f r) exp( ηr f )). To ensure that the function is differentiable, we set w 0 = a( w ) where a(x) = ɛ+t log(1+exp(x/t)). 4 This enables unconstrained optimization oer the η and δ parameters. The gradients are straightforward to derie from the gradients in (Ping, Liu, and Ihler 2015) and are omitted for space. Coarse to fine approximation We now deelop a coarse to fine procedure that interleaes relaxations of the oer-symmetric parameter constraints S(P F ) with optimization, eentually arriing at the coarsest stable partition P and a solution to the ground problem. While optimization touches only the lifted graph, relaxation steps must touch a subset of the ground factors and ariables. To control this cost, we adapt the ey ideas from (Berholz, Bonsma, and Grohe 2013), which presented an asymptotically optimal algorithm for generating the coarsest stable partition of a graph, to our wor. This procedure is 4 Any t > 0, ɛ > 0 bound the objectie since it generates larger unary weights: a t(w ) > ɛ + max(0, w ) > max(0, w ).
5 Algorithm 1 Syntactic coarsest stable partition 1: Set P V to group ariables with identical X 2: Set F 0 f for f N+ F 0 to group factors with identical θ 3: Q ; 4: for F = 1 to F 0 do 5: F F F 0 f ; refinepv(0); 6: end for 7: while Q do 8: Choose any (f, r) Q, set Q Q \ (f, r) 9: Choose a N where E 1 2 F f 10: F F + 1; 11: F F E ; F f F f \ F F ; 12: refinepv(f); 13: end while used as a pre-processing step in wors on exact lifted ariational inference such as (Mladeno, Globerson, and Kersting 2014). Syntactic coarsest stable partition Throughout our description, we maintain a alid partition and its associated lifted graph. Recall from the subsection on lifted gradients (and Figure 2b) that a alid partition is stable if and only if each position cell in its lifted graph has exactly one neighbor. Algorithm 1 maintains the set Q = {(f, r) N 2} of position cells that iolate this property. The algorithm iteratiely selects an (f, r) Q and one of its neighbors N. Ground factors in F f are then split into two groups (line 11), based on whether their r th position neighbor is a ariable in V (recall from the section on gradient symmetries that this is E ), or not. In the appendix, we define the function refinepv that updates the partition P V to be consistent with the new neighbor counts M f and M F for all, updating the lifted graph and Q as needed. The restriction of choosing such that 1 2 F f (line 9) is necessary to maintain the complexity results of (Berholz, Bonsma, and Grohe 2013), as discussed in the appendix. E Coarse-to-fine algorithm One simple way to construct a coarse to fine inference procedure is to interrupt the main loop of Algorithm 1 to perform lifted inference with relaxations and symmetries specified by the current partition P. This would yield any-time bounds and ensure that we neer produce a model finer than the coarsest stable partition. In practice, howeer, superior results are obtained by choosing refinement operations based on their predicted decrease of the objectie alue. Algorithm 2 illustrates this idea, where Sf r (detailed in next section) judges the quality of a split of (f, r), cost(p) is the estimated cost of one inference iteration (we count the number of operations used to compute the lifted score (6)), and β controls how much refinement we allow between inference calls. Algorithm 2 Coarse to fine inference 1: Execute lines 1 to 6 of Algorithm 1 Initialize P 2: Ψ 0; 3: repeat 4: if cost(p) > Ψ then 5: [ δ, w] = inference(p, δ, w) 6: Set Sf r ia (9) (f, r) Q 7: Ψ cost(p) β; 8: end if 9: (f, r ) arg min Q Sf r 10: Set N 1 N (f,r ) as arg min of Sf r ia (9) 11: F F + 1; 12: F F N 1 E (f,r ) ; F f F f \ F F ; 13: refinepv(f ) 14: Set Sf r ia (9) for f {f, F }, r N + r f 15: until P is stable 16: [ δ, w] = inference(p, δ, w) Stable P inference Objectie based scoring function We first define a metric to measure the quality of a proposed parameter partition, then we will show how to optimize this metric efficiently, using only information contained in the lifted graph. For the set of ground parameters associated with an (f, r), we measure the error induced by the current parameter tying restrictions as the distance of the ground gradients to their projected gradient. Letting gα r be the set of ground gradients associated with an (α, r), this metric is d(f f, r) = α F f gα r γ 2 2, with γ = ( α F f gα)/ F r f. We would lie to find a split of F f into F f 1 and F f 2 that minimizes d(f f 1, r) + d(f f 2, r) d(f f, r), which is the change in the projected gradient distance. By symmetry, this split groups together factors with the same gα. r Recalling from (7) that gα r = ḡ for all α E, we write this grouping as F f i = N i E where N 1 N and N 2 = N \ N 1 (adopting the conention F f 1 F f 2 ). We can now form a compact optimization problem to find the best split. For any N N we hae d( N E, r) = d r f ( N ) where d r f ( N ) = ḡ γ 2 2 N E and γ = ( N E ḡ )/( N E ). Letting P( N ) be the power set (all subsets) of N, we sole S r f = min N 1 P( N ) i=1 2 d r f ( N i) d r f ( N ). (9) This is a weighted K-Means (K=2) problem with data ḡ and weights E. If we approximate this by using only one of the elements of ḡ (for example only the gradient wrt δ (1), as we use on binary problems in this paper s experiments), the optimum can be found exactly by sorting ḡ and looping oer all splits.
6 Weight Formula b ( x y) V (x) V (y) u x ( x) V (x) (a) Complete Graph Weight Formula b ( x y) L(x, y) (C(y) C(x)) u x ( x) C(x) l xy ( x y) L(x, y) (b) Binary Collectie Classification Weight Formula b ( x y) Q 1(x) Q 2(y) b ( x y) Q 2(x) Q 3(y) b ( x y) Q 3(x) Q 1(y) u i x ( x) Q i(x), i {1, 2, 3} (c) Clique-Cycle Figure 3: Test models with distinct soft eidence (u and l terms) C2F - Objectie C2F - Size C2F - Syntactic Stable C2F - Objectie C2F - Size C2F - Syntactic Stable (a) Complete (b) Collectie Classification C2F - Objectie C2F - Size C2F - Syntactic Stable (c) Clique Cycle Figure 4: Comparison of optimization at stable coloring (exact model symmetries) with our coarse to fine inference framewor for three different splitting methods, on each of the test models with d = 160 domain size for the logical ariables. Vertical blac dashed lines indicate coarse-to-fine transitions for objectie based splitting ( C2F - Objectie cure). Transitions for other C2F cures occur at approximately the same positions. Experiments This section proides an empirical illustration of our lifted GenDD algorithm. We demonstrate the superiority of our objectie based approach oer purely syntactic based approaches for generating a coarse to fine approximation sequence. We also demonstrate excellent any-time performance on MLNs with distinct soft eidence on eery node compared to exact lifted ariational approaches which are forced to ground the entire problem. Datasets and methodology Datasets. Figure 3 shows the models we use (similar models without eidence were used in (Mladeno and Kersting 2015; Bui, Huynh, and Sontag 2014) to ealuate performance on marginalization tass). In all models b was set to 2.5 and the u terms specify random eidence uniformly distributed on [ ζ, ζ] where ζ = log(( )/10 2 ), so that the u terms act lie probabilities between 0.01 and For the collectie classification problem, we also choose a random 10% of the x, y alues associated with lins L(x, y) and set l xy to a random alue on [0, 5.0]. This can be interpreted as the strength of a lin between two web pages. Parameter settings. In our experiments, we set ɛ = 10 3, t = 10 2 (in the definition of a(x)) and perform an LBFGS blac box optimization with ran 20 Hessian correction. For the coarse to fine procedure, it is important to balance performing inference with model refinement. We found it wored well to perform a small number of inference iterations (30), followed by a small amount of model refinement (setting β = 1.25 in Algorithm 2). This wored better than trying to diagnose inference conergence to determine the transition point. Future wor will loo at more principled methods of balancing inference wor with refinement. Timing methodology. In our experiments, we measure only the cost of performing inference, ignoring the cost of identifying symmetries. This is common in experiments on approximate lifted inference in other wors (Mladeno, Globerson, and Kersting 2014; Broec and Niepert 2014; Venugopal and Gogate 2014) as well as exact lifted ariational inference approaches (Bui, Huynh, and Riedel 2012; Bui, Huynh, and Sontag 2014; Mladeno and Kersting 2015) which assume problem symmetries are aailable before inference. Wors such as (Singla, Nath, and Domingos 2014; Kersting et al. 2010) note that identifying (approximate) symmetries can be a bottlenec. (Singla, Nath, and Domingos 2014) used a sparse hyper-cube based representation to specify approximate symmetries in MLNs; adapting lifted GenDD to leerage a similar representation is an interesting direction for future wor. Results Figure 4 compares our coarse to fine procedure s. performing inference on the stable coloring (ground model in our setup) on each of our three test models. The blue cure ( C2F - objectie ) performs objectie-based splitting as per Algorithm 2 using the metric (9) to guide splitting choice. We also show syntactic splitting ( C2F - syntactic, orange) as per Algorithm 1, interleaing inference when the predicted inference cost exceeds a cost bound (then multiplying that bound by β as in Algorithm 2). C2F - Size (red) splits the largest super-factor in Q and chooses a random split that is as een as possible (adding random edges from N to
7 (a) d = 20 C2F - Objectie Stable (b) d = (c) d = (d) d = (e) d = 320 Figure 5: Scalability comparison. Panels (a)-(e) show bound s time on instances of complete graph with eidence, arying logical ariable domain size d (in MLN specification). C2F exhibits increasing adantage as model grows larger. N 2, stopping when N E 1 > 1 2 F f and setting N 1 = N \ N 2 in line 10 of Algorithm 2), interleaing inference with refinement as in the other C2F methods. The stable (here, ground) inference process is shown in green. We see that objectie-based refinement significantly outperforms the others, with size-based refinement worse, but still better than syntactic splitting. Objectie-based splitting proides orders of magnitude speed-up oer the stable coloring for similar quality results, and all methods proide better early performance than the stable coloring, which must touch the entire model to een proide an initial bound. Figure 5 demonstrates the scalability of our approach. Plots (a)-(e) show the running time for our coarse-to-fine ( C2F - Objectie (blue)) method and the stable coloring method on instances of the complete graph, arying its size. Conclusions We presented a framewor for lifting GenDD for approximate inference by imposing oer-symmetric constraints on the optimization parameters to induce symmetry in the ariational bound. We deeloped a coarse-to-fine procedure that, guided by the objectie, proides high-quality any-time approximations een in models with no exact symmetry. Our method proides orders of magnitude speed-up on our benchmars, with increasing adantage for larger models. Acnowledgements This wor is sponsored in part by NSF grants IIS , IIS , and by the United States Air Force under Contract No. FA C-0011 and FA C Appendix Description of refinepv. After creating a new partition F F (possibly ia splitting it from F f ), algorithms 1 and 2 call refinepv to update P V such that ariables haing identical neighbor counts M = [M 1,..., M F ] are grouped together (where M f def = [M (f,1)... M (f, r f ) ] for all f ). We assume that at entry, the neighbor counts M f are correct for all and f = 1... F 1, and that P V correctly groups ariables by these counts, i.e., M f = M f for all V. The first loop updates N and N (F,r) and their counts. The second loop groups ariables by their signa- Algorithm 3 refinepv Input: f: if f 0, F F was deleted from F f before call. 1: for r = 1 to r F do 2: for α F F do Update ground neighbor lists 3: = α r ; 4: N 5: M 6: end for N M \ α; N (F,r) + 1; M (F,r) N (F,r) α; M (F,r) 1; 7: V m (, m) New ariable groupings 8: for α F F do 9: = α r ; K ; m = M (F,r) ; 10: V V \ ; V m Vm ; 11: end for 12: for (, m) : V m do Update lifted graph 13: K ; 14: if m last m() then New super-node 15: K K + 1; K K (f 16: M,r ) K M (f,r ) (f, r ) N 17: end if 18: V K V m; K K ( V K ) (F,r) 19: M K m; M K 20: end for 21: end for M K m; Note: Updating M (f,r ) (for any, f, r ) changes N, N (f,r ), Q. For compactness, these changes are not shown. ture [K, M (F,r) ], creating the sets V m = { K =, m = M (F,r) }, where K is the partition holding ariable ( V with = K ). This produces the same grouping as would be obtained by grouping the full signature [M 1 (F 1)... M, M F ] because only M and M (F,r) counts change during the call. Note that the change to M does not affect our grouping since for all V m, M is equal to its alue before the call minus M (F,r) (the updates to M K on line 19 reflect this fact). Lastly, the third loop adds V m sets to P V and updates quantities associated with the lifted graph ( M, N terms). V will be oer-written with V m where m is the last (, m) encountered in the loop. In all other cases, splits into a new super-node and its neighbors are copied (line 16).
8 Complexity analysis Each set is implemented as a hash table supporting O(1) addition and deletion (N (f,r ) stores pointers to factor scope ectors). Hence, each α F F contributes O( α ) time to the first two loops of refinepv. Throughout algorithms 1 and 2, an α is moed to an F F that is at most half the size of its current set. Thus, each α participates in at most log 2 F sets throughout either algorithm. Equialent to the result of (Berholz, Bonsma, and Grohe 2013), the total time of the these loops is O(R F log F ) where R = max α F α. Throughout algorithms 1 and 2, the total time spent in the third loop of refinepv is proportional to the size of the lifted graph. This is because the loop adds nodes and edges to the lifted graph in O(1) and each addition is accompanied by at most one deletion (if M K = 0 in line 19). Since the size of the lifted graph is at most the size of the ground graph (R F ), the total time is dominated by the first two loops. References Berholz, C.; Bonsma, P.; and Grohe, M Tight lower and upper bounds for the complexity of canonical colour refinement. In European Symposium on Algorithms, Springer. Broec, G. V. d., and Niepert, M Lifted probabilistic inference for asymmetric graphical models. arxi preprint arxi: Bui, H. B.; Huynh, T. N.; and de Salo Braz, R Exact lifted inference with distinct soft eidence on eery object. In AAAI. Bui, H. H.; Huynh, T. N.; and Riedel, S Automorphism groups of graphical models and lifted ariational inference. arxi preprint arxi: Bui, H. H.; Huynh, T. N.; and Sontag, D Lifted tree-reweighted ariational inference. arxi preprint arxi: Globerson, A., and Jaaola, T. S Fixing maxproduct: Conergent message passing algorithms for map lp-relaxations. In Adances in neural information processing systems, Habeeb, H.; Anand, A.; Singla, P.; et al Coarse-tofine lifted map inference in computer ision. arxi preprint arxi: Kersting, K.; Ahmadi, B.; and Natarajan, S Counting belief propagation. In Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence, AUAI Press. Kersting, K.; El Massaoudi, Y.; Hadiji, F.; and Ahmadi, B Informed lifting for message-passing. In AAAI. Mladeno, M.; Ahmadi, B.; and Kersting, K Lifted linear programming. In AISTATS, Mladeno, M., and Kersting, K Equitable partitions of concae free energies. In UAI, Mladeno, M.; Globerson, A.; and Kersting, K Lifted message passing as reparametrization of graphical models. In UAI, Ping, W.; Liu, Q.; and Ihler, A. T Decomposition bounds for marginal map. In Adances in Neural Information Processing Systems, Poole, D First-order probabilistic inference. In IJCAI, olume 3, Richardson, M., and Domingos, P Maro logic networs. Machine learning 62(1): Singla, P., and Domingos, P. M Lifted first-order belief propagation. In AAAI, olume 8, Singla, P.; Nath, A.; and Domingos, P. M Approximate lifting techniques for belief propagation. In AAAI, Van den Broec, G., and Darwiche, A On the complexity and approximation of binary eidence in lifted inference. In Adances in Neural Information Processing Systems, Venugopal, D., and Gogate, V Eidence-based clustering for scalable inference in maro logic. In Joint European Conference on Machine Learning and Knowledge Discoery in Databases, Springer. Wainwright, M. J., and Jordan, M. I Graphical models, exponential families, and ariational inference. Foundations and Trends R in Machine Learning 1(1 2): Wainwright, M. J.; Jaaola, T. S.; and Willsy, A. S A new class of upper bounds on the log partition function. IEEE Transactions on Information Theory 51(7):
Lifted Probabilistic Inference for Asymmetric Graphical Models
Lifted Probabilistic Inference for Asymmetric Graphical Models Guy Van den Broeck Department of Computer Science KU Leuven, Belgium guy.vandenbroeck@cs.kuleuven.be Mathias Niepert Computer Science and
More informationLecture 6: Graphical Models
Lecture 6: Graphical Models Kai-Wei Chang CS @ Uniersity of Virginia kw@kwchang.net Some slides are adapted from Viek Skirmar s course on Structured Prediction 1 So far We discussed sequence labeling tasks:
More informationDoes Better Inference mean Better Learning?
Does Better Inference mean Better Learning? Andrew E. Gelfand, Rina Dechter & Alexander Ihler Department of Computer Science University of California, Irvine {agelfand,dechter,ihler}@ics.uci.edu Abstract
More informationApproximating the Partition Function by Deleting and then Correcting for Model Edges (Extended Abstract)
Approximating the Partition Function by Deleting and then Correcting for Model Edges (Extended Abstract) Arthur Choi and Adnan Darwiche Computer Science Department University of California, Los Angeles
More informationStarAI Full, 6+1 pages Short, 2 page position paper or abstract
StarAI 2015 Fifth International Workshop on Statistical Relational AI At the 31st Conference on Uncertainty in Artificial Intelligence (UAI) (right after ICML) In Amsterdam, The Netherlands, on July 16.
More information14 : Theory of Variational Inference: Inner and Outer Approximation
10-708: Probabilistic Graphical Models 10-708, Spring 2017 14 : Theory of Variational Inference: Inner and Outer Approximation Lecturer: Eric P. Xing Scribes: Maria Ryskina, Yen-Chia Hsu 1 Introduction
More information13 : Variational Inference: Loopy Belief Propagation
10-708: Probabilistic Graphical Models 10-708, Spring 2014 13 : Variational Inference: Loopy Belief Propagation Lecturer: Eric P. Xing Scribes: Rajarshi Das, Zhengzhong Liu, Dishan Gupta 1 Introduction
More informationBounding the Partition Function using Hölder s Inequality
Qiang Liu Alexander Ihler Department of Computer Science, University of California, Irvine, CA, 92697, USA qliu1@uci.edu ihler@ics.uci.edu Abstract We describe an algorithm for approximate inference in
More informationAutomorphism Groups of Graphical Models and Lifted Variational Inference
Automorphism Groups of Graphical Models and Lifted Variational Inference Hung Hai Bui Artificial Intelligence Center SRI International bui@ai.sri.com Tuyen N. Huynh Artificial Intelligence Center SRI International
More informationDual Decomposition for Marginal Inference
Dual Decomposition for Marginal Inference Justin Domke Rochester Institute of echnology Rochester, NY 14623 Abstract We present a dual decomposition approach to the treereweighted belief propagation objective.
More information13 : Variational Inference: Loopy Belief Propagation and Mean Field
10-708: Probabilistic Graphical Models 10-708, Spring 2012 13 : Variational Inference: Loopy Belief Propagation and Mean Field Lecturer: Eric P. Xing Scribes: Peter Schulam and William Wang 1 Introduction
More informationA Combined LP and QP Relaxation for MAP
A Combined LP and QP Relaxation for MAP Patrick Pletscher ETH Zurich, Switzerland pletscher@inf.ethz.ch Sharon Wulff ETH Zurich, Switzerland sharon.wulff@inf.ethz.ch Abstract MAP inference for general
More informationNetwork Flow Problems Luis Goddyn, Math 408
Network Flow Problems Luis Goddyn, Math 48 Let D = (V, A) be a directed graph, and let s, t V (D). For S V we write δ + (S) = {u A : u S, S} and δ (S) = {u A : u S, S} for the in-arcs and out-arcs of S
More informationA Regularization Framework for Learning from Graph Data
A Regularization Framework for Learning from Graph Data Dengyong Zhou Max Planck Institute for Biological Cybernetics Spemannstr. 38, 7076 Tuebingen, Germany Bernhard Schölkopf Max Planck Institute for
More informationProbabilistic and Bayesian Machine Learning
Probabilistic and Bayesian Machine Learning Day 4: Expectation and Belief Propagation Yee Whye Teh ywteh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit University College London http://www.gatsby.ucl.ac.uk/
More informationDual Decomposition for Inference
Dual Decomposition for Inference Yunshu Liu ASPITRG Research Group 2014-05-06 References: [1]. D. Sontag, A. Globerson and T. Jaakkola, Introduction to Dual Decomposition for Inference, Optimization for
More informationAccuracy Bounds for Belief Propagation
Appeared at Uncertainty in Artificial Intelligence, July 2007. Some typos corrected. Accuracy Bounds for Belief Propagation Alexander T. Ihler Toyota Technological Institute, Chicago 427 East 60 th St.,
More informationAlternative Parameterizations of Markov Networks. Sargur Srihari
Alternative Parameterizations of Markov Networks Sargur srihari@cedar.buffalo.edu 1 Topics Three types of parameterization 1. Gibbs Parameterization 2. Factor Graphs 3. Log-linear Models with Energy functions
More informationReduce and Re-Lift: Bootstrapped Lifted Likelihood Maximization for MAP
Reduce and Re-Lift: Bootstrapped Lifted Likelihood Maximization for MAP Fabian Hadiji and Kristian Kersting IGG, University of Bonn and KD, Fraunhofer IAIS {firstname.lastname}@iais.fraunhofer.de Abstract
More informationTightness of LP Relaxations for Almost Balanced Models
Tightness of LP Relaxations for Almost Balanced Models Adrian Weller University of Cambridge AISTATS May 10, 2016 Joint work with Mark Rowland and David Sontag For more information, see http://mlg.eng.cam.ac.uk/adrian/
More informationAlgorithms and Data Structures 2014 Exercises and Solutions Week 14
lgorithms and ata tructures 0 xercises and s Week Linear programming. onsider the following linear program. maximize x y 0 x + y 7 x x 0 y 0 x + 3y Plot the feasible region and identify the optimal solution.
More informationProbabilistic Graphical Models. Theory of Variational Inference: Inner and Outer Approximation. Lecture 15, March 4, 2013
School of Computer Science Probabilistic Graphical Models Theory of Variational Inference: Inner and Outer Approximation Junming Yin Lecture 15, March 4, 2013 Reading: W & J Book Chapters 1 Roadmap Two
More informationSampling Algorithms for Probabilistic Graphical models
Sampling Algorithms for Probabilistic Graphical models Vibhav Gogate University of Washington References: Chapter 12 of Probabilistic Graphical models: Principles and Techniques by Daphne Koller and Nir
More informationTowards Universal Cover Decoding
International Symposium on Information Theory and its Applications, ISITA2008 Auckland, New Zealand, 7-10, December, 2008 Towards Uniersal Coer Decoding Nathan Axig, Deanna Dreher, Katherine Morrison,
More informationVariational Algorithms for Marginal MAP
Variational Algorithms for Marginal MAP Qiang Liu Department of Computer Science University of California, Irvine Irvine, CA, 92697 qliu1@ics.uci.edu Alexander Ihler Department of Computer Science University
More informationGraphical Model Inference with Perfect Graphs
Graphical Model Inference with Perfect Graphs Tony Jebara Columbia University July 25, 2013 joint work with Adrian Weller Graphical models and Markov random fields We depict a graphical model G as a bipartite
More informationProbabilistic Graphical Models
School of Computer Science Probabilistic Graphical Models Variational Inference II: Mean Field Method and Variational Principle Junming Yin Lecture 15, March 7, 2012 X 1 X 1 X 1 X 1 X 2 X 3 X 2 X 2 X 3
More informationVariational Inference (11/04/13)
STA561: Probabilistic machine learning Variational Inference (11/04/13) Lecturer: Barbara Engelhardt Scribes: Matt Dickenson, Alireza Samany, Tracy Schifeling 1 Introduction In this lecture we will further
More informationEfficient Lifting of MAP LP Relaxations Using k-locality
Martin Mladenov Amir Globerson Kristian Kersting TU Dortmund University Hebrew University TU Dortmund University {fn.ln}@cs.tu-dortmund.de amir.globerson@mail.huji.ac.il {fn.ln}@cs.tu-dortmund.de Abstract
More informationFisher Information in Gaussian Graphical Models
Fisher Information in Gaussian Graphical Models Jason K. Johnson September 21, 2006 Abstract This note summarizes various derivations, formulas and computational algorithms relevant to the Fisher information
More informationAlternative Parameterizations of Markov Networks. Sargur Srihari
Alternative Parameterizations of Markov Networks Sargur srihari@cedar.buffalo.edu 1 Topics Three types of parameterization 1. Gibbs Parameterization 2. Factor Graphs 3. Log-linear Models Features (Ising,
More informationUndirected Graphical Models
Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Properties Properties 3 Generative vs. Conditional
More informationNon-Count Symmetries in Boolean & Multi-Valued Prob. Graphical Models
Ankit Anand 1 Ritesh Noothigattu 1 Parag Singla and Mausam Department of CSE Machine Learning Department 1 Department of CSE I.I.T Delhi Carnegie Mellon University I.I.T Delhi Abstract Lifted inference
More informationCSE 254: MAP estimation via agreement on (hyper)trees: Message-passing and linear programming approaches
CSE 254: MAP estimation via agreement on (hyper)trees: Message-passing and linear programming approaches A presentation by Evan Ettinger November 11, 2005 Outline Introduction Motivation and Background
More informationLecture 1. 1 Overview. 2 Maximum Flow. COMPSCI 532: Design and Analysis of Algorithms August 26, 2015
COMPSCI 532: Design and Analysis of Algorithms August 26, 205 Lecture Lecturer: Debmalya Panigrahi Scribe: Allen Xiao Oeriew In this lecture, we will define the maximum flow problem and discuss two algorithms
More informationProbabilistic Graphical Models
Probabilistic Graphical Models David Sontag New York University Lecture 6, March 7, 2013 David Sontag (NYU) Graphical Models Lecture 6, March 7, 2013 1 / 25 Today s lecture 1 Dual decomposition 2 MAP inference
More informationDiscrete Inference and Learning Lecture 3
Discrete Inference and Learning Lecture 3 MVA 2017 2018 h
More informationLifted MAP Inference for Markov Logic
Lifted MAP Inference for Markov Logic - Somdeb Sarkhel - Deepak Venugopal - Happy Mittal - Parag Singla - Vibhav Gogate Outline Introduction Lifted MAP Inference for Markov Logic Networks S. Sarkhel, D.
More informationSymmetry Breaking for Relational Weighted Model Finding
Symmetry Breaking for Relational Weighted Model Finding Tim Kopp Parag Singla Henry Kautz WL4AI July 27, 2015 Outline Weighted logics. Symmetry-breaking in SAT. Symmetry-breaking for weighted logics. Weighted
More informationAutomatic Differentiation Equipped Variable Elimination for Sensitivity Analysis on Probabilistic Inference Queries
Automatic Differentiation Equipped Variable Elimination for Sensitivity Analysis on Probabilistic Inference Queries Anonymous Author(s) Affiliation Address email Abstract 1 2 3 4 5 6 7 8 9 10 11 12 Probabilistic
More informationarxiv: v1 [stat.ml] 28 Oct 2017
Jinglin Chen Jian Peng Qiang Liu UIUC UIUC Dartmouth arxiv:1710.10404v1 [stat.ml] 28 Oct 2017 Abstract We propose a new localized inference algorithm for answering marginalization queries in large graphical
More informationLecture 9: PGM Learning
13 Oct 2014 Intro. to Stats. Machine Learning COMP SCI 4401/7401 Table of Contents I Learning parameters in MRFs 1 Learning parameters in MRFs Inference and Learning Given parameters (of potentials) and
More informationOutline. Bayesian Networks: Belief Propagation in Singly Connected Networks. Form of Evidence and Notation. Motivation
Outline Bayesian : in Singly Connected Huizhen u Dept. Computer Science, Uni. of Helsinki Huizhen u (U.H.) Bayesian : in Singly Connected Feb. 16 1 / 25 Huizhen u (U.H.) Bayesian : in Singly Connected
More informationNew Rules for Domain Independent Lifted MAP Inference
New Rules for Domain Independent Lifted MAP Inference Happy Mittal, Prasoon Goyal Dept. of Comp. Sci. & Engg. I.I.T. Delhi, Hauz Khas New Delhi, 116, India happy.mittal@cse.iitd.ac.in prasoongoyal13@gmail.com
More informationFast Lifted MAP Inference via Partitioning
Fast Lifted MAP Inference via Partitioning Somdeb Sarkhel The University of Texas at Dallas Parag Singla I.I.T. Delhi Abstract Vibhav Gogate The University of Texas at Dallas Recently, there has been growing
More informationInference in Bayesian Networks
Andrea Passerini passerini@disi.unitn.it Machine Learning Inference in graphical models Description Assume we have evidence e on the state of a subset of variables E in the model (i.e. Bayesian Network)
More informationSection 6: PRISMATIC BEAMS. Beam Theory
Beam Theory There are two types of beam theory aailable to craft beam element formulations from. They are Bernoulli-Euler beam theory Timoshenko beam theory One learns the details of Bernoulli-Euler beam
More informationReview of Matrices and Vectors 1/45
Reiew of Matrices and Vectors /45 /45 Definition of Vector: A collection of comple or real numbers, generally put in a column [ ] T "! Transpose + + + b a b a b b a a " " " b a b a Definition of Vector
More informationOn the optimality of tree-reweighted max-product message-passing
Appeared in Uncertainty on Artificial Intelligence, July 2005, Edinburgh, Scotland. On the optimality of tree-reweighted max-product message-passing Vladimir Kolmogorov Microsoft Research Cambridge, UK
More information12 : Variational Inference I
10-708: Probabilistic Graphical Models, Spring 2015 12 : Variational Inference I Lecturer: Eric P. Xing Scribes: Fattaneh Jabbari, Eric Lei, Evan Shapiro 1 Introduction Probabilistic inference is one of
More informationProbabilistic Graphical Models
2016 Robert Nowak Probabilistic Graphical Models 1 Introduction We have focused mainly on linear models for signals, in particular the subspace model x = Uθ, where U is a n k matrix and θ R k is a vector
More informationA note on the primal-dual method for the semi-metric labeling problem
A note on the primal-dual method for the semi-metric labeling problem Vladimir Kolmogorov vnk@adastral.ucl.ac.uk Technical report June 4, 2007 Abstract Recently, Komodakis et al. [6] developed the FastPD
More informationLifted Relax, Compensate and then Recover: From Approximate to Exact Lifted Probabilistic Inference
Lifted Relax, Compensate and then Recover: From Approximate to Exact Lifted Probabilistic Inference Guy Van den Broeck Department of Computer Science KU Leuven guy.vandenbroeck@cs.kuleuven.be Arthur Choi
More informationPolynomial-Solvability of N P-class Problems
Polynomial-Solability of N P-class Problems Anatoly Panyuko paniukoa@susu.ru arxi:1409.03755 [cs.cc] 24 Aug 2018 Abstract LetHamiltoniancomplement ofthegraphg = (V(G), E(G))betheminimal cardinality set
More informationA Geometric Review of Linear Algebra
A Geometric Reiew of Linear Algebra The following is a compact reiew of the primary concepts of linear algebra. The order of presentation is unconentional, with emphasis on geometric intuition rather than
More informationDynamic Vehicle Routing with Moving Demands Part II: High speed demands or low arrival rates
Dynamic Vehicle Routing with Moing Demands Part II: High speed demands or low arrial rates Stephen L. Smith Shaunak D. Bopardikar Francesco Bullo João P. Hespanha Abstract In the companion paper we introduced
More informationLifted Marginal MAP Inference
Lifted Marginal MAP Inference Vishal Sharma 1, Noman Ahmed Sheikh 2, Happy Mittal 1, Vibhav Gogate 3 and Parag Singla 1 1 IIT Delhi, {vishal.sharma, happy.mittal, parags}@cse.iitd.ac.in 2 IIT Delhi, nomanahmedsheikh@outlook.com
More informationLifted Relax, Compensate and then Recover: From Approximate to Exact Lifted Probabilistic Inference
Lifted Relax, Compensate and then Recover: From Approximate to Exact Lifted Probabilistic Inference Guy Van den Broeck Department of Computer Science KU Leuven guy.vandenbroeck@cs.kuleuven.be Arthur Choi
More informationIntelligent Systems:
Intelligent Systems: Undirected Graphical models (Factor Graphs) (2 lectures) Carsten Rother 15/01/2015 Intelligent Systems: Probabilistic Inference in DGM and UGM Roadmap for next two lectures Definition
More informationA Geometric Review of Linear Algebra
A Geometric Reiew of Linear Algebra The following is a compact reiew of the primary concepts of linear algebra. I assume the reader is familiar with basic (i.e., high school) algebra and trigonometry.
More informationBayesian Learning in Undirected Graphical Models
Bayesian Learning in Undirected Graphical Models Zoubin Ghahramani Gatsby Computational Neuroscience Unit University College London, UK http://www.gatsby.ucl.ac.uk/ Work with: Iain Murray and Hyun-Chul
More informationSubproblem-Tree Calibration: A Unified Approach to Max-Product Message Passing
Subproblem-Tree Calibration: A Unified Approach to Max-Product Message Passing Huayan Wang huayanw@cs.stanford.edu Computer Science Department, Stanford University, Palo Alto, CA 90 USA Daphne Koller koller@cs.stanford.edu
More informationToward Caching Symmetrical Subtheories for Weighted Model Counting
Toward Caching Symmetrical Subtheories for Weighted Model Counting Tim Kopp University of Rochester Rochester, NY Parag Singla Indian Institute of Technology Hauz Khas, New Delhi Henry Kautz University
More informationThe Budgeted Minimum Cost Flow Problem with Unit Upgrading Cost
The Budgeted Minimum Cost Flow Problem with Unit Upgrading Cost Christina Büsing Sarah Kirchner Arie Koster Annika Thome October 6, 2015 Abstract The budgeted minimum cost flow problem (BMCF(K)) with unit
More informationMath Mathematical Notation
Math 160 - Mathematical Notation Purpose One goal in any course is to properly use the language of that subject. Finite Mathematics is no different and may often seem like a foreign language. These notations
More informationVariational algorithms for marginal MAP
Variational algorithms for marginal MAP Alexander Ihler UC Irvine CIOG Workshop November 2011 Variational algorithms for marginal MAP Alexander Ihler UC Irvine CIOG Workshop November 2011 Work with Qiang
More informationRelaxed Marginal Inference and its Application to Dependency Parsing
Relaxed Marginal Inference and its Application to Dependency Parsing Sebastian Riedel David A. Smith Department of Computer Science University of Massachusetts, Amherst {riedel,dasmith}@cs.umass.edu Abstract
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Lecture Notes Fall 2009 November, 2009 Byoung-Ta Zhang School of Computer Science and Engineering & Cognitive Science, Brain Science, and Bioinformatics Seoul National University
More informationJunction Tree, BP and Variational Methods
Junction Tree, BP and Variational Methods Adrian Weller MLSALT4 Lecture Feb 21, 2018 With thanks to David Sontag (MIT) and Tony Jebara (Columbia) for use of many slides and illustrations For more information,
More informationChapter 16. Structured Probabilistic Models for Deep Learning
Peng et al.: Deep Learning and Practice 1 Chapter 16 Structured Probabilistic Models for Deep Learning Peng et al.: Deep Learning and Practice 2 Structured Probabilistic Models way of using graphs to describe
More informationLinear Response for Approximate Inference
Linear Response for Approximate Inference Max Welling Department of Computer Science University of Toronto Toronto M5S 3G4 Canada welling@cs.utoronto.ca Yee Whye Teh Computer Science Division University
More informationWalk-Sum Interpretation and Analysis of Gaussian Belief Propagation
Walk-Sum Interpretation and Analysis of Gaussian Belief Propagation Jason K. Johnson, Dmitry M. Malioutov and Alan S. Willsky Department of Electrical Engineering and Computer Science Massachusetts Institute
More informationRelax then Compensate: On Max-Product Belief Propagation and More
Relax then Compensate: On Max-Product Belief Propagation and More Arthur Choi Computer Science Department University of California, Los Angeles Los Angeles, CA 995 aychoi@cs.ucla.edu Adnan Darwiche Computer
More informationAlgorithms for Reasoning with Probabilistic Graphical Models. Class 3: Approximate Inference. International Summer School on Deep Learning July 2017
Algorithms for Reasoning with Probabilistic Graphical Models Class 3: Approximate Inference International Summer School on Deep Learning July 2017 Prof. Rina Dechter Prof. Alexander Ihler Approximate Inference
More informationMax$Sum(( Exact(Inference(
Probabilis7c( Graphical( Models( Inference( MAP( Max$Sum(( Exact(Inference( Product Summation a 1 b 1 8 a 1 b 2 1 a 2 b 1 0.5 a 2 b 2 2 a 1 b 1 3 a 1 b 2 0 a 2 b 1-1 a 2 b 2 1 Max-Sum Elimination in Chains
More informationIntroduction to Machine Learning Midterm Exam
10-701 Introduction to Machine Learning Midterm Exam Instructors: Eric Xing, Ziv Bar-Joseph 17 November, 2015 There are 11 questions, for a total of 100 points. This exam is open book, open notes, but
More informationOn the Linear Threshold Model for Diffusion of Innovations in Multiplex Social Networks
On the Linear Threshold Model for Diffusion of Innoations in Multiplex Social Networks Yaofeng Desmond Zhong 1, Vaibha Sriastaa 2 and Naomi Ehrich Leonard 1 Abstract Diffusion of innoations in social networks
More informationLecture 6: Graphical Models: Learning
Lecture 6: Graphical Models: Learning 4F13: Machine Learning Zoubin Ghahramani and Carl Edward Rasmussen Department of Engineering, University of Cambridge February 3rd, 2010 Ghahramani & Rasmussen (CUED)
More informationExpectation Propagation in Factor Graphs: A Tutorial
DRAFT: Version 0.1, 28 October 2005. Do not distribute. Expectation Propagation in Factor Graphs: A Tutorial Charles Sutton October 28, 2005 Abstract Expectation propagation is an important variational
More informationVariance Reduction for Stochastic Gradient Optimization
Variance Reduction for Stochastic Gradient Optimization Chong Wang Xi Chen Alex Smola Eric P. Xing Carnegie Mellon Uniersity, Uniersity of California, Berkeley {chongw,xichen,epxing}@cs.cmu.edu alex@smola.org
More informationComputing Laboratory A GAME-BASED ABSTRACTION-REFINEMENT FRAMEWORK FOR MARKOV DECISION PROCESSES
Computing Laboratory A GAME-BASED ABSTRACTION-REFINEMENT FRAMEWORK FOR MARKOV DECISION PROCESSES Mark Kattenbelt Marta Kwiatkowska Gethin Norman Daid Parker CL-RR-08-06 Oxford Uniersity Computing Laboratory
More informationProbabilistic Graphical Models
School of Computer Science Probabilistic Graphical Models Variational Inference IV: Variational Principle II Junming Yin Lecture 17, March 21, 2012 X 1 X 1 X 1 X 1 X 2 X 3 X 2 X 2 X 3 X 3 Reading: X 4
More informationPattern Recognition and Machine Learning
Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability
More informationConcavity of reweighted Kikuchi approximation
Concavity of reweighted ikuchi approximation Po-Ling Loh Department of Statistics The Wharton School University of Pennsylvania loh@wharton.upenn.edu Andre Wibisono Computer Science Division University
More informationFeedback Loop between High Level Semantics and Low Level Vision
Feedback Loop between High Level Semantics and Low Level Vision Varun K. Nagaraja Vlad I. Morariu Larry S. Davis {varun,morariu,lsd}@umiacs.umd.edu University of Maryland, College Park, MD, USA. Abstract.
More information14 : Theory of Variational Inference: Inner and Outer Approximation
10-708: Probabilistic Graphical Models 10-708, Spring 2014 14 : Theory of Variational Inference: Inner and Outer Approximation Lecturer: Eric P. Xing Scribes: Yu-Hsin Kuo, Amos Ng 1 Introduction Last lecture
More informationA-level Mathematics. MM03 Mark scheme June Version 1.0: Final
-leel Mathematics MM0 Mark scheme 660 June 0 Version.0: Final Mark schemes are prepared by the Lead ssessment Writer and considered, together with the releant questions, by a panel of subject teachers.
More informationCSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18
CSE 417T: Introduction to Machine Learning Final Review Henry Chai 12/4/18 Overfitting Overfitting is fitting the training data more than is warranted Fitting noise rather than signal 2 Estimating! "#$
More informationConvexifying the Bethe Free Energy
402 MESHI ET AL. Convexifying the Bethe Free Energy Ofer Meshi Ariel Jaimovich Amir Globerson Nir Friedman School of Computer Science and Engineering Hebrew University, Jerusalem, Israel 91904 meshi,arielj,gamir,nir@cs.huji.ac.il
More informationDirected and Undirected Graphical Models
Directed and Undirected Davide Bacciu Dipartimento di Informatica Università di Pisa bacciu@di.unipi.it Machine Learning: Neural Networks and Advanced Models (AA2) Last Lecture Refresher Lecture Plan Directed
More informationBayesian Networks: Representation, Variable Elimination
Bayesian Networks: Representation, Variable Elimination CS 6375: Machine Learning Class Notes Instructor: Vibhav Gogate The University of Texas at Dallas We can view a Bayesian network as a compact representation
More informationIntroduction to Machine Learning Midterm Exam Solutions
10-701 Introduction to Machine Learning Midterm Exam Solutions Instructors: Eric Xing, Ziv Bar-Joseph 17 November, 2015 There are 11 questions, for a total of 100 points. This exam is open book, open notes,
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Brown University CSCI 295-P, Spring 213 Prof. Erik Sudderth Lecture 11: Inference & Learning Overview, Gaussian Graphical Models Some figures courtesy Michael Jordan s draft
More informationSTATISTICAL METHODS IN AI/ML Vibhav Gogate The University of Texas at Dallas. Bayesian networks: Representation
STATISTICAL METHODS IN AI/ML Vibhav Gogate The University of Texas at Dallas Bayesian networks: Representation Motivation Explicit representation of the joint distribution is unmanageable Computationally:
More informationInference and Representation
Inference and Representation David Sontag New York University Lecture 5, Sept. 30, 2014 David Sontag (NYU) Inference and Representation Lecture 5, Sept. 30, 2014 1 / 16 Today s lecture 1 Running-time of
More informationResidual migration in VTI media using anisotropy continuation
Stanford Exploration Project, Report SERGEY, Noember 9, 2000, pages 671?? Residual migration in VTI media using anisotropy continuation Tariq Alkhalifah Sergey Fomel 1 ABSTRACT We introduce anisotropy
More informationMachine Learning Summer School
Machine Learning Summer School Lecture 3: Learning parameters and structure Zoubin Ghahramani zoubin@eng.cam.ac.uk http://learning.eng.cam.ac.uk/zoubin/ Department of Engineering University of Cambridge,
More informationProgress In Electromagnetics Research Symposium 2005, Hangzhou, China, August
Progress In Electromagnetics Research Symposium 2005, Hangzhou, China, August 22-26 115 A Noel Technique for Localizing the Scatterer in Inerse Profiling of Two Dimensional Circularly Symmetric Dielectric
More informationInferning with High Girth Graphical Models
Uri Heinemann The Hebrew University of Jerusalem, Jerusalem, Israel Amir Globerson The Hebrew University of Jerusalem, Jerusalem, Israel URIHEI@CS.HUJI.AC.IL GAMIR@CS.HUJI.AC.IL Abstract Unsupervised learning
More informationAutomorphism Groups of Graphical Models and Lifted Variational Inference
Automorphism Groups of Graphical Models and Lifted Variational Inference Hung Hai Bui Natural Language Understanding Lab Nuance Communications bui.h.hung@gmail.com Tuyen N. Huynh Artificial Intelligence
More information