Probabilistic Graphical Models. Theory of Variational Inference: Inner and Outer Approximation. Lecture 15, March 4, 2013

Size: px
Start display at page:

Download "Probabilistic Graphical Models. Theory of Variational Inference: Inner and Outer Approximation. Lecture 15, March 4, 2013"

Transcription

1 School of Computer Science Probabilistic Graphical Models Theory of Variational Inference: Inner and Outer Approximation Junming Yin Lecture 15, March 4, 2013 Reading: W & J Book Chapters 1 Roadmap Two families of approximate inference algorithms Loopy belief propagation (sum-product) Mean-field approximation Are there some connections of these two approaches? We will re-exam them from a unified point of view based on the variational principle: Loop BP: outer approximation Mean-field: inner approximation 2 1

2 Variational Methods Variational : fancy name for optimization-based formulations i.e., represent the quantity of interest as the solution to an optimization problem approximate the desired solution by relaxing/approximating the intractable optimization problem Examples: Courant-Fischer for eigenvalues: Linear system of equations: variational formulation: x = arg min x for large system, apply conjugate gradient method λ max (A) = max x 2 =1 xt Ax Ax = b, A 0,x = A 1 b 1 2 xt Ax b T x 3 Inference Problems in Graphical Models Undirected graphical model (MRF): p(x) = 1 ψ C (x C ) Z The quantities of interest: marginal distributions: C C p(x i )= x j,j=i normalization constant (partition function): p(x) Question: how to represent these quantities in a variational form? Z Use tools from (1) exponential families; (2) convex analysis 4 2

3 Exponential Families Canonical parameterization Canonical Parameters Sufficient Statistics Log partition Function Log normalization constant: A(θ) = log exp{θ T φ(x)}dx it is a convex function (Prop 3.1) Effective canonical parameters: Graphical Models as Exponential Families Undirected graphical model (MRF): p(x; θ) = 1 Z(θ) ψ(x C ; θ C ) C C MRF in an exponential form: p(x; θ) =exp log ψ(x C ; θ C ) log Z(θ) C C log ψ(x C ; θ C ) can be written in a linear form after some parameterization 6 3

4 Example: Gaussian MRF Consider a zero-mean multivariate Gaussian distribution that respects the Markov property of a graph Hammersley-Clifford theorem states that the precision matrix Λ=Σ 1 also respects the graph structure Gaussian MRF in the exponential form 1 p(x) =exp Θ, xx T A(Θ), where Θ = Λ 2 Sufficient statistics are {x 2 s,s V ; x s x t, (s, t) E} 7 Example: Discrete MRF ts θ t (x t ) θ st (x s, x t ) θ 8 s (x s ) < 1 if x s = j Indicators: I j (x s ) = : 0 otherwise Parameters: θ s = {θ s;j, j X s } θ st = {θ st;jk, (j, k) X s X t } In exponential form p(x; θ) exp θ s;j I j (x s )+ θ st;jk I j (x s )I k (x t ) s V j (s,t) E 8 4

5 Why Exponential Families? Computing the expectation of sufficient statistics (mean parameters) given the canonical parameters yields the marginals µ s;j = E p [I j (X s )] = P[X s = j] j X s, edge ( ), we have µ st;jk = E p [I st;jk (X s,x t )] = P[X s = j,x t = k] (j,k) X s X t. (3.3 Computing the normalizer yields the log partition function log Z(θ) = A(θ) 9 Computing Mean Parameter: Bernoulli A single Bernoulli random variable X θ p(x; θ) =exp{θx A(θ)},x {0, 1},A(θ) = log(1 + e θ ) Inference = Computing the mean parameter µ(θ) =E θ [X] =1 p(x = 1; θ)+0 p(x = 0; θ) = 1+e θ eθ Want to do it in a variational manner: cast the procedure of computing mean (summation) in an optimization-based formulation 10 5

6 Conjugate Dual Function f(θ) Given any function, its conjugate dual function is: f (µ) =sup{θ, µ f(θ)} θ θ f(x) xy θµ (0, f (y)) µ θ x Conjugate dual is always a convex function: point-wise supremum of a class of linear functions 11 Dual of the Dual is the Original Under some technical condition on (convex and lower semicontinuous), the dual of dual is itself: f =(f ) f(θ) =sup{θ, µ f (µ)} µ f For log partition function A(θ) =sup{θ, µ A (µ)}, µ µ θ Ω The dual variable has a natural interpretation as the mean parameters 12 6

7 ˆ Computing Mean Parameter: Bernoulli The conjugate Stationary condition If µ µ (0, 1), θ(µ) = log 1 µ A (µ) := sup µθ log[1 + exp(θ)]. θ R µ = eθ 1+e θ (µ = A(θ)),A (µ) =µ log(µ)+(1 µ) log(1 µ) If µ [0, 1], A (µ) =+ 8 < We have A µ log µ + (1 µ) log(1 µ) if µ [0, 1] (µ) = : + otherwise.. The variational form: A(θ) = max µ [0,1] µ θ A (µ). The optimum is achieved at µ(θ) =. This is the mean! 1+e θ eθ 13 Remark The last few identities are not coincidental but rely on a deep theory in general exponential family. The dual function is the negative entropy function The mean parameter is restricted Solving the optimization returns the mean parameter and log partition function Next step: develop this framework for general exponential families/graphical models. However, Computing the conjugate dual (entropy) is in general intractable The constrain set of mean parameter is hard to characterize Hence we need approximation 14 7

8 Computation of Conjugate Dual Given an exponential family d { p(x 1,...,x m ; θ) =exp θ i φ i (x) A(θ) The dual function The stationary condition: Derivatives of A yields mean parameters A (θ) =E θ [φ i (X)] = φ i (x)p(x; θ) dx θ i The stationary condition becomes Question: for which µ R d does it have a solution θ(µ)? i=1 A (µ) := sup { µ, θ A(θ)} θ Ω µ A(θ) = 0. µ = E θ [φ(x)] 15 Computation of Conjugate Dual Let s assume there is a solution θ(µ) such that µ = E θ(u) [φ(x)] The dual has the form A (µ) = θ(µ),µ A(θ(µ)) = E θ(µ) [θ(µ), φ(x) A(θ(µ)] = E θ(µ) [log p(x; θ(µ)] The entropy is defined as H(p(x)) = p(x) log p(x) dx So the dual is A (µ) = H(p(x; θ(µ)) when there is a solution θ(µ) 16 8

9 Complexity of Computing Conjugate Dual The dual function is implicitly defined: µ = E θ [φ(x)] Solving the inverse mapping for canonical parameters θ(µ) is nontrivial Evaluating the negative entropy requires high-dimensional integration (summation) Question: for which µ R d does it have a solution θ(µ)? i.e., the domain of A (µ). the ones in marginal polytope! 17 Marginal Polytope φ(x) For any distribution p(x) and a set of sufficient statistics, define a vector of mean parameters µ i = E p [φ i (X)] = φ i (x)p(x) dx p(x) is not necessarily an exponential family The set of all realizable mean parameters M := {µ R d p s.t. E p [φ(x)] = µ}. It is a convex set e that a great deal hinges on the answers to For discrete exponential families, this is called marginal polytope 18 9

10 Convex Polytope Convex hull representation Half-plane representation Minkowski-Weyl Theorem: any non-empty convex polytope can be characterized by a finite collection of linear inequality constraints 19 Example: Two-node Ising Model ( ) Sufficient statistics: Mean parameters: ( φ(x) := x s,s V ; x s x t, (s,t) E ) R V + E. ciated µ s = mean E p [X s ]=P[X parameters s = 1] correspond for all s V, and to particular µ st = E p [X s X t ]=P[(X s,x t ) = (1,1)] for all (s,t) E. Two-node Ising model Convex hull representation { } } conv{(0,0,0),(1,0,0),(0,1,0),(1,1,1)}, X 1 X 2 Half-plane representation µ 1 µ 12 µ 2 µ 12 µ µ 12 µ 1 + µ

11 Marginal Polytope for General Graphs Still doable for connected binary graphs with 3 nodes: 16 constraints For tree graphical models, the number of half-planes (facet complexity) grows only linearly in the graph size General graphs? extremely hard to characterize the marginal polytope 21 Variational Principle (Theorem 3.4) The dual function takes the form A (µ) = { H(pθ(µ) ) if µ M + if µ/ M. θ(µ) satisfies µ = E θ(u) [φ(x)] The log partition function has the variational form A(θ) = sup{θ T µ A (µ)} µ M For all θ Ω, the above optimization problem is attained uniquely at µ(θ) M o that satisfies µ(θ) =E θ [φ(x)] 22 11

12 Example: Two-node Ising Model The distribution Sufficient statistics The marginal polytope is characterized by The dual has an explicit form p(x; θ) exp{θ 1 x 1 + θ 2 x 2 + θ 12 x 12 } X 1 X 2 µ 1 µ 12 µ 2 µ 12 µ µ 12 µ 1 + µ 2 A (µ) =µ 12 log µ 12 +(µ 1 µ 12 ) log(µ 1 µ 12 )+(µ 2 µ 12 ) log(µ 2 µ 12 ) +(1 + µ 12 µ 1 µ 2 ) log(1 + µ 12 µ 1 µ 2 ) The variational problem The optimum is attained at µ 1 (θ) = A(θ) = max {θ 1µ 1 + θ 2 µ 2 + θ 12 µ 12 A (µ)} {µ 1,µ 2,µ 12} M exp{θ 1 } +exp{θ 1 + θ 2 + θ 12 } 1+exp{θ 1 } +exp{θ 2 } +exp{θ 1 + θ 2 + θ 12 } 23 Variational Principle Exact variational formulation A(θ) = sup{θ T µ A (µ)} µ M M: the marginal polytope, difficult to characterize A : the negative entropy function, no explicit form Mean field method: non-convex inner bound and exact form of entropy Bethe approximation and loopy belief propagation: polyhedral outer bound and non-convex Bethe approximation 24 12

13 Mean Field Approximation 25 Tractable Subgraphs Definition: A subgraph F of the graph G is tractable if it is feasible to perform exact inference Example: 26 13

14 Mean Field Methods For an exponential family with sufficient statistics defined on graph G, the set of realizable mean parameter set M(G; φ) :={µ R d p s.t. E p [φ(x)] = µ} For a given tractable subgraph F, a subset of mean parameters of interest M(F ; φ) :={τ R d τ = E θ [φ(x)] for some θ Ω(F )} φ Inner approximation Mean field solves the relaxed problem A F = A MF (G) set M (G). As M(F ; φ) o M(G; φ) o max τ M F (G) {τ,θ A F (τ)} is the exact dual function restricted to M M F (G). involves 27 Example: Naïve Mean Field for Ising Model Ising model in {0,1} representation p(x) exp x s θ s + ( ) x s x t θ st s V (s,t) E ( Mean parameters ) iated µ s = mean E p [X s ]=P[X parameters s = 1] correspond for all s V, and to particular µ st = E p [X s X t ]=P[(X s,x t ) = (1,1)] for all (s,t) E For fully disconnected graph F, M F (G) :={τ R V + E 0 τ s 1, s V,τ st = τ s τ t, (s, t) E} The dual decomposes into sum, one for each node A F (τ) = [τ s log τ s +(1 τ s ) log(1 τ s )] s V 28 14

15 Example: Naïve Mean Field for Ising Model Mean field problem A(θ) max (τ 1,...,τ m) [0,1] m θ s τ s + s V The same objective function as in free energy based approach (s,t) E θ st τ s τ t A F (τ) The naïve mean field update equations τ s σ θ s + θ s τ t t N(s) Also yields lower bound on log partition function 29 Geometry of Mean Field Mean field optimization is always non-convex for any exponential family in which the state space is finite Recall the marginal polytope is a convex hull M(G) = conv{φ(e); e X m } M F (G) contains all the extreme points If it is a strict subset, then it must be non-convex X m φ(e) Example: two-node Ising model M F (G) ={0 τ 1 1, 0 τ 2 1,τ 12 = τ 1 τ 2 } τ 1 = τ 2 It has a parabolic cross section along, hence non-convex M 30 15

16 Bethe Approximation and Sum-Product 31 Sum-Product/Belief Propagation Algorithm Message passing rule: { ψ st (x s,x t)ψ t (x t) M ts (x s ) κ x t u N(t)/s re Marginals: 0 again denotes a normalization constant. µ s (x s ) = κψ s (x s ) Mts(x s ) t N(s) Exact for trees, but approximate for loopy graphs (so called loopy belief propagation) } M ut (x t) Question: How is the algorithm on trees related to variational principle? What is the algorithm doing for graphs with cycles? 32 16

17 Tree Graphical Models Discrete variables X s {0, 1,..., m s 1} on a tree T = (V, E) Sufficient statistics: I j (x s ) for s = 1,... n, P j X P s I jk (x s, x t ) for(s, t) E, (j, P k) X s X t Exponential representation of distribution: p(x; θ) exp X θ s (x s ) + X θ st (x s, x t ) Xs V (s,t) E X where this θ s(xrepresentation s) := P j X s θ s;jiis j(x useful? s) How is it related to infe (and similarly for θ st(x s, x t)) Mean parameters are marginal probabilities: µ s;j = E p [I j (X s )] = P[X s = j] j X s, µ s (x s )= µ s;j I j (x s )=P(X s = x s ) j X s edge ( ) µ st;jk, we = Ehave p [I st;jk (X s,x t )] = P[X s = j,x t = k] (j,k) X s X t. (3.3 µ st (x s,x t )= µ st;jk I jk (x s,x t )=P(X s = x s,x t = x t ) (j,k) X s X t 33 Marginal Polytope for Trees Recall marginal polytope for general graphs M(G) ={µ R d p with marginals µ s;j,µ st;jk } By junction tree theorem (see Prop. 2.1 & Prop. 4.1) M(T )= µ 0 µ s (x s )=1, µ st (x s,x t )=µ s (x s ) x s x t In particular, if µ M(T ), then p µ (x) := µ s (x s ) s V (s,t) E µ st (x s,x t ) µ s (x s )µ t (x t ). has the corresponding 0 := 0 in casesmarginals of zeros in the elements o 34 17

18 Decomposition of Entropy for Trees For trees, the entropy decomposes as H(p(x; µ)) = p(x; µ) log p(x; µ) x = µ s (x s ) log µ s (x s ) s V x s H s(µ s) µ st (x s,x t ) log µ st(x s,x t ) µ (s,t) E x s (x s )µ t (x t ) s,x t I st(µ st), KL-Divergence = H s (µ s ) I st (µ st ) s V (s,t) E The dual function has an explicit form A (µ) = H(p(x; µ)) 35 Exact Variational Principle for Trees Variational formulation A(θ) = max µ M(T ) θ,µ + H s (µ s ) I st (µ st ) s V (s,t) E P M Assign Lagrange multiplier λ ss for the normalization constraint C ss (µ) :=1 x s µ s (x s )=0 P ; and λ ts (x s ) for each marginalization constraint C ts (x s ; µ) :=µ s (x s ) x t µ st (x s,x t )=0 The Lagrangian has the form X L(µ, λ) =θ,µ + X H s (µ s ) I st (µ st ) + λ ss C ss (µ) s V (s,t) E s V + X ˆ X X λ st (x t )C st (x t ) + λ ts (x s )C ts (x s ) x t x s (s,t) E 36 18

19 Lagrangian Derivation Taking the derivatives of the Lagrangian w.r.t. and X L X = θ s (x s ) log µ s (x s ) + X λ ts (x s ) + C µ s (x s ) L µ st (x s, x t ) t N (s) µ s µ st = θ st (x s, x t ) log µ st(x s, x t ) µ s (x s )µ t (x t ) λ ts(x s ) λ st (x t ) + C Setting them to zeros yields µ s (x s ) exp{θ s (x s )} Y exp{λ ts (x s )} t N (s) M ts(x s) µ s (x s, x t ) exp θ s (x s ) + θ t (x t ) + θ st (x s, x t ) Y exp λ Y us (x s ) exp λ vt (x t ) u N (s)\t v N (t)\s 37 Lagrangian Derivation (continued) Adjusting the Lagrange multipliers or messages to enforce C ts (x s ; µ) :=µ s (x s ) x t µ st (x s,x t )=0 yields M ts (x s ) X exp θ Y t (x t ) + θ st (x s, x t ) M ut (x t ) x t u N (t)\s Conclusion: the message passing updates are a Lagrange method to solve the stationary condition of the variational formulation 38 19

20 BP on Arbitrary Graphs Two main difficulties of the variational formulation A(θ) = sup{θ T µ A (µ)} µ M The marginal polytope M is hard to characterize, so let s use the treebased outer bound L(G) = τ 0 τ s (x s )=1, τ st (x s,x t )=τ s (x s ) x s x t These locally consistent vectors τ are called pseudo-marginals. Exact entropy A (µ) lacks explicit form, so let s approximate it by the exact expression for trees A (τ) H Bethe (τ) := H s (τ s ) I st (τ st ). s V (s,t) E 39 Bethe Variational Problem (BVP) Combining these two ingredient leads to the Bethe variational problem (BVP): { max θ, τ + H s (τ s ) τ L(G) s V (s,t) E } I st (τ st ). A simple structured problem (differentiable & constraint set is a simple convex polytope) Loopy BP can be derived as am iterative method for solving a Lagrangian formulation of the BVP (Theorem 4.2); similar proof as for tree graphs 40 20

21 Geometry of BP Consider the following assignment of pseudo-marginals Can easily verify τ M(G) τ L(G) However, (need a bit more work) 2 Tree-based outer bound 1 3 For any graph, M(G) L(G) Equality holds if and only if the graph is a tree Question: does solution to the BVP ever fall into the gap? Yes, for any element of outer bound L(G), it is possible to construct a distribution with it as a BP fixed point (Wainwright et. al. 2003) 41 Inexactness of Bethe Entropy Approximation Consider a fully connected graph with µ s (x s )= [ ] for s =1,2,3,4 [ ] µ st (x s,x t )= (s,t) E It is globally valid: ; realized by the distribution that places mass 1/2 on each of configuration (0,0,0,0) and (1,1,1,1) τ M(G) H Bethe (µ) = 4log2 6log2 = 2log2 < 0, A (µ) = log2 >

22 Remark This connection provides a principled basis for applying the sum-product algorithm for loopy graphs However, Although there is always a fixed point of loopy BP, there is no guarantees on the convergence of the algorithm on loopy graphs The Bethe variational problem is usually non-convex. Therefore, there are no guarantees on the global optimum Generally, no guarantees that A Bethe (θ) is a lower bound of A(θ) lue ( ). mply Nevertheless, The connection and understanding suggest a number of avenues for improving upon the ordinary sum-product algorithm, via progressively better approximations to the entropy function and outer bounds on the marginal polytope (Kikuchi clustering) 43 Summary Variational methods in general turn inference into an optimization problem via exponential families and convex duality The exact variational principle is intractable to solve; there are two distinct components for approximations: Either inner or outer bound to the marginal polytope Various approximation to the entropy function Mean field: non-convex inner bound and exact form of entropy BP: polyhedral outer bound and non-convex Bethe approximation Kikuchi and variants: tighter polyhedral outer bounds and better entropy approximations (Yedidia et. al. 2002) 44 22

Probabilistic Graphical Models

Probabilistic Graphical Models School of Computer Science Probabilistic Graphical Models Variational Inference IV: Variational Principle II Junming Yin Lecture 17, March 21, 2012 X 1 X 1 X 1 X 1 X 2 X 3 X 2 X 2 X 3 X 3 Reading: X 4

More information

14 : Theory of Variational Inference: Inner and Outer Approximation

14 : Theory of Variational Inference: Inner and Outer Approximation 10-708: Probabilistic Graphical Models 10-708, Spring 2017 14 : Theory of Variational Inference: Inner and Outer Approximation Lecturer: Eric P. Xing Scribes: Maria Ryskina, Yen-Chia Hsu 1 Introduction

More information

Probabilistic Graphical Models

Probabilistic Graphical Models School of Computer Science Probabilistic Graphical Models Variational Inference III: Variational Principle I Junming Yin Lecture 16, March 19, 2012 X 1 X 1 X 1 X 1 X 2 X 3 X 2 X 2 X 3 X 3 Reading: X 4

More information

14 : Theory of Variational Inference: Inner and Outer Approximation

14 : Theory of Variational Inference: Inner and Outer Approximation 10-708: Probabilistic Graphical Models 10-708, Spring 2014 14 : Theory of Variational Inference: Inner and Outer Approximation Lecturer: Eric P. Xing Scribes: Yu-Hsin Kuo, Amos Ng 1 Introduction Last lecture

More information

Probabilistic Graphical Models

Probabilistic Graphical Models School of Computer Science Probabilistic Graphical Models Variational Inference II: Mean Field Method and Variational Principle Junming Yin Lecture 15, March 7, 2012 X 1 X 1 X 1 X 1 X 2 X 3 X 2 X 2 X 3

More information

Graphical models and message-passing Part II: Marginals and likelihoods

Graphical models and message-passing Part II: Marginals and likelihoods Graphical models and message-passing Part II: Marginals and likelihoods Martin Wainwright UC Berkeley Departments of Statistics, and EECS Tutorial materials (slides, monograph, lecture notes) available

More information

Discrete Markov Random Fields the Inference story. Pradeep Ravikumar

Discrete Markov Random Fields the Inference story. Pradeep Ravikumar Discrete Markov Random Fields the Inference story Pradeep Ravikumar Graphical Models, The History How to model stochastic processes of the world? I want to model the world, and I like graphs... 2 Mid to

More information

Probabilistic and Bayesian Machine Learning

Probabilistic and Bayesian Machine Learning Probabilistic and Bayesian Machine Learning Day 4: Expectation and Belief Propagation Yee Whye Teh ywteh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit University College London http://www.gatsby.ucl.ac.uk/

More information

A new class of upper bounds on the log partition function

A new class of upper bounds on the log partition function Accepted to IEEE Transactions on Information Theory This version: March 25 A new class of upper bounds on the log partition function 1 M. J. Wainwright, T. S. Jaakkola and A. S. Willsky Abstract We introduce

More information

CS Lecture 19. Exponential Families & Expectation Propagation

CS Lecture 19. Exponential Families & Expectation Propagation CS 6347 Lecture 19 Exponential Families & Expectation Propagation Discrete State Spaces We have been focusing on the case of MRFs over discrete state spaces Probability distributions over discrete spaces

More information

Junction Tree, BP and Variational Methods

Junction Tree, BP and Variational Methods Junction Tree, BP and Variational Methods Adrian Weller MLSALT4 Lecture Feb 21, 2018 With thanks to David Sontag (MIT) and Tony Jebara (Columbia) for use of many slides and illustrations For more information,

More information

CSE 254: MAP estimation via agreement on (hyper)trees: Message-passing and linear programming approaches

CSE 254: MAP estimation via agreement on (hyper)trees: Message-passing and linear programming approaches CSE 254: MAP estimation via agreement on (hyper)trees: Message-passing and linear programming approaches A presentation by Evan Ettinger November 11, 2005 Outline Introduction Motivation and Background

More information

13 : Variational Inference: Loopy Belief Propagation

13 : Variational Inference: Loopy Belief Propagation 10-708: Probabilistic Graphical Models 10-708, Spring 2014 13 : Variational Inference: Loopy Belief Propagation Lecturer: Eric P. Xing Scribes: Rajarshi Das, Zhengzhong Liu, Dishan Gupta 1 Introduction

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Brown University CSCI 295-P, Spring 213 Prof. Erik Sudderth Lecture 11: Inference & Learning Overview, Gaussian Graphical Models Some figures courtesy Michael Jordan s draft

More information

MAP estimation via agreement on (hyper)trees: Message-passing and linear programming approaches

MAP estimation via agreement on (hyper)trees: Message-passing and linear programming approaches MAP estimation via agreement on (hyper)trees: Message-passing and linear programming approaches Martin Wainwright Tommi Jaakkola Alan Willsky martinw@eecs.berkeley.edu tommi@ai.mit.edu willsky@mit.edu

More information

High dimensional Ising model selection

High dimensional Ising model selection High dimensional Ising model selection Pradeep Ravikumar UT Austin (based on work with John Lafferty, Martin Wainwright) Sparse Ising model US Senate 109th Congress Banerjee et al, 2008 Estimate a sparse

More information

Exact MAP estimates by (hyper)tree agreement

Exact MAP estimates by (hyper)tree agreement Exact MAP estimates by (hyper)tree agreement Martin J. Wainwright, Department of EECS, UC Berkeley, Berkeley, CA 94720 martinw@eecs.berkeley.edu Tommi S. Jaakkola and Alan S. Willsky, Department of EECS,

More information

13: Variational inference II

13: Variational inference II 10-708: Probabilistic Graphical Models, Spring 2015 13: Variational inference II Lecturer: Eric P. Xing Scribes: Ronghuo Zheng, Zhiting Hu, Yuntian Deng 1 Introduction We started to talk about variational

More information

Semidefinite relaxations for approximate inference on graphs with cycles

Semidefinite relaxations for approximate inference on graphs with cycles Semidefinite relaxations for approximate inference on graphs with cycles Martin J. Wainwright Electrical Engineering and Computer Science UC Berkeley, Berkeley, CA 94720 wainwrig@eecs.berkeley.edu Michael

More information

13 : Variational Inference: Loopy Belief Propagation and Mean Field

13 : Variational Inference: Loopy Belief Propagation and Mean Field 10-708: Probabilistic Graphical Models 10-708, Spring 2012 13 : Variational Inference: Loopy Belief Propagation and Mean Field Lecturer: Eric P. Xing Scribes: Peter Schulam and William Wang 1 Introduction

More information

Inconsistent parameter estimation in Markov random fields: Benefits in the computation-limited setting

Inconsistent parameter estimation in Markov random fields: Benefits in the computation-limited setting Inconsistent parameter estimation in Markov random fields: Benefits in the computation-limited setting Martin J. Wainwright Department of Statistics, and Department of Electrical Engineering and Computer

More information

UNDERSTANDING BELIEF PROPOGATION AND ITS GENERALIZATIONS

UNDERSTANDING BELIEF PROPOGATION AND ITS GENERALIZATIONS UNDERSTANDING BELIEF PROPOGATION AND ITS GENERALIZATIONS JONATHAN YEDIDIA, WILLIAM FREEMAN, YAIR WEISS 2001 MERL TECH REPORT Kristin Branson and Ian Fasel June 11, 2003 1. Inference Inference problems

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Brown University CSCI 2950-P, Spring 2013 Prof. Erik Sudderth Lecture 12: Gaussian Belief Propagation, State Space Models and Kalman Filters Guest Kalman Filter Lecture by

More information

Variational Inference (11/04/13)

Variational Inference (11/04/13) STA561: Probabilistic machine learning Variational Inference (11/04/13) Lecturer: Barbara Engelhardt Scribes: Matt Dickenson, Alireza Samany, Tracy Schifeling 1 Introduction In this lecture we will further

More information

Dual Decomposition for Marginal Inference

Dual Decomposition for Marginal Inference Dual Decomposition for Marginal Inference Justin Domke Rochester Institute of echnology Rochester, NY 14623 Abstract We present a dual decomposition approach to the treereweighted belief propagation objective.

More information

Bayesian Machine Learning - Lecture 7

Bayesian Machine Learning - Lecture 7 Bayesian Machine Learning - Lecture 7 Guido Sanguinetti Institute for Adaptive and Neural Computation School of Informatics University of Edinburgh gsanguin@inf.ed.ac.uk March 4, 2015 Today s lecture 1

More information

CS Lecture 8 & 9. Lagrange Multipliers & Varitional Bounds

CS Lecture 8 & 9. Lagrange Multipliers & Varitional Bounds CS 6347 Lecture 8 & 9 Lagrange Multipliers & Varitional Bounds General Optimization subject to: min ff 0() R nn ff ii 0, h ii = 0, ii = 1,, mm ii = 1,, pp 2 General Optimization subject to: min ff 0()

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Lecture 9: Variational Inference Relaxations Volkan Cevher, Matthias Seeger Ecole Polytechnique Fédérale de Lausanne 24/10/2011 (EPFL) Graphical Models 24/10/2011 1 / 15

More information

Lecture 9: PGM Learning

Lecture 9: PGM Learning 13 Oct 2014 Intro. to Stats. Machine Learning COMP SCI 4401/7401 Table of Contents I Learning parameters in MRFs 1 Learning parameters in MRFs Inference and Learning Given parameters (of potentials) and

More information

17 Variational Inference

17 Variational Inference Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science 6.438 Algorithms for Inference Fall 2014 17 Variational Inference Prompted by loopy graphs for which exact

More information

Does Better Inference mean Better Learning?

Does Better Inference mean Better Learning? Does Better Inference mean Better Learning? Andrew E. Gelfand, Rina Dechter & Alexander Ihler Department of Computer Science University of California, Irvine {agelfand,dechter,ihler}@ics.uci.edu Abstract

More information

Fisher Information in Gaussian Graphical Models

Fisher Information in Gaussian Graphical Models Fisher Information in Gaussian Graphical Models Jason K. Johnson September 21, 2006 Abstract This note summarizes various derivations, formulas and computational algorithms relevant to the Fisher information

More information

arxiv: v1 [stat.ml] 26 Feb 2013

arxiv: v1 [stat.ml] 26 Feb 2013 Variational Algorithms for Marginal MAP Qiang Liu Donald Bren School of Information and Computer Sciences University of California, Irvine Irvine, CA, 92697-3425, USA qliu1@uci.edu arxiv:1302.6584v1 [stat.ml]

More information

Message-Passing Algorithms for GMRFs and Non-Linear Optimization

Message-Passing Algorithms for GMRFs and Non-Linear Optimization Message-Passing Algorithms for GMRFs and Non-Linear Optimization Jason Johnson Joint Work with Dmitry Malioutov, Venkat Chandrasekaran and Alan Willsky Stochastic Systems Group, MIT NIPS Workshop: Approximate

More information

Probabilistic Graphical Models

Probabilistic Graphical Models 2016 Robert Nowak Probabilistic Graphical Models 1 Introduction We have focused mainly on linear models for signals, in particular the subspace model x = Uθ, where U is a n k matrix and θ R k is a vector

More information

Recitation 9: Loopy BP

Recitation 9: Loopy BP Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science 6.438 Algorithms For Inference Fall 204 Recitation 9: Loopy BP General Comments. In terms of implementation,

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Lecture 11 CRFs, Exponential Family CS/CNS/EE 155 Andreas Krause Announcements Homework 2 due today Project milestones due next Monday (Nov 9) About half the work should

More information

12 : Variational Inference I

12 : Variational Inference I 10-708: Probabilistic Graphical Models, Spring 2015 12 : Variational Inference I Lecturer: Eric P. Xing Scribes: Fattaneh Jabbari, Eric Lei, Evan Shapiro 1 Introduction Probabilistic inference is one of

More information

Learning MN Parameters with Approximation. Sargur Srihari

Learning MN Parameters with Approximation. Sargur Srihari Learning MN Parameters with Approximation Sargur srihari@cedar.buffalo.edu 1 Topics Iterative exact learning of MN parameters Difficulty with exact methods Approximate methods Approximate Inference Belief

More information

On the Convergence of the Convex Relaxation Method and Distributed Optimization of Bethe Free Energy

On the Convergence of the Convex Relaxation Method and Distributed Optimization of Bethe Free Energy On the Convergence of the Convex Relaxation Method and Distributed Optimization of Bethe Free Energy Ming Su Department of Electrical Engineering and Department of Statistics University of Washington,

More information

Expectation Consistent Free Energies for Approximate Inference

Expectation Consistent Free Energies for Approximate Inference Expectation Consistent Free Energies for Approximate Inference Manfred Opper ISIS School of Electronics and Computer Science University of Southampton SO17 1BJ, United Kingdom mo@ecs.soton.ac.uk Ole Winther

More information

Inference as Optimization

Inference as Optimization Inference as Optimization Sargur Srihari srihari@cedar.buffalo.edu 1 Topics in Inference as Optimization Overview Exact Inference revisited The Energy Functional Optimizing the Energy Functional 2 Exact

More information

Introduction to graphical models: Lecture III

Introduction to graphical models: Lecture III Introduction to graphical models: Lecture III Martin Wainwright UC Berkeley Departments of Statistics, and EECS Martin Wainwright (UC Berkeley) Some introductory lectures January 2013 1 / 25 Introduction

More information

Expectation Propagation in Factor Graphs: A Tutorial

Expectation Propagation in Factor Graphs: A Tutorial DRAFT: Version 0.1, 28 October 2005. Do not distribute. Expectation Propagation in Factor Graphs: A Tutorial Charles Sutton October 28, 2005 Abstract Expectation propagation is an important variational

More information

Variational Algorithms for Marginal MAP

Variational Algorithms for Marginal MAP Variational Algorithms for Marginal MAP Qiang Liu Department of Computer Science University of California, Irvine Irvine, CA, 92697 qliu1@ics.uci.edu Alexander Ihler Department of Computer Science University

More information

3 : Representation of Undirected GM

3 : Representation of Undirected GM 10-708: Probabilistic Graphical Models 10-708, Spring 2016 3 : Representation of Undirected GM Lecturer: Eric P. Xing Scribes: Longqi Cai, Man-Chia Chang 1 MRF vs BN There are two types of graphical models:

More information

Variational algorithms for marginal MAP

Variational algorithms for marginal MAP Variational algorithms for marginal MAP Alexander Ihler UC Irvine CIOG Workshop November 2011 Variational algorithms for marginal MAP Alexander Ihler UC Irvine CIOG Workshop November 2011 Work with Qiang

More information

CS Lecture 13. More Maximum Likelihood

CS Lecture 13. More Maximum Likelihood CS 6347 Lecture 13 More Maxiu Likelihood Recap Last tie: Introduction to axiu likelihood estiation MLE for Bayesian networks Optial CPTs correspond to epirical counts Today: MLE for CRFs 2 Maxiu Likelihood

More information

Graphs with Cycles: A Summary of Martin Wainwright s Ph.D. Thesis

Graphs with Cycles: A Summary of Martin Wainwright s Ph.D. Thesis Graphs with Cycles: A Summary of Martin Wainwright s Ph.D. Thesis Constantine Caramanis November 6, 2002 This paper is a (partial) summary of chapters 2,5 and 7 of Martin Wainwright s Ph.D. Thesis. For

More information

A graph contains a set of nodes (vertices) connected by links (edges or arcs)

A graph contains a set of nodes (vertices) connected by links (edges or arcs) BOLTZMANN MACHINES Generative Models Graphical Models A graph contains a set of nodes (vertices) connected by links (edges or arcs) In a probabilistic graphical model, each node represents a random variable,

More information

CSC 412 (Lecture 4): Undirected Graphical Models

CSC 412 (Lecture 4): Undirected Graphical Models CSC 412 (Lecture 4): Undirected Graphical Models Raquel Urtasun University of Toronto Feb 2, 2016 R Urtasun (UofT) CSC 412 Feb 2, 2016 1 / 37 Today Undirected Graphical Models: Semantics of the graph:

More information

Lecture 8 Plus properties, merit functions and gap functions. September 28, 2008

Lecture 8 Plus properties, merit functions and gap functions. September 28, 2008 Lecture 8 Plus properties, merit functions and gap functions September 28, 2008 Outline Plus-properties and F-uniqueness Equation reformulations of VI/CPs Merit functions Gap merit functions FP-I book:

More information

Concavity of reweighted Kikuchi approximation

Concavity of reweighted Kikuchi approximation Concavity of reweighted ikuchi approximation Po-Ling Loh Department of Statistics The Wharton School University of Pennsylvania loh@wharton.upenn.edu Andre Wibisono Computer Science Division University

More information

Bayesian Learning in Undirected Graphical Models

Bayesian Learning in Undirected Graphical Models Bayesian Learning in Undirected Graphical Models Zoubin Ghahramani Gatsby Computational Neuroscience Unit University College London, UK http://www.gatsby.ucl.ac.uk/ Work with: Iain Murray and Hyun-Chul

More information

Machine Learning 4771

Machine Learning 4771 Machine Learning 4771 Instructor: Tony Jebara Topic 16 Undirected Graphs Undirected Separation Inferring Marginals & Conditionals Moralization Junction Trees Triangulation Undirected Graphs Separation

More information

Approximating the Partition Function by Deleting and then Correcting for Model Edges (Extended Abstract)

Approximating the Partition Function by Deleting and then Correcting for Model Edges (Extended Abstract) Approximating the Partition Function by Deleting and then Correcting for Model Edges (Extended Abstract) Arthur Choi and Adnan Darwiche Computer Science Department University of California, Los Angeles

More information

Linear Response for Approximate Inference

Linear Response for Approximate Inference Linear Response for Approximate Inference Max Welling Department of Computer Science University of Toronto Toronto M5S 3G4 Canada welling@cs.utoronto.ca Yee Whye Teh Computer Science Division University

More information

Minimizing D(Q,P) def = Q(h)

Minimizing D(Q,P) def = Q(h) Inference Lecture 20: Variational Metods Kevin Murpy 29 November 2004 Inference means computing P( i v), were are te idden variables v are te visible variables. For discrete (eg binary) idden nodes, exact

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models David Sontag New York University Lecture 6, March 7, 2013 David Sontag (NYU) Graphical Models Lecture 6, March 7, 2013 1 / 25 Today s lecture 1 Dual decomposition 2 MAP inference

More information

Lecture 3: Lagrangian duality and algorithms for the Lagrangian dual problem

Lecture 3: Lagrangian duality and algorithms for the Lagrangian dual problem Lecture 3: Lagrangian duality and algorithms for the Lagrangian dual problem Michael Patriksson 0-0 The Relaxation Theorem 1 Problem: find f := infimum f(x), x subject to x S, (1a) (1b) where f : R n R

More information

Graphical Models and Kernel Methods

Graphical Models and Kernel Methods Graphical Models and Kernel Methods Jerry Zhu Department of Computer Sciences University of Wisconsin Madison, USA MLSS June 17, 2014 1 / 123 Outline Graphical Models Probabilistic Inference Directed vs.

More information

On the optimality of tree-reweighted max-product message-passing

On the optimality of tree-reweighted max-product message-passing Appeared in Uncertainty on Artificial Intelligence, July 2005, Edinburgh, Scotland. On the optimality of tree-reweighted max-product message-passing Vladimir Kolmogorov Microsoft Research Cambridge, UK

More information

Quadratic Programming Relaxations for Metric Labeling and Markov Random Field MAP Estimation

Quadratic Programming Relaxations for Metric Labeling and Markov Random Field MAP Estimation Quadratic Programming Relaations for Metric Labeling and Markov Random Field MAP Estimation Pradeep Ravikumar John Lafferty School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213,

More information

High-dimensional graphical model selection: Practical and information-theoretic limits

High-dimensional graphical model selection: Practical and information-theoretic limits 1 High-dimensional graphical model selection: Practical and information-theoretic limits Martin Wainwright Departments of Statistics, and EECS UC Berkeley, California, USA Based on joint work with: John

More information

Walk-Sum Interpretation and Analysis of Gaussian Belief Propagation

Walk-Sum Interpretation and Analysis of Gaussian Belief Propagation Walk-Sum Interpretation and Analysis of Gaussian Belief Propagation Jason K. Johnson, Dmitry M. Malioutov and Alan S. Willsky Department of Electrical Engineering and Computer Science Massachusetts Institute

More information

Expectation Propagation Algorithm

Expectation Propagation Algorithm Expectation Propagation Algorithm 1 Shuang Wang School of Electrical and Computer Engineering University of Oklahoma, Tulsa, OK, 74135 Email: {shuangwang}@ou.edu This note contains three parts. First,

More information

Fractional Belief Propagation

Fractional Belief Propagation Fractional Belief Propagation im iegerinck and Tom Heskes S, niversity of ijmegen Geert Grooteplein 21, 6525 EZ, ijmegen, the etherlands wimw,tom @snn.kun.nl Abstract e consider loopy belief propagation

More information

Inference in Bayesian Networks

Inference in Bayesian Networks Andrea Passerini passerini@disi.unitn.it Machine Learning Inference in graphical models Description Assume we have evidence e on the state of a subset of variables E in the model (i.e. Bayesian Network)

More information

Probabilistic Graphical Models (I)

Probabilistic Graphical Models (I) Probabilistic Graphical Models (I) Hongxin Zhang zhx@cad.zju.edu.cn State Key Lab of CAD&CG, ZJU 2015-03-31 Probabilistic Graphical Models Modeling many real-world problems => a large number of random

More information

Inference and Representation

Inference and Representation Inference and Representation David Sontag New York University Lecture 5, Sept. 30, 2014 David Sontag (NYU) Inference and Representation Lecture 5, Sept. 30, 2014 1 / 16 Today s lecture 1 Running-time of

More information

Bayesian Learning in Undirected Graphical Models

Bayesian Learning in Undirected Graphical Models Bayesian Learning in Undirected Graphical Models Zoubin Ghahramani Gatsby Computational Neuroscience Unit University College London, UK http://www.gatsby.ucl.ac.uk/ and Center for Automated Learning and

More information

Sparse Graph Learning via Markov Random Fields

Sparse Graph Learning via Markov Random Fields Sparse Graph Learning via Markov Random Fields Xin Sui, Shao Tang Sep 23, 2016 Xin Sui, Shao Tang Sparse Graph Learning via Markov Random Fields Sep 23, 2016 1 / 36 Outline 1 Introduction to graph learning

More information

Tightness of LP Relaxations for Almost Balanced Models

Tightness of LP Relaxations for Almost Balanced Models Tightness of LP Relaxations for Almost Balanced Models Adrian Weller University of Cambridge AISTATS May 10, 2016 Joint work with Mark Rowland and David Sontag For more information, see http://mlg.eng.cam.ac.uk/adrian/

More information

6.867 Machine learning, lecture 23 (Jaakkola)

6.867 Machine learning, lecture 23 (Jaakkola) Lecture topics: Markov Random Fields Probabilistic inference Markov Random Fields We will briefly go over undirected graphical models or Markov Random Fields (MRFs) as they will be needed in the context

More information

Preconditioner Approximations for Probabilistic Graphical Models

Preconditioner Approximations for Probabilistic Graphical Models Preconditioner Approximations for Probabilistic Graphical Models Pradeep Ravikumar John Lafferty School of Computer Science Carnegie Mellon University Abstract We present a family of approximation techniques

More information

LINK scheduling algorithms based on CSMA have received

LINK scheduling algorithms based on CSMA have received Efficient CSMA using Regional Free Energy Approximations Peruru Subrahmanya Swamy, Venkata Pavan Kumar Bellam, Radha Krishna Ganti, and Krishna Jagannathan arxiv:.v [cs.ni] Feb Abstract CSMA Carrier Sense

More information

Information Geometric view of Belief Propagation

Information Geometric view of Belief Propagation Information Geometric view of Belief Propagation Yunshu Liu 2013-10-17 References: [1]. Shiro Ikeda, Toshiyuki Tanaka and Shun-ichi Amari, Stochastic reasoning, Free energy and Information Geometry, Neural

More information

Solving Dual Problems

Solving Dual Problems Lecture 20 Solving Dual Problems We consider a constrained problem where, in addition to the constraint set X, there are also inequality and linear equality constraints. Specifically the minimization problem

More information

High-dimensional graphical model selection: Practical and information-theoretic limits

High-dimensional graphical model selection: Practical and information-theoretic limits 1 High-dimensional graphical model selection: Practical and information-theoretic limits Martin Wainwright Departments of Statistics, and EECS UC Berkeley, California, USA Based on joint work with: John

More information

Dual Decomposition for Inference

Dual Decomposition for Inference Dual Decomposition for Inference Yunshu Liu ASPITRG Research Group 2014-05-06 References: [1]. D. Sontag, A. Globerson and T. Jaakkola, Introduction to Dual Decomposition for Inference, Optimization for

More information

Bounding the Partition Function using Hölder s Inequality

Bounding the Partition Function using Hölder s Inequality Qiang Liu Alexander Ihler Department of Computer Science, University of California, Irvine, CA, 92697, USA qliu1@uci.edu ihler@ics.uci.edu Abstract We describe an algorithm for approximate inference in

More information

On John type ellipsoids

On John type ellipsoids On John type ellipsoids B. Klartag Tel Aviv University Abstract Given an arbitrary convex symmetric body K R n, we construct a natural and non-trivial continuous map u K which associates ellipsoids to

More information

Message-passing, convex relaxations, and all that. Graphical models and variational methods: Introduction

Message-passing, convex relaxations, and all that. Graphical models and variational methods: Introduction Graphical models and variational methods: Message-passing, convex relaxations, and all that Martin Wainwright Department of Statistics, and Department of Electrical Engineering and Computer Science, UC

More information

Probabilistic Graphical Models Lecture Notes Fall 2009

Probabilistic Graphical Models Lecture Notes Fall 2009 Probabilistic Graphical Models Lecture Notes Fall 2009 October 28, 2009 Byoung-Tak Zhang School of omputer Science and Engineering & ognitive Science, Brain Science, and Bioinformatics Seoul National University

More information

5. Duality. Lagrangian

5. Duality. Lagrangian 5. Duality Convex Optimization Boyd & Vandenberghe Lagrange dual problem weak and strong duality geometric interpretation optimality conditions perturbation and sensitivity analysis examples generalized

More information

Modeling and Estimation in Gaussian Graphical Models: Maximum-Entropy Methods and Walk-Sum Analysis. Venkat Chandrasekaran

Modeling and Estimation in Gaussian Graphical Models: Maximum-Entropy Methods and Walk-Sum Analysis. Venkat Chandrasekaran Modeling and Estimation in Gaussian Graphical Models: Maximum-Entropy Methods and Walk-Sum Analysis by Venkat Chandrasekaran Submitted to the Department of Electrical Engineering and Computer Science in

More information

Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science Algorithms For Inference Fall 2014

Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science Algorithms For Inference Fall 2014 Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science 6.438 Algorithms For Inference Fall 2014 Problem Set 3 Issued: Thursday, September 25, 2014 Due: Thursday,

More information

UNIVERSITY OF CALIFORNIA, IRVINE. Variational Message-Passing: Extension to Continuous Variables and Applications in Multi-Target Tracking

UNIVERSITY OF CALIFORNIA, IRVINE. Variational Message-Passing: Extension to Continuous Variables and Applications in Multi-Target Tracking UNIVERSITY OF CALIFORNIA, IRVINE Variational Message-Passing: Extension to Continuous Variables and Applications in Multi-Target Tracking DISSERTATION submitted in partial satisfaction of the requirements

More information

Extreme Abridgment of Boyd and Vandenberghe s Convex Optimization

Extreme Abridgment of Boyd and Vandenberghe s Convex Optimization Extreme Abridgment of Boyd and Vandenberghe s Convex Optimization Compiled by David Rosenberg Abstract Boyd and Vandenberghe s Convex Optimization book is very well-written and a pleasure to read. The

More information

10 Numerical methods for constrained problems

10 Numerical methods for constrained problems 10 Numerical methods for constrained problems min s.t. f(x) h(x) = 0 (l), g(x) 0 (m), x X The algorithms can be roughly divided the following way: ˆ primal methods: find descent direction keeping inside

More information

Lecture 13 : Variational Inference: Mean Field Approximation

Lecture 13 : Variational Inference: Mean Field Approximation 10-708: Probabilistic Graphical Models 10-708, Spring 2017 Lecture 13 : Variational Inference: Mean Field Approximation Lecturer: Willie Neiswanger Scribes: Xupeng Tong, Minxing Liu 1 Problem Setup 1.1

More information

Convexifying the Bethe Free Energy

Convexifying the Bethe Free Energy 402 MESHI ET AL. Convexifying the Bethe Free Energy Ofer Meshi Ariel Jaimovich Amir Globerson Nir Friedman School of Computer Science and Engineering Hebrew University, Jerusalem, Israel 91904 meshi,arielj,gamir,nir@cs.huji.ac.il

More information

Integer Programming ISE 418. Lecture 12. Dr. Ted Ralphs

Integer Programming ISE 418. Lecture 12. Dr. Ted Ralphs Integer Programming ISE 418 Lecture 12 Dr. Ted Ralphs ISE 418 Lecture 12 1 Reading for This Lecture Nemhauser and Wolsey Sections II.2.1 Wolsey Chapter 9 ISE 418 Lecture 12 2 Generating Stronger Valid

More information

Structured Variational Inference

Structured Variational Inference Structured Variational Inference Sargur srihari@cedar.buffalo.edu 1 Topics 1. Structured Variational Approximations 1. The Mean Field Approximation 1. The Mean Field Energy 2. Maximizing the energy functional:

More information

Extended Version of Expectation propagation for approximate inference in dynamic Bayesian networks

Extended Version of Expectation propagation for approximate inference in dynamic Bayesian networks Extended Version of Expectation propagation for approximate inference in dynamic Bayesian networks Tom Heskes & Onno Zoeter SNN, University of Nijmegen Geert Grooteplein 21, 6525 EZ, Nijmegen, The Netherlands

More information

Message passing and approximate message passing

Message passing and approximate message passing Message passing and approximate message passing Arian Maleki Columbia University 1 / 47 What is the problem? Given pdf µ(x 1, x 2,..., x n ) we are interested in arg maxx1,x 2,...,x n µ(x 1, x 2,..., x

More information

UC Berkeley Department of Electrical Engineering and Computer Science Department of Statistics. EECS 281A / STAT 241A Statistical Learning Theory

UC Berkeley Department of Electrical Engineering and Computer Science Department of Statistics. EECS 281A / STAT 241A Statistical Learning Theory UC Berkeley Department of Electrical Engineering and Computer Science Department of Statistics EECS 281A / STAT 241A Statistical Learning Theory Solutions to Problem Set 2 Fall 2011 Issued: Wednesday,

More information

High dimensional ising model selection using l 1 -regularized logistic regression

High dimensional ising model selection using l 1 -regularized logistic regression High dimensional ising model selection using l 1 -regularized logistic regression 1 Department of Statistics Pennsylvania State University 597 Presentation 2016 1/29 Outline Introduction 1 Introduction

More information

Lecture 4 October 18th

Lecture 4 October 18th Directed and undirected graphical models Fall 2017 Lecture 4 October 18th Lecturer: Guillaume Obozinski Scribe: In this lecture, we will assume that all random variables are discrete, to keep notations

More information

Chapter 20. Deep Generative Models

Chapter 20. Deep Generative Models Peng et al.: Deep Learning and Practice 1 Chapter 20 Deep Generative Models Peng et al.: Deep Learning and Practice 2 Generative Models Models that are able to Provide an estimate of the probability distribution

More information