Graphical models and message-passing Part II: Marginals and likelihoods

Size: px
Start display at page:

Download "Graphical models and message-passing Part II: Marginals and likelihoods"

Transcription

1 Graphical models and message-passing Part II: Marginals and likelihoods Martin Wainwright UC Berkeley Departments of Statistics, and EECS Tutorial materials (slides, monograph, lecture notes) available at: wainwrig/kyoto12 September 3, 2012 Martin Wainwright (UC Berkeley) Graphical models and message-passing September 3, / 23

2 Graphs and factorization 2 ψ ψ v 1 5 ψ clique C is a fully connected subset of vertices compatibility function ψ C defined on variables x C = {x s,s C} factorization over all cliques p(x 1,...,x N ) = 1 Z ψ C (x C ). C C Martin Wainwright (UC Berkeley) Graphical models and message-passing September 3, / 23

3 Core computational challenges Given an undirected graphical model (Markov random field): How to efficiently compute? p(x 1,x 2,...,x N) = 1 Z most probable configuration (MAP estimate): Maximize : ψ C(x C) C C x = arg max p(x1,...,xn) = arg max ψ x XN C(x C). x X N C C the data likelihood or normalization constant Sum/integrate : Z = x X N C C ψ C(x C) marginal distributions at single sites, or subsets: Sum/integrate : p(x s = x s) = 1 Z x t, t s C C ψ C(x C) Martin Wainwright (UC Berkeley) Graphical models and message-passing September 3, / 23

4 1. Sum-product message-passing on trees Goal: Compute marginal distribution at node u on a tree: x = arg max exp(θ s (x s ) exp(θ st (x s,x t )) x X N. s V (s,t) E M 12 M x 1,x 2,x 3 p(x) = [ exp(θ 1 (x 1 )) { x 2 t 1,3 x t exp[θ t (x t )+θ 2t (x 2,x t )] }]

5 Putting together the pieces Sum-product is an exact algorithm for any tree. T w w T u u M ut M wt t s M ts M ts message from node t to s N(t) neighbors of node t M vt v T v Update: M ts(x s) { [ ] exp θ st(x s,x t) + θ t(x } t) M vt(x t) x t X t v N(t)\s Sum-marginals: p s(x s;θ) exp{θ s(x s)} t N(s) Mts(xs). Martin Wainwright (UC Berkeley) Graphical models and message-passing September 3, / 23

6 Summary: sum-product on trees converges in at most graph diameter # of iterations updating a single message is an O(m 2 ) operation overall algorithm requires O(Nm 2 ) operations upon convergence, yields the exact node and edge marginals: p s (x s ) e θs(xs) M us (x s ) u N(s) p st (x s,x t ) e θs(xs)+θt(xt)+θst(xs,xt) M us (x s ) M ut (x t ) u N(s) u N(t) messages can also be used to compute the partition function Z = e θs(xs) e θst(xs,xt). x 1,...,x N s V (s,t) E Martin Wainwright (UC Berkeley) Graphical models and message-passing September 3, / 23

7 2. Sum-product on graph with cycles as with max-product, a widely used heuristic with a long history: error-control coding: Gallager, 1963 artificial intelligence: Pearl, 1988 turbo decoding: Berroux et al., 1993 etc.. Martin Wainwright (UC Berkeley) Graphical models and message-passing September 3, / 23

8 2. Sum-product on graph with cycles as with max-product, a widely used heuristic with a long history: error-control coding: Gallager, 1963 artificial intelligence: Pearl, 1988 turbo decoding: Berroux et al., 1993 etc.. some concerns with sum-product with cycles: no convergence guarantees can have multiple fixed points final estimate of Z is not a lower/upper bound Martin Wainwright (UC Berkeley) Graphical models and message-passing September 3, / 23

9 2. Sum-product on graph with cycles as with max-product, a widely used heuristic with a long history: error-control coding: Gallager, 1963 artificial intelligence: Pearl, 1988 turbo decoding: Berroux et al., 1993 etc.. some concerns with sum-product with cycles: no convergence guarantees can have multiple fixed points final estimate of Z is not a lower/upper bound as before, can consider a broader class of reweighted sum-product algorithms Martin Wainwright (UC Berkeley) Graphical models and message-passing September 3, / 23

10 Tree-reweighted sum-product algorithms Message update from node t to node s: M ts(x s) κ { [ θst(x s,x exp t) ρ st x t X t }{{} reweighted edge ] + θ t(x t) reweighted messages {[ }} ] { ρ Mvt(x vt t) } v N(t)\s [ Mst(x t) ] (1 ρ ts ) }{{} opposite message. Properties: 1. Modified updates remain distributed and purely local over the graph. Messages are reweighted with ρ st [0,1]. 2. Key differences: Potential on edge (s,t) is rescaled by ρ st [0,1]. Update involves the reverse direction edge. 3. The choice ρ st = 1 for all edges (s,t) recovers standard update. Martin Wainwright (UC Berkeley) Graphical models and message-passing September 3, / 23

11 Bethe entropy approximation define local marginal distributions (e.g., for m = 3 states): µ s(x s) = µs(0) µ s(1) µst(0,0) µst(0,1) µst(0,2) µ st(x s,x t) = µ st(1,0) µ st(1,1) µ st(1,2) µ s(2) µ st(2,0) µ st(2,1) µ st(2,2) define node-based entropy and edge-based mutual information: Node-based entropy:h s (µ s ) = x s µ s (x s )logµ s (x s ) Mutual information:i st (µ st ) = x s,x t µ st (x s,x t )log µ st(x s,x t ) µ s (x s )µ t (x t ). ρ-reweighted Bethe entropy H Bethe (µ) = s V H s (µ s ) (s,t) E ρ st I st (µ st ), Martin Wainwright (UC Berkeley) Graphical models and message-passing September 3, / 23

12 Bethe entropy is exact for trees exact for trees, using the factorization: p(x;θ) = s V µ s (x s ) (s,t) E µ st (x s,x t ) µ s (x s )µ t (x t ) Martin Wainwright (UC Berkeley) Graphical models and message-passing September 3, / 23

13 Reweighted sum-product and Bethe variational principle Define the local constraint set L(G) = { τ s,τ st τ 0, x s τ s (x s ) = 1, x t τ st (x s,x t ) = τ s (x s ) }

14 Reweighted sum-product and Bethe variational principle Define the local constraint set L(G) = { τ s,τ st τ 0, x s τ s (x s ) = 1, x t τ st (x s,x t ) = τ s (x s ) } Theorem For any choice of positive edge weights ρ st > 0: (a) Fixed points of reweighted sum-product are stationary points of the Lagrangian associated with { A Bethe (θ;ρ) := max τ s, θ s + } τ st, θ st +H Bethe (τ;ρ). τ L(G) s V (s,t) E

15 Reweighted sum-product and Bethe variational principle Define the local constraint set L(G) = { τ s,τ st τ 0, x s τ s (x s ) = 1, x t τ st (x s,x t ) = τ s (x s ) } Theorem For any choice of positive edge weights ρ st > 0: (a) Fixed points of reweighted sum-product are stationary points of the Lagrangian associated with { A Bethe (θ;ρ) := max τ s, θ s + } τ st, θ st +H Bethe (τ;ρ). τ L(G) s V (s,t) E (b) For valid choices of edge weights {ρ st }, the fixed points are unique and moreover logz(θ) A Bethe (θ;ρ). In addition, reweighted sum-product converges with appropriate scheduling.

16 Lagrangian derivation of ordinary sum-product let s try to solve this problem by a (partial) Lagrangian formulation assign a Lagrange multiplier λ ts(x s) for each constraint C ts(x s) := τ s(x s) x t τ st(x s,x t) = 0 will enforce the normalization ( x s τ s(x s) = 1) and non-negativity constraints explicitly the Lagrangian takes the form: L(τ;λ) = θ, τ + s V H s(τ s) + (s,t) E (s,t) E(G) I st(τ st) [ λ st(x t)c st(x t)+ λ ] ts(x s)c ts(x s) x t x s Martin Wainwright (UC Berkeley) Graphical models and message-passing September 3, / 23

17 Lagrangian derivation (part II) taking derivatives of the Lagrangian w.r.t τ s and τ st yields L τ s(x s) L τ st(x s,x t) = θ s(x s) logτ s(x s)+ t N(s) λ ts(x s)+c = θ st(x s,x t) log τst(xs,xt) τ s(x s)τ t(x t) λts(xs) λst(xt)+c setting these partial derivatives to zero and simplifying: τ s(x s) exp { θ } s(x s) exp { λ } ts(x s) u N(s)\t t N(s) τ s(x s,x t) exp { θ } s(x s)+θ t(x t)+θ st(x s,x t) exp { λ } us(x s) exp { λ } vt(x t) v N(t)\s enforcing the constraint C ts(x s) = 0 on these representations yields the familiar update rule for the messages M ts(x s) = exp(λ ts(x s)): M ts(x s) exp { θ } t(x t)+θ st(x s,x t) M ut(x t) x t u N(t)\s Martin Wainwright (UC Berkeley) Graphical models and message-passing September 3, / 23

18 Convex combinations of trees Idea: Upper bound A(θ) := logz(θ) with a convex combination of tree-structured problems. θ = ρ(t 1 )θ(t 1 ) + ρ(t 2 )θ(t 2 ) + ρ(t 3 )θ(t 3 ) A(θ) ρ(t 1 )A(θ(T 1 )) + ρ(t 2 )A(θ(T 2 )) + ρ(t 3 )A(θ(T 3 )) ρ = {ρ(t)} probability distribution over spanning trees θ(t) tree-structured parameter vector Martin Wainwright (UC Berkeley) Graphical models and message-passing September 3, / 23

19 Finding the tightest upper bound Observation: For each fixed distribution ρ over spanning trees, there are many such upper bounds. Goal: Find the tightest such upper bound over all trees. Challenge: Number of spanning trees grows rapidly in graph size. Martin Wainwright (UC Berkeley) Graphical models and message-passing September 3, / 23

20 Finding the tightest upper bound Observation: For each fixed distribution ρ over spanning trees, there are many such upper bounds. Goal: Find the tightest such upper bound over all trees. Challenge: Number of spanning trees grows rapidly in graph size. Example: On the 2-D lattice: Grid size # trees Martin Wainwright (UC Berkeley) Graphical models and message-passing September 3, / 23

21 Finding the tightest upper bound Observation: For each fixed distribution ρ over spanning trees, there are many such upper bounds. Goal: Find the tightest such upper bound over all trees. Challenge: Number of spanning trees grows rapidly in graph size. By a suitable dual reformulation, problem can be avoided: Key duality relation: T { min = max µ, θ +HBethe (µ;ρ st ) }. ρ(t)θ(t)=θρ(t)a(θ(t)) µ L(G) Martin Wainwright (UC Berkeley) Graphical models and message-passing September 3, / 23

22 Edge appearance probabilities Experiment: What is the probability ρ e that a given edge e E belongs to a tree T drawn randomly under ρ? f f f f b b b b e e e e (a) Original (b) ρ(t 1 ) = 1 3 (c) ρ(t 2 ) = 1 3 (d) ρ(t 3 ) = 1 3 In this example: ρ b = 1; ρ e = 2 3 ; ρ f = 1 3. The vector ρ e = { ρ e e E } must belong to the spanning tree polytope. (Edmonds, 1971) Martin Wainwright (UC Berkeley) Graphical models and message-passing September 3, / 23

23 Why does entropy arise in the duality? Due to a deep correspondence between two problems: Maximum entropy density estimation Maximize entropy H(p) = x p(x 1,...,x N )logp(x 1,...,x N ) subject to expectation constraints of the form p(x)φ α (x) = µ α. x Martin Wainwright (UC Berkeley) Graphical models and message-passing September 3, / 23

24 Why does entropy arise in the duality? Due to a deep correspondence between two problems: Maximum entropy density estimation Maximize entropy H(p) = x p(x 1,...,x N )logp(x 1,...,x N ) subject to expectation constraints of the form p(x)φ α (x) = µ α. x Maximum likelihood in exponential family Maximize likelihood of parameterized densities p(x 1,...,x N ;θ) = exp { θ α φ α (x) A(θ) }. α Martin Wainwright (UC Berkeley) Graphical models and message-passing September 3, / 23

25 Conjugate dual functions conjugate duality is a fertile source of variational representations any function f can be used to define another function f as follows: f (v) := sup u R n { v, u f(u) }. easy to show that f is always a convex function how about taking the dual of the dual? I.e., what is (f )? when f is well-behaved (convex and lower semi-continuous), we have (f ) = f, or alternatively stated: { f(u) = sup u, v f (v) } v R n

26 Geometric view: Supporting hyperplanes Question: Given all hyperplanes in R n R with normal (v, 1), what is the intercept of the one that supports epi(f)? β f(u) Epigraph of f: epi(f) := {(u,β) R n+1 f(u) β}. Analytically, we require the smallest c R such that: c a c b v, u c a v, u c b u (v, 1) v, u c f(u) for all u R n By re-arranging, we find that this optimal c is the dual value: c = sup u R n { v, u f(u) }.

27 Example: Single Bernoulli Random variable X {0,1} yields exponential family of the form: p(x;θ) exp { θx } with A(θ) = log [ 1+exp(θ) ]. Let s compute the dual A { } (µ) := sup µθ log[1+exp(θ)]. θ R (Possible) stationary point: µ = exp(θ)/[1 + exp(θ)]. A(θ) A(θ) µ, θ A (µ) θ θ µ, θ c (a) Epigraph supported We find that: A (µ) = (b) Epigraph cannot be supported { µlogµ+(1 µ)log(1 µ) if µ [0,1] + otherwise.. Leads to the variational representation: A(θ) = max µ [0,1] { µ θ A (µ) }. Martin Wainwright (UC Berkeley) Graphical models and message-passing September 3, / 23

28 Geometry of Bethe variational problem µ int M(G) µ frac L(G) belief propagation uses a polyhedral outer approximation to M(G): for any graph, L(G) M(G). equality holds G is a tree. Natural question: Do BP fixed points ever fall outside of the marginal polytope M(G)? Martin Wainwright (UC Berkeley) Graphical models and message-passing September 3, / 23

29 Illustration: Globally inconsistent BP fixed points Consider the following assignment of pseudomarginals τ s,τ st : Locally consistent (pseudo)marginals ¼ ¼ ½ ¼ ¼ 2 ¼ ½ ¼ 1 ¼ ½ ¼ ¼ ¼ ¼ ¼ ½ ¼ ½ ¼ 3 ¼ ¼ ½ ¼ ¼ can verify that τ L(G), and that τ is a fixed point of belief propagation (with all constant messages) however, τ is globally inconsistent Note: More generally: for any τ in the interior of L(G), can construct a distribution with τ as a BP fixed point. Martin Wainwright (UC Berkeley) Graphical models and message-passing September 3, / 23

30 High-level perspective: A broad class of methods message-passing algorithms (e.g., mean field, belief propagation) are solving approximate versions of exact variational principle in exponential families there are two distinct components to approximations: (a) can use either inner or outer bounds to M (b) various approximations to entropy function A (µ) Refining one or both components yields better approximations: BP: polyhedral outer bound and non-convex Bethe approximation Kikuchi and variants: tighter polyhedral outer bounds and better entropy approximations (e.g.,yedidia et al., 2002) Expectation-propagation: better outer bounds and Bethe-like entropy approximations (Minka, 2002) Martin Wainwright (UC Berkeley) Graphical models and message-passing September 3, / 23

14 : Theory of Variational Inference: Inner and Outer Approximation

14 : Theory of Variational Inference: Inner and Outer Approximation 10-708: Probabilistic Graphical Models 10-708, Spring 2017 14 : Theory of Variational Inference: Inner and Outer Approximation Lecturer: Eric P. Xing Scribes: Maria Ryskina, Yen-Chia Hsu 1 Introduction

More information

Probabilistic Graphical Models

Probabilistic Graphical Models School of Computer Science Probabilistic Graphical Models Variational Inference IV: Variational Principle II Junming Yin Lecture 17, March 21, 2012 X 1 X 1 X 1 X 1 X 2 X 3 X 2 X 2 X 3 X 3 Reading: X 4

More information

Probabilistic Graphical Models. Theory of Variational Inference: Inner and Outer Approximation. Lecture 15, March 4, 2013

Probabilistic Graphical Models. Theory of Variational Inference: Inner and Outer Approximation. Lecture 15, March 4, 2013 School of Computer Science Probabilistic Graphical Models Theory of Variational Inference: Inner and Outer Approximation Junming Yin Lecture 15, March 4, 2013 Reading: W & J Book Chapters 1 Roadmap Two

More information

Graphical models and message-passing Part I: Basics and MAP computation

Graphical models and message-passing Part I: Basics and MAP computation Graphical models and message-passing Part I: Basics and MAP computation Martin Wainwright UC Berkeley Departments of Statistics, and EECS Tutorial materials (slides, monograph, lecture notes) available

More information

14 : Theory of Variational Inference: Inner and Outer Approximation

14 : Theory of Variational Inference: Inner and Outer Approximation 10-708: Probabilistic Graphical Models 10-708, Spring 2014 14 : Theory of Variational Inference: Inner and Outer Approximation Lecturer: Eric P. Xing Scribes: Yu-Hsin Kuo, Amos Ng 1 Introduction Last lecture

More information

Probabilistic and Bayesian Machine Learning

Probabilistic and Bayesian Machine Learning Probabilistic and Bayesian Machine Learning Day 4: Expectation and Belief Propagation Yee Whye Teh ywteh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit University College London http://www.gatsby.ucl.ac.uk/

More information

Probabilistic Graphical Models

Probabilistic Graphical Models School of Computer Science Probabilistic Graphical Models Variational Inference III: Variational Principle I Junming Yin Lecture 16, March 19, 2012 X 1 X 1 X 1 X 1 X 2 X 3 X 2 X 2 X 3 X 3 Reading: X 4

More information

Exact MAP estimates by (hyper)tree agreement

Exact MAP estimates by (hyper)tree agreement Exact MAP estimates by (hyper)tree agreement Martin J. Wainwright, Department of EECS, UC Berkeley, Berkeley, CA 94720 martinw@eecs.berkeley.edu Tommi S. Jaakkola and Alan S. Willsky, Department of EECS,

More information

Probabilistic Graphical Models

Probabilistic Graphical Models School of Computer Science Probabilistic Graphical Models Variational Inference II: Mean Field Method and Variational Principle Junming Yin Lecture 15, March 7, 2012 X 1 X 1 X 1 X 1 X 2 X 3 X 2 X 2 X 3

More information

Discrete Markov Random Fields the Inference story. Pradeep Ravikumar

Discrete Markov Random Fields the Inference story. Pradeep Ravikumar Discrete Markov Random Fields the Inference story Pradeep Ravikumar Graphical Models, The History How to model stochastic processes of the world? I want to model the world, and I like graphs... 2 Mid to

More information

Junction Tree, BP and Variational Methods

Junction Tree, BP and Variational Methods Junction Tree, BP and Variational Methods Adrian Weller MLSALT4 Lecture Feb 21, 2018 With thanks to David Sontag (MIT) and Tony Jebara (Columbia) for use of many slides and illustrations For more information,

More information

MAP estimation via agreement on (hyper)trees: Message-passing and linear programming approaches

MAP estimation via agreement on (hyper)trees: Message-passing and linear programming approaches MAP estimation via agreement on (hyper)trees: Message-passing and linear programming approaches Martin Wainwright Tommi Jaakkola Alan Willsky martinw@eecs.berkeley.edu tommi@ai.mit.edu willsky@mit.edu

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Brown University CSCI 2950-P, Spring 2013 Prof. Erik Sudderth Lecture 12: Gaussian Belief Propagation, State Space Models and Kalman Filters Guest Kalman Filter Lecture by

More information

A new class of upper bounds on the log partition function

A new class of upper bounds on the log partition function Accepted to IEEE Transactions on Information Theory This version: March 25 A new class of upper bounds on the log partition function 1 M. J. Wainwright, T. S. Jaakkola and A. S. Willsky Abstract We introduce

More information

Variational Inference (11/04/13)

Variational Inference (11/04/13) STA561: Probabilistic machine learning Variational Inference (11/04/13) Lecturer: Barbara Engelhardt Scribes: Matt Dickenson, Alireza Samany, Tracy Schifeling 1 Introduction In this lecture we will further

More information

Approximating the Partition Function by Deleting and then Correcting for Model Edges (Extended Abstract)

Approximating the Partition Function by Deleting and then Correcting for Model Edges (Extended Abstract) Approximating the Partition Function by Deleting and then Correcting for Model Edges (Extended Abstract) Arthur Choi and Adnan Darwiche Computer Science Department University of California, Los Angeles

More information

Inconsistent parameter estimation in Markov random fields: Benefits in the computation-limited setting

Inconsistent parameter estimation in Markov random fields: Benefits in the computation-limited setting Inconsistent parameter estimation in Markov random fields: Benefits in the computation-limited setting Martin J. Wainwright Department of Statistics, and Department of Electrical Engineering and Computer

More information

13 : Variational Inference: Loopy Belief Propagation

13 : Variational Inference: Loopy Belief Propagation 10-708: Probabilistic Graphical Models 10-708, Spring 2014 13 : Variational Inference: Loopy Belief Propagation Lecturer: Eric P. Xing Scribes: Rajarshi Das, Zhengzhong Liu, Dishan Gupta 1 Introduction

More information

Dual Decomposition for Marginal Inference

Dual Decomposition for Marginal Inference Dual Decomposition for Marginal Inference Justin Domke Rochester Institute of echnology Rochester, NY 14623 Abstract We present a dual decomposition approach to the treereweighted belief propagation objective.

More information

Message-passing, convex relaxations, and all that. Graphical models and variational methods: Introduction

Message-passing, convex relaxations, and all that. Graphical models and variational methods: Introduction Graphical models and variational methods: Message-passing, convex relaxations, and all that Martin Wainwright Department of Statistics, and Department of Electrical Engineering and Computer Science, UC

More information

17 Variational Inference

17 Variational Inference Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science 6.438 Algorithms for Inference Fall 2014 17 Variational Inference Prompted by loopy graphs for which exact

More information

Concavity of reweighted Kikuchi approximation

Concavity of reweighted Kikuchi approximation Concavity of reweighted ikuchi approximation Po-Ling Loh Department of Statistics The Wharton School University of Pennsylvania loh@wharton.upenn.edu Andre Wibisono Computer Science Division University

More information

CSE 254: MAP estimation via agreement on (hyper)trees: Message-passing and linear programming approaches

CSE 254: MAP estimation via agreement on (hyper)trees: Message-passing and linear programming approaches CSE 254: MAP estimation via agreement on (hyper)trees: Message-passing and linear programming approaches A presentation by Evan Ettinger November 11, 2005 Outline Introduction Motivation and Background

More information

Semidefinite relaxations for approximate inference on graphs with cycles

Semidefinite relaxations for approximate inference on graphs with cycles Semidefinite relaxations for approximate inference on graphs with cycles Martin J. Wainwright Electrical Engineering and Computer Science UC Berkeley, Berkeley, CA 94720 wainwrig@eecs.berkeley.edu Michael

More information

Lecture 12: Binocular Stereo and Belief Propagation

Lecture 12: Binocular Stereo and Belief Propagation Lecture 12: Binocular Stereo and Belief Propagation A.L. Yuille March 11, 2012 1 Introduction Binocular Stereo is the process of estimating three-dimensional shape (stereo) from two eyes (binocular) or

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Brown University CSCI 295-P, Spring 213 Prof. Erik Sudderth Lecture 11: Inference & Learning Overview, Gaussian Graphical Models Some figures courtesy Michael Jordan s draft

More information

Expectation Propagation in Factor Graphs: A Tutorial

Expectation Propagation in Factor Graphs: A Tutorial DRAFT: Version 0.1, 28 October 2005. Do not distribute. Expectation Propagation in Factor Graphs: A Tutorial Charles Sutton October 28, 2005 Abstract Expectation propagation is an important variational

More information

High dimensional Ising model selection

High dimensional Ising model selection High dimensional Ising model selection Pradeep Ravikumar UT Austin (based on work with John Lafferty, Martin Wainwright) Sparse Ising model US Senate 109th Congress Banerjee et al, 2008 Estimate a sparse

More information

Introduction to graphical models: Lecture III

Introduction to graphical models: Lecture III Introduction to graphical models: Lecture III Martin Wainwright UC Berkeley Departments of Statistics, and EECS Martin Wainwright (UC Berkeley) Some introductory lectures January 2013 1 / 25 Introduction

More information

arxiv: v1 [stat.ml] 26 Feb 2013

arxiv: v1 [stat.ml] 26 Feb 2013 Variational Algorithms for Marginal MAP Qiang Liu Donald Bren School of Information and Computer Sciences University of California, Irvine Irvine, CA, 92697-3425, USA qliu1@uci.edu arxiv:1302.6584v1 [stat.ml]

More information

13 : Variational Inference: Loopy Belief Propagation and Mean Field

13 : Variational Inference: Loopy Belief Propagation and Mean Field 10-708: Probabilistic Graphical Models 10-708, Spring 2012 13 : Variational Inference: Loopy Belief Propagation and Mean Field Lecturer: Eric P. Xing Scribes: Peter Schulam and William Wang 1 Introduction

More information

UNDERSTANDING BELIEF PROPOGATION AND ITS GENERALIZATIONS

UNDERSTANDING BELIEF PROPOGATION AND ITS GENERALIZATIONS UNDERSTANDING BELIEF PROPOGATION AND ITS GENERALIZATIONS JONATHAN YEDIDIA, WILLIAM FREEMAN, YAIR WEISS 2001 MERL TECH REPORT Kristin Branson and Ian Fasel June 11, 2003 1. Inference Inference problems

More information

Convergence Analysis of Reweighted Sum-Product Algorithms

Convergence Analysis of Reweighted Sum-Product Algorithms IEEE TRANSACTIONS ON SIGNAL PROCESSING, PAPER IDENTIFICATION NUMBER T-SP-06090-2007.R1 1 Convergence Analysis of Reweighted Sum-Product Algorithms Tanya Roosta, Student Member, IEEE, Martin J. Wainwright,

More information

On the optimality of tree-reweighted max-product message-passing

On the optimality of tree-reweighted max-product message-passing Appeared in Uncertainty on Artificial Intelligence, July 2005, Edinburgh, Scotland. On the optimality of tree-reweighted max-product message-passing Vladimir Kolmogorov Microsoft Research Cambridge, UK

More information

On the Convergence of the Convex Relaxation Method and Distributed Optimization of Bethe Free Energy

On the Convergence of the Convex Relaxation Method and Distributed Optimization of Bethe Free Energy On the Convergence of the Convex Relaxation Method and Distributed Optimization of Bethe Free Energy Ming Su Department of Electrical Engineering and Department of Statistics University of Washington,

More information

12 : Variational Inference I

12 : Variational Inference I 10-708: Probabilistic Graphical Models, Spring 2015 12 : Variational Inference I Lecturer: Eric P. Xing Scribes: Fattaneh Jabbari, Eric Lei, Evan Shapiro 1 Introduction Probabilistic inference is one of

More information

Tightening Fractional Covering Upper Bounds on the Partition Function for High-Order Region Graphs

Tightening Fractional Covering Upper Bounds on the Partition Function for High-Order Region Graphs Tightening Fractional Covering Upper Bounds on the Partition Function for High-Order Region Graphs Tamir Hazan TTI Chicago Chicago, IL 60637 Jian Peng TTI Chicago Chicago, IL 60637 Amnon Shashua The Hebrew

More information

Concavity of reweighted Kikuchi approximation

Concavity of reweighted Kikuchi approximation Concavity of reweighted Kikuchi approximation Po-Ling Loh Department of Statistics The Wharton School University of Pennsylvania loh@wharton.upenn.edu Andre Wibisono Computer Science Division University

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Lecture 9: Variational Inference Relaxations Volkan Cevher, Matthias Seeger Ecole Polytechnique Fédérale de Lausanne 24/10/2011 (EPFL) Graphical Models 24/10/2011 1 / 15

More information

Machine Learning for Data Science (CS4786) Lecture 24

Machine Learning for Data Science (CS4786) Lecture 24 Machine Learning for Data Science (CS4786) Lecture 24 Graphical Models: Approximate Inference Course Webpage : http://www.cs.cornell.edu/courses/cs4786/2016sp/ BELIEF PROPAGATION OR MESSAGE PASSING Each

More information

Variational algorithms for marginal MAP

Variational algorithms for marginal MAP Variational algorithms for marginal MAP Alexander Ihler UC Irvine CIOG Workshop November 2011 Variational algorithms for marginal MAP Alexander Ihler UC Irvine CIOG Workshop November 2011 Work with Qiang

More information

13: Variational inference II

13: Variational inference II 10-708: Probabilistic Graphical Models, Spring 2015 13: Variational inference II Lecturer: Eric P. Xing Scribes: Ronghuo Zheng, Zhiting Hu, Yuntian Deng 1 Introduction We started to talk about variational

More information

Fractional Belief Propagation

Fractional Belief Propagation Fractional Belief Propagation im iegerinck and Tom Heskes S, niversity of ijmegen Geert Grooteplein 21, 6525 EZ, ijmegen, the etherlands wimw,tom @snn.kun.nl Abstract e consider loopy belief propagation

More information

High-dimensional graphical model selection: Practical and information-theoretic limits

High-dimensional graphical model selection: Practical and information-theoretic limits 1 High-dimensional graphical model selection: Practical and information-theoretic limits Martin Wainwright Departments of Statistics, and EECS UC Berkeley, California, USA Based on joint work with: John

More information

Inference and Representation

Inference and Representation Inference and Representation David Sontag New York University Lecture 5, Sept. 30, 2014 David Sontag (NYU) Inference and Representation Lecture 5, Sept. 30, 2014 1 / 16 Today s lecture 1 Running-time of

More information

Dual Decomposition for Inference

Dual Decomposition for Inference Dual Decomposition for Inference Yunshu Liu ASPITRG Research Group 2014-05-06 References: [1]. D. Sontag, A. Globerson and T. Jaakkola, Introduction to Dual Decomposition for Inference, Optimization for

More information

Information Geometric view of Belief Propagation

Information Geometric view of Belief Propagation Information Geometric view of Belief Propagation Yunshu Liu 2013-10-17 References: [1]. Shiro Ikeda, Toshiyuki Tanaka and Shun-ichi Amari, Stochastic reasoning, Free energy and Information Geometry, Neural

More information

Information Projection Algorithms and Belief Propagation

Information Projection Algorithms and Belief Propagation χ 1 π(χ 1 ) Information Projection Algorithms and Belief Propagation Phil Regalia Department of Electrical Engineering and Computer Science Catholic University of America Washington, DC 20064 with J. M.

More information

Variational Algorithms for Marginal MAP

Variational Algorithms for Marginal MAP Variational Algorithms for Marginal MAP Qiang Liu Department of Computer Science University of California, Irvine Irvine, CA, 92697 qliu1@ics.uci.edu Alexander Ihler Department of Computer Science University

More information

Expectation Propagation Algorithm

Expectation Propagation Algorithm Expectation Propagation Algorithm 1 Shuang Wang School of Electrical and Computer Engineering University of Oklahoma, Tulsa, OK, 74135 Email: {shuangwang}@ou.edu This note contains three parts. First,

More information

3 : Representation of Undirected GM

3 : Representation of Undirected GM 10-708: Probabilistic Graphical Models 10-708, Spring 2016 3 : Representation of Undirected GM Lecturer: Eric P. Xing Scribes: Longqi Cai, Man-Chia Chang 1 MRF vs BN There are two types of graphical models:

More information

CS Lecture 19. Exponential Families & Expectation Propagation

CS Lecture 19. Exponential Families & Expectation Propagation CS 6347 Lecture 19 Exponential Families & Expectation Propagation Discrete State Spaces We have been focusing on the case of MRFs over discrete state spaces Probability distributions over discrete spaces

More information

Learning and Evaluating Boltzmann Machines

Learning and Evaluating Boltzmann Machines Department of Computer Science 6 King s College Rd, Toronto University of Toronto M5S 3G4, Canada http://learning.cs.toronto.edu fax: +1 416 978 1455 Copyright c Ruslan Salakhutdinov 2008. June 26, 2008

More information

High-dimensional graphical model selection: Practical and information-theoretic limits

High-dimensional graphical model selection: Practical and information-theoretic limits 1 High-dimensional graphical model selection: Practical and information-theoretic limits Martin Wainwright Departments of Statistics, and EECS UC Berkeley, California, USA Based on joint work with: John

More information

CS281A/Stat241A Lecture 19

CS281A/Stat241A Lecture 19 CS281A/Stat241A Lecture 19 p. 1/4 CS281A/Stat241A Lecture 19 Junction Tree Algorithm Peter Bartlett CS281A/Stat241A Lecture 19 p. 2/4 Announcements My office hours: Tuesday Nov 3 (today), 1-2pm, in 723

More information

Does Better Inference mean Better Learning?

Does Better Inference mean Better Learning? Does Better Inference mean Better Learning? Andrew E. Gelfand, Rina Dechter & Alexander Ihler Department of Computer Science University of California, Irvine {agelfand,dechter,ihler}@ics.uci.edu Abstract

More information

UC Berkeley Department of Electrical Engineering and Computer Science Department of Statistics. EECS 281A / STAT 241A Statistical Learning Theory

UC Berkeley Department of Electrical Engineering and Computer Science Department of Statistics. EECS 281A / STAT 241A Statistical Learning Theory UC Berkeley Department of Electrical Engineering and Computer Science Department of Statistics EECS 281A / STAT 241A Statistical Learning Theory Solutions to Problem Set 2 Fall 2011 Issued: Wednesday,

More information

Learning MN Parameters with Approximation. Sargur Srihari

Learning MN Parameters with Approximation. Sargur Srihari Learning MN Parameters with Approximation Sargur srihari@cedar.buffalo.edu 1 Topics Iterative exact learning of MN parameters Difficulty with exact methods Approximate methods Approximate Inference Belief

More information

Variational Inference. Sargur Srihari

Variational Inference. Sargur Srihari Variational Inference Sargur srihari@cedar.buffalo.edu 1 Plan of discussion We first describe inference with PGMs and the intractability of exact inference Then give a taxonomy of inference algorithms

More information

Linear Response for Approximate Inference

Linear Response for Approximate Inference Linear Response for Approximate Inference Max Welling Department of Computer Science University of Toronto Toronto M5S 3G4 Canada welling@cs.utoronto.ca Yee Whye Teh Computer Science Division University

More information

Convergence Analysis of Reweighted Sum-Product Algorithms

Convergence Analysis of Reweighted Sum-Product Algorithms Convergence Analysis of Reweighted Sum-Product Algorithms Tanya Roosta Martin J. Wainwright, Shankar Sastry Department of Electrical and Computer Sciences, and Department of Statistics UC Berkeley, Berkeley,

More information

CS Lecture 13. More Maximum Likelihood

CS Lecture 13. More Maximum Likelihood CS 6347 Lecture 13 More Maxiu Likelihood Recap Last tie: Introduction to axiu likelihood estiation MLE for Bayesian networks Optial CPTs correspond to epirical counts Today: MLE for CRFs 2 Maxiu Likelihood

More information

Bayesian Machine Learning - Lecture 7

Bayesian Machine Learning - Lecture 7 Bayesian Machine Learning - Lecture 7 Guido Sanguinetti Institute for Adaptive and Neural Computation School of Informatics University of Edinburgh gsanguin@inf.ed.ac.uk March 4, 2015 Today s lecture 1

More information

Convexifying the Bethe Free Energy

Convexifying the Bethe Free Energy 402 MESHI ET AL. Convexifying the Bethe Free Energy Ofer Meshi Ariel Jaimovich Amir Globerson Nir Friedman School of Computer Science and Engineering Hebrew University, Jerusalem, Israel 91904 meshi,arielj,gamir,nir@cs.huji.ac.il

More information

Graphs with Cycles: A Summary of Martin Wainwright s Ph.D. Thesis

Graphs with Cycles: A Summary of Martin Wainwright s Ph.D. Thesis Graphs with Cycles: A Summary of Martin Wainwright s Ph.D. Thesis Constantine Caramanis November 6, 2002 This paper is a (partial) summary of chapters 2,5 and 7 of Martin Wainwright s Ph.D. Thesis. For

More information

Lecture 4 October 18th

Lecture 4 October 18th Directed and undirected graphical models Fall 2017 Lecture 4 October 18th Lecturer: Guillaume Obozinski Scribe: In this lecture, we will assume that all random variables are discrete, to keep notations

More information

Graphical Model Inference with Perfect Graphs

Graphical Model Inference with Perfect Graphs Graphical Model Inference with Perfect Graphs Tony Jebara Columbia University July 25, 2013 joint work with Adrian Weller Graphical models and Markov random fields We depict a graphical model G as a bipartite

More information

9 Forward-backward algorithm, sum-product on factor graphs

9 Forward-backward algorithm, sum-product on factor graphs Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science 6.438 Algorithms For Inference Fall 2014 9 Forward-backward algorithm, sum-product on factor graphs The previous

More information

Structured Variational Inference

Structured Variational Inference Structured Variational Inference Sargur srihari@cedar.buffalo.edu 1 Topics 1. Structured Variational Approximations 1. The Mean Field Approximation 1. The Mean Field Energy 2. Maximizing the energy functional:

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models David Sontag New York University Lecture 6, March 7, 2013 David Sontag (NYU) Graphical Models Lecture 6, March 7, 2013 1 / 25 Today s lecture 1 Dual decomposition 2 MAP inference

More information

Belief Propagation, Information Projections, and Dykstra s Algorithm

Belief Propagation, Information Projections, and Dykstra s Algorithm Belief Propagation, Information Projections, and Dykstra s Algorithm John MacLaren Walsh, PhD Department of Electrical and Computer Engineering Drexel University Philadelphia, PA jwalsh@ece.drexel.edu

More information

Convergent Tree-reweighted Message Passing for Energy Minimization

Convergent Tree-reweighted Message Passing for Energy Minimization Accepted to IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), 2006. Convergent Tree-reweighted Message Passing for Energy Minimization Vladimir Kolmogorov University College London,

More information

Chalmers Publication Library

Chalmers Publication Library Chalmers Publication Library Copyright Notice 20 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for

More information

Lagrangian Relaxation for MAP Estimation in Graphical Models

Lagrangian Relaxation for MAP Estimation in Graphical Models In Proceedings of The th Allerton Conference on Communication, Control and Computing, September, 007. Lagrangian Relaxation for MAP Estimation in Graphical Models Jason K. Johnson, Dmitry M. Malioutov

More information

Undirected graphical models

Undirected graphical models Undirected graphical models Semantics of probabilistic models over undirected graphs Parameters of undirected models Example applications COMP-652 and ECSE-608, February 16, 2017 1 Undirected graphical

More information

Inference in Bayesian Networks

Inference in Bayesian Networks Andrea Passerini passerini@disi.unitn.it Machine Learning Inference in graphical models Description Assume we have evidence e on the state of a subset of variables E in the model (i.e. Bayesian Network)

More information

Linear Response Algorithms for Approximate Inference in Graphical Models

Linear Response Algorithms for Approximate Inference in Graphical Models Linear Response Algorithms for Approximate Inference in Graphical Models Max Welling Department of Computer Science University of Toronto 10 King s College Road, Toronto M5S 3G4 Canada welling@cs.toronto.edu

More information

Bounding the Partition Function using Hölder s Inequality

Bounding the Partition Function using Hölder s Inequality Qiang Liu Alexander Ihler Department of Computer Science, University of California, Irvine, CA, 92697, USA qliu1@uci.edu ihler@ics.uci.edu Abstract We describe an algorithm for approximate inference in

More information

3 Undirected Graphical Models

3 Undirected Graphical Models Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science 6.438 Algorithms For Inference Fall 2014 3 Undirected Graphical Models In this lecture, we discuss undirected

More information

Inference as Optimization

Inference as Optimization Inference as Optimization Sargur Srihari srihari@cedar.buffalo.edu 1 Topics in Inference as Optimization Overview Exact Inference revisited The Energy Functional Optimizing the Energy Functional 2 Exact

More information

Machine Learning 4771

Machine Learning 4771 Machine Learning 4771 Instructor: Tony Jebara Topic 16 Undirected Graphs Undirected Separation Inferring Marginals & Conditionals Moralization Junction Trees Triangulation Undirected Graphs Separation

More information

Expectation Maximization

Expectation Maximization Expectation Maximization Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr 1 /

More information

Preconditioner Approximations for Probabilistic Graphical Models

Preconditioner Approximations for Probabilistic Graphical Models Preconditioner Approximations for Probabilistic Graphical Models Pradeep Ravikumar John Lafferty School of Computer Science Carnegie Mellon University Abstract We present a family of approximation techniques

More information

CS Lecture 8 & 9. Lagrange Multipliers & Varitional Bounds

CS Lecture 8 & 9. Lagrange Multipliers & Varitional Bounds CS 6347 Lecture 8 & 9 Lagrange Multipliers & Varitional Bounds General Optimization subject to: min ff 0() R nn ff ii 0, h ii = 0, ii = 1,, mm ii = 1,, pp 2 General Optimization subject to: min ff 0()

More information

Learning discrete graphical models via generalized inverse covariance matrices

Learning discrete graphical models via generalized inverse covariance matrices Learning discrete graphical models via generalized inverse covariance matrices Duzhe Wang, Yiming Lv, Yongjoon Kim, Young Lee Department of Statistics University of Wisconsin-Madison {dwang282, lv23, ykim676,

More information

Chapter 8 Cluster Graph & Belief Propagation. Probabilistic Graphical Models 2016 Fall

Chapter 8 Cluster Graph & Belief Propagation. Probabilistic Graphical Models 2016 Fall Chapter 8 Cluster Graph & elief ropagation robabilistic Graphical Models 2016 Fall Outlines Variable Elimination 消元法 imple case: linear chain ayesian networks VE in complex graphs Inferences in HMMs and

More information

LINK scheduling algorithms based on CSMA have received

LINK scheduling algorithms based on CSMA have received Efficient CSMA using Regional Free Energy Approximations Peruru Subrahmanya Swamy, Venkata Pavan Kumar Bellam, Radha Krishna Ganti, and Krishna Jagannathan arxiv:.v [cs.ni] Feb Abstract CSMA Carrier Sense

More information

Sparse Graph Learning via Markov Random Fields

Sparse Graph Learning via Markov Random Fields Sparse Graph Learning via Markov Random Fields Xin Sui, Shao Tang Sep 23, 2016 Xin Sui, Shao Tang Sparse Graph Learning via Markov Random Fields Sep 23, 2016 1 / 36 Outline 1 Introduction to graph learning

More information

Graphical models. Sunita Sarawagi IIT Bombay

Graphical models. Sunita Sarawagi IIT Bombay 1 Graphical models Sunita Sarawagi IIT Bombay http://www.cse.iitb.ac.in/~sunita 2 Probabilistic modeling Given: several variables: x 1,... x n, n is large. Task: build a joint distribution function Pr(x

More information

Factor Graphs and Message Passing Algorithms Part 1: Introduction

Factor Graphs and Message Passing Algorithms Part 1: Introduction Factor Graphs and Message Passing Algorithms Part 1: Introduction Hans-Andrea Loeliger December 2007 1 The Two Basic Problems 1. Marginalization: Compute f k (x k ) f(x 1,..., x n ) x 1,..., x n except

More information

A brief introduction to Conditional Random Fields

A brief introduction to Conditional Random Fields A brief introduction to Conditional Random Fields Mark Johnson Macquarie University April, 2005, updated October 2010 1 Talk outline Graphical models Maximum likelihood and maximum conditional likelihood

More information

Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science Algorithms For Inference Fall 2014

Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science Algorithms For Inference Fall 2014 Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science 6.438 Algorithms For Inference Fall 2014 Problem Set 3 Issued: Thursday, September 25, 2014 Due: Thursday,

More information

The Generalized Distributive Law and Free Energy Minimization

The Generalized Distributive Law and Free Energy Minimization The Generalized Distributive Law and Free Energy Minimization Srinivas M. Aji Robert J. McEliece Rainfinity, Inc. Department of Electrical Engineering 87 N. Raymond Ave. Suite 200 California Institute

More information

Probabilistic Graphical Models

Probabilistic Graphical Models 2016 Robert Nowak Probabilistic Graphical Models 1 Introduction We have focused mainly on linear models for signals, in particular the subspace model x = Uθ, where U is a n k matrix and θ R k is a vector

More information

Walk-Sum Interpretation and Analysis of Gaussian Belief Propagation

Walk-Sum Interpretation and Analysis of Gaussian Belief Propagation Walk-Sum Interpretation and Analysis of Gaussian Belief Propagation Jason K. Johnson, Dmitry M. Malioutov and Alan S. Willsky Department of Electrical Engineering and Computer Science Massachusetts Institute

More information

Lecture 18 Generalized Belief Propagation and Free Energy Approximations

Lecture 18 Generalized Belief Propagation and Free Energy Approximations Lecture 18, Generalized Belief Propagation and Free Energy Approximations 1 Lecture 18 Generalized Belief Propagation and Free Energy Approximations In this lecture we talked about graphical models and

More information

6.867 Machine learning, lecture 23 (Jaakkola)

6.867 Machine learning, lecture 23 (Jaakkola) Lecture topics: Markov Random Fields Probabilistic inference Markov Random Fields We will briefly go over undirected graphical models or Markov Random Fields (MRFs) as they will be needed in the context

More information

DUAL OPTIMALITY FRAMEWORKS FOR EXPECTATION PROPAGATION. John MacLaren Walsh

DUAL OPTIMALITY FRAMEWORKS FOR EXPECTATION PROPAGATION. John MacLaren Walsh DUAL OPTIMALITY FAMEWOKS FO EPECTATION POPAGATION John MacLaren Walsh Cornell University School of Electrical Computer Engineering Ithaca, NY 4853 ABSTACT We show a duality relationship between two different

More information

On the Choice of Regions for Generalized Belief Propagation

On the Choice of Regions for Generalized Belief Propagation UAI 2004 WELLING 585 On the Choice of Regions for Generalized Belief Propagation Max Welling (welling@ics.uci.edu) School of Information and Computer Science University of California Irvine, CA 92697-3425

More information

ABSTRACT. Inference by Reparameterization using Neural Population Codes. Rajkumar Vasudeva Raju

ABSTRACT. Inference by Reparameterization using Neural Population Codes. Rajkumar Vasudeva Raju ABSTRACT Inference by Reparameterization using Neural Population Codes by Rajkumar Vasudeva Raju Behavioral experiments on humans and animals suggest that the brain performs probabilistic inference to

More information