Discrete Markov Random Fields the Inference story. Pradeep Ravikumar

Size: px
Start display at page:

Download "Discrete Markov Random Fields the Inference story. Pradeep Ravikumar"

Transcription

1 Discrete Markov Random Fields the Inference story Pradeep Ravikumar

2 Graphical Models, The History How to model stochastic processes of the world? I want to model the world, and I like graphs... 2

3 Mid to Late Twentieth Century History Pioneering work of Conspiracy Theorists The System, it is all connected... 3

4 History Late Twentieth Century: people realize that existing scientific literature offers a marriage between probability theory and graph theory which can be used to model the world. 4

5 History Common Misconception: Called graphical models after Grafici Modeles, a sculptor protege of Da Vinci. Called Graphical Models because it models stochastic systems using graphs. 5

6 History Common Misconception: Called graphical models after Grafici Modeles, a sculptor protege of Da Vinci. Called Graphical Models because it models stochastic systems using graphs. 6

7 Graphical Models X2 X4 X1 X3 7

8 Graphical Models X2 X4 X1 X3 8

9 Graphical Models X2 X4 X1 X3 Separating Set (X 2, X 3 ) disconnects X 1 and X 4 9

10 Graphical Models X2 X4 X1 X3 Separating Set (X 2, X 3 ) disconnects X 1 and X 4 Global Markov Property X 1 X 4 (X 2, X 3 ) 10

11 Graphical Models MP(G) Set of all Markov properties, by ranging over separating sets of G. P represented by G P satisfies MP(G) P1(X) G(X) P4(X) P3(X) P2(X) 11

12 Hammersley and Clifford Theorem Positive P over X satisfies MP(G) iff P factorizes according to cliques C in G, P(X) = 1 Z C C ψ C (X C ) Specific member of family specified by weights over cliques. P1(X) G(X) P4(X) P3(X) {ψ (3) C (X C)} P2(X) 12

13 Exponential Family p(x) = 1 Z ψ C (X C ) C C = exp( log ψ C (X C ) log Z) C C Exponential family: p(x; θ) = exp ( α C θ αφ α (X) Ψ(θ) ) {φ α } features {θ α } parameters Ψ(θ) log partition function 13

14 Inference Answering queries about the graphical model probability distribution. 14

15 Inference For undirected model p(x; θ) = exp ( α I θ αφ α (x) Ψ(θ) ) key inference problems are: compute log partition function (normalization constant) Ψ(θ) marginals p(x A ) = P xv,v A p(x) most probable configurations x = arg max x p(x x L ) These problems are intractable in full generality. 15

16 Log Partition Function Z = log x α I ψ α (x α ) 16

17 Variable Elimination Z = log x α I ψ α (x α ) x = = = ψ α (x α ) α {x j i } {x j i } {x j i } ψ α (x α ) x i α ψ α (x α ) ψ α (x α ) α C \i x i α C i ψ α (x α )g(x j i ) α C \i 17

18 Variable Elimination Z = {x j i } α C \i ψ α (x α )g(x j i ) Continue to eliminate other variables x j. Is this a linear time method then? 18

19 Variable Elimination Z = {x j i } α C \i ψ α (x α )g(x j i ) Continue to eliminate other variables x j. Is this a linear time method then? g(x j i ) depends on variables j which share a factor with i. 19

20 Variable Elimination Z = log x α I ψ α (x α ) x 7 ψ 57 ψ 67 x 5 x 6 ψ 15 ψ 25 ψ 36 ψ 46 x 1 x 2 x 3 x 4 20

21 Variable Elimination Z = log x α I ψ α (x α ) x 7 ψ 57 ψ 67 x 5 x 6 ψ 15 ψ 25 ψ 36 ψ 46 x 1 x 2 x 3 x 4 21

22 Variable Elimination Z = log x α I ψ α (x α ) x 7 ψ 57 ψ 67 x 5 x 6 m m

23 Variable Elimination Z = log x α I ψ α (x α ) x 7 ψ 57 ψ 67 x 5 x 6 m m

24 Variable Elimination Z = log x α I ψ α (x α ) Exponential in tree-width. x 7 ψ 57 ψ 67 x 5 x 6 ψ 15 ψ 25 ψ 36 ψ 46 x 1 x 2 x 3 x 4 x 7 ψ 57 ( ψ 15 ψ 25 ) x 5 x 1 x 2 ψ 67 ( ψ 36 ψ 46 ) x 6 x

25 Inference p(x; θ) = exp(θ φ(x) A(θ)) A(θ) log partition function 25

26 Inference A(θ) = log x exp(θ φ(x)) B(θ, λ) C(θ, λ) λ variational parameter A(θ) inf λ sup λ B(θ, λ) C(θ, λ) Summing over configurations Optimization! 26

27 Inference But... (there s always a but!) is there a principled way to obtain parametrized bounds B(θ, λ) and C(θ, λ)? 27

28 Fenchel Duality f(x) concave function Define f (λ) = min x {λ x f(x)}. = f (x λ ) = λ, Slope of tangent at x λ is λ Tangent λ x (Intercept) = f(x λ ) = λ x λ (Intercept) = f (λ) Intercept of line with slope λ tangent to f(x). 28

29 Fenchel Duality Tangent λ x f (λ) f(x) = min λ {λ x f (λ)} Thus, g(x, λ) = λ x f (λ) is an upper bound of f(x)! 29

30 Fenchel Duality Let us apply fenchel duality to the log partition function! A(θ) convex A (µ) = sup(θ µ A(θ)) θ A(θ) = sup(θ µ A (µ)) µ 30

31 Define the marginal polytope Log Partition Function M = {µ R d p(.) s.t. x φ(x)p(x) = µ} M Convex hull of {φ(x)} 31

32 Mean parameter mapping Consider the mapping : Θ M, (θ) := E θ [φ(x)] = x φ(x)p(x; θ) The mapping associates θ to mean parameters µ := (θ) M. Conversely, for µ in Int(M), θ = 1 (µ) (unique if exponential family is minimal) 32

33 Partition function conjugates A (µ) = sup(θ µ A(θ)) θ A(θ) = sup(θ µ A (µ)) µ Optimal parameters given by, θ µ = 1 (µ) µ θ = (θ) 33

34 Partition function conjugate Properties of the fenchel conjugate, A (µ) = sup θ (θ µ A(θ)) A (µ) is finite only for µ M. A (µ) is the entropy of graphical model distribution with mean parameters µ, or equivalently with parameters 1 (µ)! 34

35 A(θ) = sup µ M θ µ A (µ) Partition Function Hardness is due to two bottlenecks M: a polytope with exponentially many vertices and no compact representation A (µ): entropy computation Approximate either or both! 35

36 Pairwise Graphical Models x 1 θ 12 φ 12 (x 1, x 2 ) x 2 Overcomplete potentials: I j (x s ) = I j,k (x s, x t ) = { 1 xs = j 0 otherwise { 1 xs = j and x t = k 0 otherwise. p(x θ) = exp s,j θ s;j I j (x s ) + θ s,t;j,k I j,k (x s, x t ) Ψ(θ) s,t;j,k 36

37 Overcomplete Representation; Mean Parameters µ s;j := E θ [I j (x s )] = p(x s = j; θ) µ s,t;j,k := E θ [I j,k (x s, x t )] = p(x s = j, x t = k; θ) Mean parameters are marginals! Define the following functional forms, µ s (x s ) = j µ s;j I j (x s ) µ st (x s, x t ) = j,k µ s,t;j,k I jk (x s, x t ) 37

38 Outer Polytope Approximations LOCAL(G) := {µ 0 x s µ s (x s ) = 1, x t µ st (x s, x t ) = µ s (x s )} 38

39 Inner Polytope Approximations For the given graph G and a subgraph H, let E(H) = {θ θ st = θ st 1 (s,t) H } M(G; H) = {µ µ = E θ [φ(x)] for some θ E(H)}. M(G; H) M(G) 39

40 Entropy Approximations Tree-structured distributions, p(x; µ) = s µ s (x s ) (s,t) E µ st (x s, x t ) µ s µ t Define, H s (µ s ) := x s µ s (x s ) log µ s (x s ) I st (µ st ) := x s,x t µ st (x s, x t ) log µ st (x s, x t ) 40

41 Tree-structured Entropy, A tree(µ) = s V H s (µ s ) + I st (µ st ) (s,t) E Compact representation; can be used as an approximation. 41

42 Approximate Inference Techniques Belief Propagation Polytope LOCAL(G), Entropy Treestructured entropy! Structured Mean Field Polytope M(G; H), Entropy H- structured entropy Mean Field H = H 0, completely independent graph 42

43 Given: p(x; θ) exp(θ φ(x)) Divergence Measure View Would like a more manageable surrogate distribution, q Q, min q Q D(q(x) p(x; θ)) 43

44 Divergence Measure View min q Q D(q(x) p(x; θ)) D(q p) = KL(q p) Structured Mean Field, Belief Propagation D(p q) = KL(p q) Expectation Propagation (look out for talk on Continuous Markov Random Fields!) Typically approximate KL measure with energy approximations (Bethe free energy, Kikuchi free energy) (Ravikumar,Lafferty 05; Preconditioner Approximations) Optimizing for a minimax criterion reduces task to a generalized linear systems 44

45 problem! 45

46 Bounds on event probabilities Doctor: So what is the lower bound on the diagnosis probability? Graphical Model: value. I don t know, but here is an approximate Doctor: =( Can we get upper and lower bounds on p(x C; θ) (instead of just approximate values)? 46

47 Bounds on event probabilities Classical Chernoff Bounds give useful estimates for i.i.d. random variables. Can they be extended to graphical models? [Ravikumar, Lafferty 04; Variational Chernoff Bounds] 47

48 Classical Chernoff Bounds p θ (X u) E θ(x) u Markov Inequality p θ (X u) = p θ (e λx e λu ) E θ [e λ(x u) ] From this it follows that: log p θ (X u) inf λ 0 ( λu + log E θ [e λx ]) Bounds on cumulant function E θ [e λx ] yield the standard Chernoff Bounds. 48

49 Classical Chernoff Bounds p θ (X u) E θ(x) u Markov Inequality p θ (X u) = p θ (e λx e λu ) E θ [e λ(x u) ] From this it follows that: log p θ (X u) inf λ 0 ( λu + log E θ [e λx ]) Bounds on cumulant function E θ [e λx ] yield the standard Chernoff Bounds. 49

50 Classical Chernoff Bounds p θ (X u) E θ(x) u Markov Inequality p θ (X u) = p θ (e λx e λu ) E θ [e λ(x u) ] From this it follows that: log p θ (X u) inf λ 0 ( λu + log E θ [e λx ]) Bounds on cumulant function E θ [e λx ] yield the standard Chernoff Bounds. 50

51 Generalized Chernoff Bounds Event: X C I C (X) = { 1 X C 0 otherwise I C (X) f λ p θ (X C) E θ [f λ ] 51

52 Generalized Chernoff Bounds Event: X C I C (X) = { 1 X C 0 otherwise I C (X) f λ p θ (X C) E θ [f λ ] 52

53 Graphical Model Chernoff Bounds With f λ = exp( λ, x + u), we get: log p θ (X C) inf λ (S C( λ) + log E θ [e <λ,x> ]) where S C (λ) = sup x C x, λ is the support function of the set C. For an exponential model with sufficient statistic φ(x), the above becomes: log p θ (X C) inf λ (S C,φ( λ) + Φ(θ + λ) Φ(θ)) where Φ is the log-partition function and S C,φ (λ) = sup x C φ(x), λ 53

54 Graphical Model Chernoff Bounds With f λ = exp( λ, x + u), we get: log p θ (X C) inf λ (S C( λ) + log E θ [e <λ,x> ]) where S C (λ) = sup x C x, λ is the support function of the set C. For an exponential model with sufficient statistic φ(x), the above becomes: log p θ (X C) inf λ (S C,φ( λ) + Φ(θ + λ) Φ(θ)) where Φ is the log-partition function and S C,φ (λ) = sup x C φ(x), λ 54

55 MAP Estimation x 3 x 2 x 4 x 1 55

56 MAP Estimation x 3 x 2 PROB = 0.01 x 4 x 1 56

57 MAP Estimation x 3 x 2 PROB = 0.2 x 4 x 1 57

58 MAP Estimation x 3 x 2 PROB = 0.2 x 4 x 1 Most Probable Configuration? 58

59 Polytope View µ = max x θ φ(x) = sup θ µ µ M 59

60 Outer Polytope Relaxations LOCAL(G) := {µ 0 x s µ s (x s ) = 1, x t µ st (x s, x t ) = µ s (x s )} 60

61 Outer Polytope Relaxations sup µ M(G) sup µ LOCAL(G) θ µ θ µ A Linear Program! (Chekuri, Khanna, Naor, Zosin 05; LP Formulation for Metric Labeling) (Wainwright, Jaakkola, Willsky 05; Tree-reweighted Max-Product, Dual of LP) 61

62 Inner Polytope Approximations If M I M is any subset of the marginal polytope that includes all of the vertices, µ = max x θ, φ = sup θ, µ µ M I 62

63 Inner Polytope Approximations For the given graph G and a subgraph H, let E(H) = {θ θ st = θ st 1 (s,t) H } M(G; H) = {µ µ = E θ [φ(x)] for some θ E(H)}. M(G; H) M(G) 63

64 Mean Field parameters, Inner Polytope Approximations M(G; H 0 ) = {µ(s; j), µ(s, j; t, k) 0 µ(s; j) 1, µ(s, j; t, k) = µ(s; j)µ(t; k)} Mean Field Relaxation, sup θ, µ µ M(G;H 0 ) Quadratic Program! = sup µ M(G;H 0 ) = sup µ M(G;H 0 ) θ s;j µ(s; j) + θ s,j;t,k µ(s, j; t, k) st;jk s;j θ s;j µ(s; j) + θ s,j;t,k µ(s; j)µ(t; k) st;jk s;j (Ravikumar, Lafferty 06; Quadratic Relaxations for Metric Labeling and MAP in MRFs) 64

65 References Martin. J. Wainwright and Michael I. Jordan (2003). Graphical models, exponential families, and variational inference. M. I. Jordan, Z. Ghahramani, T. S. Jaakkola, and L. K. Saul (1999). An introduction to variational methods for graphical models. Pradeep Ravikumar, John Lafferty (2005). Preconditioner Approximations for Probabilistic Graphical Models. Pradeep Ravikumar, John Lafferty (2004). Variational Chernoff Bounds for Graphical Models. Chekuri, C., Khanna, S., Naor, J., Zosin, L. (2005). A linear programming formulation and approximation algorithms for the metric labeling problem. Pradeep Ravikumar, John Lafferty (2006). Quadratic Programming Relaxations for Metric Labeling and Markov Random Field MAP Estimation. 65

14 : Theory of Variational Inference: Inner and Outer Approximation

14 : Theory of Variational Inference: Inner and Outer Approximation 10-708: Probabilistic Graphical Models 10-708, Spring 2017 14 : Theory of Variational Inference: Inner and Outer Approximation Lecturer: Eric P. Xing Scribes: Maria Ryskina, Yen-Chia Hsu 1 Introduction

More information

Probabilistic Graphical Models. Theory of Variational Inference: Inner and Outer Approximation. Lecture 15, March 4, 2013

Probabilistic Graphical Models. Theory of Variational Inference: Inner and Outer Approximation. Lecture 15, March 4, 2013 School of Computer Science Probabilistic Graphical Models Theory of Variational Inference: Inner and Outer Approximation Junming Yin Lecture 15, March 4, 2013 Reading: W & J Book Chapters 1 Roadmap Two

More information

14 : Theory of Variational Inference: Inner and Outer Approximation

14 : Theory of Variational Inference: Inner and Outer Approximation 10-708: Probabilistic Graphical Models 10-708, Spring 2014 14 : Theory of Variational Inference: Inner and Outer Approximation Lecturer: Eric P. Xing Scribes: Yu-Hsin Kuo, Amos Ng 1 Introduction Last lecture

More information

Probabilistic Graphical Models

Probabilistic Graphical Models School of Computer Science Probabilistic Graphical Models Variational Inference II: Mean Field Method and Variational Principle Junming Yin Lecture 15, March 7, 2012 X 1 X 1 X 1 X 1 X 2 X 3 X 2 X 2 X 3

More information

Probabilistic Graphical Models

Probabilistic Graphical Models School of Computer Science Probabilistic Graphical Models Variational Inference III: Variational Principle I Junming Yin Lecture 16, March 19, 2012 X 1 X 1 X 1 X 1 X 2 X 3 X 2 X 2 X 3 X 3 Reading: X 4

More information

Probabilistic Graphical Models

Probabilistic Graphical Models School of Computer Science Probabilistic Graphical Models Variational Inference IV: Variational Principle II Junming Yin Lecture 17, March 21, 2012 X 1 X 1 X 1 X 1 X 2 X 3 X 2 X 2 X 3 X 3 Reading: X 4

More information

Quadratic Programming Relaxations for Metric Labeling and Markov Random Field MAP Estimation

Quadratic Programming Relaxations for Metric Labeling and Markov Random Field MAP Estimation Quadratic Programming Relaations for Metric Labeling and Markov Random Field MAP Estimation Pradeep Ravikumar John Lafferty School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213,

More information

A new class of upper bounds on the log partition function

A new class of upper bounds on the log partition function Accepted to IEEE Transactions on Information Theory This version: March 25 A new class of upper bounds on the log partition function 1 M. J. Wainwright, T. S. Jaakkola and A. S. Willsky Abstract We introduce

More information

CSE 254: MAP estimation via agreement on (hyper)trees: Message-passing and linear programming approaches

CSE 254: MAP estimation via agreement on (hyper)trees: Message-passing and linear programming approaches CSE 254: MAP estimation via agreement on (hyper)trees: Message-passing and linear programming approaches A presentation by Evan Ettinger November 11, 2005 Outline Introduction Motivation and Background

More information

Graphical models and message-passing Part II: Marginals and likelihoods

Graphical models and message-passing Part II: Marginals and likelihoods Graphical models and message-passing Part II: Marginals and likelihoods Martin Wainwright UC Berkeley Departments of Statistics, and EECS Tutorial materials (slides, monograph, lecture notes) available

More information

On the optimality of tree-reweighted max-product message-passing

On the optimality of tree-reweighted max-product message-passing Appeared in Uncertainty on Artificial Intelligence, July 2005, Edinburgh, Scotland. On the optimality of tree-reweighted max-product message-passing Vladimir Kolmogorov Microsoft Research Cambridge, UK

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Lecture 9: Variational Inference Relaxations Volkan Cevher, Matthias Seeger Ecole Polytechnique Fédérale de Lausanne 24/10/2011 (EPFL) Graphical Models 24/10/2011 1 / 15

More information

Junction Tree, BP and Variational Methods

Junction Tree, BP and Variational Methods Junction Tree, BP and Variational Methods Adrian Weller MLSALT4 Lecture Feb 21, 2018 With thanks to David Sontag (MIT) and Tony Jebara (Columbia) for use of many slides and illustrations For more information,

More information

Approximate Inference, Structure Learning and Feature Estimation in Markov Random Fields

Approximate Inference, Structure Learning and Feature Estimation in Markov Random Fields Approximate Inference, Structure Learning and Feature Estimation in Markov Random Fields Pradeep Ravikumar August 2007 CMU-ML-07-115 Approximate Inference, Structure Learning and Feature Estimation in

More information

High dimensional Ising model selection

High dimensional Ising model selection High dimensional Ising model selection Pradeep Ravikumar UT Austin (based on work with John Lafferty, Martin Wainwright) Sparse Ising model US Senate 109th Congress Banerjee et al, 2008 Estimate a sparse

More information

Does Better Inference mean Better Learning?

Does Better Inference mean Better Learning? Does Better Inference mean Better Learning? Andrew E. Gelfand, Rina Dechter & Alexander Ihler Department of Computer Science University of California, Irvine {agelfand,dechter,ihler}@ics.uci.edu Abstract

More information

Preconditioner Approximations for Probabilistic Graphical Models

Preconditioner Approximations for Probabilistic Graphical Models Preconditioner Approximations for Probabilistic Graphical Models Pradeep Ravikumar John Lafferty School of Computer Science Carnegie Mellon University Abstract We present a family of approximation techniques

More information

Dual Decomposition for Marginal Inference

Dual Decomposition for Marginal Inference Dual Decomposition for Marginal Inference Justin Domke Rochester Institute of echnology Rochester, NY 14623 Abstract We present a dual decomposition approach to the treereweighted belief propagation objective.

More information

Probabilistic and Bayesian Machine Learning

Probabilistic and Bayesian Machine Learning Probabilistic and Bayesian Machine Learning Day 4: Expectation and Belief Propagation Yee Whye Teh ywteh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit University College London http://www.gatsby.ucl.ac.uk/

More information

Semidefinite relaxations for approximate inference on graphs with cycles

Semidefinite relaxations for approximate inference on graphs with cycles Semidefinite relaxations for approximate inference on graphs with cycles Martin J. Wainwright Electrical Engineering and Computer Science UC Berkeley, Berkeley, CA 94720 wainwrig@eecs.berkeley.edu Michael

More information

12 : Variational Inference I

12 : Variational Inference I 10-708: Probabilistic Graphical Models, Spring 2015 12 : Variational Inference I Lecturer: Eric P. Xing Scribes: Fattaneh Jabbari, Eric Lei, Evan Shapiro 1 Introduction Probabilistic inference is one of

More information

Exact MAP estimates by (hyper)tree agreement

Exact MAP estimates by (hyper)tree agreement Exact MAP estimates by (hyper)tree agreement Martin J. Wainwright, Department of EECS, UC Berkeley, Berkeley, CA 94720 martinw@eecs.berkeley.edu Tommi S. Jaakkola and Alan S. Willsky, Department of EECS,

More information

arxiv: v1 [stat.ml] 26 Feb 2013

arxiv: v1 [stat.ml] 26 Feb 2013 Variational Algorithms for Marginal MAP Qiang Liu Donald Bren School of Information and Computer Sciences University of California, Irvine Irvine, CA, 92697-3425, USA qliu1@uci.edu arxiv:1302.6584v1 [stat.ml]

More information

Inconsistent parameter estimation in Markov random fields: Benefits in the computation-limited setting

Inconsistent parameter estimation in Markov random fields: Benefits in the computation-limited setting Inconsistent parameter estimation in Markov random fields: Benefits in the computation-limited setting Martin J. Wainwright Department of Statistics, and Department of Electrical Engineering and Computer

More information

Variational Algorithms for Marginal MAP

Variational Algorithms for Marginal MAP Variational Algorithms for Marginal MAP Qiang Liu Department of Computer Science University of California, Irvine Irvine, CA, 92697 qliu1@ics.uci.edu Alexander Ihler Department of Computer Science University

More information

MAP estimation via agreement on (hyper)trees: Message-passing and linear programming approaches

MAP estimation via agreement on (hyper)trees: Message-passing and linear programming approaches MAP estimation via agreement on (hyper)trees: Message-passing and linear programming approaches Martin Wainwright Tommi Jaakkola Alan Willsky martinw@eecs.berkeley.edu tommi@ai.mit.edu willsky@mit.edu

More information

Introduction to graphical models: Lecture III

Introduction to graphical models: Lecture III Introduction to graphical models: Lecture III Martin Wainwright UC Berkeley Departments of Statistics, and EECS Martin Wainwright (UC Berkeley) Some introductory lectures January 2013 1 / 25 Introduction

More information

13 : Variational Inference: Loopy Belief Propagation

13 : Variational Inference: Loopy Belief Propagation 10-708: Probabilistic Graphical Models 10-708, Spring 2014 13 : Variational Inference: Loopy Belief Propagation Lecturer: Eric P. Xing Scribes: Rajarshi Das, Zhengzhong Liu, Dishan Gupta 1 Introduction

More information

CS Lecture 19. Exponential Families & Expectation Propagation

CS Lecture 19. Exponential Families & Expectation Propagation CS 6347 Lecture 19 Exponential Families & Expectation Propagation Discrete State Spaces We have been focusing on the case of MRFs over discrete state spaces Probability distributions over discrete spaces

More information

Fisher Information in Gaussian Graphical Models

Fisher Information in Gaussian Graphical Models Fisher Information in Gaussian Graphical Models Jason K. Johnson September 21, 2006 Abstract This note summarizes various derivations, formulas and computational algorithms relevant to the Fisher information

More information

CS Lecture 8 & 9. Lagrange Multipliers & Varitional Bounds

CS Lecture 8 & 9. Lagrange Multipliers & Varitional Bounds CS 6347 Lecture 8 & 9 Lagrange Multipliers & Varitional Bounds General Optimization subject to: min ff 0() R nn ff ii 0, h ii = 0, ii = 1,, mm ii = 1,, pp 2 General Optimization subject to: min ff 0()

More information

CSC 412 (Lecture 4): Undirected Graphical Models

CSC 412 (Lecture 4): Undirected Graphical Models CSC 412 (Lecture 4): Undirected Graphical Models Raquel Urtasun University of Toronto Feb 2, 2016 R Urtasun (UofT) CSC 412 Feb 2, 2016 1 / 37 Today Undirected Graphical Models: Semantics of the graph:

More information

Lecture 9: PGM Learning

Lecture 9: PGM Learning 13 Oct 2014 Intro. to Stats. Machine Learning COMP SCI 4401/7401 Table of Contents I Learning parameters in MRFs 1 Learning parameters in MRFs Inference and Learning Given parameters (of potentials) and

More information

Graphs with Cycles: A Summary of Martin Wainwright s Ph.D. Thesis

Graphs with Cycles: A Summary of Martin Wainwright s Ph.D. Thesis Graphs with Cycles: A Summary of Martin Wainwright s Ph.D. Thesis Constantine Caramanis November 6, 2002 This paper is a (partial) summary of chapters 2,5 and 7 of Martin Wainwright s Ph.D. Thesis. For

More information

Variational algorithms for marginal MAP

Variational algorithms for marginal MAP Variational algorithms for marginal MAP Alexander Ihler UC Irvine CIOG Workshop November 2011 Variational algorithms for marginal MAP Alexander Ihler UC Irvine CIOG Workshop November 2011 Work with Qiang

More information

Message-Passing Algorithms for GMRFs and Non-Linear Optimization

Message-Passing Algorithms for GMRFs and Non-Linear Optimization Message-Passing Algorithms for GMRFs and Non-Linear Optimization Jason Johnson Joint Work with Dmitry Malioutov, Venkat Chandrasekaran and Alan Willsky Stochastic Systems Group, MIT NIPS Workshop: Approximate

More information

Surrogate loss functions, divergences and decentralized detection

Surrogate loss functions, divergences and decentralized detection Surrogate loss functions, divergences and decentralized detection XuanLong Nguyen Department of Electrical Engineering and Computer Science U.C. Berkeley Advisors: Michael Jordan & Martin Wainwright 1

More information

Sparse Graph Learning via Markov Random Fields

Sparse Graph Learning via Markov Random Fields Sparse Graph Learning via Markov Random Fields Xin Sui, Shao Tang Sep 23, 2016 Xin Sui, Shao Tang Sparse Graph Learning via Markov Random Fields Sep 23, 2016 1 / 36 Outline 1 Introduction to graph learning

More information

13 : Variational Inference: Loopy Belief Propagation and Mean Field

13 : Variational Inference: Loopy Belief Propagation and Mean Field 10-708: Probabilistic Graphical Models 10-708, Spring 2012 13 : Variational Inference: Loopy Belief Propagation and Mean Field Lecturer: Eric P. Xing Scribes: Peter Schulam and William Wang 1 Introduction

More information

A Combined LP and QP Relaxation for MAP

A Combined LP and QP Relaxation for MAP A Combined LP and QP Relaxation for MAP Patrick Pletscher ETH Zurich, Switzerland pletscher@inf.ethz.ch Sharon Wulff ETH Zurich, Switzerland sharon.wulff@inf.ethz.ch Abstract MAP inference for general

More information

17 Variational Inference

17 Variational Inference Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science 6.438 Algorithms for Inference Fall 2014 17 Variational Inference Prompted by loopy graphs for which exact

More information

Expectation Propagation in Factor Graphs: A Tutorial

Expectation Propagation in Factor Graphs: A Tutorial DRAFT: Version 0.1, 28 October 2005. Do not distribute. Expectation Propagation in Factor Graphs: A Tutorial Charles Sutton October 28, 2005 Abstract Expectation propagation is an important variational

More information

Approximating the Partition Function by Deleting and then Correcting for Model Edges (Extended Abstract)

Approximating the Partition Function by Deleting and then Correcting for Model Edges (Extended Abstract) Approximating the Partition Function by Deleting and then Correcting for Model Edges (Extended Abstract) Arthur Choi and Adnan Darwiche Computer Science Department University of California, Los Angeles

More information

Bayesian Machine Learning - Lecture 7

Bayesian Machine Learning - Lecture 7 Bayesian Machine Learning - Lecture 7 Guido Sanguinetti Institute for Adaptive and Neural Computation School of Informatics University of Edinburgh gsanguin@inf.ed.ac.uk March 4, 2015 Today s lecture 1

More information

Robustness and duality of maximum entropy and exponential family distributions

Robustness and duality of maximum entropy and exponential family distributions Chapter 7 Robustness and duality of maximum entropy and exponential family distributions In this lecture, we continue our study of exponential families, but now we investigate their properties in somewhat

More information

Bounding the Partition Function using Hölder s Inequality

Bounding the Partition Function using Hölder s Inequality Qiang Liu Alexander Ihler Department of Computer Science, University of California, Irvine, CA, 92697, USA qliu1@uci.edu ihler@ics.uci.edu Abstract We describe an algorithm for approximate inference in

More information

3 : Representation of Undirected GM

3 : Representation of Undirected GM 10-708: Probabilistic Graphical Models 10-708, Spring 2016 3 : Representation of Undirected GM Lecturer: Eric P. Xing Scribes: Longqi Cai, Man-Chia Chang 1 MRF vs BN There are two types of graphical models:

More information

Variational Inference (11/04/13)

Variational Inference (11/04/13) STA561: Probabilistic machine learning Variational Inference (11/04/13) Lecturer: Barbara Engelhardt Scribes: Matt Dickenson, Alireza Samany, Tracy Schifeling 1 Introduction In this lecture we will further

More information

6.867 Machine learning, lecture 23 (Jaakkola)

6.867 Machine learning, lecture 23 (Jaakkola) Lecture topics: Markov Random Fields Probabilistic inference Markov Random Fields We will briefly go over undirected graphical models or Markov Random Fields (MRFs) as they will be needed in the context

More information

Variational Inference. Sargur Srihari

Variational Inference. Sargur Srihari Variational Inference Sargur srihari@cedar.buffalo.edu 1 Plan of discussion We first describe inference with PGMs and the intractability of exact inference Then give a taxonomy of inference algorithms

More information

U Logo Use Guidelines

U Logo Use Guidelines Information Theory Lecture 3: Applications to Machine Learning U Logo Use Guidelines Mark Reid logo is a contemporary n of our heritage. presents our name, d and our motto: arn the nature of things. authenticity

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Brown University CSCI 295-P, Spring 213 Prof. Erik Sudderth Lecture 11: Inference & Learning Overview, Gaussian Graphical Models Some figures courtesy Michael Jordan s draft

More information

A brief introduction to Conditional Random Fields

A brief introduction to Conditional Random Fields A brief introduction to Conditional Random Fields Mark Johnson Macquarie University April, 2005, updated October 2010 1 Talk outline Graphical models Maximum likelihood and maximum conditional likelihood

More information

High-dimensional graphical model selection: Practical and information-theoretic limits

High-dimensional graphical model selection: Practical and information-theoretic limits 1 High-dimensional graphical model selection: Practical and information-theoretic limits Martin Wainwright Departments of Statistics, and EECS UC Berkeley, California, USA Based on joint work with: John

More information

Part I. C. M. Bishop PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS

Part I. C. M. Bishop PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS Part I C. M. Bishop PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS Probabilistic Graphical Models Graphical representation of a probabilistic model Each variable corresponds to a

More information

CS Lecture 13. More Maximum Likelihood

CS Lecture 13. More Maximum Likelihood CS 6347 Lecture 13 More Maxiu Likelihood Recap Last tie: Introduction to axiu likelihood estiation MLE for Bayesian networks Optial CPTs correspond to epirical counts Today: MLE for CRFs 2 Maxiu Likelihood

More information

Modeling and Estimation in Gaussian Graphical Models: Maximum-Entropy Methods and Walk-Sum Analysis. Venkat Chandrasekaran

Modeling and Estimation in Gaussian Graphical Models: Maximum-Entropy Methods and Walk-Sum Analysis. Venkat Chandrasekaran Modeling and Estimation in Gaussian Graphical Models: Maximum-Entropy Methods and Walk-Sum Analysis by Venkat Chandrasekaran Submitted to the Department of Electrical Engineering and Computer Science in

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Lecture 11 CRFs, Exponential Family CS/CNS/EE 155 Andreas Krause Announcements Homework 2 due today Project milestones due next Monday (Nov 9) About half the work should

More information

High-dimensional graphical model selection: Practical and information-theoretic limits

High-dimensional graphical model selection: Practical and information-theoretic limits 1 High-dimensional graphical model selection: Practical and information-theoretic limits Martin Wainwright Departments of Statistics, and EECS UC Berkeley, California, USA Based on joint work with: John

More information

UNDERSTANDING BELIEF PROPOGATION AND ITS GENERALIZATIONS

UNDERSTANDING BELIEF PROPOGATION AND ITS GENERALIZATIONS UNDERSTANDING BELIEF PROPOGATION AND ITS GENERALIZATIONS JONATHAN YEDIDIA, WILLIAM FREEMAN, YAIR WEISS 2001 MERL TECH REPORT Kristin Branson and Ian Fasel June 11, 2003 1. Inference Inference problems

More information

Concavity of reweighted Kikuchi approximation

Concavity of reweighted Kikuchi approximation Concavity of reweighted Kikuchi approximation Po-Ling Loh Department of Statistics The Wharton School University of Pennsylvania loh@wharton.upenn.edu Andre Wibisono Computer Science Division University

More information

Inference as Optimization

Inference as Optimization Inference as Optimization Sargur Srihari srihari@cedar.buffalo.edu 1 Topics in Inference as Optimization Overview Exact Inference revisited The Energy Functional Optimizing the Energy Functional 2 Exact

More information

Clique trees & Belief Propagation. Siamak Ravanbakhsh Winter 2018

Clique trees & Belief Propagation. Siamak Ravanbakhsh Winter 2018 Graphical Models Clique trees & Belief Propagation Siamak Ravanbakhsh Winter 2018 Learning objectives message passing on clique trees its relation to variable elimination two different forms of belief

More information

Undirected Graphical Models: Markov Random Fields

Undirected Graphical Models: Markov Random Fields Undirected Graphical Models: Markov Random Fields 40-956 Advanced Topics in AI: Probabilistic Graphical Models Sharif University of Technology Soleymani Spring 2015 Markov Random Field Structure: undirected

More information

Graphical models and message-passing Part I: Basics and MAP computation

Graphical models and message-passing Part I: Basics and MAP computation Graphical models and message-passing Part I: Basics and MAP computation Martin Wainwright UC Berkeley Departments of Statistics, and EECS Tutorial materials (slides, monograph, lecture notes) available

More information

Undirected Graphical Models

Undirected Graphical Models Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Properties Properties 3 Generative vs. Conditional

More information

13: Variational inference II

13: Variational inference II 10-708: Probabilistic Graphical Models, Spring 2015 13: Variational inference II Lecturer: Eric P. Xing Scribes: Ronghuo Zheng, Zhiting Hu, Yuntian Deng 1 Introduction We started to talk about variational

More information

Logistic Regression: Online, Lazy, Kernelized, Sequential, etc.

Logistic Regression: Online, Lazy, Kernelized, Sequential, etc. Logistic Regression: Online, Lazy, Kernelized, Sequential, etc. Harsha Veeramachaneni Thomson Reuter Research and Development April 1, 2010 Harsha Veeramachaneni (TR R&D) Logistic Regression April 1, 2010

More information

Alternative Parameterizations of Markov Networks. Sargur Srihari

Alternative Parameterizations of Markov Networks. Sargur Srihari Alternative Parameterizations of Markov Networks Sargur srihari@cedar.buffalo.edu 1 Topics Three types of parameterization 1. Gibbs Parameterization 2. Factor Graphs 3. Log-linear Models Features (Ising,

More information

Convergent Tree-reweighted Message Passing for Energy Minimization

Convergent Tree-reweighted Message Passing for Energy Minimization Accepted to IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), 2006. Convergent Tree-reweighted Message Passing for Energy Minimization Vladimir Kolmogorov University College London,

More information

Bayesian Learning in Undirected Graphical Models

Bayesian Learning in Undirected Graphical Models Bayesian Learning in Undirected Graphical Models Zoubin Ghahramani Gatsby Computational Neuroscience Unit University College London, UK http://www.gatsby.ucl.ac.uk/ Work with: Iain Murray and Hyun-Chul

More information

The Moment Method; Convex Duality; and Large/Medium/Small Deviations

The Moment Method; Convex Duality; and Large/Medium/Small Deviations Stat 928: Statistical Learning Theory Lecture: 5 The Moment Method; Convex Duality; and Large/Medium/Small Deviations Instructor: Sham Kakade The Exponential Inequality and Convex Duality The exponential

More information

CSC2535: Computation in Neural Networks Lecture 7: Variational Bayesian Learning & Model Selection

CSC2535: Computation in Neural Networks Lecture 7: Variational Bayesian Learning & Model Selection CSC2535: Computation in Neural Networks Lecture 7: Variational Bayesian Learning & Model Selection (non-examinable material) Matthew J. Beal February 27, 2004 www.variational-bayes.org Bayesian Model Selection

More information

A graph contains a set of nodes (vertices) connected by links (edges or arcs)

A graph contains a set of nodes (vertices) connected by links (edges or arcs) BOLTZMANN MACHINES Generative Models Graphical Models A graph contains a set of nodes (vertices) connected by links (edges or arcs) In a probabilistic graphical model, each node represents a random variable,

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models David Sontag New York University Lecture 6, March 7, 2013 David Sontag (NYU) Graphical Models Lecture 6, March 7, 2013 1 / 25 Today s lecture 1 Dual decomposition 2 MAP inference

More information

Dual Decomposition for Inference

Dual Decomposition for Inference Dual Decomposition for Inference Yunshu Liu ASPITRG Research Group 2014-05-06 References: [1]. D. Sontag, A. Globerson and T. Jaakkola, Introduction to Dual Decomposition for Inference, Optimization for

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Brown University CSCI 2950-P, Spring 2013 Prof. Erik Sudderth Lecture 12: Gaussian Belief Propagation, State Space Models and Kalman Filters Guest Kalman Filter Lecture by

More information

Convex Relaxations for Markov Random Field MAP estimation

Convex Relaxations for Markov Random Field MAP estimation Convex Relaxations for Markov Random Field MAP estimation Timothee Cour GRASP Lab, University of Pennsylvania September 2008 Abstract Markov Random Fields (MRF) are commonly used in computer vision and

More information

Convexifying the Bethe Free Energy

Convexifying the Bethe Free Energy 402 MESHI ET AL. Convexifying the Bethe Free Energy Ofer Meshi Ariel Jaimovich Amir Globerson Nir Friedman School of Computer Science and Engineering Hebrew University, Jerusalem, Israel 91904 meshi,arielj,gamir,nir@cs.huji.ac.il

More information

Graphical models. Sunita Sarawagi IIT Bombay

Graphical models. Sunita Sarawagi IIT Bombay 1 Graphical models Sunita Sarawagi IIT Bombay http://www.cse.iitb.ac.in/~sunita 2 Probabilistic modeling Given: several variables: x 1,... x n, n is large. Task: build a joint distribution function Pr(x

More information

Tightness of LP Relaxations for Almost Balanced Models

Tightness of LP Relaxations for Almost Balanced Models Tightness of LP Relaxations for Almost Balanced Models Adrian Weller University of Cambridge AISTATS May 10, 2016 Joint work with Mark Rowland and David Sontag For more information, see http://mlg.eng.cam.ac.uk/adrian/

More information

topics about f-divergence

topics about f-divergence topics about f-divergence Presented by Liqun Chen Mar 16th, 2018 1 Outline 1 f-gan: Training Generative Neural Samplers using Variational Experiments 2 f-gans in an Information Geometric Nutshell Experiments

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 7 Approximate

More information

On the Convergence of the Convex Relaxation Method and Distributed Optimization of Bethe Free Energy

On the Convergence of the Convex Relaxation Method and Distributed Optimization of Bethe Free Energy On the Convergence of the Convex Relaxation Method and Distributed Optimization of Bethe Free Energy Ming Su Department of Electrical Engineering and Department of Statistics University of Washington,

More information

Concavity of reweighted Kikuchi approximation

Concavity of reweighted Kikuchi approximation Concavity of reweighted ikuchi approximation Po-Ling Loh Department of Statistics The Wharton School University of Pennsylvania loh@wharton.upenn.edu Andre Wibisono Computer Science Division University

More information

Generative and Discriminative Approaches to Graphical Models CMSC Topics in AI

Generative and Discriminative Approaches to Graphical Models CMSC Topics in AI Generative and Discriminative Approaches to Graphical Models CMSC 35900 Topics in AI Lecture 2 Yasemin Altun January 26, 2007 Review of Inference on Graphical Models Elimination algorithm finds single

More information

Conditional Random Fields and beyond DANIEL KHASHABI CS 546 UIUC, 2013

Conditional Random Fields and beyond DANIEL KHASHABI CS 546 UIUC, 2013 Conditional Random Fields and beyond DANIEL KHASHABI CS 546 UIUC, 2013 Outline Modeling Inference Training Applications Outline Modeling Problem definition Discriminative vs. Generative Chain CRF General

More information

Alternative Parameterizations of Markov Networks. Sargur Srihari

Alternative Parameterizations of Markov Networks. Sargur Srihari Alternative Parameterizations of Markov Networks Sargur srihari@cedar.buffalo.edu 1 Topics Three types of parameterization 1. Gibbs Parameterization 2. Factor Graphs 3. Log-linear Models with Energy functions

More information

Message-passing for Graph-structured Linear Programs: Proximal Methods and Rounding Schemes

Message-passing for Graph-structured Linear Programs: Proximal Methods and Rounding Schemes MESSAGE-PASSING FOR GRAPH-STRUCTURED LINEAR PROGRAMS Message-passing for Graph-structured Linear Programs: Proximal Methods and Rounding Schemes Pradeep Ravikumar Department of Statistics University of

More information

Unsupervised Learning

Unsupervised Learning Unsupervised Learning Bayesian Model Comparison Zoubin Ghahramani zoubin@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, and MSc in Intelligent Systems, Dept Computer Science University College

More information

Inference and Representation

Inference and Representation Inference and Representation David Sontag New York University Lecture 5, Sept. 30, 2014 David Sontag (NYU) Inference and Representation Lecture 5, Sept. 30, 2014 1 / 16 Today s lecture 1 Running-time of

More information

Information Geometric view of Belief Propagation

Information Geometric view of Belief Propagation Information Geometric view of Belief Propagation Yunshu Liu 2013-10-17 References: [1]. Shiro Ikeda, Toshiyuki Tanaka and Shun-ichi Amari, Stochastic reasoning, Free energy and Information Geometry, Neural

More information

Tightening Fractional Covering Upper Bounds on the Partition Function for High-Order Region Graphs

Tightening Fractional Covering Upper Bounds on the Partition Function for High-Order Region Graphs Tightening Fractional Covering Upper Bounds on the Partition Function for High-Order Region Graphs Tamir Hazan TTI Chicago Chicago, IL 60637 Jian Peng TTI Chicago Chicago, IL 60637 Amnon Shashua The Hebrew

More information

6.1 Variational representation of f-divergences

6.1 Variational representation of f-divergences ECE598: Information-theoretic methods in high-dimensional statistics Spring 2016 Lecture 6: Variational representation, HCR and CR lower bounds Lecturer: Yihong Wu Scribe: Georgios Rovatsos, Feb 11, 2016

More information

Learning discrete graphical models via generalized inverse covariance matrices

Learning discrete graphical models via generalized inverse covariance matrices Learning discrete graphical models via generalized inverse covariance matrices Duzhe Wang, Yiming Lv, Yongjoon Kim, Young Lee Department of Statistics University of Wisconsin-Madison {dwang282, lv23, ykim676,

More information

Estimating Latent Variable Graphical Models with Moments and Likelihoods

Estimating Latent Variable Graphical Models with Moments and Likelihoods Estimating Latent Variable Graphical Models with Moments and Likelihoods Arun Tejasvi Chaganty Percy Liang Stanford University June 18, 2014 Chaganty, Liang (Stanford University) Moments and Likelihoods

More information

A Convex Upper Bound on the Log-Partition Function for Binary Graphical Models

A Convex Upper Bound on the Log-Partition Function for Binary Graphical Models A Convex Upper Bound on the Log-Partition Function for Binary Graphical Models Laurent El Ghaoui and Assane Gueye Department of Electrical Engineering and Computer Science University of California Berkeley

More information

Algorithms for Reasoning with Probabilistic Graphical Models. Class 3: Approximate Inference. International Summer School on Deep Learning July 2017

Algorithms for Reasoning with Probabilistic Graphical Models. Class 3: Approximate Inference. International Summer School on Deep Learning July 2017 Algorithms for Reasoning with Probabilistic Graphical Models Class 3: Approximate Inference International Summer School on Deep Learning July 2017 Prof. Rina Dechter Prof. Alexander Ihler Approximate Inference

More information

Graphical Model Inference with Perfect Graphs

Graphical Model Inference with Perfect Graphs Graphical Model Inference with Perfect Graphs Tony Jebara Columbia University July 25, 2013 joint work with Adrian Weller Graphical models and Markov random fields We depict a graphical model G as a bipartite

More information

4 : Exact Inference: Variable Elimination

4 : Exact Inference: Variable Elimination 10-708: Probabilistic Graphical Models 10-708, Spring 2014 4 : Exact Inference: Variable Elimination Lecturer: Eric P. ing Scribes: Soumya Batra, Pradeep Dasigi, Manzil Zaheer 1 Probabilistic Inference

More information