Probabilistic Graphical Models

Similar documents
13: Variational inference II

Bayesian Machine Learning - Lecture 7

Probabilistic Graphical Models

14 : Theory of Variational Inference: Inner and Outer Approximation

Junction Tree, BP and Variational Methods

Variational Inference (11/04/13)

STA 4273H: Statistical Machine Learning

Probabilistic Graphical Models

Lecture 13 : Variational Inference: Mean Field Approximation

STA 4273H: Statistical Machine Learning

14 : Theory of Variational Inference: Inner and Outer Approximation

STA 4273H: Statistical Machine Learning

Graphical Models for Collaborative Filtering

Discrete Markov Random Fields the Inference story. Pradeep Ravikumar

Graphical Models and Kernel Methods

STA 414/2104: Machine Learning

Probabilistic Graphical Models. Theory of Variational Inference: Inner and Outer Approximation. Lecture 15, March 4, 2013

ECE521 Tutorial 11. Topic Review. ECE521 Winter Credits to Alireza Makhzani, Alex Schwing, Rich Zemel and TAs for slides. ECE521 Tutorial 11 / 4

6.867 Machine learning, lecture 23 (Jaakkola)

Probabilistic Graphical Models

Lecture 9: PGM Learning

The Particle Filter. PD Dr. Rudolph Triebel Computer Vision Group. Machine Learning for Computer Vision

Chapter 16. Structured Probabilistic Models for Deep Learning

9 Forward-backward algorithm, sum-product on factor graphs

Probabilistic Graphical Models

Undirected Graphical Models

Directed Probabilistic Graphical Models CMSC 678 UMBC

Generative and Discriminative Approaches to Graphical Models CMSC Topics in AI

The Ising model and Markov chain Monte Carlo

Conditional Random Field

Chris Bishop s PRML Ch. 8: Graphical Models

Basic math for biology

STA 4273H: Statistical Machine Learning

Probabilistic Graphical Models

Linear Dynamical Systems

Based on slides by Richard Zemel

Bayesian Learning in Undirected Graphical Models

Probabilistic Machine Learning

13 : Variational Inference: Loopy Belief Propagation and Mean Field

Probabilistic Graphical Models

Variational Scoring of Graphical Model Structures

Directed and Undirected Graphical Models

Sampling Methods (11/30/04)

Probabilistic Graphical Models (I)

Random Field Models for Applications in Computer Vision

Probabilistic Graphical Models

Undirected Graphical Models

Does Better Inference mean Better Learning?

13 : Variational Inference: Loopy Belief Propagation

Unsupervised Learning

The Origin of Deep Learning. Lili Mou Jan, 2015

an introduction to bayesian inference

CS839: Probabilistic Graphical Models. Lecture 7: Learning Fully Observed BNs. Theo Rekatsinas

Two Useful Bounds for Variational Inference

Statistical Learning

Conditional Random Fields and beyond DANIEL KHASHABI CS 546 UIUC, 2013

Pattern Recognition and Machine Learning

Probabilistic Graphical Models

Expectation Propagation Algorithm

A graph contains a set of nodes (vertices) connected by links (edges or arcs)

Lecture 4 October 18th

1 Bayesian Linear Regression (BLR)

3 : Representation of Undirected GM

Graphical Models. Outline. HMM in short. HMMs. What about continuous HMMs? p(o t q t ) ML 701. Anna Goldenberg ... t=1. !

Sequence labeling. Taking collective a set of interrelated instances x 1,, x T and jointly labeling them

p L yi z n m x N n xi

Part 1: Expectation Propagation

Variational Inference. Sargur Srihari

Probabilistic Graphical Models Homework 2: Due February 24, 2014 at 4 pm

CSC 2541: Bayesian Methods for Machine Learning

CPSC 540: Machine Learning

Probabilistic and Bayesian Machine Learning

Statistical Machine Learning Lectures 4: Variational Bayes

Logistic Regression: Online, Lazy, Kernelized, Sequential, etc.

Sequence Modelling with Features: Linear-Chain Conditional Random Fields. COMP-599 Oct 6, 2015

Probabilistic Graphical Models

VCMC: Variational Consensus Monte Carlo

Directed and Undirected Graphical Models

The Expectation-Maximization Algorithm

CS Lecture 8 & 9. Lagrange Multipliers & Varitional Bounds

Density Estimation. Seungjin Choi

Machine Learning for Data Science (CS4786) Lecture 24

Lecture 7 and 8: Markov Chain Monte Carlo

Lecture 6: Graphical Models: Learning

CSC2535: Computation in Neural Networks Lecture 7: Variational Bayesian Learning & Model Selection

Chapter 4 Dynamic Bayesian Networks Fall Jin Gu, Michael Zhang

CS Lecture 19. Exponential Families & Expectation Propagation

Probabilistic Graphical Models: MRFs and CRFs. CSE628: Natural Language Processing Guest Lecturer: Veselin Stoyanov

Sum-Product Networks. STAT946 Deep Learning Guest Lecture by Pascal Poupart University of Waterloo October 17, 2017

Dynamic Approaches: The Hidden Markov Model

LECTURE 15 Markov chain Monte Carlo

10 : HMM and CRF. 1 Case Study: Supervised Part-of-Speech Tagging

COS402- Artificial Intelligence Fall Lecture 10: Bayesian Networks & Exact Inference

Collapsed Variational Inference for Sum-Product Networks

CSci 8980: Advanced Topics in Graphical Models Gaussian Processes

6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008

Intelligent Systems:

4.1 Notation and probability review

An Introduction to Bayesian Machine Learning

Lecture 15. Probabilistic Models on Graph

Transcription:

Probabilistic Graphical Models Lecture 9: Variational Inference Relaxations Volkan Cevher, Matthias Seeger Ecole Polytechnique Fédérale de Lausanne 24/10/2011 (EPFL) Graphical Models 24/10/2011 1 / 15

Announcement No assignment this week Deadline programming assignment: June 18 (next lecture) bayesml09lecture@googlemail.com (EPFL) Graphical Models 24/10/2011 2 / 15

Outline 1 Structured Mean Field (Variational Bayes) 2 (EPFL) Graphical Models 24/10/2011 3 / 15

Structured Mean Field (Variational Bayes) Variational Mean Field log Z sup Q Q {E Q [Ψ(x )] + H[Q(x )] Q: Tractable subset of all distributions (factorization constraints) { Q = Q(x ) = Q k(x Sk ), S k disjoint k Tractable? For any k, N k : Factor nodes j connected to any i S k (S k C j ) ( ) Q k (x S k ) exp E Q(xCj j N \S k k )[Ψ j (x Cj )] tractable to handle F1 (EPFL) Graphical Models 24/10/2011 4 / 15

Structured Mean Field (Variational Bayes) Variational Mean Field log Z sup Q Q {E Q [Ψ(x )] + H[Q(x )] Q: Tractable subset of all distributions (factorization constraints) { Q = Q(x ) = Q k(x Sk ), S k disjoint k Tractable? For any k, N k : Factor nodes j connected to any i S k (S k C j ) tractable to handle ( ) Q k (x S k ) exp E Q(xCj j N \S k k )[Ψ j (x Cj )] Q(x ) completely factorized? Naive mean field Anything more elaborate? Structured mean field (EPFL) Graphical Models 24/10/2011 4 / 15

Structured Mean Field (Variational Bayes) Factorial Hidden Markov Model F2 (EPFL) Graphical Models 24/10/2011 5 / 15

Structured Mean Field (Variational Bayes) Factorial Hidden Markov Model (EPFL) Graphical Models 24/10/2011 6 / 15

Structured Mean Field (Variational Bayes) Factorial Hidden Markov Model S 1 = uppermost chain. Update? F3 (EPFL) Graphical Models 24/10/2011 7 / 15

Structured Mean Field (Variational Bayes) Factorial Hidden Markov Model S 1 = uppermost chain. Update? Q(x S1 ): Markov chain (variable single node potentials) Double node (transition) potentials of Q(x Sk )? Fixed up front! Forward-backward for single node marginals to update Q(x S1 ). Implementation reduces to single HMM code, called with changing evidence potentials Not magic, but as expected: If this does not happen, you made a mistake (EPFL) Graphical Models 24/10/2011 7 / 15

Structured Mean Field (Variational Bayes) Variational Bayes Another instance of re-naming game: Nothing else than structured mean field Often applied to P(x, θ y ) (y observed, x latent nuisance, θ latent parameters) (EPFL) Graphical Models 24/10/2011 8 / 15

Structured Mean Field (Variational Bayes) Variational Bayes Another instance of re-naming game: Nothing else than structured mean field Often applied to P(x, θ y ) (y observed, x latent nuisance, θ latent parameters) Expectation maximization max θ log P(y, x θ) dx log Variational Bayes P(y, x θ) dx dθ max θ,q(x ) {E Q [log P(y, x θ)] max Q(θ),Q(x ) {E Q [log P(y, x θ)] + H[Q(x )] + H[Q(x )] + H[Q(θ)] Factorization assumption: Q(x, θ) = Q(x )Q(θ) (EPFL) Graphical Models 24/10/2011 8 / 15

Structured Mean Field (Variational Bayes) Variational Bayes Another instance of re-naming game: Nothing else than structured mean field Often applied to P(x, θ y ) (y observed, x latent nuisance, θ latent parameters) Expectation maximization max θ log P(y, x θ) dx log Variational Bayes P(y, x θ) dx dθ max θ,q(x ) {E Q [log P(y, x θ)] + H[Q(x )] max Q(θ),Q(x ) {E Q [log P(y, x θ)] + H[Q(x )] + H[Q(θ)] Factorization assumption: Q(x, θ) = Q(x )Q(θ) Easy to write generic code (bit like MCMC Gibbs sampling) Good approximation? Can do better today for almost any well-studied model (EPFL) Graphical Models 24/10/2011 8 / 15

Moment Parameters log Z = sup Q {E Q [Ψ(x )] + H[Q(x )] (EPFL) Graphical Models 24/10/2011 9 / 15

Moment Parameters log Z sup Q Q {E Q [Ψ(x )] + H[Q(x )] Q: Tractable subset of all distributions (factorization constraints) Seems whole story. What else could there be? (EPFL) Graphical Models 24/10/2011 9 / 15

Moment Parameters log Z sup Q Q {E Q [Ψ(x )] + H[Q(x )] Q: Tractable subset of all distributions (factorization constraints) Seems whole story. What else could there be? Consider log-linear models: Ψ j (x Cj ) = θ T j f j (x Cj ), θ = (θ j ) F5 E Q [Ψ(x )] = j θt j µ j, µ j := E Q [f j (x Cj )], µ = (µ j ) (EPFL) Graphical Models 24/10/2011 9 / 15

Moment Parameters log Z sup Q Q {E Q [Ψ(x )] + H[Q(x )] Q: Tractable subset of all distributions (factorization constraints) Seems whole story. What else could there be? Consider log-linear models: Ψ j (x Cj ) = θ T j f j (x Cj ), θ = (θ j ) E Q [Ψ(x )] = j θt j µ j, µ j := E Q [f j (x Cj )], µ = (µ j ) Moment parameters: Under mild assumptions on f j (x Cj ): Just another way (instead of θ) of parameterizing P(x ) (EPFL) Graphical Models 24/10/2011 9 / 15

Moment Parameters: Examples P(x ; θ) = Z 1 e θt f (x ) θ Natural parameters f (x ) Statistics, representation µ = E θ [f (x )] Moment parameters Representation minimal: For every z 0, there is x : z T (f (x ) T 1) T 0 Otherwise: Representation overcomplete F6 (EPFL) Graphical Models 24/10/2011 10 / 15

Moment Parameters: Examples P(x ; θ) = Z 1 e θt f (x ) θ Natural parameters f (x ) Statistics, representation µ = E θ [f (x )] Moment parameters Representation minimal: For every z 0, there is x : z T (f (x ) T 1) T 0 Otherwise: Representation overcomplete 1 Multinomial on graph with cliques C j F6b Convenient overcomplete representation: Components of f (x ): Indicators on cliques C j, indicators on intersections of cliques, indicators on intersections of cliques, intersections,... Equality constraints for µ: Consistency on nonempty intersections Sum to one on smallest intersections (EPFL) Graphical Models 24/10/2011 10 / 15

Moment Parameters: Examples P(x ; θ) = Z 1 e θt f (x ) θ Natural parameters f (x ) Statistics, representation µ = E θ [f (x )] Moment parameters Representation minimal: For every z 0, there is x : z T (f (x ) T 1) T 0 Otherwise: Representation overcomplete 2 Gaussian MRF Overcomplete representation: ( x f (x ) = vec( x x T /2) ) (, θ = r vec(a) Not minimal: A symmetric. {ij E a ij = a ji = 0. ) F6c (EPFL) Graphical Models 24/10/2011 10 / 15

Variational Formulation of Bayesian Inference log Z = sup Q {θ T E Q [f (x )] + H[Q(x )], f (x ) = [f j (x Cj )] Transform to moment parameters (EPFL) Graphical Models 24/10/2011 11 / 15

Variational Formulation of Bayesian Inference log Z = sup µ M {θ T µ + H[Q(x )], µ j = E Q [f j (x Cj )] Transform to moment parameters { M = (µ j ) µ j = E Q [f j (x Cj )] for some Q(x ) Marginal polytope What about the entropy? F7 (EPFL) Graphical Models 24/10/2011 11 / 15

Variational Formulation of Bayesian Inference log Z = sup µ M {θ T µ + H[µ] Transform to moment parameters { M = (µ j ) µ j = E Q [f j (x Cj )] for some Q(x ) Marginal polytope What about the entropy? µ M Q(x ) unique: Entropy is function of µ Point of this exercise: M convex set of vectors, more useful relaxation target than set of distributions (EPFL) Graphical Models 24/10/2011 11 / 15

Variational Formulation of Bayesian Inference log Z = sup µ M {θ T µ + H[µ] Transform to moment parameters { M = (µ j ) µ j = E Q [f j (x Cj )] for some Q(x ) Marginal polytope What about the entropy? µ M Q(x ) unique: Entropy is function of µ Point of this exercise: M convex set of vectors, more useful relaxation target than set of distributions Close now: Exponential families, Fenchel duality, maximum entropy. Full story: Wainwright, Jordan: Graphical Models, Exponential Families, and Variational Inference Foundations and Trends in Machine Learning, 1(1 2), pp. 1 305 (EPFL) Graphical Models 24/10/2011 11 / 15

Bayesian Inference is Convex Optimization log Z = sup µ M {θ T µ + H[µ] Marginal polytope M: Convex set F8 (EPFL) Graphical Models 24/10/2011 12 / 15

Bayesian Inference is Convex Optimization { log Z = sup µ M θ T µ + H[µ] {{ M convex Marginal polytope M: Convex set Entropy µ H[µ]: Concave function on M F8b (EPFL) Graphical Models 24/10/2011 12 / 15

Bayesian Inference is Convex Optimization { log Z = sup µ M θ T µ + H[µ] {{{{ M convex concave Marginal polytope M: Convex set Entropy µ H[µ]: Concave function on M Posterior: Unique solution to convex optimization problem (EPFL) Graphical Models 24/10/2011 12 / 15

Bayesian Inference is Convex Optimization { log Z = sup µ M θ T µ + H[µ] {{{{ M convex concave Marginal polytope M: Convex set Entropy µ H[µ]: Concave function on M Posterior: Unique solution to convex optimization problem Convex optimization can be intractable M can be hard to fence in θ µ can be hard to compute H[µ] can be hard to compute F8c (EPFL) Graphical Models 24/10/2011 12 / 15

Bayesian Inference is Convex Optimization { log Z = sup µ M θ T µ + H[µ] {{{{ M convex concave Marginal polytope M: Convex set Entropy µ H[µ]: Concave function on M Posterior: Unique solution to convex optimization problem Convex optimization can be intractable M can be hard to fence in θ µ can be hard to compute H[µ] can be hard to compute Took some steps. But worth it: Rich literature on relaxations of hard convex problems (EPFL) Graphical Models 24/10/2011 12 / 15

Variational Mean Field Revisited log Z = sup µ M {θ T µ + H[µ] Have to approximate M, H[µ]. One way you already know... (EPFL) Graphical Models 24/10/2011 13 / 15

Variational Mean Field Revisited log Z = sup µ M {θ T µ + H[µ] Have to approximate M, H[µ]. One way you already know... { M M NMF := µ µ Cj (x Cj ) = ( ) Q(x i ) f j (x Cj ) x Cj i C j Inner approximation, induced by factorized distributions Entropy decomposes just as distribution: H[µ] = i H[µ i] log Z sup µ MNMF {θ T µ + H[µ i] i (EPFL) Graphical Models 24/10/2011 13 / 15

Variational Mean Field Revisited log Z = sup µ M {θ T µ + H[µ] Have to approximate M, H[µ]. One way you already know... { M M NMF := µ µ Cj (x Cj ) = ( ) Q(x i ) f j (x Cj ) x Cj i C j Inner approximation, induced by factorized distributions Entropy decomposes just as distribution: H[µ] = i H[µ i] log Z sup µ MNMF {θ T µ + H[µ i] i Non-convex relaxation: M NMF not convex F9 (EPFL) Graphical Models 24/10/2011 13 / 15

The Marginal Polytope { M = (µ j ) µ j = E Q [f j (x Cj )] for some Q(x ) Multinomial on graph. Minimal representation. M convex polytope: Described by finite number inequalities. Complexity of M: Number of inequalities F10 (EPFL) Graphical Models 24/10/2011 14 / 15

The Marginal Polytope { M = (µ j ) µ j = E Q [f j (x Cj )] for some Q(x ) Multinomial on graph. Minimal representation. M convex polytope: Described by finite number inequalities. Complexity of M: Number of inequalities Complexity of M complexity of exact inference [we ll see why] G tree: M described by O(n) inequalities [next lecture] (EPFL) Graphical Models 24/10/2011 14 / 15

The Marginal Polytope { M = (µ j ) µ j = E Q [f j (x Cj )] for some Q(x ) Multinomial on graph. Minimal representation. M convex polytope: Described by finite number inequalities. Complexity of M: Number of inequalities Complexity of M complexity of exact inference [we ll see why] G tree: M described by O(n) inequalities [next lecture] Many graphs G with cycles: M polytope description provably hard (poly(n) inequalities would imply P=NP) (EPFL) Graphical Models 24/10/2011 14 / 15

The Marginal Polytope { M = (µ j ) µ j = E Q [f j (x Cj )] for some Q(x ) Gaussian MRF. Minimal representation (upper triangle of A). M exactly characterized by Σ = A 1 0. Convex cone (not polytope): Tractable to describe G tree: M described by O(n) inequalities General sparse G: Approximate inference still of interest, if exact cost O(n 3 ) too high F10b (EPFL) Graphical Models 24/10/2011 14 / 15

Wrap-Up Structured Mean Field: Q(x ) product of tractable, disjoint factors Variational Bayes: Another name for structured mean field Bayesian (marginal) inference is a convex optimization problem Variational approximations: Inner / outer bounds to marginal polytope (EPFL) Graphical Models 24/10/2011 15 / 15