Probabilistic Graphical Models Lecture 9: Variational Inference Relaxations Volkan Cevher, Matthias Seeger Ecole Polytechnique Fédérale de Lausanne 24/10/2011 (EPFL) Graphical Models 24/10/2011 1 / 15
Announcement No assignment this week Deadline programming assignment: June 18 (next lecture) bayesml09lecture@googlemail.com (EPFL) Graphical Models 24/10/2011 2 / 15
Outline 1 Structured Mean Field (Variational Bayes) 2 (EPFL) Graphical Models 24/10/2011 3 / 15
Structured Mean Field (Variational Bayes) Variational Mean Field log Z sup Q Q {E Q [Ψ(x )] + H[Q(x )] Q: Tractable subset of all distributions (factorization constraints) { Q = Q(x ) = Q k(x Sk ), S k disjoint k Tractable? For any k, N k : Factor nodes j connected to any i S k (S k C j ) ( ) Q k (x S k ) exp E Q(xCj j N \S k k )[Ψ j (x Cj )] tractable to handle F1 (EPFL) Graphical Models 24/10/2011 4 / 15
Structured Mean Field (Variational Bayes) Variational Mean Field log Z sup Q Q {E Q [Ψ(x )] + H[Q(x )] Q: Tractable subset of all distributions (factorization constraints) { Q = Q(x ) = Q k(x Sk ), S k disjoint k Tractable? For any k, N k : Factor nodes j connected to any i S k (S k C j ) tractable to handle ( ) Q k (x S k ) exp E Q(xCj j N \S k k )[Ψ j (x Cj )] Q(x ) completely factorized? Naive mean field Anything more elaborate? Structured mean field (EPFL) Graphical Models 24/10/2011 4 / 15
Structured Mean Field (Variational Bayes) Factorial Hidden Markov Model F2 (EPFL) Graphical Models 24/10/2011 5 / 15
Structured Mean Field (Variational Bayes) Factorial Hidden Markov Model (EPFL) Graphical Models 24/10/2011 6 / 15
Structured Mean Field (Variational Bayes) Factorial Hidden Markov Model S 1 = uppermost chain. Update? F3 (EPFL) Graphical Models 24/10/2011 7 / 15
Structured Mean Field (Variational Bayes) Factorial Hidden Markov Model S 1 = uppermost chain. Update? Q(x S1 ): Markov chain (variable single node potentials) Double node (transition) potentials of Q(x Sk )? Fixed up front! Forward-backward for single node marginals to update Q(x S1 ). Implementation reduces to single HMM code, called with changing evidence potentials Not magic, but as expected: If this does not happen, you made a mistake (EPFL) Graphical Models 24/10/2011 7 / 15
Structured Mean Field (Variational Bayes) Variational Bayes Another instance of re-naming game: Nothing else than structured mean field Often applied to P(x, θ y ) (y observed, x latent nuisance, θ latent parameters) (EPFL) Graphical Models 24/10/2011 8 / 15
Structured Mean Field (Variational Bayes) Variational Bayes Another instance of re-naming game: Nothing else than structured mean field Often applied to P(x, θ y ) (y observed, x latent nuisance, θ latent parameters) Expectation maximization max θ log P(y, x θ) dx log Variational Bayes P(y, x θ) dx dθ max θ,q(x ) {E Q [log P(y, x θ)] max Q(θ),Q(x ) {E Q [log P(y, x θ)] + H[Q(x )] + H[Q(x )] + H[Q(θ)] Factorization assumption: Q(x, θ) = Q(x )Q(θ) (EPFL) Graphical Models 24/10/2011 8 / 15
Structured Mean Field (Variational Bayes) Variational Bayes Another instance of re-naming game: Nothing else than structured mean field Often applied to P(x, θ y ) (y observed, x latent nuisance, θ latent parameters) Expectation maximization max θ log P(y, x θ) dx log Variational Bayes P(y, x θ) dx dθ max θ,q(x ) {E Q [log P(y, x θ)] + H[Q(x )] max Q(θ),Q(x ) {E Q [log P(y, x θ)] + H[Q(x )] + H[Q(θ)] Factorization assumption: Q(x, θ) = Q(x )Q(θ) Easy to write generic code (bit like MCMC Gibbs sampling) Good approximation? Can do better today for almost any well-studied model (EPFL) Graphical Models 24/10/2011 8 / 15
Moment Parameters log Z = sup Q {E Q [Ψ(x )] + H[Q(x )] (EPFL) Graphical Models 24/10/2011 9 / 15
Moment Parameters log Z sup Q Q {E Q [Ψ(x )] + H[Q(x )] Q: Tractable subset of all distributions (factorization constraints) Seems whole story. What else could there be? (EPFL) Graphical Models 24/10/2011 9 / 15
Moment Parameters log Z sup Q Q {E Q [Ψ(x )] + H[Q(x )] Q: Tractable subset of all distributions (factorization constraints) Seems whole story. What else could there be? Consider log-linear models: Ψ j (x Cj ) = θ T j f j (x Cj ), θ = (θ j ) F5 E Q [Ψ(x )] = j θt j µ j, µ j := E Q [f j (x Cj )], µ = (µ j ) (EPFL) Graphical Models 24/10/2011 9 / 15
Moment Parameters log Z sup Q Q {E Q [Ψ(x )] + H[Q(x )] Q: Tractable subset of all distributions (factorization constraints) Seems whole story. What else could there be? Consider log-linear models: Ψ j (x Cj ) = θ T j f j (x Cj ), θ = (θ j ) E Q [Ψ(x )] = j θt j µ j, µ j := E Q [f j (x Cj )], µ = (µ j ) Moment parameters: Under mild assumptions on f j (x Cj ): Just another way (instead of θ) of parameterizing P(x ) (EPFL) Graphical Models 24/10/2011 9 / 15
Moment Parameters: Examples P(x ; θ) = Z 1 e θt f (x ) θ Natural parameters f (x ) Statistics, representation µ = E θ [f (x )] Moment parameters Representation minimal: For every z 0, there is x : z T (f (x ) T 1) T 0 Otherwise: Representation overcomplete F6 (EPFL) Graphical Models 24/10/2011 10 / 15
Moment Parameters: Examples P(x ; θ) = Z 1 e θt f (x ) θ Natural parameters f (x ) Statistics, representation µ = E θ [f (x )] Moment parameters Representation minimal: For every z 0, there is x : z T (f (x ) T 1) T 0 Otherwise: Representation overcomplete 1 Multinomial on graph with cliques C j F6b Convenient overcomplete representation: Components of f (x ): Indicators on cliques C j, indicators on intersections of cliques, indicators on intersections of cliques, intersections,... Equality constraints for µ: Consistency on nonempty intersections Sum to one on smallest intersections (EPFL) Graphical Models 24/10/2011 10 / 15
Moment Parameters: Examples P(x ; θ) = Z 1 e θt f (x ) θ Natural parameters f (x ) Statistics, representation µ = E θ [f (x )] Moment parameters Representation minimal: For every z 0, there is x : z T (f (x ) T 1) T 0 Otherwise: Representation overcomplete 2 Gaussian MRF Overcomplete representation: ( x f (x ) = vec( x x T /2) ) (, θ = r vec(a) Not minimal: A symmetric. {ij E a ij = a ji = 0. ) F6c (EPFL) Graphical Models 24/10/2011 10 / 15
Variational Formulation of Bayesian Inference log Z = sup Q {θ T E Q [f (x )] + H[Q(x )], f (x ) = [f j (x Cj )] Transform to moment parameters (EPFL) Graphical Models 24/10/2011 11 / 15
Variational Formulation of Bayesian Inference log Z = sup µ M {θ T µ + H[Q(x )], µ j = E Q [f j (x Cj )] Transform to moment parameters { M = (µ j ) µ j = E Q [f j (x Cj )] for some Q(x ) Marginal polytope What about the entropy? F7 (EPFL) Graphical Models 24/10/2011 11 / 15
Variational Formulation of Bayesian Inference log Z = sup µ M {θ T µ + H[µ] Transform to moment parameters { M = (µ j ) µ j = E Q [f j (x Cj )] for some Q(x ) Marginal polytope What about the entropy? µ M Q(x ) unique: Entropy is function of µ Point of this exercise: M convex set of vectors, more useful relaxation target than set of distributions (EPFL) Graphical Models 24/10/2011 11 / 15
Variational Formulation of Bayesian Inference log Z = sup µ M {θ T µ + H[µ] Transform to moment parameters { M = (µ j ) µ j = E Q [f j (x Cj )] for some Q(x ) Marginal polytope What about the entropy? µ M Q(x ) unique: Entropy is function of µ Point of this exercise: M convex set of vectors, more useful relaxation target than set of distributions Close now: Exponential families, Fenchel duality, maximum entropy. Full story: Wainwright, Jordan: Graphical Models, Exponential Families, and Variational Inference Foundations and Trends in Machine Learning, 1(1 2), pp. 1 305 (EPFL) Graphical Models 24/10/2011 11 / 15
Bayesian Inference is Convex Optimization log Z = sup µ M {θ T µ + H[µ] Marginal polytope M: Convex set F8 (EPFL) Graphical Models 24/10/2011 12 / 15
Bayesian Inference is Convex Optimization { log Z = sup µ M θ T µ + H[µ] {{ M convex Marginal polytope M: Convex set Entropy µ H[µ]: Concave function on M F8b (EPFL) Graphical Models 24/10/2011 12 / 15
Bayesian Inference is Convex Optimization { log Z = sup µ M θ T µ + H[µ] {{{{ M convex concave Marginal polytope M: Convex set Entropy µ H[µ]: Concave function on M Posterior: Unique solution to convex optimization problem (EPFL) Graphical Models 24/10/2011 12 / 15
Bayesian Inference is Convex Optimization { log Z = sup µ M θ T µ + H[µ] {{{{ M convex concave Marginal polytope M: Convex set Entropy µ H[µ]: Concave function on M Posterior: Unique solution to convex optimization problem Convex optimization can be intractable M can be hard to fence in θ µ can be hard to compute H[µ] can be hard to compute F8c (EPFL) Graphical Models 24/10/2011 12 / 15
Bayesian Inference is Convex Optimization { log Z = sup µ M θ T µ + H[µ] {{{{ M convex concave Marginal polytope M: Convex set Entropy µ H[µ]: Concave function on M Posterior: Unique solution to convex optimization problem Convex optimization can be intractable M can be hard to fence in θ µ can be hard to compute H[µ] can be hard to compute Took some steps. But worth it: Rich literature on relaxations of hard convex problems (EPFL) Graphical Models 24/10/2011 12 / 15
Variational Mean Field Revisited log Z = sup µ M {θ T µ + H[µ] Have to approximate M, H[µ]. One way you already know... (EPFL) Graphical Models 24/10/2011 13 / 15
Variational Mean Field Revisited log Z = sup µ M {θ T µ + H[µ] Have to approximate M, H[µ]. One way you already know... { M M NMF := µ µ Cj (x Cj ) = ( ) Q(x i ) f j (x Cj ) x Cj i C j Inner approximation, induced by factorized distributions Entropy decomposes just as distribution: H[µ] = i H[µ i] log Z sup µ MNMF {θ T µ + H[µ i] i (EPFL) Graphical Models 24/10/2011 13 / 15
Variational Mean Field Revisited log Z = sup µ M {θ T µ + H[µ] Have to approximate M, H[µ]. One way you already know... { M M NMF := µ µ Cj (x Cj ) = ( ) Q(x i ) f j (x Cj ) x Cj i C j Inner approximation, induced by factorized distributions Entropy decomposes just as distribution: H[µ] = i H[µ i] log Z sup µ MNMF {θ T µ + H[µ i] i Non-convex relaxation: M NMF not convex F9 (EPFL) Graphical Models 24/10/2011 13 / 15
The Marginal Polytope { M = (µ j ) µ j = E Q [f j (x Cj )] for some Q(x ) Multinomial on graph. Minimal representation. M convex polytope: Described by finite number inequalities. Complexity of M: Number of inequalities F10 (EPFL) Graphical Models 24/10/2011 14 / 15
The Marginal Polytope { M = (µ j ) µ j = E Q [f j (x Cj )] for some Q(x ) Multinomial on graph. Minimal representation. M convex polytope: Described by finite number inequalities. Complexity of M: Number of inequalities Complexity of M complexity of exact inference [we ll see why] G tree: M described by O(n) inequalities [next lecture] (EPFL) Graphical Models 24/10/2011 14 / 15
The Marginal Polytope { M = (µ j ) µ j = E Q [f j (x Cj )] for some Q(x ) Multinomial on graph. Minimal representation. M convex polytope: Described by finite number inequalities. Complexity of M: Number of inequalities Complexity of M complexity of exact inference [we ll see why] G tree: M described by O(n) inequalities [next lecture] Many graphs G with cycles: M polytope description provably hard (poly(n) inequalities would imply P=NP) (EPFL) Graphical Models 24/10/2011 14 / 15
The Marginal Polytope { M = (µ j ) µ j = E Q [f j (x Cj )] for some Q(x ) Gaussian MRF. Minimal representation (upper triangle of A). M exactly characterized by Σ = A 1 0. Convex cone (not polytope): Tractable to describe G tree: M described by O(n) inequalities General sparse G: Approximate inference still of interest, if exact cost O(n 3 ) too high F10b (EPFL) Graphical Models 24/10/2011 14 / 15
Wrap-Up Structured Mean Field: Q(x ) product of tractable, disjoint factors Variational Bayes: Another name for structured mean field Bayesian (marginal) inference is a convex optimization problem Variational approximations: Inner / outer bounds to marginal polytope (EPFL) Graphical Models 24/10/2011 15 / 15