Introduction to the Variational Bayesian Mean-Field Method

Size: px

Start display at page:

Download "Introduction to the Variational Bayesian Mean-Field Method"

Bertram Gordon Tucker
6 years ago
Views:

1 Introduction to the Variational Bayesian Mean-Field Method David Benjamin, Broad DSDE Methods May 11, 2016

2 What is variational Bayes? VB replaces a complex probability distribution with a simpler one: P exact (x 1, x 2...) P tractable (x 1, x 2...) Mean-field VB is the particular choice P exact (x 1, x 2...) q 1 (x 1 )q 2 (x 2 )... VB exchanges inference for optimization once we find the optimal P tractable, inference is trivial.

3 Ising model The Ising model is an infinite grid of (+1) and ( 1) magnets that want to point the same way as their neighbors. Energy = 1 for adjacent or. Energy = +1 for adjacent or. Energy of adjacent s 1, s 2 = s 1 s 2. Total energy = s i s j adjacent i,j

4 Statistical physics Probability(state) e energy(state)/temperature Meaning: nature seeks low-energy states, but high temperatures allow high-energy states.

5 Phase transition of Ising model P (s 1, s s...) = e adjacent i,j s is j /T Figure: Small T forces s i = s j. Figure: Large T allows randomness.

6 Mean-field for Ising model P (s 1, s 2...) = e adjacent i,j s is j /T Approximate P (s 1, s 2...) q 1 (s 1 )q 2 (s 2 )...

7 Mean-field for Ising model P (s 1, s 2...) = e adjacent i,j s is j /T Approximate P (s 1, s 2...) q 1 (s 1 )q 2 (s 2 )... Conditionals are P (s i adjacent s j ) exp(s i sj /T ).

8 Mean-field for Ising model P (s 1, s 2...) = e adjacent i,j s is j /T Approximate P (s 1, s 2...) q 1 (s 1 )q 2 (s 2 )... Conditionals are P (s i adjacent s j ) exp(s i sj /T ). Heuristic: q i (s i ) exp(s i m j /T ), m j s j For q j (s j ) of sites j adjacent to site i we need s i = s sq i(s) s q i(s) = emi/t e mi/t e mi/t + e m i/t = tanh(m i/t )

9 Mean-field for Ising model Mean-field recipe For each site i calculate mean-field m i = s j. For each site i update s i = tanh(m i /T ). Repeat until convergence. adjacent j

10 Mean-field for Ising model Mean-field recipe For each site i calculate mean-field m i = s j. For each site i update s i = tanh(m i /T ). Repeat until convergence. Self-consistent solution: solve s = tanh(4 s/t ). adjacent j Figure: Mean-field phase diagram of Ising model.

11 What did we do? For P (x 1, x 2...) q 1 (x 1 )q 2 (x 2 )... we used e s is j e s i s j. Hypothesis: the rule we used was q i (x i ) P (E q1 [x 1 ],... E qi 1 [x i ], x i,...)

12 What did we do? For P (x 1, x 2...) q 1 (x 1 )q 2 (x 2 )... we used e s is j e s i s j. Hypothesis: the rule we used was q i (x i ) P (E q1 [x 1 ],... E qi 1 [x i ], x i,...) Actually, no. The correct variational Bayes rule formula is q i (x i ) exp E q1 (x 1 )...q i 1 (x i 1 )q i+1 (x i+1...[ln P (x 1, x 2...)] Let s see where this comes from...

13 Derivation of variational Bayes Variational Bayes starts with Jensen s inequality (ln is concave) 0 = ln 1 = ln P (x) dx = ln q(x) P (x) q(x) dx q(x) ln P (x) q(x) dx

14 Derivation of variational Bayes Variational Bayes starts with Jensen s inequality (ln is concave) 0 = ln 1 = ln P (x) dx = ln q(x) P (x) q(x) dx q(x) ln P (x) q(x) dx Equality iff q(x) = P (x), so best q maximizes the RHS. Penalizes large q(x) when P (x) small (not vice-versa).

15 Derivation of mean-field variational Bayes To approximate P (x) q(x) we maximize L[q] = q(x) ln P (x) q(x) dx = q(x) (ln P (x) ln q(x)) dx

16 Derivation of mean-field variational Bayes To approximate P (x) q(x) we maximize L[q] = q(x) ln P (x) q(x) dx = q(x) (ln P (x) ln q(x)) dx In two-variable problem P (x, y) q x (x)q y (y) L[q] = q x (x)q y (y) (ln P (x, y) ln q x (x) ln q y (y)) dx dy

17 Derivation of mean-field variational Bayes To approximate P (x) q(x) we maximize L[q] = q(x) ln P (x) q(x) dx = q(x) (ln P (x) ln q(x)) dx In two-variable problem P (x, y) q x (x)q y (y) L[q] = q x (x)q y (y) (ln P (x, y) ln q x (x) ln q y (y)) dx dy Setting DL/Dq x (x) = 0 with constraint q x (x) dx = 1 gives q y (y) ln P (x, y) dy ln q x (x) = const q x (x) exp E qy [ln P (x, y)]

18 Recipe for mean-field variational Bayes To approximate P (x 1, x 2,...) q 1 (x 1 )q 2 (x 2 )... Initialize all q i (x i ). For each i update q i (x i ) exp E j i q j(x j )[ln P (x 1, x 2,...)]. Repeat until convergence.

19 Canonical application: hierarchical models Hyperparameter λ, parameters θ j and data x ij : [ ] P (λ, θ 1, θ 2,... x 11,...) = P (λ) j P (θ j λ) i P (x ij θ j )

20 Canonical application: hierarchical models Hyperparameter λ, parameters θ j and data x ij : [ ] P (λ, θ 1, θ 2,... x 11,...) = P (λ) j P (θ j λ) i P (x ij θ j ) Mean-field valid by Law of Large Numbers. q(λ) P (λ) i exp E q(θj )[ln P (θ j λ)] q(θ j ) i P (x ij θ j ) exp E q(λ) [ln P (θ j λ)]

21 Observations EM is a special case of mean-field VB in which we assume some q i (x i ) are infinitely narrow. You don t have to completely maximize L[q] at each step, just increase it, ex: stochastic gradient VB. You don t have to decompose completely one q can contain several x i, ex: HMM Figure: P (A, π, φ, z 1, z 2...) q A (A)q π (π)q φ (φ)q z (z 1, z 2...).

Structured Variational Inference

Structured Variational Inference Sargur srihari@cedar.buffalo.edu 1 Topics 1. Structured Variational Approximations 1. The Mean Field Approximation 1. The Mean Field Energy 2. Maximizing the energy functional: