Variational Bayesian Inference Techniques

Advanced Signal Processing 2, SE Variational Bayesian Inference Techniques Johann Steiner 1

Outline Introduction Sparse Signal Reconstruction Sparsity Priors Benefits of Sparse Bayesian Inference Variational Sparse Bayesian Inference Gaussian Bayesian Graphical Models Algorithms for Variational Sparse Bayesian Inference Double-loop Algorithms Variational Sparse Bayesian Reconstruction Re-weighted l1 Algorithm Properties of Automatic Relevance Determination Applications Sampling Optimization of Magnetic Resonance Imaging Source Localization and Group Sparsity Penalization 2

Introduction Bayesian probability P H D = P D H P H P D Signal reconstruction from noisy measurements is a core problem in signal processing. Sparse signal reconstruction Compressive sensing 3

Sparse Signal Reconstruction (1) linear reconstruction problem measurements: y ℝ m design matrix: X ℝ m n we seek u : which minimizes the squared error: y X u u ℝ n 2 Example: MRI reconstruction Ill posed problem :many different u give zero error. 4

Sparse Signal Reconstruction (2) Ideally, estimation should be biased towards known properties of the signal class. Apply derivative or wavelet filters B: s= B u ℝ q The responses s exhibit statistical sparsity. 2 min y X u 2 R l u p u q R l u = B u p := s i p p i=1 p s= B u 2 0 (1) 5

Sparse Signal Reconstruction (3) The l1 reconstruction Problem (1) is non convex of any RIP restricted isometry property lp reconstruction view as probalistic sparse linear model (SLM) 6

Sparse Signal Reconstruction (4) Statistical sparsity of s : Laplace prior distribution: P u i t i si t i s i =e i s i 2 1 / 2 Student s t sparsity potentials: t i s i = 1 i / s i (2) (3) General solution of the inference problem: q P u y =Z 1 N y X u, 2 I t i s i s= B u (4) i =1 Z Partition function Z = N y X u, 2 I i t i s i 7

Sparsity Priors (1) Statistical and computational properties of SLM inference methods are determined by the choice of positive potentials ti(si) in the prior. P u i t i si enforce sparsity The statistical role of sparsity potentials is understood by inspecting the prior and posterior distributions they give rise to the next figure 8

Sparsity Priors (2) Examples: (a) prior distributions With the same var (b) corresponding posterior distr. 9

Benefits of Sparse Bayesian Inference Can sparse estimators with better properties than MAP estimation (1) be obtained from P (u y)? Example: A smooth nonlinear model f (.), densely sampled at n locations θi, and sources are reconstructed from sensor readings y by sparse estimation with X =[f(θi)] A Bayesian approach can alleviate these problems in many situations Convex l1 reconstruction tends to perform poorly. Nonconvex MAP reconstruction does not do well either: -log P(u y) computing the posterior mean instead of its mode integrating instead of maximizing over P (u y) the mean is not exactly sparse zero temperature limit 10

Variational Sparse Bayesian Inference (1) The advantages of Bayesian inference could well be offset by its computational difficulty. While there is a large and diverse body of approximate Bayesian inference technology, until recently none of these methods, applied to sparse linear models, could match the computational efficiency and theoretical characterization of MAP. Bayesian inference in SLMs, integrating over the posterior (4), is intractable for two reasons coming together: P (u y) is highly coupled (X is not block diagonal) and non-gaussian. Two major classes of inference approximations are Markov chain Monte Carlo (MCMC) and variational relaxations 11

Variational Sparse Bayesian Inference (2) Fit P (u y) by a Gaussian distribution Q (u y; γ) parameterized by γ minimizing a divergence measure between P and Q Exploit super Gaussanity of the prior potentials log Z max log N y X u, I e 2 T 1 s s / s h / 2 du (5) 0 s= B u :=diag 12

Variational Sparse Bayesian Inference (3) Relate the variational inference problem (5) to MAP estimation directly: the latter is obtained from the former by replacing the integration over u with a optimization over u. automatic relevance determination (ARD) 13

Variational Sparse Bayesian Inference (4) automatic relevance determination (ARD) For sparse reconstruction, ARD is an attractive alternative to convex or non convex MAP estimation. Variational sparse Bayesian inference (5) is a convex optimization problem if and only if MAP estimation is convex for the same model The variational inference relaxation (5) is solved by double-loop algorithms, scaled up to very large models by reductions to convex reconstruction and Bayesian graphical model technology. 14

Gaussian Bayesian Graphical Models (1) What does it take to solve the variational problem (5)? Can we use MAP estimation technology, or do we need computations of a different kind? log N y X u, I e 2 1 T 1 s s /2 du= E Q [ s i y ]2 Var Q [s i y ] / 2 Approximate means and variances by Bayesian Graphical model algorithms 15

Gaussian Bayesian Graphical Models (2) (undirected) graphical model (on blackboard),[2],[3] 16

Algorithms for Variational Sparse Bayesian Inference (1) Efficient double-loop algorithms for solving the variational relaxation (5) at large scales and characterize its convexity The following reformulation was made: 2 2 T 1 min min = u, :=log A y X u s s h 0 (6) u 17

Double-loop Algorithm (1) Joint minimization of (6) is difficult due to the coupled term log A concept known as concave-convex or majorize-minimize will work convert (6) into: q min min= 2 y X u 2 2 log t i z i s 2i g 1 z z 0 u (7) i =1 18

Double-loop Algorithm (2) The Algorithm iterates between Inner loop: minimizations of (7) over u (which involve posterior mean calculations EQ(si y) as commonly used for MAP) Outer loop: updates for z 19

Double-loop Algorithm (3) 20

Variational Sparse Bayesian Reconstruction Bayesian inference can be used for sparse point reconstruction by computing the posterior mean in a zero temperature limit, where posterior mass is concentrated on exactly sparse points. 21

Re-weighted l1 Algorithm ARD zero temperature limit: we can use an alternative to the double-loop algorithm above, enjoying the same global convergence property but some additional benefits: q min min= 2 y X u 2 2 Z 11 / 2 s i g 2 z z 0 u (8) i =1 22

Properties of Automatic Relevance Determination ARD can offer substantial advantages over separable (convex or nonconvex) MAP estimation when searching for maximally sparse solutions. n min { u 0=2 I u 0 } such that y = X u u i=1 (9) i min RVB u such that y= X u ( 10 ) u n RVB =u=min log X X u u=min 2 z 1 u i g 2 z T 1/ 2 1 z 0 0 ( 11) i=1 23

Applications Sampling Optimization of Magnetic Resonance Imaging Source Localization and Group Sparsity Penalization 24

Sampling Optimization of Magnetic Resonance Imaging (1) Setup: n = 131072, q ~ 3n n up to (¾ n) u complex valued 25

Sampling Optimization of Magnetic Resonance Imaging (3) 26

glm-ie: The Generalised Linear Models Inference & Estimation Toolbox The glm-ie toolbox contains: scalable estimation routines for GLMs and SLMs scalable convex variational Bayesian inference relaxation. - MAP estimation - Variational Bayesian inference - Double loop algorithm - Nonlinear or group potentials - Expectation propagation inference http://mloss.org/software/view/269/ (last visit: 09.052011) from [6] 27

glm-ie: The Generalised Linear Models Inference & Estimation Toolbox Problems: Only 32 bit machine support Based on C++ and Fortran 77 Code ( MEX) Additional Software needed: L-BFGS-B (solving largescale nonlinear optimization problems) Examples exist, but are not documented offline 28

glm-ie: Example from [6] 29

Bibiliography [1] M.W. Seeger, D.P.Wipf: "" IEEE Signal Processing Magazine, Nov. 2010. [2] T.B. Minka: "Expectation Propsgstion for Approximate Bayesian Inference" Statistics Dept., Carnegie Mellon University -Pittsburgh [3] T.B. Minka: "A family of algorithms for approximate Bayesian inference" Department of Electrical Engineering and Computer Science, MIT [4] M. J. Beal : "VARIATIONAL ALGORITHMS FOR APPROXIMATE BAYESIAN INFERENCE ", The Gatsby Computational Neuroscience Unit, University College London [6] H. Nickisch: " glm-ie : The Generalised Linear Models Inference & Estimation Toolbox, MPI for Biological Cybernetics, Saarland University Tübingen, Germany 30