Variational Bayesian Inference Techniques

Similar documents
Efficient Variational Inference in Large-Scale Bayesian Compressed Sensing

Probabilistic and Bayesian Machine Learning

STA 4273H: Statistical Machine Learning

Active and Semi-supervised Kernel Classification

An Algorithm for Bayesian Variable Selection in High-dimensional Generalized Linear Models

Bayesian Learning in Undirected Graphical Models

Pattern Recognition and Machine Learning

PILCO: A Model-Based and Data-Efficient Approach to Policy Search

Approximate Inference Part 1 of 2

Approximate Inference Part 1 of 2

Probabilistic Graphical Models

Recent Advances in Bayesian Inference Techniques

The Variational Gaussian Approximation Revisited

Machine Learning Techniques for Computer Vision

Part 1: Expectation Propagation

Bayesian Learning in Undirected Graphical Models

Bayesian System Identification based on Hierarchical Sparse Bayesian Learning and Gibbs Sampling with Application to Structural Damage Assessment

Probabilistic Graphical Models

Variational Principal Components

Density Propagation for Continuous Temporal Chains Generative and Discriminative Models

Bayesian Regression Linear and Logistic Regression

Probabilistic Graphical Models

U-Likelihood and U-Updating Algorithms: Statistical Inference in Latent Variable Models

Bayesian Inference Course, WTCN, UCL, March 2013

Machine Learning Summer School

Density Estimation. Seungjin Choi

Probabilistic Graphical Models

Content.

Kernel adaptive Sequential Monte Carlo

Gaussian Processes for Machine Learning

STA 4273H: Statistical Machine Learning

Sparse Recovery Beyond Compressed Sensing

Simultaneous Multi-frame MAP Super-Resolution Video Enhancement using Spatio-temporal Priors

STA414/2104 Statistical Methods for Machine Learning II

Diversity-Promoting Bayesian Learning of Latent Variable Models

Variational Inference via Stochastic Backpropagation

Probabilistic Graphical Models

STA 4273H: Statistical Machine Learning

Variational Learning : From exponential families to multilinear systems

Bayesian Paradigm. Maximum A Posteriori Estimation

CSci 8980: Advanced Topics in Graphical Models Gaussian Processes

Sparsity Regularization

Sequential Monte Carlo and Particle Filtering. Frank Wood Gatsby, November 2007

Bayesian Networks BY: MOHAMAD ALSABBAGH

Kernel Sequential Monte Carlo

9 Forward-backward algorithm, sum-product on factor graphs

p L yi z n m x N n xi

Bayesian room-acoustic modal analysis

The Origin of Deep Learning. Lili Mou Jan, 2015

Expectation Propagation in Factor Graphs: A Tutorial

Inferring Sparsity: Compressed Sensing Using Generalized Restricted Boltzmann Machines. Eric W. Tramel. itwist 2016 Aalborg, DK 24 August 2016

PART I INTRODUCTION The meaning of probability Basic definitions for frequentist statistics and Bayesian inference Bayesian inference Combinatorics

An introduction to particle filters

VCMC: Variational Consensus Monte Carlo

Nonparametric Bayesian Methods (Gaussian Processes)

Sparse Gaussian Markov Random Field Mixtures for Anomaly Detection

Large Scale Bayesian Inference

Lecture Notes 9: Constrained Optimization

(5) Multi-parameter models - Gibbs sampling. ST440/540: Applied Bayesian Analysis

Bayesian Machine Learning - Lecture 7

Expectation propagation as a way of life

Expectation Propagation for Approximate Bayesian Inference

Sparse Sensing for Statistical Inference

Learning Static Parameters in Stochastic Processes

Outline. 1 Introduction. 2 Problem and solution. 3 Bayesian tracking model of group debris. 4 Simulation results. 5 Conclusions

Sparse Covariance Selection using Semidefinite Programming

Design of Projection Matrix for Compressive Sensing by Nonsmooth Optimization

Parameter Estimation in the Spatio-Temporal Mixed Effects Model Analysis of Massive Spatio-Temporal Data Sets

Particle Filtering a brief introductory tutorial. Frank Wood Gatsby, August 2007

A graph contains a set of nodes (vertices) connected by links (edges or arcs)

Lecture 6: Graphical Models: Learning

Hierarchical sparse Bayesian learning for structural health monitoring. with incomplete modal data

Accelerated MRI Image Reconstruction

Introduction to Machine Learning

Part 2: Multivariate fmri analysis using a sparsifying spatio-temporal prior

Large Scale Bayesian Inference and Experimental Design for Sparse Linear Models

Recent Advances in Bayesian Inference for Inverse Problems

Lecture 16 Deep Neural Generative Models

On Bayesian Computation

State Space Gaussian Processes with Non-Gaussian Likelihoods

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016

Blind Equalization via Particle Filtering

Lecture 13 : Variational Inference: Mean Field Approximation

Bayesian Inference: Principles and Practice 3. Sparse Bayesian Models and the Relevance Vector Machine

Markov chain Monte Carlo methods for visual tracking

Fast Regularization Paths via Coordinate Descent

Bayesian Machine Learning

STA 4273H: Sta-s-cal Machine Learning

Dynamic Multipath Estimation by Sequential Monte Carlo Methods

Uncertainty quantification for Wavefield Reconstruction Inversion

Gaussian Process Approximations of Stochastic Differential Equations

Unsupervised Learning

Expectation propagation for signal detection in flat-fading channels

Variational Methods in Bayesian Deconvolution

wissen leben WWU Münster

Bayesian Hidden Markov Models and Extensions

TRACKING SOLUTIONS OF TIME VARYING LINEAR INVERSE PROBLEMS

Expectation Propagation performs smooth gradient descent GUILLAUME DEHAENE

Bayesian Methods for Sparse Signal Recovery

Bayes: All uncertainty is described using probability.

Transcription:

Advanced Signal Processing 2, SE Variational Bayesian Inference Techniques Johann Steiner 1

Outline Introduction Sparse Signal Reconstruction Sparsity Priors Benefits of Sparse Bayesian Inference Variational Sparse Bayesian Inference Gaussian Bayesian Graphical Models Algorithms for Variational Sparse Bayesian Inference Double-loop Algorithms Variational Sparse Bayesian Reconstruction Re-weighted l1 Algorithm Properties of Automatic Relevance Determination Applications Sampling Optimization of Magnetic Resonance Imaging Source Localization and Group Sparsity Penalization 2

Introduction Bayesian probability P H D = P D H P H P D Signal reconstruction from noisy measurements is a core problem in signal processing. Sparse signal reconstruction Compressive sensing 3

Sparse Signal Reconstruction (1) linear reconstruction problem measurements: y ℝ m design matrix: X ℝ m n we seek u : which minimizes the squared error: y X u u ℝ n 2 Example: MRI reconstruction Ill posed problem :many different u give zero error. 4

Sparse Signal Reconstruction (2) Ideally, estimation should be biased towards known properties of the signal class. Apply derivative or wavelet filters B: s= B u ℝ q The responses s exhibit statistical sparsity. 2 min y X u 2 R l u p u q R l u = B u p := s i p p i=1 p s= B u 2 0 (1) 5

Sparse Signal Reconstruction (3) The l1 reconstruction Problem (1) is non convex of any RIP restricted isometry property lp reconstruction view as probalistic sparse linear model (SLM) 6

Sparse Signal Reconstruction (4) Statistical sparsity of s : Laplace prior distribution: P u i t i si t i s i =e i s i 2 1 / 2 Student s t sparsity potentials: t i s i = 1 i / s i (2) (3) General solution of the inference problem: q P u y =Z 1 N y X u, 2 I t i s i s= B u (4) i =1 Z Partition function Z = N y X u, 2 I i t i s i 7

Sparsity Priors (1) Statistical and computational properties of SLM inference methods are determined by the choice of positive potentials ti(si) in the prior. P u i t i si enforce sparsity The statistical role of sparsity potentials is understood by inspecting the prior and posterior distributions they give rise to the next figure 8

Sparsity Priors (2) Examples: (a) prior distributions With the same var (b) corresponding posterior distr. 9

Benefits of Sparse Bayesian Inference Can sparse estimators with better properties than MAP estimation (1) be obtained from P (u y)? Example: A smooth nonlinear model f (.), densely sampled at n locations θi, and sources are reconstructed from sensor readings y by sparse estimation with X =[f(θi)] A Bayesian approach can alleviate these problems in many situations Convex l1 reconstruction tends to perform poorly. Nonconvex MAP reconstruction does not do well either: -log P(u y) computing the posterior mean instead of its mode integrating instead of maximizing over P (u y) the mean is not exactly sparse zero temperature limit 10

Variational Sparse Bayesian Inference (1) The advantages of Bayesian inference could well be offset by its computational difficulty. While there is a large and diverse body of approximate Bayesian inference technology, until recently none of these methods, applied to sparse linear models, could match the computational efficiency and theoretical characterization of MAP. Bayesian inference in SLMs, integrating over the posterior (4), is intractable for two reasons coming together: P (u y) is highly coupled (X is not block diagonal) and non-gaussian. Two major classes of inference approximations are Markov chain Monte Carlo (MCMC) and variational relaxations 11

Variational Sparse Bayesian Inference (2) Fit P (u y) by a Gaussian distribution Q (u y; γ) parameterized by γ minimizing a divergence measure between P and Q Exploit super Gaussanity of the prior potentials log Z max log N y X u, I e 2 T 1 s s / s h / 2 du (5) 0 s= B u :=diag 12

Variational Sparse Bayesian Inference (3) Relate the variational inference problem (5) to MAP estimation directly: the latter is obtained from the former by replacing the integration over u with a optimization over u. automatic relevance determination (ARD) 13

Variational Sparse Bayesian Inference (4) automatic relevance determination (ARD) For sparse reconstruction, ARD is an attractive alternative to convex or non convex MAP estimation. Variational sparse Bayesian inference (5) is a convex optimization problem if and only if MAP estimation is convex for the same model The variational inference relaxation (5) is solved by double-loop algorithms, scaled up to very large models by reductions to convex reconstruction and Bayesian graphical model technology. 14

Gaussian Bayesian Graphical Models (1) What does it take to solve the variational problem (5)? Can we use MAP estimation technology, or do we need computations of a different kind? log N y X u, I e 2 1 T 1 s s /2 du= E Q [ s i y ]2 Var Q [s i y ] / 2 Approximate means and variances by Bayesian Graphical model algorithms 15

Gaussian Bayesian Graphical Models (2) (undirected) graphical model (on blackboard),[2],[3] 16

Algorithms for Variational Sparse Bayesian Inference (1) Efficient double-loop algorithms for solving the variational relaxation (5) at large scales and characterize its convexity The following reformulation was made: 2 2 T 1 min min = u, :=log A y X u s s h 0 (6) u 17

Double-loop Algorithm (1) Joint minimization of (6) is difficult due to the coupled term log A concept known as concave-convex or majorize-minimize will work convert (6) into: q min min= 2 y X u 2 2 log t i z i s 2i g 1 z z 0 u (7) i =1 18

Double-loop Algorithm (2) The Algorithm iterates between Inner loop: minimizations of (7) over u (which involve posterior mean calculations EQ(si y) as commonly used for MAP) Outer loop: updates for z 19

Double-loop Algorithm (3) 20

Variational Sparse Bayesian Reconstruction Bayesian inference can be used for sparse point reconstruction by computing the posterior mean in a zero temperature limit, where posterior mass is concentrated on exactly sparse points. 21

Re-weighted l1 Algorithm ARD zero temperature limit: we can use an alternative to the double-loop algorithm above, enjoying the same global convergence property but some additional benefits: q min min= 2 y X u 2 2 Z 11 / 2 s i g 2 z z 0 u (8) i =1 22

Properties of Automatic Relevance Determination ARD can offer substantial advantages over separable (convex or nonconvex) MAP estimation when searching for maximally sparse solutions. n min { u 0=2 I u 0 } such that y = X u u i=1 (9) i min RVB u such that y= X u ( 10 ) u n RVB =u=min log X X u u=min 2 z 1 u i g 2 z T 1/ 2 1 z 0 0 ( 11) i=1 23

Applications Sampling Optimization of Magnetic Resonance Imaging Source Localization and Group Sparsity Penalization 24

Sampling Optimization of Magnetic Resonance Imaging (1) Setup: n = 131072, q ~ 3n n up to (¾ n) u complex valued 25

Sampling Optimization of Magnetic Resonance Imaging (3) 26

glm-ie: The Generalised Linear Models Inference & Estimation Toolbox The glm-ie toolbox contains: scalable estimation routines for GLMs and SLMs scalable convex variational Bayesian inference relaxation. - MAP estimation - Variational Bayesian inference - Double loop algorithm - Nonlinear or group potentials - Expectation propagation inference http://mloss.org/software/view/269/ (last visit: 09.052011) from [6] 27

glm-ie: The Generalised Linear Models Inference & Estimation Toolbox Problems: Only 32 bit machine support Based on C++ and Fortran 77 Code ( MEX) Additional Software needed: L-BFGS-B (solving largescale nonlinear optimization problems) Examples exist, but are not documented offline 28

glm-ie: Example from [6] 29

Bibiliography [1] M.W. Seeger, D.P.Wipf: "" IEEE Signal Processing Magazine, Nov. 2010. [2] T.B. Minka: "Expectation Propsgstion for Approximate Bayesian Inference" Statistics Dept., Carnegie Mellon University -Pittsburgh [3] T.B. Minka: "A family of algorithms for approximate Bayesian inference" Department of Electrical Engineering and Computer Science, MIT [4] M. J. Beal : "VARIATIONAL ALGORITHMS FOR APPROXIMATE BAYESIAN INFERENCE ", The Gatsby Computational Neuroscience Unit, University College London [6] H. Nickisch: " glm-ie : The Generalised Linear Models Inference & Estimation Toolbox, MPI for Biological Cybernetics, Saarland University Tübingen, Germany 30