Efficient Variational Inference in Large-Scale Bayesian Compressed Sensing

Similar documents
Efficient Variational Inference in Large-Scale Bayesian Compressed Sensing

Density Estimation. Seungjin Choi

Machine Learning Techniques for Computer Vision

Scale Mixture Modeling of Priors for Sparse Signal Recovery

Bayesian Inference and MCMC

Variational Bayesian Inference Techniques

Graphical Models for Collaborative Filtering

Introduction to Bayesian methods in inverse problems

MCMC Sampling for Bayesian Inference using L1-type Priors

Likelihood NIPS July 30, Gaussian Process Regression with Student-t. Likelihood. Jarno Vanhatalo, Pasi Jylanki and Aki Vehtari NIPS-2009

Lecture 2: From Linear Regression to Kalman Filter and Beyond

Integrated Non-Factorized Variational Inference

Statistical learning. Chapter 20, Sections 1 4 1

Sequential Monte Carlo and Particle Filtering. Frank Wood Gatsby, November 2007

Recent Advances in Bayesian Inference Techniques

Lecture 2: From Linear Regression to Kalman Filter and Beyond

The Particle Filter. PD Dr. Rudolph Triebel Computer Vision Group. Machine Learning for Computer Vision

Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a

Probabilistic Graphical Models

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012

Linear Dynamical Systems

Learning Energy-Based Models of High-Dimensional Data

Introduction to Machine Learning

Advanced Structured Prediction

Probabilistic Graphical Models for Image Analysis - Lecture 4

STA 4273H: Statistical Machine Learning

Motivation Sparse Signal Recovery is an interesting area with many potential applications. Methods developed for solving sparse signal recovery proble

Statistical learning. Chapter 20, Sections 1 3 1

Approximate Inference Part 1 of 2

Estimation theory and information geometry based on denoising

ECE276A: Sensing & Estimation in Robotics Lecture 10: Gaussian Mixture and Particle Filtering

Approximate Inference Part 1 of 2

Lecture 6: Bayesian Inference in SDE Models

Unsupervised Learning

April 20th, Advanced Topics in Machine Learning California Institute of Technology. Markov Chain Monte Carlo for Machine Learning

Pattern Recognition and Machine Learning

Variational Learning : From exponential families to multilinear systems

From independent component analysis to score matching

Particle Filtering Approaches for Dynamic Stochastic Optimization

CPSC 540: Machine Learning

Bayesian Methods for Sparse Signal Recovery

Variational Inference via Stochastic Backpropagation

Learning with Noisy Labels. Kate Niehaus Reading group 11-Feb-2014

Variational inference

Non-Gaussian likelihoods for Gaussian Processes

PILCO: A Model-Based and Data-Efficient Approach to Policy Search

Lecture : Probabilistic Machine Learning

Variational Methods in Bayesian Deconvolution

CPSC 540: Machine Learning

Probabilistic Graphical Models

Probabilistic Graphical Models

13: Variational inference II

Bayesian inference J. Daunizeau

Machine Learning 4771

Bayesian Methods for Machine Learning

Bayesian Machine Learning

Markov Chain Monte Carlo Methods for Stochastic Optimization

Hierarchical Sparse Bayesian Learning. Pierre Garrigues UC Berkeley

Notes on Machine Learning for and

The Expectation-Maximization Algorithm

A Generative Perspective on MRFs in Low-Level Vision Supplemental Material

Lecture 4: Probabilistic Learning

(5) Multi-parameter models - Gibbs sampling. ST440/540: Applied Bayesian Analysis

Lecture 7 and 8: Markov Chain Monte Carlo

Pattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions

Expectation Propagation Algorithm

Lecture 4: Probabilistic Learning. Estimation Theory. Classification with Probability Distributions

The Kalman Filter ImPr Talk

Monte Carlo Methods. Leon Gu CSD, CMU

Particle Filtering for Data-Driven Simulation and Optimization

Bayesian Regression Linear and Logistic Regression

Clustering K-means. Clustering images. Machine Learning CSE546 Carlos Guestrin University of Washington. November 4, 2014.

Statistical learning. Chapter 20, Sections 1 3 1

An example of Bayesian reasoning Consider the one-dimensional deconvolution problem with various degrees of prior information.

CSci 8980: Advanced Topics in Graphical Models Gaussian Processes

STA414/2104 Statistical Methods for Machine Learning II

Lecture 8: Bayesian Estimation of Parameters in State Space Models

Variational Inference (11/04/13)

An introduction to Sequential Monte Carlo

Expectation maximization tutorial

Particle Filtering a brief introductory tutorial. Frank Wood Gatsby, August 2007

Variational Autoencoders

Machine Learning and Bayesian Inference. Unsupervised learning. Can we find regularity in data without the aid of labels?

Lecture 3. Linear Regression II Bastian Leibe RWTH Aachen

Inverse problem and optimization

IEOR E4570: Machine Learning for OR&FE Spring 2015 c 2015 by Martin Haugh. The EM Algorithm

Markov Chain Monte Carlo (MCMC)

Expectation Propagation for Approximate Bayesian Inference

Support Vector Machines

Machine Learning Basics: Maximum Likelihood Estimation

Nonparametric Bayesian Methods (Gaussian Processes)

Markov Chain Monte Carlo Methods for Stochastic

Markov Chain Monte Carlo

Sparse Bayesian Logistic Regression with Hierarchical Prior and Variational Inference

Noise-Blind Image Deblurring Supplementary Material

Gaussian Mixture Models

Probabilistic & Unsupervised Learning

Deblurring Jupiter (sampling in GLIP faster than regularized inversion) Colin Fox Richard A. Norton, J.

Need for Sampling in Machine Learning. Sargur Srihari

Bayesian Blind Deconvolution with General Sparse Image Priors

Transcription:

Efficient Variational Inference in Large-Scale Bayesian Compressed Sensing George Papandreou and Alan Yuille Department of Statistics University of California, Los Angeles ICCV Workshop on Information Theory in Computer Vision November 13, 2011, Barcelona, Spain

Inverse Image Problems 2 / 22 Denoising Deblurring Inpainting

The Sparse Linear Model A hidden vector x R N and noisy measurements y R M. Sparse linear model K P(x;θ) t(g T k x) k=1 P(y x;θ) = N(y; Hx,σ 2 I) g 1 g 2 g 3 g K x 1 x 2 x 3 x 4 x N h 1 h 2 h 3 h M Sparsity directions: s = Gx, with G = [g T 1 ;...;gt K ] Measurement directions: H = [h T 1 ;...;ht M ] Sparse potential: t(s), e.g., Laplacian t(s) = e τ k s k Model parameters: θ = (G, H,σ 2 ) 3 / 22

4 / 22 Deterministic or Probabilistic Modeling? Deterministic modeling: Standard Compressive Sensing Find minimum energy configuration Same as finding the posterior MAP Probabilistic modeling: Bayesian Compressive Sensing Try to capture the full posterior distribution Suitable for learning parameters by maximum likelihood (ML) Harder than just point estimate

Deterministic Modeling 5 / 22 MAP estimate as an optimization problem Estimate is ˆx MAP = argminφ MAP (x), where φ MAP (x) = σ 2 y Hx 2 2 K log t(s k ), s k = g T k x. k=1 Properties Modern optimization techniques allow us find ˆx MAP efficiently for large-scale problems.

5 / 22 Deterministic Modeling MAP estimate as an optimization problem Estimate is ˆx MAP = argminφ MAP (x), where φ MAP (x) = σ 2 y Hx 2 2 K log t(s k ), s k = g T k x. k=1 Properties Modern optimization techniques allow us find ˆx MAP efficiently for large-scale problems. How much do we trust the solution? What about error bars? Is the MAP best in terms of PSNR performance?

Probabilistic Modeling Work with the full posterior distribution K P(x y) N(y; Hx,σ 2 I) t(g T k x). k=1 Posterior Prior/Measure (Figure from Seeger & Wipf, 10) 6 / 22

Probabilistic Modeling Markov Chain Monte-Carlo vs. Variational Bayes Markov Chain Monte-Carlo Draw samples from the posterior Typically model prior with Gaussian mixtures and perform block Gibbs sampling. Very general, but can be slow and difficult to monitor convergence [Schmidt, Rao & Roth 10], [Papandreou & Yuille, 10],... Variational Bayes Approximate the posterior distribution with a tractable parametric form Systematic error but often guaranteed convergence [Attias, 99], [Girolami, 01], [Lewicki & Sejnowski, 00], [Palmer et al., 05], [Levin et al., 11], [Seeger & Nickisch, 11],... 7 / 22

Variational Bounding 8 / 22 Approximate the posterior distribution with a Gaussian Q(x y) N(y; Hx,σ 2 I)e 1 2 st Γ 1s = N(x;ˆx Q, A 1 ), with ˆx Q = A 1 b, A = σ 2 H T H+G T Γ 1 G, Γ = diag(γ), and b = σ 2 H T y. Suitable for super-gaussian priors t(s k ) = sup e s2 k /(2γ k) h k (γ k )/2 γ k >0 Optimization problem: Find the variational parameters γ that give the tightest fit.

Variational Bounding: Double-Loop Algorithm Outer Loop: Variance Computation Compute z = diag(ga 1 G T ), i.e. the vector of variances z k = Var Q (s k y) along the sparsity directions s k = g T k x. Inner Loop: Smoothed Estimation Obtain the variational mean ˆx Q = argmin x φ Q (x; z), where φ Q (x; z) = σ 2 y Hx 2 2 Update the variational parameters γ 1 K log t ((s k 2 + z k) 1/2) k=1 k = 2 d log t( v) dv v=ŝ 2 k +z k Convex if standard MAP is convex. See [Seeger & Nickisch, 11]. 9 / 22

10 / 22 Variance Computation Goal: Estimate elements of Σ = A 1, where A = σ 2 H T H+G T Γ 1 G Direct inversion is hopeless (N 10 6 ). Accurate and fast techniques for problems of special structure [Malioutov et al., 08]. Lanczos iteration (only MVM required) [Schneider & Willsky, 01], [Seeger & Nickisch, 11].

10 / 22 Variance Computation Goal: Estimate elements of Σ = A 1, where A = σ 2 H T H+G T Γ 1 G Direct inversion is hopeless (N 10 6 ). Accurate and fast techniques for problems of special structure [Malioutov et al., 08]. Lanczos iteration (only MVM required) [Schneider & Willsky, 01], [Seeger & Nickisch, 11]. This work: Monte-Carlo variance estimation.

11 / 22 Gaussian Sampling by Local Perturbations g 1 g 2 g 3 g K g 1 g 2 g 3 g K x 1 x 2 x 3 x 4 x N x 1 x 2 x 3 x 4 x N h 1 h 2 h 3 h M h 1 h 2 h 3 h M Gaussian MRF sampling by local noise injection 1. Local Perturbations : ỹ N(0,σ 2 I), and β N(0,Γ 1 ) 2. Gaussian Mode : A x = σ 2 H T ỹ+g T β Then x N(0, A 1 ), where A = σ 2 H T H+G T Γ 1 G. [Papandreou & Yuille, 10]

12 / 22 Monte-Carlo Variance Estimation Let x i N(0, A 1 ), with i = 1,...,N s. General purpose Monte-Carlo variance estimator ˆΣ = 1 N s where s k,i = g T k x i. Properties N s i=1 x i x T i, ẑ k = 1 N s N s i=1 s 2 k,i, Marginal distribution of estimates ẑ k /z k 1 N s χ 2 (N s ). Unbiased E {ẑ k } = z k. Relative error is r = (ẑ k )/z k = 2/N s.

Monte-Carlo vs. Lanczos Variance Estimates 13 / 22 8 x 10 3 SAMPLE LANCZOS EXACT 6 ẑk 4 2 0 0 2 4 6 8 z k x 10 3

14 / 22 Application: Image Deconvolution Measurement equation: y k x = Hx. Non-blind deconvolution (known blur kernel k). Blind deconvolution (unknown blur kernel k).

15 / 22 Blind Image Deconvolution Blur kernel recovery by Maximum Likelihood ML objective: ˆk = argmax k P(y; k) = argmax k P(y, x; k)dx. Variational ML: ˆk = argmax k Q(y; k) Contrast with argmax k (max x P(x, y; k)). [Fergus et al., 06], [Levin et al., 09].

16 / 22 Variational EM for Maximum Likelihood Find k by maximizing Q(y; k) [Girolami, 01], [Levin et al., 11]. E-Step Given current kernel estimate k t, do variational Bayesian inference, i.e., fit Q(x y; k t ). M-Step Maximize w.r.t. k the expected complete log-likelihood E Q(x y;k t ){log Q(x, y; k)}. Equivalently, minimize w.r.t. k { } 1 2 y Hx 2 = 1 ) ((H 2 tr T H)(A 1 + ˆxˆx T ) y T Hˆx+(const) E Q(x y;k t ) = 1 2 kt R xx k r T xyk+(const) Expected moments R xx estimated by Gaussian sampling.

Summary of Computational Primitives 17 / 22 Smoothed estimation Obtain the variational mean ˆx Q = argmin x φ Q (x; z), where φ Q (x; z) = σ 2 y Hx 2 2 K log t ((s k 2 + z k) 1/2) k=1 Inner loop of variational inference. Sparse linear system Ax = b, where A = σ 2 H T H+G T Γ 1 G. Estimate variances in outer loop of variational inference and moments R xx in blind image deconvolution.

17 / 22 Summary of Computational Primitives Smoothed estimation Obtain the variational mean ˆx Q = argmin x φ Q (x; z), where φ Q (x; z) = σ 2 y Hx 2 2 K log t ((s k 2 + z k) 1/2) k=1 Inner loop of variational inference. Sparse linear system Ax = b, where A = σ 2 H T H+G T Γ 1 G. Estimate variances in outer loop of variational inference and moments R xx in blind image deconvolution. Solve with preconditioned conjugate gradients.

18 / 22 Efficient Circulant Preconditioning Approximate A = σ 2 H T H+G T Γ 1 G with P = σ 2 H T H+ γ 1 G T G, with γ 1 (1/K) K k=1 γ 1 k [Lefkimmiatis et al., 12]. Properties Thanks to stationarity of P, DFT techniques apply. Optimality: P = argmin X C X A

Effect of Preconditioner 19 / 22 10 10 CG 10 5 PCG 10 0 10 5 10 10 10 15 0 20 40 60 80 100 120

Non-Blind Image Deblurring Example 20 / 22 ground truth our result (PSNR=31.93dB) blurred (PSNR=22.57dB) VB stdev

Blind Image Deblurring Example 21 / 22 ground truth our result (PSNR=27.54dB) blurred (PSNR=22.57dB) kernel

Summary Main Points Variational Bayesian inference using standard optimization primitives. Scalable to large-scale problems. Open question: Monte-Carlo or Variational?

Summary Main Points Variational Bayesian inference using standard optimization primitives. Scalable to large-scale problems. Open question: Monte-Carlo or Variational? Our software integrated in the glm-ie open source toolbox. THANK YOU!