wissen leben WWU Münster

Similar documents
MCMC Sampling for Bayesian Inference using L1-type Priors

Recent Advances in Bayesian Inference for Inverse Problems

wissen leben WWU Münster

Introduction to Bayesian methods in inverse problems

arxiv: v2 [math.na] 5 Jul 2016

Point spread function reconstruction from the image of a sharp edge

Default Priors and Effcient Posterior Computation in Bayesian

Bayesian Methods for Machine Learning

Discretization-invariant Bayesian inversion and Besov space priors. Samuli Siltanen RICAM Tampere University of Technology

Deblurring Jupiter (sampling in GLIP faster than regularized inversion) Colin Fox Richard A. Norton, J.

(4D) Variational Models Preserving Sharp Edges. Martin Burger. Institute for Computational and Applied Mathematics

Pattern Recognition and Machine Learning

Part 2: Multivariate fmri analysis using a sparsifying spatio-temporal prior

A hierarchical Krylov-Bayes iterative inverse solver for MEG with physiological preconditioning

Advances and Applications in Perfect Sampling

Bayesian Methods and Uncertainty Quantification for Nonlinear Inverse Problems

CPSC 540: Machine Learning

Markov Chain Monte Carlo Using the Ratio-of-Uniforms Transformation. Luke Tierney Department of Statistics & Actuarial Science University of Iowa

Nonparametric Bayesian Methods (Gaussian Processes)

Well-posed Bayesian Inverse Problems: Beyond Gaussian Priors

Bayesian Regression Linear and Logistic Regression

Large Scale Bayesian Inference

Contents. Part I: Fundamentals of Bayesian Inference 1

On Bayesian Computation

IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 18, NO. 3, MARCH

Bayesian inference. Fredrik Ronquist and Peter Beerli. October 3, 2007

Stat 535 C - Statistical Computing & Monte Carlo Methods. Lecture February Arnaud Doucet

M/EEG source analysis

STA 4273H: Statistical Machine Learning

Bayesian time series classification

Scale Mixture Modeling of Priors for Sparse Signal Recovery

arxiv: v1 [math.st] 19 Mar 2016

The Bayesian approach to inverse problems

Multivariate Bayes Wavelet Shrinkage and Applications

GPU Applications for Modern Large Scale Asset Management

Pattern Recognition and Machine Learning. Bishop Chapter 11: Sampling Methods

Computational statistics

Bayesian Inference: Principles and Practice 3. Sparse Bayesian Models and the Relevance Vector Machine

Integrated Non-Factorized Variational Inference

EEG/MEG Inverse Solution Driven by fmri

Markov Chain Monte Carlo methods

Uncertainty quantification for inverse problems with a weak wave-equation constraint

Fitting Narrow Emission Lines in X-ray Spectra

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo

Probabilistic Graphical Models

The Particle Filter. PD Dr. Rudolph Triebel Computer Vision Group. Machine Learning for Computer Vision

The connection of dropout and Bayesian statistics

Probabilistic machine learning group, Aalto University Bayesian theory and methods, approximative integration, model

STA414/2104 Statistical Methods for Machine Learning II

Markov Chain Monte Carlo (MCMC)

ECE521 week 3: 23/26 January 2017

Introduction to Machine Learning

Introduction to Machine Learning CMU-10701

Augmented Tikhonov Regularization

Monte Carlo methods for sampling-based Stochastic Optimization

Bayesian Inference and MCMC

Bayesian Methods for Sparse Signal Recovery

Efficient MCMC Sampling for Hierarchical Bayesian Inverse Problems

Dimension-Independent likelihood-informed (DILI) MCMC

MCMC and Gibbs Sampling. Kayhan Batmanghelich

arxiv: v1 [math.na] 14 Dec 2014

Hamiltonian Monte Carlo for Scalable Deep Learning

Efficient Variational Inference in Large-Scale Bayesian Compressed Sensing

STA 4273H: Statistical Machine Learning

Markov Chain Monte Carlo Algorithms for Gaussian Processes

New Machine Learning Methods for Neuroimaging

Basic Sampling Methods

Variational Bayesian Inference Techniques

Robust MCMC Sampling with Non-Gaussian and Hierarchical Priors

Bayesian Inference in GLMs. Frequentists typically base inferences on MLEs, asymptotic confidence

Bayesian Paradigm. Maximum A Posteriori Estimation

Regularization Parameter Selection for a Bayesian Multi-Level Group Lasso Regression Model with Application to Imaging Genomics

Bayesian Analysis. Bayesian Analysis: Bayesian methods concern one s belief about θ. [Current Belief (Posterior)] (Prior Belief) x (Data) Outline

An example of Bayesian reasoning Consider the one-dimensional deconvolution problem with various degrees of prior information.

Supplement to A Hierarchical Approach for Fitting Curves to Response Time Measurements

Parameter Estimation. William H. Jefferys University of Texas at Austin Parameter Estimation 7/26/05 1

Bayesian analysis of the neuromagnetic inverse

Hierarchical Bayesian approaches for robust inference in ARX models

The Bayesian Choice. Christian P. Robert. From Decision-Theoretic Foundations to Computational Implementation. Second Edition.

A Probabilistic Framework for solving Inverse Problems. Lambros S. Katafygiotis, Ph.D.

17 : Markov Chain Monte Carlo

ABC methods for phase-type distributions with applications in insurance risk problems

A Search and Jump Algorithm for Markov Chain Monte Carlo Sampling. Christopher Jennison. Adriana Ibrahim. Seminar at University of Kuwait

Computer Vision Group Prof. Daniel Cremers. 11. Sampling Methods: Markov Chain Monte Carlo

Stat 451 Lecture Notes Markov Chain Monte Carlo. Ryan Martin UIC

Stochastic Proximal Gradient Algorithm

The Metropolis-Hastings Algorithm. June 8, 2012

Modelling Operational Risk Using Bayesian Inference

Markov Networks. l Like Bayes Nets. l Graph model that describes joint probability distribution using tables (AKA potentials)

Computer Vision Group Prof. Daniel Cremers. 11. Sampling Methods

Markov Chain Monte Carlo in Practice

The University of Auckland Applied Mathematics Bayesian Methods for Inverse Problems : why and how Colin Fox Tiangang Cui, Mike O Sullivan (Auckland),

Single-channel source separation using non-negative matrix factorization

Introduction to Bayesian Statistics and Markov Chain Monte Carlo Estimation. EPSY 905: Multivariate Analysis Spring 2016 Lecture #10: April 6, 2016

Computer Vision Group Prof. Daniel Cremers. 14. Sampling Methods

Markov Chain Monte Carlo, Numerical Integration

CSC 2541: Bayesian Methods for Machine Learning

Spatial Statistics with Image Analysis. Outline. A Statistical Approach. Johan Lindström 1. Lund October 6, 2016

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Uncertainty quantification for Wavefield Reconstruction Inversion

Transcription:

MÜNSTER Sparsity Constraints in Bayesian Inversion Inverse Days conference in Jyväskylä, Finland. Felix Lucka 19.12.2012

MÜNSTER 2 /41 Sparsity Constraints in Inverse Problems Current trend in high dimensional inverse problems: Sparsity constraints. Compressed Sensing: High quality reconstructions from a small amount of data, if a sparse basis/dictionary is a-priori known (e.g., wavelets). Total Variation (TV) imaging: Sparsity constraints on the gradient of the unknowns. (a) 20 min, EM (b) 5 sec, EM (c) 5 sec, Bregman EM-TV Thank s to Jahn Müller for these images!,,

Sparsity Constraints in the Bayesian Approach Sparsity as a-priori information are encoded into the prior distribution p prior (u): 1. Turning the functionals used in variational regularization directly into priors, e.g., L1-type priors: Convenient, as prior is log-concave. MAP estimate is sparse, but the prior itself is not sparse. 2. Hierarchical Bayesian modeling (HBM): Sparsity is introduced at the hyperparameter level. Relies on a slightly different concept of sparsity. Resulting implicit priors over unknowns are usually not log-concave. (a) exp ( 1 ) 2 u 2 2 (b) exp ( u 1) (c) (1 + u 2 /3) 2

, MÜNSTER 4 /41 Outline Introduction: Sparsity Constraints in Inverse Problems Sparse Bayesian Inversion with L1-type Priors Introduction to L1-type Priors Fast Posterior Sampling Take Home Messages I Sparse Bayesian Inversion by Hierarchical Modeling in EEG/MEG Introduction to EEG/MEG Source Reconstruction Sparsity Promoting Hierarchical Bayesian Modeling Take Home Messages II

Likelihood: exp ( ) 1 f K u 2 2 σ 2 2 Prior: exp ( λ u 1) (λ via discrepancy principle) Posterior: exp ( 1 2 σ 2 f K u 2 2 λ u 1 )

MÜNSTER 6 /41 L1-type Priors in Bayesian Inversion Influence on MAP estimate is well understood (boring). Influence on other quantities (CM, credible regions, histograms, sophisticated stuff...): Interesting. Interesting scenario for the comparison between variational regularization theory and Bayesian inference! M. Lassas and S. Siltanen, 2004. Can one use total variation prior for edge-preserving Bayesian inversion? M. Lassas, E. Saksman, and S. Siltanen, 2009. Discretization invariant Bayesian inversion and Besov space priors. V. Kolehmainen, M. Lassas, K. Niinimäki, and S. Siltanen, 2012. Sparsity-promoting Bayesian inversion. S. Comelli, 2011. A Novel Class of Priors for Edge-Preserving Methods in Bayesian Inversion Master thesis, supervision by V. Capasso and M. Burger,

MÜNSTER 7 /41 Sample and Integration-based Inference for Sparse Bayesian Inversion Computation of many Bayesian quantities relies on drawing independent samples from posterior. Most widely used black box MCMC sampling algorithm: Metropolis-Hastings (MH) [Metropolis et al., 1953; Hastings, 1970]. Breaks down for ill-posed inverse problems with sparsity constraints and n. Examining n is interesting. This talk summarizes partial results from: F. L., 2012. Fast Markov chain Monte Carlo sampling for sparse Bayesian inference in high-dimensional inverse problems using L1-type priors Inverse Problems, 28(12). arxiv:1206.0262v2.,

Own Contributions Single component Gibbs sampling for posteriors from L1-type priors: Fast and explicit computation of 1-dim. cond. densities. Exact and robust sampling from 1-dim densities. Further acceleration by stochastic over-relaxation techniques. Detailed comparison of the most basic variants of MH and Gibbs sampling. Scenarios with up to 261 121 unknowns. (a) Metropolis Hastings scheme (b) Gibbs scheme

Evaluating Sampling Efficiency by Autocorrelation Functions Sampling efficiency: Fast generation of independent samples. R(τ) [0, 1]: Average correlation between samples x i, x i+τ w.r.t. to a test function (rescale by computation time per sample: R(τ) R (t)). Rapid decay of R(τ)/R (t) = Samples get uncorrelated fast! length = 4 length = 16 length = 32 1 length = 4 length = 16 length = 32 0 (c) Stochastic processes... (d)...and their autocorrelation functions

MÜNSTER 10 /41 Total Variation Deblurring Example in 1D (from Lassas & Siltanen, 2004) 1 0 Model of a charge coupled device (CCD) in 1D. Unknown light intensity ũ : [0, 1] R +, indicator on [ 1, 2 ]. 3 3 1 Integrated into k = 30 CCD pixels [, k+1 ] [0, 1]. k+2 k+2 Noise is added. ũ is reconstructed on a regular, n-dim. grid. D is the forward finite difference operator with NB cond. ( p post(u f ) exp 1 ) 2 σ f K 2 u 2 2 λ D u 1 0 0.2 0.4 t 0.6 0.8 1 (a) The unknown function ũ(t) 0.03 0.02 0.01 0 0 0.2 0.4 t 0.6 0.8 1 (b) The measurement data f,,

Total Variation Deblurring Example in 1D (from Lassas & Siltanen, 2004) R*( t ) 1 0.8 0.6 0.4 MH, λ = 100 MH, λ = 200 MH, λ = 400 Gibbs, λ = 100 Gibbs, λ = 200 Gibbs, λ = 400 0.2 0 0 1 t (sec) 2 3 Figure: Temporal autocorrelation plots R (t) for n = 63.

Total Variation Deblurring Example in 1D (from Lassas & Siltanen, 2004) 1 0.8 0.6 MH Iso, n = 127, λ = 280 RnGibbs, n = 127, λ = 280 MH Iso, n = 255, λ = 400 RnGibbs, n = 255, λ = 400 MH Iso, n = 511, λ = 560 RnGibbs n = 511, λ = 560 MH Iso, n = 1023, λ = 800 RnGibbs, n = 1023, λ = 800 R*( t ) 0.4 0.2 0 10 0 10 2 t (sec) 10 4 10 6 Figure: Temporal autocorrelation plots R (t).

Total Variation Deblurring Example in 1D (from Lassas & Siltanen, 2004) New sampler can be used to address theoretical questions: Lassas & Siltanen, 2004: For λ n n + 1, the TV prior converges to a smoothness prior in the limit n. MH sampling to compute CM estimate for n = 63, 255, 1023, 4095. Even after a month of computation time only partly satisfying results. 1 u real n = 63, K = 500000, K0 = 200, t = 21m n = 255, K = 100000, k0 = 50, t = 16m n = 1023, K = 50000, k0 = 20, t = 45m n = 4095, K = 10000, k0 = 20, t = 34m n = 16383, K = 5000, k0 = 20, t = 1h 51m n = 65535, K = 1000, k0 = 20, t = 3h 26m 0 0 1 Figure: CM estimate computed for n = 63, 255, 1023, 4095, 16383, 65535 using Gibbs sampler on a comparable CPU.

Image Deblurring Example in 2D Unknown function ũ Measurement data m Gaussian blurring kernel Relative noise level of 10% Reconstruction using n = 511 511 = 261 121.

, MÜNSTER 15 /41 Image Deblurring Example in 2D (a) 1h comp. time (b) 5h comp. time (c) 20h comp. time Figure: CM estimates by MH sampler

, MÜNSTER 16 /41 Image Deblurring Example in 2D (a) 1h comp. time (b) 5h comp. time (c) 20h comp. time Figure: CM estimates by Gibbs sampler

, MÜNSTER 17 /41 Take Home Messages, Part I High dimensional inverse problems using sparsity constraints pose specific challenges for MCMC. MH and Gibbs may show very different performance. Contrary to common beliefs, MCMC is not in general slow and scales bad with increasing dimension! MCMC for inverse problems is far less elaborate compared to optimization up to now. Sample-based inference in high dimensions is feasible (n > 10 6 works). CAUTION: These results do not generalize! MH and Gibbs sampling have both pro s and con s!

, MÜNSTER 18 /41 Outline Introduction: Sparsity Constraints in Inverse Problems Sparse Bayesian Inversion with L1-type Priors Introduction to L1-type Priors Fast Posterior Sampling Take Home Messages I Sparse Bayesian Inversion by Hierarchical Modeling in EEG/MEG Introduction to EEG/MEG Source Reconstruction Sparsity Promoting Hierarchical Bayesian Modeling Take Home Messages II

MÜNSTER 19 /41 Cooperation with... Dr. Sampsa Pursiainen Department of Mathematics and Systems Analysis, Aalto University, Finland Prof. Dr. Martin Burger Institute for Applied Mathematics, University of Münster, Germany PD. Dr. Carsten Wolters Institute for Biomagnetism and Biosignalanalysis, University of Münster, Germany

, MÜNSTER 20 /41 Background of this Part of the Talk Felix Lucka. Hierarchical Bayesian Approaches to the Inverse Problem of EEG/MEG Current Density Reconstruction. Diploma thesis in mathematics, University of Münster, March 2011 Felix Lucka., Sampsa Pursiainen, Martin Burger,Carsten H. Wolters. Hierarchical Bayesian Inference for the EEG Inverse Problem using Realistic FE Head Models: Depth Localization and Source Separation for Focal Primary Currents. Neuroimage, 61(4), 2012.

MÜNSTER 21 /41 Source Reconstruction by Electroencephalography (EEG) and Magnetoencephalography (MEG) Aim: Reconstruction of brain activity by non-invasive measurement of induced electromagnetic fields (bioelectromagnetism) outside of the skull. source: Wikimedia Commons source: Wikimedia Commons

MÜNSTER 22 /41 One Challenge of Source Reconstruction: The Inverse Problem (Presumably) under-determined Unfortunately, not just: Severely ill-conditioned, special spatial characteristics. Signal is contaminated by a complex spatio-temporal mixture of external and internal noise and nuisance sources.

Discretization Approach: Current Density Reconstruction (CDR) Continuous (ion current) vector field Spatial grid with 3 orthogonal elementary sources at each node (in general more sophisticated). f = K u, = p li (f u) exp( 1 2 Σ 1/2 ε (f K u) 2 2)

, MÜNSTER 24 /41 Tasks and Problems for EEG/MEG in Presurgical Epilepsy Diagnosis EEG/MEG in epileptic focus localization: Focal epilepsy is believed to originate from networks of focal sources. Active in inter-ictal spikes. Task 1: Determine number of focal sources (multi focal epilepsy?). Task 2: Determine location and extend of sources. Problems of established CDR methods: Depth-Bias: Reconstruction of deeper sources too close to the surface. Masking: Near-surface sources mask deep-lying ones.

Depth Bias: Illustration One deep-lying reference source (blue cone) and minimum norm estimate: u MNE = argmin{ Σ 1/2 ε (f K u) 2 2 + λ u 2 2}

Depth Bias: Illustration Reweighting minimum norm estimate (sloreta, Pascual-Marqui, 2002).

Masking: Illustration Reference sources.

Masking: Illustration MNE result and reference sources (green cones).

Masking: Illustration sloreta result and reference sources (green cones).

, MÜNSTER 30 /41 Basic Ideas of Sparsity Promoting HBM Gaussian prior with fixed, uniform diagonal covariance matrix. p pr (u) N (0, γ Id)

, MÜNSTER 30 /41 Basic Ideas of Sparsity Promoting HBM Gaussian prior with fixed, uniform diagonal covariance matrix. p pr (u) N (0, γ Id) Advantages: Posterior is Gaussian. Simple, well understood analytical structure. Problems: Gaussian variables = characteristic scale given by variance. (not scale invariant) All sources have variance γ = Similar amplitudes are likely. = Sparse solutions are very unlikely.

, MÜNSTER 31 /41 Basic Ideas of Sparsity Promoting HBM Gaussian prior with flexible, individual diagonal covariance matrix: p pr (u γ) N (0, diag [γ 1,,..., γ n]) Let the data determine γ i (hyperparameters). Bayesian inference: γ are random variables as well. Their prior distribution p hypr (γ) is called hyperprior. Encode sparsity constraints into hyperprior.

, MÜNSTER 32 /41 Basic Ideas of Sparsity Promoting HBM Gaussian prior with flexible, individual diagonal covariance matrix: p pr (u γ) N (0, diag [γ 1,,..., γ n]) and heavy-tailed, non-informative, i.e., scale invariant prior on γ i, e.g., p hypr (γ i ) γ 1 i to promote sparsity on the hyperparameter level.

, MÜNSTER 32 /41 Basic Ideas of Sparsity Promoting HBM Gaussian prior with flexible, individual diagonal covariance matrix: p pr (u γ) N (0, diag [γ 1,,..., γ n]) and heavy-tailed, non-informative, i.e., scale invariant prior on γ i, e.g., p hypr (γ i ) γ 1 i to promote sparsity on the hyperparameter level. Problem: Improper prior, improper posterior.

, MÜNSTER 32 /41 Basic Ideas of Sparsity Promoting HBM Gaussian prior with flexible, individual diagonal covariance matrix: p pr (u γ) N (0, diag [γ 1,,..., γ n]) and heavy-tailed, weakly-informative prior on γ i, e.g., inverse gamma distribution: ) γ 1 i γ (1+α) i exp ( βγi to promote sparsity on the hyperparameter level. Advantage: Conjugate hyperprior, computationally convenient.

MÜNSTER 32 /41 Basic Ideas of Sparsity Promoting HBM Gaussian prior with flexible, individual diagonal covariance matrix: p pr (u γ) N (0, diag [γ 1,,..., γ n]) and heavy-tailed, weakly-informative prior on γ i, e.g., inverse gamma distribution: ) γ 1 i γ (1+α) i exp ( βγi to promote sparsity on the hyperparameter level. Advantage: Conjugate hyperprior, computationally convenient. By the direct correspondence, we get sparsity over the primary unknowns u as well. Generalization: diag [γ 1,,..., γ n] γ i C i (Gaussian scale mixture models) C i to model complex spatio-temporal covariance structures.,

, MÜNSTER 33 /41 Some Basic Features of Sparsity Promoting HBM Posterior: p post(u, γ f ) ( exp 1 2 Σ 1/2 ε (f K u) 2 2 k i=1 ( 1 u 2 i 2 + β + ( )) ) α + 5 2 ln γi γ i Gaussian with respect to u. Factorizes over γ i s. Energy is non-convex w.r.t. (u, γ) (posterior is multimodal).

Depth Bias: Full-CM Computed by blocked Gibbs sampler.

Depth Bias: Full-MAP Computed by alternating optimization initialized at the CM estimate.

Masking: Result Full-CM Computed by blocked Gibbs sampler.

Masking: Result Full-MAP Computed by alternating optimization initialized at the CM estimate.

, MÜNSTER 38 /41 Contributions of our Studies Elaborate up on: Daniela Calvetti, Harri Hakula, Sampsa Pursiainen, Erkki Somersalo, 2009. Conditionally Gaussian hypermodels for cerebral source localization Implementation of Full-MAP and Full-CM inference for HBM with realistic, high resolution Finite Element (FE) head models. Improve algorithms for Full-MAP estimation. Examination of general properties, parameter choices, etc. Systematic examination of performance concerning depth-bias and masking in simulation studies. Introduction of suitable performance measures for validation of simulation studies (Wasserstein distances).

, MÜNSTER 39 /41 Systematic Studies: Summary Results for Full-MAP and Full-CM estimation: Good performance in all validation measures. No depth bias. Good results w.r.t. orientation, amplitude and spatial extend. Full-MAP estimate (by our algorithm): Best results in every aspect examined. Full results: Felix Lucka., Sampsa Pursiainen, Martin Burger,Carsten H. Wolters. Hierarchical Bayesian Inference for the EEG Inverse Problem using Realistic FE Head Models: Depth Localization and Source Separation for Focal Primary Currents. Neuroimage, 61(4), 2012.

, MÜNSTER 40 /41 Take Home Messages: Hierarchical Bayesian Modeling Current trend in all areas of Bayesian inference. Extension of the prior model by hyperparameters γ and hyperpriors. Gaussian w.r.t. u, factorization w.r.t. γ, sparse hyperprior. Alternative formulation of sparse Bayesian inversion that has promising, interesting features for EEG/MEG source reconstruction: No depth-bias. Good source separation Non-convex energy (multimodal posterior) but the possibility to infer the support from CM estimate.

MÜNSTER 41 /41 Thank you for your attention! F. L., 2012. Fast Markov chain Monte Carlo sampling for sparse Bayesian inference in high-dimensional inverse problems using L1-type priors Inverse Problems, 28(12), 2012; arxiv:1206.0262v2. F. L., Sampsa Pursiainen, Martin Burger,Carsten H. Wolters, 2012. Hierarchical Bayesian Inference for the EEG Inverse Problem using Realistic FE Head Models: Depth Localization and Source Separation for Focal Primary Currents. Neuroimage, 61(4), 2012.,