Introduction to Bayesian methods in inverse problems

Similar documents
MCMC Sampling for Bayesian Inference using L1-type Priors

Statistical Estimation of the Parameters of a PDE

The University of Auckland Applied Mathematics Bayesian Methods for Inverse Problems : why and how Colin Fox Tiangang Cui, Mike O Sullivan (Auckland),

Deblurring Jupiter (sampling in GLIP faster than regularized inversion) Colin Fox Richard A. Norton, J.

Markov Chain Monte Carlo (MCMC)

Computational statistics

An example of Bayesian reasoning Consider the one-dimensional deconvolution problem with various degrees of prior information.

Bayesian Methods and Uncertainty Quantification for Nonlinear Inverse Problems

Bagging During Markov Chain Monte Carlo for Smoother Predictions

A Search and Jump Algorithm for Markov Chain Monte Carlo Sampling. Christopher Jennison. Adriana Ibrahim. Seminar at University of Kuwait

April 20th, Advanced Topics in Machine Learning California Institute of Technology. Markov Chain Monte Carlo for Machine Learning

Marginal density. If the unknown is of the form x = (x 1, x 2 ) in which the target of investigation is x 1, a marginal posterior density

STA 4273H: Statistical Machine Learning

The Particle Filter. PD Dr. Rudolph Triebel Computer Vision Group. Machine Learning for Computer Vision

Adaptive Monte Carlo methods

Markov chain Monte Carlo methods in atmospheric remote sensing

Stat 516, Homework 1

Strong Lens Modeling (II): Statistical Methods

Efficient Variational Inference in Large-Scale Bayesian Compressed Sensing

Statistical and Computational Inverse Problems with Applications Part 2: Introduction to inverse problems and example applications

Bayesian Estimation of Input Output Tables for Russia

wissen leben WWU Münster

CSC 2541: Bayesian Methods for Machine Learning

A quick introduction to Markov chains and Markov chain Monte Carlo (revised version)

Inverse Problems in the Bayesian Framework

18 : Advanced topics in MCMC. 1 Gibbs Sampling (Continued from the last lecture)

STAT 518 Intro Student Presentation

Bayesian Methods for Machine Learning

16 : Approximate Inference: Markov Chain Monte Carlo

Principles of Bayesian Inference

Bayesian Inference and MCMC

Spatial Statistics with Image Analysis. Outline. A Statistical Approach. Johan Lindström 1. Lund October 6, 2016

Monte Carlo Methods. Leon Gu CSD, CMU

Advances and Applications in Perfect Sampling

Computer Vision Group Prof. Daniel Cremers. 14. Sampling Methods

Markov Chain Monte Carlo

Sampling Algorithms for Probabilistic Graphical models

SAMPLING ALGORITHMS. In general. Inference in Bayesian models

Density Estimation. Seungjin Choi

MH I. Metropolis-Hastings (MH) algorithm is the most popular method of getting dependent samples from a probability distribution

Variational Methods in Bayesian Deconvolution

Stochastic Spectral Approaches to Bayesian Inference

Introduction to Machine Learning

Overlapping block proposals for latent Gaussian Markov random fields

Markov Chain Monte Carlo methods

A Review of Pseudo-Marginal Markov Chain Monte Carlo

Learning the hyper-parameters. Luca Martino

Point spread function reconstruction from the image of a sharp edge

Lecture 8: Bayesian Estimation of Parameters in State Space Models

Markov chain Monte Carlo

ECE276A: Sensing & Estimation in Robotics Lecture 10: Gaussian Mixture and Particle Filtering

LECTURE 15 Markov chain Monte Carlo

Graphical Models and Kernel Methods

CPSC 540: Machine Learning

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo

Lecture 8: The Metropolis-Hastings Algorithm

27 : Distributed Monte Carlo Markov Chain. 1 Recap of MCMC and Naive Parallel Gibbs Sampling

Hastings-within-Gibbs Algorithm: Introduction and Application on Hierarchical Model

Adaptive Posterior Approximation within MCMC

Stat 451 Lecture Notes Markov Chain Monte Carlo. Ryan Martin UIC

Bayesian Inference in Astronomy & Astrophysics A Short Course

Results: MCMC Dancers, q=10, n=500

Computer Vision Group Prof. Daniel Cremers. 11. Sampling Methods: Markov Chain Monte Carlo

17 : Markov Chain Monte Carlo

Bayesian Image Segmentation Using MRF s Combined with Hierarchical Prior Models

Bayesian Inference for DSGE Models. Lawrence J. Christiano

MARKOV CHAIN MONTE CARLO

Bayesian rules of probability as principles of logic [Cox] Notation: pr(x I) is the probability (or pdf) of x being true given information I

Computer Intensive Methods in Mathematical Statistics

ST 740: Markov Chain Monte Carlo

MCMC algorithms for fitting Bayesian models

F denotes cumulative density. denotes probability density function; (.)

Monte Carlo simulation inspired by computational optimization. Colin Fox Al Parker, John Bardsley MCQMC Feb 2012, Sydney

Riemann Manifold Methods in Bayesian Statistics

Part 8: GLMs and Hierarchical LMs and GLMs

Discretization-invariant Bayesian inversion and Besov space priors. Samuli Siltanen RICAM Tampere University of Technology

Lecture 4: Probabilistic Learning

Gentle Introduction to Infinite Gaussian Mixture Modeling

Empirical Bayes Unfolding of Elementary Particle Spectra at the Large Hadron Collider

Metropolis Hastings. Rebecca C. Steorts Bayesian Methods and Modern Statistics: STA 360/601. Module 9

Monte Carlo methods for sampling-based Stochastic Optimization

Lecture: Gaussian Process Regression. STAT 6474 Instructor: Hongxiao Zhu

Pattern Recognition and Machine Learning

Advanced Machine Learning

Markov Chain Monte Carlo methods

Winter 2019 Math 106 Topics in Applied Mathematics. Lecture 9: Markov Chain Monte Carlo

Introduction to Bayesian Statistics and Markov Chain Monte Carlo Estimation. EPSY 905: Multivariate Analysis Spring 2016 Lecture #10: April 6, 2016

Computer Vision Group Prof. Daniel Cremers. 11. Sampling Methods

(5) Multi-parameter models - Gibbs sampling. ST440/540: Applied Bayesian Analysis

Markov Random Fields

Markov Chain Monte Carlo Using the Ratio-of-Uniforms Transformation. Luke Tierney Department of Statistics & Actuarial Science University of Iowa

Markov Chain Monte Carlo, Numerical Integration

General Construction of Irreversible Kernel in Markov Chain Monte Carlo

Session 3A: Markov chain Monte Carlo (MCMC)

Linear Dynamical Systems

CosmoSIS Webinar Part 1

Dynamic System Identification using HDMR-Bayesian Technique

Bayesian model selection in graphs by using BDgraph package

Computer Practical: Metropolis-Hastings-based MCMC

eqr094: Hierarchical MCMC for Bayesian System Reliability

Transcription:

Introduction to Bayesian methods in inverse problems Ville Kolehmainen 1 1 Department of Applied Physics, University of Eastern Finland, Kuopio, Finland March 4 2013 Manchester, UK.

Contents Introduction Bayesian inversion Computation of integration based estimates Numerical examples

What are inverse problems? Consider measurement problems; estimate the quantity of interest x R n from (noisy) measurement of A(x) R d, where A is known mapping. Inverse problems are characterized as those measurement problems which are ill-posed: 1. The problem is non-unique (e.g., less measurements than unknowns) 2. Solution is unstable w.r.t measurement noise and modelling errors (e.g., model reduction, inaccurately known nuisance parameters in the model A(x)).

An introductory example: Consider 2D deconvolution (image deblurring); Given noisy and blurred image m = Ax + e, m R n the objective is to reconstruct the original image x R n. Forward model x Ax implements discrete convolution (here the convolution kernel is Gaussian blurring kernel with std of 6 pixels). 5 5 10 10 15 15 20 20 25 25 30 30 35 35 40 40 45 45 50 50 5 10 15 20 25 30 35 40 45 50 5 10 15 20 25 30 35 40 45 50 Left; Original image x. Right; Observed image m = Ax + e.

Example (cont.) Matrix A has trivial null-space null(a) = {0} solution is unique (i.e. (A T A) 1 ). However, the problem is unstable (A has "unclear" null-space, i.e., Aw = k σ k w, v k u k 0 for certain w = 1) Figure shows the least squares (LS) solution x LS = arg min x m Ax 2 x LS = (A T A) 1 A T m 5 5 10 10 15 15 20 20 25 25 30 30 35 35 40 40 45 45 50 5 10 15 20 25 30 35 40 45 50 50 5 10 15 20 25 30 35 40 45 50 Left; true image x. Right; LS solution x LS

Solutions; Regularization. The ill posed problem is replaced by a well posed approximation. Solution hopefully close to the true solution. Typically modifications of the associated LS problem m Ax 2. Examples of methods; Tikhonov regularization, truncated iterations. Consider the LS problem x LS = arg min{ m Ax 2 x 2 } (AT A) x }{{} LS = A T m Uniqueness; B 1 if null(a) = {0}. Stability of the solution? B

Regularization Example (cont.); In Tikhonov regularization, the LS problem is replaced with x TIK = arg min{ m Ax 2 x 2 +α x 2 2 } (AT A + αi) x }{{} TIK = A T m Uniqueness; α > 0 null( B) = {0} B 1. Stability of the solution guaranteed by choosing α large enough. Regularization poses (implicit) prior about the solution. These assumptions are sometimes well hidden. B 5 5 10 10 15 15 20 20 25 25 30 30 35 35 40 40 45 45 50 50 5 10 15 20 25 30 35 40 45 50 5 10 15 20 25 30 35 40 45 50 Left; true image x. Right; Tikhonov solution x TIK.

Solutions; Statistical (Bayesian) inversion. The inverse problem is recast as a problem of Bayesian inference. The key idea is to extract estimates and assess uncertainty about the unknowns based on Measured data Model of measurement process Model of a priori information Ill-posedness removed by explicit use of prior models! Systematic handling of model uncertainties and reduction ( approximation error theory) 5 5 10 10 15 15 20 20 25 25 30 30 35 35 40 40 45 45 50 50 5 10 15 20 25 30 35 40 45 50 5 10 15 20 25 30 35 40 45 50 Left; true image x. Right; Bayesian estimate x CM.

Bayesian inversion. All variables in the model are considered as random variables. The randomness reflects our uncertainty of their actual values. The degree of uncertainty is coded in the probability distribution models of these variables. The complete model of the inverse problem is the posterior probability distribution π(x m) = π pr(x)π(m x), m = m observed (1) π(m) where π(m x) is the likelihood density; model of the measurement process. π pr (x) is the prior density; model for a priori information. π(m) is a normalization constant

The posterior density model is a function on n dimensional space; π(x m) : R n R + where n is usually large. To obtain "practical solution", the posterior model is summarized by estimates that answer to questions such as; What is the most probable value of x over π(x m)? What is the mean of x over π(x m)? In what interval are the values of x with 90% (posterior) probability?

Typical point estimates Maximum a posteriori (MAP) estimate: π(x MAP m) = arg max π(x m). x Conditional mean (CM) estimate: x CM = xπ(x m)dx. R n Covariance: Γ x m = (x x CM )(x x CM ) T π(x m) dx R n

Confidence intervals; Given 0 < τ < 100, compute a k and b k s.t. ak π k (x k ) dx k = b k π k (x k ) dx k = 100 τ 200 where π k (x k ) is the marginal density π k (x k ) = π(x m) dx 1 dx k 1 dx k+1 dx n. R n 1 The interval I k (τ) = [a k b k ] contains τ % of the mass of the marginal density.

Computation of the integration based estimates Many estimators are of the form f (x) = R n f (x)π(x m)dx Examples: f (x) = x x CM f (x) = (x x CM )(x x CM ) T Γ x m etc... Analytical evaluation in most cases impossible Traditional numerical quadratures not applicable when n is large (number of points needed unreasonably large, support of π(x m) may not be well known) Monte Carlo integration.

Monte Carlo integration Monte Carlo integration 1. Draw an ensemble {x (k), k = 1, 2,..., N} of i.i.d samples from π(x). 2. Estimate R n f (x)π(x)dx 1 N N f (x (k) ) k=1 Convergence (law of large numbers) 1 N lim N f (x (k) ) f (x)π(x)dx N R n k=1 (a.c) Variance of the estimator f = 1 N N k=1 f (x (k) ) reduces 1 N

Simple example of Monte Carlo integration Let x R 2 and { 1 π(x) = 4, x G, G = [ 1, 1] [ 1, 1] 0, x / G. The task is to compute integral g(x) = R 2 χ D π(x)dx where D is the unit disk on R 2. Using N = 5000 samples, we get estimate g(x) = 0.7814 (true value π/4 = 0.7854) 1 0.8 0.6 0.4 0.2 0 0.2 0.4 0.6 0.8 1 1 0.8 0.6 0.4 0.2 0 0.2 0.4 0.6 0.8 1

MCMC Direct sampling of the posterior usually not possible Markov chain Monte Carlo (MCMC). MCMC is Monte Carlo integration using Markov chains (dependent samples); 1. Draw {x (k), k = 1, 2,..., N} π(x) by simulating a Markov chain (with equilibrium distribution π(x)). 2. Estimate f (x)π(x)dx 1 N f (x (k) ) R N n Variance of the estimator reduces as τ/n where τ is the integrated autocorrelation time of the chain. Algorithms for MCMC; 1. Metropolis Hastings algorithm 2. Gibbs sampler k=1

Metropolis-Hastings algorihtm Generation of the ensemble {x (k), k = 1,..., N} π(x) using Metropolis Hastings algorithm; 1. Pick an initial value x (1) R n and set l = 1 2. Set x = x (l). 3. Draw a candidate sample x from proposal density x q(x, x ) and compute the acceptance factor ( α(x, x ) = min 1, π(x )q(x ), x) π(x)q(x, x. ) 4. Draw t [0, 1] from uniform probability density (t uni(0, 1)). 5. If α(x, x ) t, set x (l+1) = x, else x (l+1) = x. Increment l l + 1. 6. When l = N stop, else repeat from step 2.

Normalization constant of π do not need to be known. Great flexibility in choosing the proposal density q(x, x ); almost any density would do the job (eventually). However, the choice of q(x, x ) is a crucial part of successful MCMC; it determines the efficiency (autocorrelation time τ) of the algorithm q(x, x ) should be s.t. τ as small as possible No systematic methods for choosing q(x, x ). More based on intuition & art.

The updating may proceed All unknowns at a time Single component x j at a time A block of unknowns at a time The order of updating (in single component and blockwise updating) may be Update elements chosen in random (random scan) Systematic order The proposal can be mixture of several densities q(x, x ) = t p i q i (x, x ), i=1 t p i = 1 i=1

Proposal parameteres are usually calibrated by pilot runs; aim at finding parameters giving best efficiency (minimal τ). Determining the "burn-in"; Trial runs (e.g, running several chains & checking that they are consistent). Often determined from the plot of posterior trace {π(x (k) ), k = 1,..., N} (see Figure).

Numerical examples In summary, Bayesian solution of an inverse problem consist of the following steps: 1. Construct the likelihood model π(m x). This step includes the development of the model x A(x) and modeling the measurement noise and modelling errors. 2. Construction of the prior model π pr (x). 3. Develop computation methods for the summary statistics. We consider the the following numerical examples: 1. 2D deconvolution (intro example continued) 2. EIT with experimental data

Example 1; 2D deconvolution We are given a blurred and noisy image m = Ax + e, m R n Assume that we know a priori that the true solution is binary image representing some text. 5 5 10 10 15 15 20 20 25 25 30 30 35 35 40 40 45 45 50 50 5 10 15 20 25 30 35 40 45 50 5 10 15 20 25 30 35 40 45 50 Left; true image x. Right; Noisy, blurred image m = Ax + e.

Likelihood model π(m x): Joint density π(m, x, e) = π(m x, e)π(e x)π(x) = π(m, e x)π(x) In case of m = Ax + e, we have π(m x, e) = δ(m Ax e), and π(m x) = π(m, e x) de = δ(m Ax e)π(e x) de = π e x (m Ax x)

In the (usual) case of mutually independent x and e, we have π e x (e x) = π e (e) and π(m x) = π e (m Ax) The model of the noise: π(e) = N (0, Γ e ), ( π(m x) exp 1 ( L e (m Ax) 2)), 2 where L T el e = Γ 1 e. Remark; we have assumed that the (numerical) model Ax is exact (i.e., no modelling errors).

Prior model π pr (x) The prior knowledge: x can obtain the values x j {0, 1} Case x j = 0 (black, background), case x j = 1 (white, text). The pixels with value x j = 1 are known to be clustered in "blocky structures" which have short boundaries. We model the prior knowledge with Ising prior n π pr (x) exp α δ(x i, x j ) i=1 j N i where δ is the Kronecker delta { 1, xi = x δ(x i, x j ) = j 0, x i x j

Posterior model π(x m): Posterior model for the deblurring problem; π(x m) exp 1 n 2 L e(m Ax) 2 2 + α δ(x i, x j ) i=1 j N i Posterior explored with the Metropolis Hastings algorithm.

Metropolis Hastings proposal The proposal is a mixture q(x, x ) = 2 i=1 ξ iq i (x, x ) of two move types. Move 1: choose update element x i R by drawing index i with uniform probability 1 n and change the value of x i. Move 2: Let N (x) denote the set of active edges in image x (edge l ij connects pixels x i and x j in the lattice; it is active if x i x j ). Pick an active update edge w.p. 1 N (x) Pick one of the two pixels related to the chosen edge w.p 1 2 and change the value of the chosen pixel.

Results Left image; True image x Middle; Tikhonov regularized solution x TIK = arg min x { m Ax 2 2 + α x 2 2 } Right image; CM estimate x CM = R n xπ(x m)dx. 5 5 5 10 10 10 15 15 15 20 20 20 25 25 25 30 30 30 35 35 35 40 40 40 45 45 45 50 50 50 5 10 15 20 25 30 35 40 45 50 5 10 15 20 25 30 35 40 45 50 5 10 15 20 25 30 35 40 45 50 Left; true image x, Middle; x TIK. Right; x CM.

Results Left image; x CM Posterior variances diag(γ x m ) Right; A sample from π(x m) 5 5 5 10 10 10 15 15 15 20 20 20 25 25 25 30 30 30 35 35 35 40 40 40 45 45 45 50 5 10 15 20 25 30 35 40 45 50 50 5 10 15 20 25 30 35 40 45 50 50 5 10 15 20 25 30 35 40 45 50 Left; x CM, Middle; posterior variances diag(γ x m ). Right; A sample x from π(x m).

Introduction Bayesian inversion Computation of integration based estimates Numerical examples Example 2: "tank EIT" Joint work with Geoff Nicholls (Oxford) & Colin Fox (U. Otago, NZ) Data measured with the Kuopio EIT device. 16 electrodes, adjacent current patterns. Target: plastic cylinders in salt water. Measured voltages 0.05 0 0.05 0.1 0.15 50 100 150 200 250

Measurement model V = U(σ) + e, e N (e, Γ e ) where V denotes the measured voltages and U(σ) FEM-based forward map. e and σ modelled mutually independent. e and Γ e estimated by repeated measurements. Posterior model { π(σ V ) exp 1 } 2 (V U(σ))T Γ 1 e (V U(σ)) π pr (σ) We compute results with three different prior models π pr (σ). Sampling carried out with the Metropolis-Hasting sampling algorithm.

1 SMOOTHNESS PRIOR π pr (σ) π + (σ) exp Prior models π pr (σ) ( α ) n H i (σ), H i (σ) = σ i σ j 2. j N i i=1 2 MATERIAL TYPE PRIOR (FOX & NICHOLLS,1998) The possible material types {1, 2,..., C} inside the body are known but the segmentation to these materials is unknown. The material distibution is represented by an (auxliary) dicrete valued random field τ (pixel values τ j {1, 2,..., C}). The possible values of conductivity σ(τ) for different materials are known only approximately. Gaussian prior for the material conductivities. The different material types are assumed to be clustered in blocky structures (e.g., organs, etc)

Prior model π pr (σ) = π pr (σ τ)π pr (τ), where ( ) n n π pr (σ τ) π + (σ) exp α G i (σ) π(σ i τ i ) where π(σ i τ i ) = N (η(τ i ), ξ(τ i ) 2 ) and G i (σ) = i i=1 i=1 j {k k N i andτ k =τ i } (σ i σ j ) 2 π pr (τ) is an Ising prior: ( ) n π pr (τ) exp β T i (τ), T i (τ) = δ(τ i, τ j ), j N i (2) i=1 where δ(τ i, τ j ) is the Kronecker delta function.

3 CIRCLE PRIOR Domain Ω with known (constant) background conductivity σ bg is assumed to contain unknown number of circular inclusions (i.e., dimension of state not fixed) The inlcusions have known contrast Sizes and locations of inclusions unknown For the number N of the circular inclusions we write a point process prior: π pr (σ) = β N exp( βa)δ(σ A), where β = 1/A (A is the area of the domain Ω) A is set of feasible circle configurations (the circle inclusions are disjoint) and δ is the indicator function.

CM estimates with the different priors Smoothness prior (Top right), material type prior (Bottom left) and circle prior (Bottom right)