Introduction to Bayesian methods in inverse problems

Introduction to Bayesian methods in inverse problems Ville Kolehmainen 1 1 Department of Applied Physics, University of Eastern Finland, Kuopio, Finland March 4 2013 Manchester, UK.

Contents Introduction Bayesian inversion Computation of integration based estimates Numerical examples

What are inverse problems? Consider measurement problems; estimate the quantity of interest x R n from (noisy) measurement of A(x) R d, where A is known mapping. Inverse problems are characterized as those measurement problems which are ill-posed: 1. The problem is non-unique (e.g., less measurements than unknowns) 2. Solution is unstable w.r.t measurement noise and modelling errors (e.g., model reduction, inaccurately known nuisance parameters in the model A(x)).

An introductory example: Consider 2D deconvolution (image deblurring); Given noisy and blurred image m = Ax + e, m R n the objective is to reconstruct the original image x R n. Forward model x Ax implements discrete convolution (here the convolution kernel is Gaussian blurring kernel with std of 6 pixels). 5 5 10 10 15 15 20 20 25 25 30 30 35 35 40 40 45 45 50 50 5 10 15 20 25 30 35 40 45 50 5 10 15 20 25 30 35 40 45 50 Left; Original image x. Right; Observed image m = Ax + e.

Example (cont.) Matrix A has trivial null-space null(a) = {0} solution is unique (i.e. (A T A) 1 ). However, the problem is unstable (A has "unclear" null-space, i.e., Aw = k σ k w, v k u k 0 for certain w = 1) Figure shows the least squares (LS) solution x LS = arg min x m Ax 2 x LS = (A T A) 1 A T m 5 5 10 10 15 15 20 20 25 25 30 30 35 35 40 40 45 45 50 5 10 15 20 25 30 35 40 45 50 50 5 10 15 20 25 30 35 40 45 50 Left; true image x. Right; LS solution x LS

Solutions; Regularization. The ill posed problem is replaced by a well posed approximation. Solution hopefully close to the true solution. Typically modifications of the associated LS problem m Ax 2. Examples of methods; Tikhonov regularization, truncated iterations. Consider the LS problem x LS = arg min{ m Ax 2 x 2 } (AT A) x }{{} LS = A T m Uniqueness; B 1 if null(a) = {0}. Stability of the solution? B

Regularization Example (cont.); In Tikhonov regularization, the LS problem is replaced with x TIK = arg min{ m Ax 2 x 2 +α x 2 2 } (AT A + αi) x }{{} TIK = A T m Uniqueness; α > 0 null( B) = {0} B 1. Stability of the solution guaranteed by choosing α large enough. Regularization poses (implicit) prior about the solution. These assumptions are sometimes well hidden. B 5 5 10 10 15 15 20 20 25 25 30 30 35 35 40 40 45 45 50 50 5 10 15 20 25 30 35 40 45 50 5 10 15 20 25 30 35 40 45 50 Left; true image x. Right; Tikhonov solution x TIK.

Solutions; Statistical (Bayesian) inversion. The inverse problem is recast as a problem of Bayesian inference. The key idea is to extract estimates and assess uncertainty about the unknowns based on Measured data Model of measurement process Model of a priori information Ill-posedness removed by explicit use of prior models! Systematic handling of model uncertainties and reduction ( approximation error theory) 5 5 10 10 15 15 20 20 25 25 30 30 35 35 40 40 45 45 50 50 5 10 15 20 25 30 35 40 45 50 5 10 15 20 25 30 35 40 45 50 Left; true image x. Right; Bayesian estimate x CM.

Bayesian inversion. All variables in the model are considered as random variables. The randomness reflects our uncertainty of their actual values. The degree of uncertainty is coded in the probability distribution models of these variables. The complete model of the inverse problem is the posterior probability distribution π(x m) = π pr(x)π(m x), m = m observed (1) π(m) where π(m x) is the likelihood density; model of the measurement process. π pr (x) is the prior density; model for a priori information. π(m) is a normalization constant

The posterior density model is a function on n dimensional space; π(x m) : R n R + where n is usually large. To obtain "practical solution", the posterior model is summarized by estimates that answer to questions such as; What is the most probable value of x over π(x m)? What is the mean of x over π(x m)? In what interval are the values of x with 90% (posterior) probability?

Typical point estimates Maximum a posteriori (MAP) estimate: π(x MAP m) = arg max π(x m). x Conditional mean (CM) estimate: x CM = xπ(x m)dx. R n Covariance: Γ x m = (x x CM )(x x CM ) T π(x m) dx R n

Confidence intervals; Given 0 < τ < 100, compute a k and b k s.t. ak π k (x k ) dx k = b k π k (x k ) dx k = 100 τ 200 where π k (x k ) is the marginal density π k (x k ) = π(x m) dx 1 dx k 1 dx k+1 dx n. R n 1 The interval I k (τ) = [a k b k ] contains τ % of the mass of the marginal density.

Computation of the integration based estimates Many estimators are of the form f (x) = R n f (x)π(x m)dx Examples: f (x) = x x CM f (x) = (x x CM )(x x CM ) T Γ x m etc... Analytical evaluation in most cases impossible Traditional numerical quadratures not applicable when n is large (number of points needed unreasonably large, support of π(x m) may not be well known) Monte Carlo integration.

Monte Carlo integration Monte Carlo integration 1. Draw an ensemble {x (k), k = 1, 2,..., N} of i.i.d samples from π(x). 2. Estimate R n f (x)π(x)dx 1 N N f (x (k) ) k=1 Convergence (law of large numbers) 1 N lim N f (x (k) ) f (x)π(x)dx N R n k=1 (a.c) Variance of the estimator f = 1 N N k=1 f (x (k) ) reduces 1 N

Simple example of Monte Carlo integration Let x R 2 and { 1 π(x) = 4, x G, G = [ 1, 1] [ 1, 1] 0, x / G. The task is to compute integral g(x) = R 2 χ D π(x)dx where D is the unit disk on R 2. Using N = 5000 samples, we get estimate g(x) = 0.7814 (true value π/4 = 0.7854) 1 0.8 0.6 0.4 0.2 0 0.2 0.4 0.6 0.8 1 1 0.8 0.6 0.4 0.2 0 0.2 0.4 0.6 0.8 1

MCMC Direct sampling of the posterior usually not possible Markov chain Monte Carlo (MCMC). MCMC is Monte Carlo integration using Markov chains (dependent samples); 1. Draw {x (k), k = 1, 2,..., N} π(x) by simulating a Markov chain (with equilibrium distribution π(x)). 2. Estimate f (x)π(x)dx 1 N f (x (k) ) R N n Variance of the estimator reduces as τ/n where τ is the integrated autocorrelation time of the chain. Algorithms for MCMC; 1. Metropolis Hastings algorithm 2. Gibbs sampler k=1

Metropolis-Hastings algorihtm Generation of the ensemble {x (k), k = 1,..., N} π(x) using Metropolis Hastings algorithm; 1. Pick an initial value x (1) R n and set l = 1 2. Set x = x (l). 3. Draw a candidate sample x from proposal density x q(x, x ) and compute the acceptance factor ( α(x, x ) = min 1, π(x )q(x ), x) π(x)q(x, x. ) 4. Draw t [0, 1] from uniform probability density (t uni(0, 1)). 5. If α(x, x ) t, set x (l+1) = x, else x (l+1) = x. Increment l l + 1. 6. When l = N stop, else repeat from step 2.

Normalization constant of π do not need to be known. Great flexibility in choosing the proposal density q(x, x ); almost any density would do the job (eventually). However, the choice of q(x, x ) is a crucial part of successful MCMC; it determines the efficiency (autocorrelation time τ) of the algorithm q(x, x ) should be s.t. τ as small as possible No systematic methods for choosing q(x, x ). More based on intuition & art.

The updating may proceed All unknowns at a time Single component x j at a time A block of unknowns at a time The order of updating (in single component and blockwise updating) may be Update elements chosen in random (random scan) Systematic order The proposal can be mixture of several densities q(x, x ) = t p i q i (x, x ), i=1 t p i = 1 i=1

Proposal parameteres are usually calibrated by pilot runs; aim at finding parameters giving best efficiency (minimal τ). Determining the "burn-in"; Trial runs (e.g, running several chains & checking that they are consistent). Often determined from the plot of posterior trace {π(x (k) ), k = 1,..., N} (see Figure).

Numerical examples In summary, Bayesian solution of an inverse problem consist of the following steps: 1. Construct the likelihood model π(m x). This step includes the development of the model x A(x) and modeling the measurement noise and modelling errors. 2. Construction of the prior model π pr (x). 3. Develop computation methods for the summary statistics. We consider the the following numerical examples: 1. 2D deconvolution (intro example continued) 2. EIT with experimental data

Example 1; 2D deconvolution We are given a blurred and noisy image m = Ax + e, m R n Assume that we know a priori that the true solution is binary image representing some text. 5 5 10 10 15 15 20 20 25 25 30 30 35 35 40 40 45 45 50 50 5 10 15 20 25 30 35 40 45 50 5 10 15 20 25 30 35 40 45 50 Left; true image x. Right; Noisy, blurred image m = Ax + e.

Likelihood model π(m x): Joint density π(m, x, e) = π(m x, e)π(e x)π(x) = π(m, e x)π(x) In case of m = Ax + e, we have π(m x, e) = δ(m Ax e), and π(m x) = π(m, e x) de = δ(m Ax e)π(e x) de = π e x (m Ax x)

In the (usual) case of mutually independent x and e, we have π e x (e x) = π e (e) and π(m x) = π e (m Ax) The model of the noise: π(e) = N (0, Γ e ), ( π(m x) exp 1 ( L e (m Ax) 2)), 2 where L T el e = Γ 1 e. Remark; we have assumed that the (numerical) model Ax is exact (i.e., no modelling errors).

Prior model π pr (x) The prior knowledge: x can obtain the values x j {0, 1} Case x j = 0 (black, background), case x j = 1 (white, text). The pixels with value x j = 1 are known to be clustered in "blocky structures" which have short boundaries. We model the prior knowledge with Ising prior n π pr (x) exp α δ(x i, x j ) i=1 j N i where δ is the Kronecker delta { 1, xi = x δ(x i, x j ) = j 0, x i x j

Posterior model π(x m): Posterior model for the deblurring problem; π(x m) exp 1 n 2 L e(m Ax) 2 2 + α δ(x i, x j ) i=1 j N i Posterior explored with the Metropolis Hastings algorithm.

Metropolis Hastings proposal The proposal is a mixture q(x, x ) = 2 i=1 ξ iq i (x, x ) of two move types. Move 1: choose update element x i R by drawing index i with uniform probability 1 n and change the value of x i. Move 2: Let N (x) denote the set of active edges in image x (edge l ij connects pixels x i and x j in the lattice; it is active if x i x j ). Pick an active update edge w.p. 1 N (x) Pick one of the two pixels related to the chosen edge w.p 1 2 and change the value of the chosen pixel.

Results Left image; True image x Middle; Tikhonov regularized solution x TIK = arg min x { m Ax 2 2 + α x 2 2 } Right image; CM estimate x CM = R n xπ(x m)dx. 5 5 5 10 10 10 15 15 15 20 20 20 25 25 25 30 30 30 35 35 35 40 40 40 45 45 45 50 50 50 5 10 15 20 25 30 35 40 45 50 5 10 15 20 25 30 35 40 45 50 5 10 15 20 25 30 35 40 45 50 Left; true image x, Middle; x TIK. Right; x CM.

Results Left image; x CM Posterior variances diag(γ x m ) Right; A sample from π(x m) 5 5 5 10 10 10 15 15 15 20 20 20 25 25 25 30 30 30 35 35 35 40 40 40 45 45 45 50 5 10 15 20 25 30 35 40 45 50 50 5 10 15 20 25 30 35 40 45 50 50 5 10 15 20 25 30 35 40 45 50 Left; x CM, Middle; posterior variances diag(γ x m ). Right; A sample x from π(x m).

Introduction Bayesian inversion Computation of integration based estimates Numerical examples Example 2: "tank EIT" Joint work with Geoff Nicholls (Oxford) & Colin Fox (U. Otago, NZ) Data measured with the Kuopio EIT device. 16 electrodes, adjacent current patterns. Target: plastic cylinders in salt water. Measured voltages 0.05 0 0.05 0.1 0.15 50 100 150 200 250

Measurement model V = U(σ) + e, e N (e, Γ e ) where V denotes the measured voltages and U(σ) FEM-based forward map. e and σ modelled mutually independent. e and Γ e estimated by repeated measurements. Posterior model { π(σ V ) exp 1 } 2 (V U(σ))T Γ 1 e (V U(σ)) π pr (σ) We compute results with three different prior models π pr (σ). Sampling carried out with the Metropolis-Hasting sampling algorithm.

1 SMOOTHNESS PRIOR π pr (σ) π + (σ) exp Prior models π pr (σ) ( α ) n H i (σ), H i (σ) = σ i σ j 2. j N i i=1 2 MATERIAL TYPE PRIOR (FOX & NICHOLLS,1998) The possible material types {1, 2,..., C} inside the body are known but the segmentation to these materials is unknown. The material distibution is represented by an (auxliary) dicrete valued random field τ (pixel values τ j {1, 2,..., C}). The possible values of conductivity σ(τ) for different materials are known only approximately. Gaussian prior for the material conductivities. The different material types are assumed to be clustered in blocky structures (e.g., organs, etc)

Prior model π pr (σ) = π pr (σ τ)π pr (τ), where ( ) n n π pr (σ τ) π + (σ) exp α G i (σ) π(σ i τ i ) where π(σ i τ i ) = N (η(τ i ), ξ(τ i ) 2 ) and G i (σ) = i i=1 i=1 j {k k N i andτ k =τ i } (σ i σ j ) 2 π pr (τ) is an Ising prior: ( ) n π pr (τ) exp β T i (τ), T i (τ) = δ(τ i, τ j ), j N i (2) i=1 where δ(τ i, τ j ) is the Kronecker delta function.

3 CIRCLE PRIOR Domain Ω with known (constant) background conductivity σ bg is assumed to contain unknown number of circular inclusions (i.e., dimension of state not fixed) The inlcusions have known contrast Sizes and locations of inclusions unknown For the number N of the circular inclusions we write a point process prior: π pr (σ) = β N exp( βa)δ(σ A), where β = 1/A (A is the area of the domain Ω) A is set of feasible circle configurations (the circle inclusions are disjoint) and δ is the indicator function.

CM estimates with the different priors Smoothness prior (Top right), material type prior (Bottom left) and circle prior (Bottom right)