Bayesian Methods and Uncertainty Quantification for Nonlinear Inverse Problems

Similar documents
Randomize-Then-Optimize for Sampling and Uncertainty Quantification in Electrical Impedance Tomography

Bayesian Inverse Problems with L[subscript 1] Priors: A Randomize-Then-Optimize Approach

MCMC Sampling for Bayesian Inference using L1-type Priors

Answers and expectations

Randomize-Then-Optimize: A Method for Sampling from Posterior Distributions in Nonlinear Inverse Problems

Introduction to Bayesian methods in inverse problems

Bayesian Inference and MCMC

ECE276A: Sensing & Estimation in Robotics Lecture 10: Gaussian Mixture and Particle Filtering

16 : Markov Chain Monte Carlo (MCMC)

Deblurring Jupiter (sampling in GLIP faster than regularized inversion) Colin Fox Richard A. Norton, J.

Lecture 8: Bayesian Estimation of Parameters in State Space Models

Discretization-invariant Bayesian inversion and Besov space priors. Samuli Siltanen RICAM Tampere University of Technology

Markov chain Monte Carlo methods in atmospheric remote sensing

Metropolis Hastings. Rebecca C. Steorts Bayesian Methods and Modern Statistics: STA 360/601. Module 9

17 : Markov Chain Monte Carlo

Reminder of some Markov Chain properties:

Robust MCMC Sampling with Non-Gaussian and Hierarchical Priors

Point spread function reconstruction from the image of a sharp edge

Advanced Statistical Modelling

Computer Practical: Metropolis-Hastings-based MCMC

Lecture 8: The Metropolis-Hastings Algorithm

SAMPLING ALGORITHMS. In general. Inference in Bayesian models

Inverse Problems in the Bayesian Framework

Recent Advances in Bayesian Inference for Inverse Problems

I. Bayesian econometrics

Bayesian Regression Linear and Logistic Regression

Probabilistic Graphical Models Lecture 17: Markov chain Monte Carlo

CS281A/Stat241A Lecture 22

Markov Chain Monte Carlo (MCMC)

Stat 451 Lecture Notes Markov Chain Monte Carlo. Ryan Martin UIC

Lecture 7 and 8: Markov Chain Monte Carlo

Lecture 4: Dynamic models

April 20th, Advanced Topics in Machine Learning California Institute of Technology. Markov Chain Monte Carlo for Machine Learning

Sampling from complex probability distributions

Bayesian Inference for DSGE Models. Lawrence J. Christiano

Introduction to MCMC. DB Breakfast 09/30/2011 Guozhang Wang

Graphical Models and Kernel Methods

Spatial Statistics with Image Analysis. Outline. A Statistical Approach. Johan Lindström 1. Lund October 6, 2016

Learning the hyper-parameters. Luca Martino

F denotes cumulative density. denotes probability density function; (.)

Well-posed Bayesian Inverse Problems: Beyond Gaussian Priors

Exercises Tutorial at ICASSP 2016 Learning Nonlinear Dynamical Models Using Particle Filters

Winter 2019 Math 106 Topics in Applied Mathematics. Lecture 9: Markov Chain Monte Carlo

Computer intensive statistical methods

Density Estimation. Seungjin Choi

Bayesian parameter estimation in predictive engineering

Bayesian Inverse Problems

Inverse problems and uncertainty quantification in remote sensing

Bayesian Inference for DSGE Models. Lawrence J. Christiano

CPSC 540: Machine Learning

Generalized Rejection Sampling Schemes and Applications in Signal Processing

Parameter Estimation. William H. Jefferys University of Texas at Austin Parameter Estimation 7/26/05 1

Markov Chain Monte Carlo (MCMC)

Robert Collins CSE586, PSU Intro to Sampling Methods

Markov Chain Monte Carlo

Markov Chains and MCMC

Probabilistic Graphical Models

Adaptive Posterior Approximation within MCMC

Nonlinear Model Reduction for Uncertainty Quantification in Large-Scale Inverse Problems

Robert Collins CSE586, PSU Intro to Sampling Methods

MCMC and Gibbs Sampling. Kayhan Batmanghelich

Forward Problems and their Inverse Solutions

Markov Chain Monte Carlo, Numerical Integration

MARKOV CHAIN MONTE CARLO

AEROSOL MODEL SELECTION AND UNCERTAINTY MODELLING BY RJMCMC TECHNIQUE

Previously Monte Carlo Integration

Monte Carlo Methods. Leon Gu CSD, CMU

Transport maps and dimension reduction for Bayesian computation Youssef Marzouk

Markov Chain Monte Carlo

Markov Chain Monte Carlo Methods for Stochastic

Adaptive Monte Carlo methods

Sequential Monte Carlo Methods (for DSGE Models)

Bayesian Prediction of Code Output. ASA Albuquerque Chapter Short Course October 2014

Computer Vision Group Prof. Daniel Cremers. 11. Sampling Methods

Statistical Estimation of the Parameters of a PDE

Monte Carlo Markov Chains: A Brief Introduction and Implementation. Jennifer Helsby Astro 321

Estimation theory and information geometry based on denoising

Strong Lens Modeling (II): Statistical Methods

Markov Chain Monte Carlo (MCMC) and Model Evaluation. August 15, 2017

Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak

Econometrics I, Estimation

Robert Collins CSE586, PSU Intro to Sampling Methods

Simulation - Lectures - Part III Markov chain Monte Carlo

STA 4273H: Statistical Machine Learning

Augmented Tikhonov Regularization

A quick introduction to Markov chains and Markov chain Monte Carlo (revised version)

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) =

Review. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda

Lecture 2: From Linear Regression to Kalman Filter and Beyond

Computer Intensive Methods in Mathematical Statistics

Markov Networks.

Computational statistics

Hierarchical Bayesian Inversion

ST 740: Markov Chain Monte Carlo

Hierarchical models. Dr. Jarad Niemi. August 31, Iowa State University. Jarad Niemi (Iowa State) Hierarchical models August 31, / 31

BAYESIAN METHODS FOR VARIABLE SELECTION WITH APPLICATIONS TO HIGH-DIMENSIONAL DATA

Parallel Tempering I

MCMC Methods: Gibbs and Metropolis

A Review of Pseudo-Marginal Markov Chain Monte Carlo

Quantifying Uncertainty

Transcription:

Bayesian Methods and Uncertainty Quantification for Nonlinear Inverse Problems John Bardsley, University of Montana Collaborators: H. Haario, J. Kaipio, M. Laine, Y. Marzouk, A. Seppänen, A. Solonen, Z. Wang Technical University Denmark, December 216

Outline Nonlinear Inverse Problems Setup Randomize-then-Optimize (RTO) Test Cases: small # of parameters examples electrical impedance tomography l 1 priors, i.e., TV and Besov priors

Now Consider a Nonlinear Statistical Model Now assume the non-linear, Gaussian statistical model where y = A(x) + ɛ, y R m is the vector of observations; x R n is the vector of unknown parameters; A : R n R m is nonlinear; ɛ N (, λ 1 I m ), i.e., ɛ is i.i.d. Gaussian with mean and variance λ 1.

Toy example Consider the following nonlinear, two-parameter pre-whitened model. y i = x 1 (1 exp( x 2 t i )) + ɛ i, ɛ i N(, σ 2 ), i = 1, 2, 3, 4, 5, with t i = 2i 1, σ =.136, and y = [.76,.258,.369,.492,.559]. GOAL: estimate a probability distribution for x = (x 1, x 2 ).

Toy example continued: the Bayesian posterior p(x 1, x 2 y) p(x 1 y) = x 2 p(x 1, x 2 y)dx 2 p(x 2 y) = x 1 p(x 1, x 2 y)dx 1 4 3 2 1 DRAM RTO corrected RTO 25 2 15 1 5.6.8 1 1.2 1.4 1.6 θ1.5.1.15.2 θ2.15 1 9.8 θ2.1 9.6 9.4.5.7.8.9 1 1.1 1.2 1.3 1.4 1.5 1.6 θ1 9.2 p(x 1, x 2 y)

Compute Samples Using Markov Chain Monte Carlo Markov chain Monte Carlo (MCMC) is a framework for sampling from a (potentially un-normalized) probability distribution. Some Classical MCMC algorithms Gibbs sampling (talk 1: for sampling from p(x, λ, δ y)) Metropolis-Hastings Adaptive Metropolis (talk 1: for sampling from p(λ, δ y)) Inverse Problems: high-dimensional posterior Posterior is harder to explore with classical algorithms Chains become more correlated, sampling becomes inefficient

Metropolis-Hastings Definitions: p(x y) x k q(x x k ) x posterior (target) density random variable of the Markov chain at step k proposal density given x k random variable from the proposal A chain of samples {x, x 1, } is generated by: 1. Start at x 2. For k = 1, 2, K 2.1 sample x q(x x{ k 1 ) } 2.2 calculate α = min p(x y)q(x k 1 x ) p(x k 1 y)q(x x k 1 ) {, 1 x 2.3 x k with probability α = x k 1 with probability 1 α

Metropolis-Hastings Demonstration: http://chifeng.scripts.mit.edu/stuff/mcmc-demo/ chifeng.scripts.mit.edu/stuff/mcmc-demo/

Randomize-then-Optimize (RTO): defines a proposal q Assumption: RTO requires that the posterior to have least squares form, i.e., p(x y) exp ( 1 ) 2 Ā(x) ȳ 2.

Randomize-then-Optimize (RTO): defines a proposal q Assumption: RTO requires that the posterior to have least squares form, i.e., p(x y) exp ( 1 ) 2 Ā(x) ȳ 2. Given that the likelihood function has the form p(y x) exp ( λ ) 2 A(x) y 2, for which priors p(x) will the posterior density function have least squares form? p(x y) p(y x)p(x).

Test Case 1: Uniform prior In small parameter cases, it is often true that p(y x) = for x / Ω. Then we can choose as a prior p(x) defined by x U(Ω), where U denotes the multivariate uniform distribution.

Test Case 1: Uniform prior In small parameter cases, it is often true that p(y x) = for x / Ω. Then we can choose as a prior p(x) defined by x U(Ω), where U denotes the multivariate uniform distribution. Then p(x) = constant on Ω, and we have p(x y) p(y x)p(x) exp ( 1 ) 2 A(x) y 2. Thus can use RTO to sample from p(x y).

Test Case 2: Gaussian prior When a Gaussian prior is chosen, ( p(x) exp 1 ) 2 (x x ) T L(x x ), the posterior can also be written in least squares form: p(x y) p(y x)p(x) ( exp 1 2 A(x) y 2 1 ) 2 (x x ) T L(x x )

Test Case 2: Gaussian prior When a Gaussian prior is chosen, ( p(x) exp 1 ) 2 (x x ) T L(x x ), the posterior can also be written in least squares form: p(x y) p(y x)p(x) ( exp 1 2 A(x) y 2 1 ) 2 (x x ) T L(x x ) = exp 1 [ ] [ ] A(x) y 2 2 L 1/2 x L 1/2 x def = exp ( 1 ) 2 Ā(x) ȳ 2, Thus we can use RTO to sample from p(x y).

Extension of optimization-based approach to nonlinear problems: Randomized maximum likelihood Recall that when Ā is linear, we can sample from p(x y) via: x = arg min ψ Ā(ψ) (ȳ + ɛ) 2, ɛ N (, I m+n ). Comment: For nonlinear models, this is called randomized maximum likelihood. Problem: It is an open question what the probability of x is.

Extension to nonlinear problems As in the linear case, we create a nonlinear mapping x = F 1 (v), v N (Q T ȳ, I n ).

Extension to nonlinear problems As in the linear case, we create a nonlinear mapping x = F 1 (v), v N (Q T ȳ, I n ). What are Q and F? First, define x MAP = arg min x Ā(x) ȳ 2, then first-order optimality yields J(x MAP ) T (Ā(x MAP) ȳ) =.

So x MAP is a solution of the nonlinear equation J(x MAP ) T Ā(x) = J(x MAP ) T ȳ.

So x MAP is a solution of the nonlinear equation J(x MAP ) T Ā(x) = J(x MAP ) T ȳ. QR-rewrite: this equation can be equivalently expressed Q T Ā(x) = Q T ȳ, where J(x MAP ) = QR is the thin QR factorization of J(x MAP ).

So x MAP is a solution of the nonlinear equation J(x MAP ) T Ā(x) = J(x MAP ) T ȳ. QR-rewrite: this equation can be equivalently expressed Q T Ā(x) = Q T ȳ, where J(x MAP ) = QR is the thin QR factorization of J(x MAP ). Nonlinear mapping: define F def = Q T Ā and ( ) x = F 1 Q T (ȳ + ɛ), ɛ N (, I m+n ) def = F 1 (v), v N (Q T ȳ, I n ).

RTO: use optimization to compute x = F 1 (v) Compute a sample x from the RTO proposal q(x): 1. Randomize: compute v N (Q T ȳ, I n ); 2. Optimize: solve x = arg min F(ψ) v 2 ψ 3. Reject x when v is not in the range of F.

RTO: use optimization to compute x = F 1 (v) Compute a sample x from the RTO proposal q(x): 1. Randomize: compute v N (Q T ȳ, I n ); 2. Optimize: solve x = arg min F(ψ) v 2 ψ 3. Reject x when v is not in the range of F. Comment: steps 1 & 2 can be equivalently expressed x = arg min ψ QT (Ā(ψ) (ȳ + ɛ)) 2, ɛ N (, I m+n ).

PDF for x = F 1 (v), v N (Q T ȳ, I n ) ( First, v N (Q T ȳ, I n ) implies p v (v) exp 1 2 v QT ȳ 2).

PDF for x = F 1 (v), v N (Q T ȳ, I n ) ( First, v N (Q T ȳ, I n ) implies p v (v) exp 1 2 v QT ȳ 2). Next we need d dx F(x) Rn n to be invertible. Then ( ) q(x) d det dx F(x) p v (F(x)) ( ) = det Q T J(x) exp ( 1 ) 2 QT (Ā(x) ȳ) 2

PDF for x = F 1 (v), v N (Q T ȳ, I n ) ( First, v N (Q T ȳ, I n ) implies p v (v) exp 1 2 v QT ȳ 2). Next we need d dx F(x) Rn n to be invertible. Then ( ) q(x) d det dx F(x) p v (F(x)) ( ) = det Q T J(x) exp ( 1 ) 2 QT (Ā(x) ȳ) 2 ( ) ( ) = det Q T 1 J(x) exp 2 Q T (Ā(x) ȳ) 2 exp ( 1 ) 2 Ā(x) ȳ 2 = c(x)p(x y), where the columns of Q are orthonormal and C( Q) C(Q).

Theorem (RTO probability density) Let Ā : Rn R m+n, ȳ R m+n, and assume Ā is continuously differentiable; J(x) R (m+n) n is rank n for every x; Q T J(x) is invertible for all relevant x. Then the random variable x = F 1 (v), v N (Q T ȳ, I n ), has probability density function q(x) c(x)p(x y), where ( ) c(x) = det(q T 1 J(x)) exp 2 Q T (ȳ Ā(x)) 2, where the columns of Q are orthonormal and C( Q) C(Q).

RTO Metropolis-Hastings Definitions: p(x y) x k q(x ) x posterior (target) density random variable of the Markov chain at step k RTO (independence) proposal density random variable from the proposal A chain of samples {x, x 1, } is generated by: 1. Start at x 2. For k = 1, 2, K 2.1 sample x q(x ){ from the RTO proposal } density 2.2 calculate α = min p(x y)q(x k 1 ) p(x k 1 y)q(x ) {, 1 x 2.3 x k with probability α = x k 1 with probability 1 α

Metropolis-Hastings using RTO Given x k 1 and proposal x q(x), accept with probability where recall that ( r = min 1, p(x y)q(x k 1 ) ) p(x k 1 y)q(x ) ( = min 1, p(x y)c(x k 1 )p(x k 1 ) y) p(x k 1 y)c(x )p(x y) ( ) = min 1, c(xk 1 ) c(x, ) ( ) c(x) = det(q T 1 J(x)) exp 2 Q T (ȳ Ā(x)) 2.

Metropolis-Hastings using RTO, Cont. T The RTO Metropolis-Hastings Algorithm 1. Choose x = x MAP and number of samples N. Set k = 1. 2. Compute an RTO sample x q(x ). 3. Compute the acceptance probability ( ) r = min 1, c(xk 1 ) c(x. ) 4. With probability r, set x k = x, else set x k = x k 1. 5. If k < N, set k = k + 1 and return to Step 2.

Understanding RTO (thanks to Zheng Wang) Consider the simple, scalar inverse problem : y = }{{} observation forward model {}}{ f(x) + ɛ }{{} noise, x N(, 1), ɛ N(, 1) p(x y) exp ( 1 ) }{{} 2 (f(x) y)2 exp ( 1 ) 2 x2 posterior ( [ ] [ ] exp x 2 ) 1 2 ( exp 1 2 f(x) }{{} Ā(x) y }{{} ȳ Ā(x) ȳ 2 )

density output Understanding RTO Least-squares form: p(x y) ( exp 1 [ ] [ ] x 2 ) 2 f(x) y }{{}}{{} Ā(x) ȳ p(x y) is the height of the path Ā(x) = [x, f(x)] T on the Gaussian ([ ] N, I y 2 ). 4 2.5 posterior -2 2 3

Understanding RTO Algorithm s task: sample from the posterior

Understanding RTO Algorithm s task: sample from the posterior

Understanding RTO Algorithm s task: sample from the posterior

Understanding RTO Algorithm s task: sample from the posterior

Understanding RTO Algorithm s task: sample from the posterior

Understanding RTO Algorithm s task: sample from the posterior

Understanding RTO Algorithm s task: sample from the posterior

Understanding RTO Algorithm s task: sample from the posterior

Understanding RTO Algorithm s task: sample from the posterior

Understanding RTO Algorithm s task: sample from the posterior

density output Randomize-then-optimize Generate RTO samples {x k }: 4 2.5 posterior -2 2 3

density output Randomize-then-optimize Generate RTO samples {x k }: 1. Compute x MAP. 4 2 3 bar.5-2 2 3

density output Randomize-then-optimize Generate RTO samples {x k }: 1. Compute x MAP. 2. Compute Q = J(x MAP )/ J(x MAP ). 4 2 Q bar.5-2 2 3

density output Randomize-then-optimize Generate RTO samples {x k }: 1. Compute x MAP. 2. Compute Q = J(x MAP )/ J(x MAP ). 3. For k = 1, 2,, K 3.1 Sample ξ N (ȳ, I 2 ) 4 2.5-2 2 3

density output Randomize-then-optimize Generate RTO samples {x k }: 1. Compute x MAP. 2. Compute Q = J(x MAP )/ J(x MAP ). 3. For k = 1, 2,, K 3.1 Sample ξ N (ȳ, I 2 ) 3.2 Compute x k = arg min x Q T ( Ā(x) ξ ) 2. 4 2 proposal.5-2 2 3

density output Randomize-then-optimize Generate RTO samples {x k }: 1. Compute x MAP. 2. Compute Q = J(x MAP )/ J(x MAP ). 3. For k = 1, 2,, K 3.1 Sample ξ N (ȳ, I 2 ) 3.2 Compute x k = arg min x Q T ( Ā(x) ξ ) 2. RTO proposal density: q(x k ) Q T J(x k ) ( exp 1 ( 2 Q T Ā(x k ) ȳ ) 2) 4 2.5 proposal -2 2 3

Uniform prior test cases Choose prior p(x) defined by x U(Ω), where U is a multivariate uniform distribution on Ω. Then p(x) = constant on Ω, and we have p(x y) exp ( 1 ) 2 A(x) y 2. Thus can use RTO to sample from p(x y).

BOD, Good: A(x 1, x 2 ) = x 1 (1 exp( x 2 t)) t = 2 linearly spaced observations in 1 x 9; y = A(x 1, x 2 ) + ɛ, where ɛ N (, σ 2 I) with σ =.1; [x 1, x 2 ] = [1,.1] T. 4 3 2 1 DRAM RTO corrected RTO 25 2 15 1 5.6.8 1 1.2 1.4 1.6 θ1.5.1.15.2 θ2.15 1 9.8 θ2.1 9.6 9.4.5.7.8.9 1 1.1 1.2 1.3 1.4 1.5 1.6 θ1 9.2

BOD, Bad: A(x1, x2 ) = x1 (1 exp( x2 t)) t = 2 linearly spaced observations in 1 x 5; y = A(x1, x2 ) +, where N (, σ 2 I) with σ =.1; [x1, x2 ] = [1,.1]T. 2 25 DRAM RTO corrected RTO 1.5 2 15 1 1.5 5 1 2 3 4.5 θ1.1.15 θ2.15 RTO DRAM θ2.1.5.5 1 1.5 2 2.5 3 θ1 3.5 4 4.5 5

MONOD: A(x 1, x 2 ) = x 1 t/(x 2 + t) t = [28, 55, 83, 11, 138, 225, 375] T y = [.53,.6,.112,.15,.99,.122,.125] T. 3 2 1.3.2.1.15.2.25 θ1 θ2 12 RTO corrected RTO DRAM 1 8 6 4.1 2 5 1 15 θ2.1.12.14.16.18.2 θ1

Autocorrelation plots for x 1 for Good and Bad BOD 1.4 1.3 1.2 DRAM 1.4 1.3 1.2 RTO 1.8 ACF for 1 DRAM RTO 1.1 1.1.6 1 1 1.4.9.8.9.8.2.7 2 4 6 8 1.7 2 4 6 8 1 2 4 6 8 1 2.5 2 2.5 2 1.8.6 DRAM RTO 1 1.5 1.5.4 1 1.2 2 4 6 8 1 2 4 6 8 1 5 1 15 2 lag

Gaussian prior test case When a Gaussian prior is chosen, ( p(x) exp 1 ) 2 (x x ) T L(x x ), the posterior can be written in least squares form: ( p(x y) exp 1 2 A(x) y 2 1 ) 2 (x x ) T L(x x ) = exp 1 [ ] [ ] A(x) y 2 2 L 1/2 x L 1/2 x def = exp ( 1 ) 2 Ā(x) ȳ 2. Thus we can use RTO to sample from p(x y).

Electrical Impedance Tomography Seppänen, Solonen, Haario, Kaipio (x ϕ) =, r Ω ϕ + z l x ϕ n = y l, r e l, l = 1,..., L e l x ϕ n ds = I l, l = 1,..., L x ϕ n =, r Ω\ L l=1 e l x = x( r) & ϕ = ϕ( r): electrical conductivity & potential. r Ω: spatial coordinate. e l : area under the lth electrode. z l : contact impedance between lth electrode and object. y l & I l : amplitudes of the electrode potential and current. n: outward unit normal L: number of electrodes.

EIT, Forward/Inverse Problem (image by Siltanen) Left: current I and electrode potential y; Right: conductivity x. Forward Problem: Given the conductivity x, compute y = f(x) + ɛ. Evaluating f(x) requires solving a complicated PDE. Inverse Problem: Given y, construct the posterior density p(x y).

RTO Metropolis-Hastings applied to EIT example True Conductivity = Realization from Smoothness Prior 1.6 1.4 1.2 γ (ms cm 1 ) 1.8.6.4 8 6 4 2 2 4 6 8 r (cm) Upper images: truth & conditional mean. Lower images: 99% c.i. s & profiles of all of the above.

RTO Metropolis-Hastings applied to EIT example True Conductivity = Internal Structure #1 2 1.5 γ (ms cm 1 ) 1.5 8 6 4 2 2 4 6 8 r (cm) Upper images: truth & conditional mean. Lower images: 99% c.i. s & profiles of all of the above.

RTO Metropolis-Hastings applied to EIT example True Conductivity = Internal Structure #2 1.4 1.2 1 γ (ms cm 1 ).8.6.4.2 8 6 4 2 2 4 6 8 r (cm) Upper images: truth & conditional mean. Lower images: 99% c.i. s & profiles of all of the above.

Laplace (Total Variation and Besov) Priors Finally, we consider the l 1 prior case: p(x) exp ( δ Dx 1 ), where D is an invertible matrix. Then the posterior then takes the form p(x y) exp ( 1 ) 2 A(x) y 2 δ Dx 1. Note that total variation in one-dimension and the Besov B1,1 s -space priors in one- and higher-dimensions have this form.

Laplace (Total Variation and Besov) Priors Finally, we consider the l 1 prior case: p(x) exp ( δ Dx 1 ), where D is an invertible matrix. Then the posterior then takes the form p(x y) exp ( 1 ) 2 A(x) y 2 δ Dx 1. Note that total variation in one-dimension and the Besov B1,1 s -space priors in one- and higher-dimensions have this form. But p(x y) does not have least squares form.

Prior Transformation for l 1 Priors Main idea: Transform the problem to one that RTO can solve Define a map between a reference parameter u and the physical parameter x. Choose the mapping so that the prior on u is Gaussian. Sample from the transformed posterior, in u, using RTO, then transform the samples back. S:u x u x

Prior Transformation for l 1 Priors Main idea: Transform the problem to one that RTO can solve Define a map between a reference parameter u and the physical parameter x. Choose the mapping so that the prior on u is Gaussian. Sample from the transformed posterior, in u, using RTO, then transform the samples back. S:u x u x

The One-Dimensional Transformation The prior transformation is analytic and is defined where x = S(u) def = F 1 p(x) (ϕ(u)), F 1 p(x) is the inverse-cdf of the L1 -type prior p(x); ϕ is the CDF of a standard Gaussian.

The One-Dimensional Transformation The prior transformation is analytic and is defined where x = S(u) def = F 1 p(x) (ϕ(u)), F 1 p(x) is the inverse-cdf of the L1 -type prior p(x); ϕ is the CDF of a standard Gaussian. Then the posterior density p(x y) can be expressed in terms of r: p(s(u) y) exp ( 1 2 (f (S(u)) y)2 1 ) 2 u2 = exp 1 [ ] [ ] f (S(u)) y 2 2 u

3 Prior Transformation: 1D Laplace Prior Transformation 3 = g(u) 4 p(x) exp ( λ x ) p(u) exp ( 1 2 u2) -4.5.5-4 4 u For multiple independent x i, transformation is repeated

posterior likelihood prior 2D Laplace Prior linear model Gaussian prior linear model L1-type prior Transformed 3 3 r Transformation moves complexity from prior to likelihood

Laplace Priors in Higher-Dimensions 1. Define a change of variables Dx = S(u) such that the transformed prior is a standard Gaussian, i.e., ( p(d 1 S(u)) exp δ ) 2 u 2 2.

Laplace Priors in Higher-Dimensions 1. Define a change of variables Dx = S(u) such that the transformed prior is a standard Gaussian, i.e., ( p(d 1 S(u)) exp δ ) 2 u 2 2. 2. Sample from the transformed posterior, with respect to u, p(d 1 S(u) y) p(y D 1 S(u))p(D 1 S(u));

Laplace Priors in Higher-Dimensions 1. Define a change of variables Dx = S(u) such that the transformed prior is a standard Gaussian, i.e., ( p(d 1 S(u)) exp δ ) 2 u 2 2. 2. Sample from the transformed posterior, with respect to u, p(d 1 S(u) y) p(y D 1 S(u))p(D 1 S(u)); 3. Transform the samples back via x = D 1 S(u).

Test Case 3, L 1 -type priors: High-Dimensional Problems The transformed posterior, with D an invertible matrix, takes the form p(d 1 S(u) y) exp ( 1 ( ) 2 (f D 1 S(u) y) 2 1 ) 2 u2 = exp 1 [ ( A D 1 S(u) ) ] [ ] y 2, 2 u where as defined above. S(u) = (S(u 1 ),..., S(u n ))

Test Case 3, L 1 -type priors: High-Dimensional Problems The transformed posterior, with D an invertible matrix, takes the form p(d 1 S(u) y) exp ( 1 ( ) 2 (f D 1 S(u) y) 2 1 ) 2 u2 = exp 1 [ ( A D 1 S(u) ) ] [ ] y 2, 2 u where as defined above. S(u) = (S(u 1 ),..., S(u n )) p(d 1 S(u) y) is in least squares form with respect to u so we can apply RTO!

RTO Metropolis-Hastings to Sample from p(d 1 S(u) y) T The RTO Metropolis-Hastings Algorithm 1. Choose u = u MAP = arg min u p(d 1 S(u) y) and number of samples N. Set k = 1. 2. Compute an RTO sample u q(u ). 3. Compute the acceptance probability ( ) r = min 1, c(uk 1 ) c(u. ) 4. With probability r, set u k = u, else set u k = u k 1. 5. If k < N, set k = k + 1 and return to Step 2.

True signal Measurements Deconvolution of a Square Pulse w/ TV Prior.4 1.5.2.2.4.6.8 1 x x R 63 y R 32 p(x y) exp ( λ ) 2 Ax y 2 δ Dx 1

MCMC chain MCMC chain Deconvolution of a Square Pulse w/ TV Prior.2.1 -.1 -.2 -.3 3.9 3.92 3.94 3.96 3.98 4 Evaluations #1 6.2.1 -.1 -.2 -.3 3.9 3.92 3.94 3.96 3.98 4 Evaluations #1 6

2D elliptic PDE inverse problem (exp(x(t)) y(t)) = h(t), t [, 1] 2, with boundary conditions exp(x(t)) y(t) n(t) =. After discretization, this defines the model y = A(x). 1 6 1 5 1.5 5 x2 4 x2-5 x2 -.5 -.1 3-1 -.15 -.2 1 2 1 1 -.25 x1 x1 x1 x true h y

2D PDE inverse problem: mean and STD Use RTO-MH to sample from the transformed posterior: p(d 1 S(u) y) exp ( λ ) 2 A(D 1 S(u)) y 22 δ u 2, where D is a wavelet transform matrix, then transform the samples back via x = D 1 S(u). Conditional Mean Standard Deviation 1 6 1 1 5.5 5.8 4.5.6 x2 4 x2 3.5.4 3 2.5.2 1 x 1 2 1 x 1

2D PDE inverse problem: Samples

Conclusions/Takeaways The development of computationally efficient MCMC methods for nonlinear inverse problems is challenging. RTO was presented as a proposal mechanism within Metropolis-Hastings. RTO was described in some detail and then test on several examples, including EIT and l 1 priors such as total variation.