Dimension-Independent likelihood-informed (DILI) MCMC

Similar documents
Transport maps and dimension reduction for Bayesian computation Youssef Marzouk

The Bayesian approach to inverse problems

MCMC Sampling for Bayesian Inference using L1-type Priors

Hierarchical Bayesian Inversion

Uncertainty quantification for inverse problems with a weak wave-equation constraint

On an adaptive preconditioned Crank-Nicolson algorithm for infinite dimensional Bayesian inferences

Scalable algorithms for optimal experimental design for infinite-dimensional nonlinear Bayesian inverse problems

Robust MCMC Sampling with Non-Gaussian and Hierarchical Priors

Stochastic Spectral Approaches to Bayesian Inference

Bayesian Inverse Problems with L[subscript 1] Priors: A Randomize-Then-Optimize Approach

NONLINEAR DIFFUSION PDES

The University of Auckland Applied Mathematics Bayesian Methods for Inverse Problems : why and how Colin Fox Tiangang Cui, Mike O Sullivan (Auckland),

c 2016 Society for Industrial and Applied Mathematics

ICES REPORT March Tan Bui-Thanh And Mark Andrew Girolami

Bayesian Inverse problem, Data assimilation and Localization

Introduction to Machine Learning

Uncertainty quantification for Wavefield Reconstruction Inversion

Introduction to Bayesian methods in inverse problems

Multilevel Sequential 2 Monte Carlo for Bayesian Inverse Problems

Riemann Manifold Methods in Bayesian Statistics

Sequential Monte Carlo Samplers for Applications in High Dimensions

Bayesian Methods and Uncertainty Quantification for Nonlinear Inverse Problems

Introduction to Hamiltonian Monte Carlo Method

Stochastic Collocation Methods for Polynomial Chaos: Analysis and Applications

Bayesian inverse problems with Laplacian noise

Markov chain Monte Carlo methods

Nonparametric Drift Estimation for Stochastic Differential Equations

Ergodicity in data assimilation methods

Resolving the White Noise Paradox in the Regularisation of Inverse Problems

Signal Processing Problems on Function Space: Bayesian Formulation, Stochastic PDEs and Effective MCMC Methods

Efficient MCMC Sampling for Hierarchical Bayesian Inverse Problems

Gaussian processes for inference in stochastic differential equations

Nonlinear Model Reduction for Uncertainty Quantification in Large-Scale Inverse Problems

On Bayesian Computation

Bayesian Model Comparison:

MALA versus Random Walk Metropolis Dootika Vats June 4, 2017

Adaptive Posterior Approximation within MCMC

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012

Kernel Sequential Monte Carlo

Kernel Adaptive Metropolis-Hastings

Randomize-Then-Optimize for Sampling and Uncertainty Quantification in Electrical Impedance Tomography

Labor-Supply Shifts and Economic Fluctuations. Technical Appendix

Towards Large-Scale Computational Science and Engineering with Quantifiable Uncertainty

Deblurring Jupiter (sampling in GLIP faster than regularized inversion) Colin Fox Richard A. Norton, J.

CPSC 540: Machine Learning

Linear Diffusion and Image Processing. Outline

Lecture 6: Bayesian Inference in SDE Models

Part 1: Expectation Propagation

arxiv: v1 [math.oc] 11 Jan 2018

Kernel adaptive Sequential Monte Carlo

Recent Advances in Bayesian Inference for Inverse Problems

Exercises Tutorial at ICASSP 2016 Learning Nonlinear Dynamical Models Using Particle Filters

Well-posed Bayesian Inverse Problems: Beyond Gaussian Priors

Bayesian model selection in graphs by using BDgraph package

Parameter and State Model Reduction for Large-Scale Statistical Inverse Problems

LECTURE 15 Markov chain Monte Carlo

Learning the hyper-parameters. Luca Martino

ECE521 week 3: 23/26 January 2017

ST 740: Markov Chain Monte Carlo

Bayesian spatial hierarchical modeling for temperature extremes

Lecture 2: From Linear Regression to Kalman Filter and Beyond

Statistical Estimation of the Parameters of a PDE

Last updated: Oct 22, 2012 LINEAR CLASSIFIERS. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

Monte Carlo in Bayesian Statistics

A Stein variational Newton method

Parallelizing large scale time domain electromagnetic inverse problem

STA414/2104 Statistical Methods for Machine Learning II

Spatio-temporal precipitation modeling based on time-varying regressions

Stability of Krylov Subspace Spectral Methods

X t = a t + r t, (7.1)

CS242: Probabilistic Graphical Models Lecture 7B: Markov Chain Monte Carlo & Gibbs Sampling

arxiv: v2 [stat.ml] 29 Oct 2018

Bayesian inference for stochastic differential mixed effects models - initial steps

EnKF and Catastrophic filter divergence

Markov chain Monte Carlo methods for visual tracking

Supplement to A Hierarchical Approach for Fitting Curves to Response Time Measurements

Zig-Zag Monte Carlo. Delft University of Technology. Joris Bierkens February 7, 2017

Gaussian Process Approximations of Stochastic Differential Equations

wissen leben WWU Münster

Sequential Monte Carlo Methods in High Dimensions

Numerical Methods. Rafał Zdunek Underdetermined problems (2h.) Applications) (FOCUSS, M-FOCUSS,

Block-Structured Adaptive Mesh Refinement

Ch 4. Linear Models for Classification

String method for the Cahn-Hilliard dynamics

Part 2: Multivariate fmri analysis using a sparsifying spatio-temporal prior

(5) Multi-parameter models - Gibbs sampling. ST440/540: Applied Bayesian Analysis

Outline Lecture 2 2(32)

Application of the Ensemble Kalman Filter to History Matching

Lecture 4: Dynamic models

Numerical Analysis for Statisticians

Sequential Monte Carlo Methods for High-Dimensional Inverse Problems: A case study for the Navier-Stokes equations

Methods of Data Assimilation and Comparisons for Lagrangian Data

A short introduction to INLA and R-INLA

Fully Bayesian Deep Gaussian Processes for Uncertainty Quantification

Non-linear least-squares inversion with data-driven

The Inversion Problem: solving parameters inversion and assimilation problems

Gaussian Process Approximations of Stochastic Differential Equations

EE 367 / CS 448I Computational Imaging and Display Notes: Image Deconvolution (lecture 6)

Point spread function reconstruction from the image of a sharp edge

MCMC 2: Lecture 2 Coding and output. Phil O Neill Theo Kypraios School of Mathematical Sciences University of Nottingham

Transcription:

Dimension-Independent likelihood-informed (DILI) MCMC Tiangang Cui, Kody Law 2, Youssef Marzouk Massachusetts Institute of Technology 2 Oak Ridge National Laboratory 2 August 25 TC, KL, YM DILI MCMC USC UQ summer school / 25

Inverse Problems Data Parameter y = F ( u ) + e forward model (PDE) observation/model errors y R Ny u H F : H R Ny Data y are limited in number, noisy, and indirect. Parameter u is often a function, and discretized on some mesh. Continuous, bounded, and st order differentiable. TC, KL, YM DILI MCMC USC UQ summer school 2 / 25

Infinite Dimensional Bayesian Inference Assume Gaussian observation noise, e N (, Γ obs ) Data-misfit function Likelihood function Φ(u; y) = 2 y F (u) 2 Γ obs L(y u) exp ( Φ(y u)) Posterior measure dµ y dµ (u) L(y u), µ = N (m, Γ pr ) where Γ pr is a trace class operator, so µ (H) = Goal: sample posterior µ y using intrinsic low dimensional structure of inverse problems. TC, KL, YM DILI MCMC USC UQ summer school 3 / 25

MCMC Sampling Autocorrelations of different samplers versus parameter dimension random walk MALA.9.8.7 2 4 8.9.8.7 2 4 8.6.6.5.5.4.4.3.3.2.2... 2 3 4 5 lag Random walk O(N u ). 2 3 4 5 lag MALA O(N 3 u ) Standard MCMC is not dimension-independent Look at the infinite dimensional limit! TC, KL, YM DILI MCMC USC UQ summer school 4 / 25

MCMC Sampling: Metropolis-Hastings Given a proposal q(u, du ), Transition probability Acceptance probability ν(du, du ) = µ y (du) q(u, du ) ν (du, du ) = µ y (du )q(u, du) α(u, u ) = dν dν (u, u ) Requires ν ν for a well-defined MCMC for functions. Many MCMCs defined for the finite dimensional setting, ν ν. Preconditioned Crank-Nicolson (pcn) proposal u = b 2 u + b N (, Γ pr ), satisfies ν ν a a Beskos et al. 28, Stuart 2, Cotter et al. 23 TC, KL, YM DILI MCMC USC UQ summer school 5 / 25

MCMC Sampling: Metropolis-Hastings Given a proposal q(u, du ), Transition probability Acceptance probability ν(du, du ) = µ y (du) q(u, du ) ν (du, du ) = µ y (du )q(u, du) α(u, u ) = dν dν (u, u ) Requires ν ν for a well-defined MCMC for functions. Many MCMCs defined for the finite dimensional setting, ν ν. Preconditioned Crank-Nicolson (pcn) proposal u = b 2 u + b N (, Γ pr ), satisfies ν ν a a Beskos et al. 28, Stuart 2, Cotter et al. 23 TC, KL, YM DILI MCMC USC UQ summer school 5 / 25

MCMC Sampling Autocorrelations of different samplers versus parameter dimension random walk MALA Crank Nicolson.9.8.7 2 4 8.9.8.7 2 4 8.9.8.7 2 4 8.6.6.6.5.5.5.4.4.4.3.3.3.2.2.2.... 2 3 4 5 lag Random walk O(N u ). 2 3 4 5 lag MALA O(N 3 u ). 2 3 4 5 lag pcn O() TC, KL, YM DILI MCMC USC UQ summer school 6 / 25

Likelihood Information u = b 2 u + b N (, Γ pr ) pcn proposal is isotropic w.r.t. prior, Γ pr. Likelihood constrains the variability of posterior at some directions. What will happen to pcn? Consider the linear example (Law 24) y = u + e, e N (, σ 2 ), u = (u, u 2 ) N (, I) 2 prior posterior x 2 2 2 2 x TC, KL, YM DILI MCMC USC UQ summer school 7 / 25

Likelihood Information 2 prior posterior 4 3 2 CN x 2 x 2 2 x 2 2 2 2 2 2 x CN samples prior posterior 3 4 2 4 6 8 MCMC iterations For pcn/cn proposal, the sample correlation n= corr(u (), u (n) ) const σ 2.9.95.5. x Problem: µ y is anisotropic w.r.t. µ TC, KL, YM DILI MCMC USC UQ summer school 8 / 25

Likelihood Information 2 prior posterior 4 3 2 CN x 2 x 2 2 2 2 2 x 3 4 2 4 6 8 MCMC iterations To adapt to this anisotropy, consider an alternative likelihood-informed proposal [ ] [ ] u b 2 b = u + N (, I) likelihood-informed in u, and prior-informed in u 2. TC, KL, YM DILI MCMC USC UQ summer school 9 / 25

Likelihood Information 2 prior posterior 4 3 2 LI CN x 2 x 2 2 2 2 2 x CN 3 4 2 4 6 8 MCMC iterations LI 2 samples prior posterior 2 samples prior posterior x 2 x 2 2.9.95.5. x 2.9.95.5. x TC, KL, YM DILI MCMC USC UQ summer school / 25

Likelihood Information 2 prior posterior 4 3 2 LI CN x 2 x 2 2 2 2 2 x 3 4 2 4 6 8 MCMC iterations Messages: Performance of pcn can be characterized by data dominated directions. We want our proposals adapt to the likelihood information. In function space this leads to operator weighted proposals. TC, KL, YM DILI MCMC USC UQ summer school / 25

Likelihood Information How does data information impact parameters? Limited information carried in the data, e.g., sensor quality, amount of data... 2 Forward model filters the parameters (ill-posedness) 3 Smoothing property of the prior (e.g., correlation structure) We first look at a linear example: y = F u + e, e N (, Γ obs ), µ (u) = N (, Γ pr ) This leads to Gaussian posterior N (m y, Γ pos ) TC, KL, YM DILI MCMC USC UQ summer school 2 / 25

Data information Posterior covariance Γ pos = Γ pr + H, where is the data misfit Hessian. Woodbury : H = F Γ obs F, Γ pos = Γ pr Γ pr F Γ y F Γ pr where Γ y = F Γ pr F + Γ obs. Low dimensionality lies in the change from prior to posterior: Γ pos Γ pr K r K r : rank(k r ) r TC, KL, YM DILI MCMC USC UQ summer school 3 / 25

Likelihood-informed subspace Theorem: optimal approximation. Spantini et al. (24) eigendecomposition of the prior-preconditioned Hessian Γ 2 pr H Γ 2 pr z i = z i λ i, λ i > λ i+, provide optimal basis Γ 2 pr z i, i =,..., r in terms of information update from prior to posterior. Γ pos Γ pr r i= λ i + λ i (Γ 2 pr z i ) ( Γ 2 pr z i ), Γ 2 pr H Γ 2 pr = Γ 2 pr F Γ obs F Γ 2 pr = Noisy data, ill-posed forward operator and smooth prior are integrated together TC, KL, YM DILI MCMC USC UQ summer school 4 / 25

Likelihood-informed subspace Nonlinear forward model F (u) or non-gaussian noise e Idea behind algorithm: combine locally important directions, over posterior, to yield a global reduced basis ) S = (Γ 2 pr H(u) Γ 2 pr µ y (du) H m m i= = Ψ Λ Ψ Γ 2 pr H(u i ) Γ 2 pr Use Gauss-Newton Hessian or Fisher information (non-gaussian noise) for H(u). TC, KL, YM DILI MCMC USC UQ summer school 5 / 25

Operator weighted proposals Likelihood-informed subspace spanned by basis Γ 2 pr Ψ captures the update from prior to posterior. [Γ 2 pr Ψ, Γ 2 pr Ψ ] forms a complete orthogonal system w.r.t. prior, Γ pr u = Γ 2 pr Ψ v }{{} r + Γ 2 pr Ψ v }{{ } Constrained by data prior Prescribe different scales to v r and v : v r : using smaller time steps, gradient, local geometry... v : using homogeneous Crank-Nicolson These leads to operator weighted proposals. TC, KL, YM DILI MCMC USC UQ summer school 6 / 25

Operator weighted proposals Operator weighted proposals ( ) u = Γ 2 pr AΓ 2 pr u (Γ 2 pr GΓ 2 pr ) D u Φ(u; y) + ( ) Γ 2 pr B N (, I) where A, B, and G are commutative, bounded, self-adjoint operators. Given Trace ( (A 2 + B 2 I ) 2 ) <, and other mild technical conditions, we have ν ν (and ν ν). Thus the operator proposal is well-defined in the function space setting (Cui, Law & Marzouk 24). TC, KL, YM DILI MCMC USC UQ summer school 7 / 25

Examples Split the operators A = A r + A B = B r + B G = G r + G LI-Langevin A r = Ψ r D Ar Ψ r B r = Ψ r D Br Ψ r G r = Ψ r D Gr Ψ r D Ar = I r t r D r D Br = 2 t r D r D Gr = t r D r A = a (I Ψ r Ψ r ) B = b (I Ψ r Ψ r ) G = Metropolis-within-Gibbs, alternate on u r and u A r = Ψ r (D Ar I r ) Ψ r + I B r = Ψ r D Br Ψ r A = Ψ r Ψ r + a (I Ψ r Ψ r ) B = b (I Ψ r Ψ r ) G r = Ψ r D Gr Ψ r G = TC, KL, YM DILI MCMC USC UQ summer school 8 / 25

Example: Conditioned Diffusion Path reconstruction of a Brownian motion driven SDE: dp t = f (p t )dt + du t f (p) = θp( p 2 )/( + p 2 ) p = pt 2.5.5 -.5 - -.5 truth observation mean p.5 quantile.95 quantile -2 2 3 4 5 6 7 8 9 time 8 6 4 2 2 pt - -2 2 4 time 6 8 TC, KL, YM DILI MCMC USC UQ summer school 9 / 25

Example: Autocorrelations Likelihood: (a) trace plot, MGLI-Langevin 6 5.8 (c) autocorrelations 4 5 6 7 8 9 5 (b) trace plot, PCN-RW 6 5 autocorr.6.4.2 MGLI-Langevin MGLI-Prior LI-Langevin LI-Prior H-Langevin PCN-RW 4 5 6 7 8 9 MCMC Steps 5 2 4 6 8 lag parameters projected onto KL basis of prior (lag ).8.6.4.2 H-Langevin MGLI-Langevin 2 4 6 8 components of v parameter H-Langevin: Explicit discretization of Langevin SDE, preconditioned by Hessian at the MAP. TC, KL, YM DILI MCMC USC UQ summer school 2 / 25

Example: Autocorrelations Operators built from single Hessian vs. integrated Hessian (a) OMF (b) lag autocorrelation of v.8.8 autocorr.6.4 MAP-LIS Adapt-LIS.6.4 MAP-LIS Adapt-LIS.2.2 2 3 4 5 lag 2 4 6 8 components of v parameter TC, KL, YM DILI MCMC USC UQ summer school 2 / 25

Example: Elliptic PDE Recover the transmissivity κ(s) from partial observation of the potential p(s). (κ(s) p(s)) = f (s) s2 (a).8.6.4.2 s2 (b).8.6.4.2.5.5 s s.5..5 -.5 -. -.5 -.2 (c) SNR SNR5 Truth TC, KL, YM DILI MCMC USC UQ summer school 22 / 25

Example: Likelihood-informed Subspace Lead basis vectors of likelihood-informed subspace with different grid resolutions Index Index 2 Index 3 Index 4 Index 5 2 2 8 8 4 4 TC, KL, YM DILI MCMC USC UQ summer school 23 / 25

Example: Autocorrelations Likelihood: parameters projected onto KL basis of prior (lag ) 8 (a) trace plot, MGLI-Langevin 6 5 6 7 8 9 5 (b) trace plot, PCN-RW 6 5 6 7 8 9 MCMC Steps 5 Correlation 8.8.6.4.2 (a) SNR 5 5 Components of v Parameter autocorr.8.6.4.2 Lag Autocorrelation of v H-Langevin MGLI-Langevin.8.6.4.2 (c) autocorrelations MGLI-Langevin MGLI-Prior LI-Langevin LI-Prior H-Langevin PCN-RW 5 5 2 lag (b) SNR5 H-Langevin MGLI-Langevin 5 5 H-Langevin: Explicit discretization of Langevin SDE, preconditioned by Hessian at the MAP. TC, KL, YM DILI MCMC USC UQ summer school 24 / 25

Conclusions Dimension independent MCMC using operator-weighted proposals Operators are designed by identifying the likelihood-informed directions. Demonstrated efficiency on numerical examples. Future work: hyperparameters, optimal operators, parallelization, extensions to local operators. DILI ideas in transport maps FastFInS package (contact tcui@mit.edu) Applications, bigger models: adjoint model. FastFInS only needs the forward model and More info: T. Cui, K. Law, Y. Marzouk, Dimension-independent likelihood-informed MCMC, arxiv:4.3688. T. Cui and Y. Marzouk acknowledge the financial support from the DOE Applied Mathematics Program, Awards DE-FG2-8ER2585 and DE-SC9297, as part of the DiaMonD Multifaceted Mathematics Integrated Capability Center. K. Law is a member of the SRI-UQ Center at KAUST. TC, KL, YM DILI MCMC USC UQ summer school 25 / 25