Dimension-Independent likelihood-informed (DILI) MCMC

Dimension-Independent likelihood-informed (DILI) MCMC Tiangang Cui, Kody Law 2, Youssef Marzouk Massachusetts Institute of Technology 2 Oak Ridge National Laboratory 2 August 25 TC, KL, YM DILI MCMC USC UQ summer school / 25

Inverse Problems Data Parameter y = F ( u ) + e forward model (PDE) observation/model errors y R Ny u H F : H R Ny Data y are limited in number, noisy, and indirect. Parameter u is often a function, and discretized on some mesh. Continuous, bounded, and st order differentiable. TC, KL, YM DILI MCMC USC UQ summer school 2 / 25

Infinite Dimensional Bayesian Inference Assume Gaussian observation noise, e N (, Γ obs ) Data-misfit function Likelihood function Φ(u; y) = 2 y F (u) 2 Γ obs L(y u) exp ( Φ(y u)) Posterior measure dµ y dµ (u) L(y u), µ = N (m, Γ pr ) where Γ pr is a trace class operator, so µ (H) = Goal: sample posterior µ y using intrinsic low dimensional structure of inverse problems. TC, KL, YM DILI MCMC USC UQ summer school 3 / 25

MCMC Sampling Autocorrelations of different samplers versus parameter dimension random walk MALA.9.8.7 2 4 8.9.8.7 2 4 8.6.6.5.5.4.4.3.3.2.2... 2 3 4 5 lag Random walk O(N u ). 2 3 4 5 lag MALA O(N 3 u ) Standard MCMC is not dimension-independent Look at the infinite dimensional limit! TC, KL, YM DILI MCMC USC UQ summer school 4 / 25

MCMC Sampling: Metropolis-Hastings Given a proposal q(u, du ), Transition probability Acceptance probability ν(du, du ) = µ y (du) q(u, du ) ν (du, du ) = µ y (du )q(u, du) α(u, u ) = dν dν (u, u ) Requires ν ν for a well-defined MCMC for functions. Many MCMCs defined for the finite dimensional setting, ν ν. Preconditioned Crank-Nicolson (pcn) proposal u = b 2 u + b N (, Γ pr ), satisfies ν ν a a Beskos et al. 28, Stuart 2, Cotter et al. 23 TC, KL, YM DILI MCMC USC UQ summer school 5 / 25

MCMC Sampling Autocorrelations of different samplers versus parameter dimension random walk MALA Crank Nicolson.9.8.7 2 4 8.9.8.7 2 4 8.9.8.7 2 4 8.6.6.6.5.5.5.4.4.4.3.3.3.2.2.2.... 2 3 4 5 lag Random walk O(N u ). 2 3 4 5 lag MALA O(N 3 u ). 2 3 4 5 lag pcn O() TC, KL, YM DILI MCMC USC UQ summer school 6 / 25

Likelihood Information u = b 2 u + b N (, Γ pr ) pcn proposal is isotropic w.r.t. prior, Γ pr. Likelihood constrains the variability of posterior at some directions. What will happen to pcn? Consider the linear example (Law 24) y = u + e, e N (, σ 2 ), u = (u, u 2 ) N (, I) 2 prior posterior x 2 2 2 2 x TC, KL, YM DILI MCMC USC UQ summer school 7 / 25

Likelihood Information 2 prior posterior 4 3 2 CN x 2 x 2 2 x 2 2 2 2 2 2 x CN samples prior posterior 3 4 2 4 6 8 MCMC iterations For pcn/cn proposal, the sample correlation n= corr(u (), u (n) ) const σ 2.9.95.5. x Problem: µ y is anisotropic w.r.t. µ TC, KL, YM DILI MCMC USC UQ summer school 8 / 25

Likelihood Information 2 prior posterior 4 3 2 CN x 2 x 2 2 2 2 2 x 3 4 2 4 6 8 MCMC iterations To adapt to this anisotropy, consider an alternative likelihood-informed proposal [ ] [ ] u b 2 b = u + N (, I) likelihood-informed in u, and prior-informed in u 2. TC, KL, YM DILI MCMC USC UQ summer school 9 / 25

Likelihood Information 2 prior posterior 4 3 2 LI CN x 2 x 2 2 2 2 2 x CN 3 4 2 4 6 8 MCMC iterations LI 2 samples prior posterior 2 samples prior posterior x 2 x 2 2.9.95.5. x 2.9.95.5. x TC, KL, YM DILI MCMC USC UQ summer school / 25

Likelihood Information 2 prior posterior 4 3 2 LI CN x 2 x 2 2 2 2 2 x 3 4 2 4 6 8 MCMC iterations Messages: Performance of pcn can be characterized by data dominated directions. We want our proposals adapt to the likelihood information. In function space this leads to operator weighted proposals. TC, KL, YM DILI MCMC USC UQ summer school / 25

Likelihood Information How does data information impact parameters? Limited information carried in the data, e.g., sensor quality, amount of data... 2 Forward model filters the parameters (ill-posedness) 3 Smoothing property of the prior (e.g., correlation structure) We first look at a linear example: y = F u + e, e N (, Γ obs ), µ (u) = N (, Γ pr ) This leads to Gaussian posterior N (m y, Γ pos ) TC, KL, YM DILI MCMC USC UQ summer school 2 / 25

Data information Posterior covariance Γ pos = Γ pr + H, where is the data misfit Hessian. Woodbury : H = F Γ obs F, Γ pos = Γ pr Γ pr F Γ y F Γ pr where Γ y = F Γ pr F + Γ obs. Low dimensionality lies in the change from prior to posterior: Γ pos Γ pr K r K r : rank(k r ) r TC, KL, YM DILI MCMC USC UQ summer school 3 / 25

Likelihood-informed subspace Theorem: optimal approximation. Spantini et al. (24) eigendecomposition of the prior-preconditioned Hessian Γ 2 pr H Γ 2 pr z i = z i λ i, λ i > λ i+, provide optimal basis Γ 2 pr z i, i =,..., r in terms of information update from prior to posterior. Γ pos Γ pr r i= λ i + λ i (Γ 2 pr z i ) ( Γ 2 pr z i ), Γ 2 pr H Γ 2 pr = Γ 2 pr F Γ obs F Γ 2 pr = Noisy data, ill-posed forward operator and smooth prior are integrated together TC, KL, YM DILI MCMC USC UQ summer school 4 / 25

Likelihood-informed subspace Nonlinear forward model F (u) or non-gaussian noise e Idea behind algorithm: combine locally important directions, over posterior, to yield a global reduced basis ) S = (Γ 2 pr H(u) Γ 2 pr µ y (du) H m m i= = Ψ Λ Ψ Γ 2 pr H(u i ) Γ 2 pr Use Gauss-Newton Hessian or Fisher information (non-gaussian noise) for H(u). TC, KL, YM DILI MCMC USC UQ summer school 5 / 25

Operator weighted proposals Likelihood-informed subspace spanned by basis Γ 2 pr Ψ captures the update from prior to posterior. [Γ 2 pr Ψ, Γ 2 pr Ψ ] forms a complete orthogonal system w.r.t. prior, Γ pr u = Γ 2 pr Ψ v }{{} r + Γ 2 pr Ψ v }{{ } Constrained by data prior Prescribe different scales to v r and v : v r : using smaller time steps, gradient, local geometry... v : using homogeneous Crank-Nicolson These leads to operator weighted proposals. TC, KL, YM DILI MCMC USC UQ summer school 6 / 25

Operator weighted proposals Operator weighted proposals ( ) u = Γ 2 pr AΓ 2 pr u (Γ 2 pr GΓ 2 pr ) D u Φ(u; y) + ( ) Γ 2 pr B N (, I) where A, B, and G are commutative, bounded, self-adjoint operators. Given Trace ( (A 2 + B 2 I ) 2 ) <, and other mild technical conditions, we have ν ν (and ν ν). Thus the operator proposal is well-defined in the function space setting (Cui, Law & Marzouk 24). TC, KL, YM DILI MCMC USC UQ summer school 7 / 25

Examples Split the operators A = A r + A B = B r + B G = G r + G LI-Langevin A r = Ψ r D Ar Ψ r B r = Ψ r D Br Ψ r G r = Ψ r D Gr Ψ r D Ar = I r t r D r D Br = 2 t r D r D Gr = t r D r A = a (I Ψ r Ψ r ) B = b (I Ψ r Ψ r ) G = Metropolis-within-Gibbs, alternate on u r and u A r = Ψ r (D Ar I r ) Ψ r + I B r = Ψ r D Br Ψ r A = Ψ r Ψ r + a (I Ψ r Ψ r ) B = b (I Ψ r Ψ r ) G r = Ψ r D Gr Ψ r G = TC, KL, YM DILI MCMC USC UQ summer school 8 / 25

Example: Conditioned Diffusion Path reconstruction of a Brownian motion driven SDE: dp t = f (p t )dt + du t f (p) = θp( p 2 )/( + p 2 ) p = pt 2.5.5 -.5 - -.5 truth observation mean p.5 quantile.95 quantile -2 2 3 4 5 6 7 8 9 time 8 6 4 2 2 pt - -2 2 4 time 6 8 TC, KL, YM DILI MCMC USC UQ summer school 9 / 25

Example: Autocorrelations Likelihood: (a) trace plot, MGLI-Langevin 6 5.8 (c) autocorrelations 4 5 6 7 8 9 5 (b) trace plot, PCN-RW 6 5 autocorr.6.4.2 MGLI-Langevin MGLI-Prior LI-Langevin LI-Prior H-Langevin PCN-RW 4 5 6 7 8 9 MCMC Steps 5 2 4 6 8 lag parameters projected onto KL basis of prior (lag ).8.6.4.2 H-Langevin MGLI-Langevin 2 4 6 8 components of v parameter H-Langevin: Explicit discretization of Langevin SDE, preconditioned by Hessian at the MAP. TC, KL, YM DILI MCMC USC UQ summer school 2 / 25

Example: Autocorrelations Operators built from single Hessian vs. integrated Hessian (a) OMF (b) lag autocorrelation of v.8.8 autocorr.6.4 MAP-LIS Adapt-LIS.6.4 MAP-LIS Adapt-LIS.2.2 2 3 4 5 lag 2 4 6 8 components of v parameter TC, KL, YM DILI MCMC USC UQ summer school 2 / 25

Example: Elliptic PDE Recover the transmissivity κ(s) from partial observation of the potential p(s). (κ(s) p(s)) = f (s) s2 (a).8.6.4.2 s2 (b).8.6.4.2.5.5 s s.5..5 -.5 -. -.5 -.2 (c) SNR SNR5 Truth TC, KL, YM DILI MCMC USC UQ summer school 22 / 25

Example: Likelihood-informed Subspace Lead basis vectors of likelihood-informed subspace with different grid resolutions Index Index 2 Index 3 Index 4 Index 5 2 2 8 8 4 4 TC, KL, YM DILI MCMC USC UQ summer school 23 / 25

Example: Autocorrelations Likelihood: parameters projected onto KL basis of prior (lag ) 8 (a) trace plot, MGLI-Langevin 6 5 6 7 8 9 5 (b) trace plot, PCN-RW 6 5 6 7 8 9 MCMC Steps 5 Correlation 8.8.6.4.2 (a) SNR 5 5 Components of v Parameter autocorr.8.6.4.2 Lag Autocorrelation of v H-Langevin MGLI-Langevin.8.6.4.2 (c) autocorrelations MGLI-Langevin MGLI-Prior LI-Langevin LI-Prior H-Langevin PCN-RW 5 5 2 lag (b) SNR5 H-Langevin MGLI-Langevin 5 5 H-Langevin: Explicit discretization of Langevin SDE, preconditioned by Hessian at the MAP. TC, KL, YM DILI MCMC USC UQ summer school 24 / 25

Conclusions Dimension independent MCMC using operator-weighted proposals Operators are designed by identifying the likelihood-informed directions. Demonstrated efficiency on numerical examples. Future work: hyperparameters, optimal operators, parallelization, extensions to local operators. DILI ideas in transport maps FastFInS package (contact tcui@mit.edu) Applications, bigger models: adjoint model. FastFInS only needs the forward model and More info: T. Cui, K. Law, Y. Marzouk, Dimension-independent likelihood-informed MCMC, arxiv:4.3688. T. Cui and Y. Marzouk acknowledge the financial support from the DOE Applied Mathematics Program, Awards DE-FG2-8ER2585 and DE-SC9297, as part of the DiaMonD Multifaceted Mathematics Integrated Capability Center. K. Law is a member of the SRI-UQ Center at KAUST. TC, KL, YM DILI MCMC USC UQ summer school 25 / 25