Multilevel Sequential 2 Monte Carlo for Bayesian Inverse Problems

Jonas Latz 1 Multilevel Sequential 2 Monte Carlo for Bayesian Inverse Problems Jonas Latz Technische Universität München Fakultät für Mathematik Lehrstuhl für Numerische Mathematik jonas.latz@tum.de November 22nd 2017 Seminar Stochastics, Statistics and Numerical Analysis, Universität Mannheim

Jonas Latz 2 Joint work with Elisabeth Ullmann (Lehrstuhl Numerische Mathematik, Mathematik, TUM) Iason Papaioannou (Engineering Risk Analysis, Bau Geo Umwelt, TUM) Acknowledgments This work was supported by Deutsche Forschungsgemeinschaft (DFG) and TU München (TUM) through the International Graduate School of Science and Engineering (IGSSE) at TUM within the project 10.02 BAYES. The computing resources were provided by Leibniz Rechenzentrum (LRZ) der Bayerischen Akademie der Wissenschaften.

Jonas Latz 3 Motivation Outline Motivation Sequential Monte Carlo Samplers Multilevel Sequential 2 Monte Carlo Numerical Experiments Conclusions

Jonas Latz 4 Motivation Motivation Given an Inverse Problem Find θ X : G(θ) + η = y, (IP) where G : X Y is the forward response operator, θ X is the unknown parameter, η N(0, Γ) is observational noise and y Y is observed data.

Jonas Latz 5 Motivation Motivation Example: Typically, G = O G, where G(θ) is the solution operator of a PDE exp(θ) p = f (on D) p = 0 (on D) (PDE) and O is the observation operator mapping p (p(x i ) : i = 1,..., N obs ) Y.

Jonas Latz 6 Motivation Motivation We approach (IP) Bayesian. So Assume, θ L 2 (Ω, A, P; X), θ µ 0 := N(m 0, C 0 ). (Prior)

Jonas Latz 6 Motivation Motivation We approach (IP) Bayesian. So Assume, θ L 2 (Ω, A, P; X), θ µ 0 := N(m 0, C 0 ). (Prior) Find µ y := P(θ G(θ) + η = y), (BIP)

Jonas Latz 6 Motivation Motivation We approach (IP) Bayesian. So Assume, θ L 2 (Ω, A, P; X), θ µ 0 := N(m 0, C 0 ). (Prior) Find µ y := P(θ G(θ) + η = y), (BIP) given by dµ y (θ) L(y θ) := exp ( 12 ) dµ Γ 12 (G(θ) y) 2Y. (Bayes Rule) 0

Jonas Latz 7 Motivation Typical Approach: Importance Sampling Task: Integrate a quantity of interest Q : X R w.r.t. µ y.

Jonas Latz 7 Motivation Typical Approach: Importance Sampling Task: Integrate a quantity of interest Q : X R w.r.t. µ y. Idea: Consider the following identity ] E µ y [Q] = Qdµ y = Q dµy dµ 0 = E µ0 [Q dµy. dµ 0 dµ 0 X integrals w.r.t. µ y can be expressed by integrals w.r.t. µ 0 X

Jonas Latz 7 Motivation Typical Approach: Importance Sampling Task: Integrate a quantity of interest Q : X R w.r.t. µ y. Idea: Consider the following identity ] E µ y [Q] = Qdµ y = Q dµy dµ 0 = E µ0 [Q dµy. dµ 0 dµ 0 X X integrals w.r.t. µ y can be expressed by integrals w.r.t. µ 0 we can use vanilla monte carlo to approximate the integral using µ 0 -distributed particles

Jonas Latz 8 Motivation Typical Approach: Importance Sampling 0.8 0.8 0.6 0.6 0.4 0.4 0.2 0.2 0-15 -10-5 0 5 10 15 0-15 -10-5 0 5 10 15

Jonas Latz 9 Sequential Monte Carlo Samplers Outline Motivation Sequential Monte Carlo Samplers Sequential Monte Carlo (with Tempering) Multilevel Bridging Multilevel Sequential 2 Monte Carlo Numerical Experiments Conclusions

Jonas Latz 10 Sequential Monte Carlo Samplers Sequential Monte Carlo Samplers Task: Sample from a sequence of measures µ 0, µ 1,..., µ K, where We can sample from µ 0, µ k and µ 0 are equivalent (k {1,..., K }) Idea: Apply Importance Sampling sequentially to update µ k µ k+1, where µ k µ k. (Del Moral et. al. 2006) [3],[2]

Jonas Latz 11 Sequential Monte Carlo Samplers Tempering Tempering Employ a sequence µ 0,..., µ K, µ 0 is the prior distribution µ K = µ y is the posterior distribution µ k is accessible by importance sampling from µ k 1 with a small number of samples. Use a tempering of the likelihood (β k [0, 1], β 0 = 0, β K = 1): dµ k dµ 0 (θ) L(y θ) β k. (Neal 2001, Beskos et. al. 2015) [6]

Jonas Latz 12 Sequential Monte Carlo Samplers Tempering Tempering 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0-15 -10-5 0 5 10 15

Jonas Latz 13 Sequential Monte Carlo Samplers Multilevel Bridging Outline Motivation Sequential Monte Carlo Samplers Sequential Monte Carlo (with Tempering) Multilevel Bridging Multilevel Sequential 2 Monte Carlo Numerical Experiments

Jonas Latz 14 Sequential Monte Carlo Samplers Multilevel Bridging (Multilevel) Bridging In realistic problems: Approximate G l G, where l {1,..., N L } reflects the discretisation complexity and approximation accuracy of G. Task: Increase the complexity of µ y l, by updating directly µ y l µy l+1. SMC sampler from (Koutsourelakis 2009, Del Moral et. al. 2006): [4] dµ k dµ 0 (θ) L l (y θ) 1 ζ k L l+1 (y θ) ζ k, where µ 0 is the underlying prior distribution.

Sequential Monte Carlo Samplers Multilevel Bridging (Multilevel) Bridging In realistic problems: Approximate G l G, where l {1,..., N L } reflects the discretisation complexity and approximation accuracy of G. Task: Increase the complexity of µ y l, by updating directly µ y l µy l+1. SMC sampler from (Koutsourelakis 2009, Del Moral et. al. 2006): [4] dµ k dµ 0 (θ) L l (y θ) 1 ζ k L l+1 (y θ) ζ k, where µ 0 is the underlying prior distribution. (β k : k = 1,..., K ) can be determined a priori or on the fly Jonas Latz 14

Jonas Latz 15 Sequential Monte Carlo Samplers Multilevel Bridging (Multilevel) Bridging 1.6 1.4 1.2 1 0.8 0.6 0.4 0.2 0 2 2.5 3 3.5 4 4.5 Figure: Bridging between two distributions µ and µ

Jonas Latz 16 Sequential Monte Carlo Samplers Multilevel Bridging Inv. Temp. β k (k {1,..., K }) 1 = β K MLB (Target distr. µ y ) β K 1 β K 2. β 2 β 1 SMC (Prior distr. µ 0 ) 0 = β 0 1 2 3 N L 2N L 1 N L Discr. lvl. l {1,..., N L }

Jonas Latz 17 Sequential Monte Carlo Samplers Multilevel Bridging However... 1.6 1.4 1.2 1 0.8 0.6 0.4 0.2 0 2 2.5 3 3.5 4 4.5 5 5.5 6 Figure: Bridging between two distributions µ and µ with a high discrepancy

Jonas Latz 18 Multilevel Sequential 2 Monte Carlo Outline Motivation Sequential Monte Carlo Samplers Multilevel Sequential 2 Monte Carlo Basics Adaptivity Numerical Experiments Conclusions

Jonas Latz 19 Multilevel Sequential 2 Monte Carlo Basics Multilevel Sequential 2 Monte Carlo Basic Idea: Consider two different update mechanisms at the same time: Update Inverse Temperature (Tempering) or Update level (Bridging) Reference: See (L., Papaioannou, Ullmann 2017; submitted to JCP).[5]

Jonas Latz 20 Multilevel Sequential 2 Monte Carlo Basics Inv. Temp. β k (k {1,..., K }) 1 = β K MLB (Target distr. µ y ) β K 1 β K 2. β 2 β 1 SMC (Prior distr. µ 0 ) 0 = β 0 1 2 3 L 2 L 1 L Discr. lvl. l {1,..., L}

Jonas Latz 21 Multilevel Sequential 2 Monte Carlo Basics Inv. Temp. β k (k {1,..., K }) 1 = β K MLB (Target distr. µ y ) β K 1 β K 2. β 2 β 1 MLS 2 MC SMC (Prior distr. µ 0 ) 0 = β 0 1 2 3 L 2 L 1 L Discr. lvl. l {1,..., L}

Jonas Latz 22 Multilevel Sequential 2 Monte Carlo Adaptivity Outline Motivation Sequential Monte Carlo Samplers Multilevel Sequential 2 Monte Carlo Basics Adaptivity Numerical Experiments Conclusions

Jonas Latz 23 Multilevel Sequential 2 Monte Carlo Adaptivity Adaptive Bridging and Tempering Reminder: We construct the SMC sequences by either (Tempering) or (Multilevel Bridging) dµ k dµ 0 (θ) L(y θ) β k. dµ k dµ 0 (θ) L l (y θ) 1 ζ k L l+1 (y θ) ζ k.

Jonas Latz 23 Multilevel Sequential 2 Monte Carlo Adaptivity Adaptive Bridging and Tempering Reminder: We construct the SMC sequences by either (Tempering) or dµ k dµ 0 (θ) L(y θ) β k. dµ k dµ 0 (θ) L l (y θ) 1 ζ k L l+1 (y θ) ζ k. (Multilevel Bridging) How do we choose β 1,..., β K, resp. ζ 1,..., ζ K?

Jonas Latz 24 Multilevel Sequential 2 Monte Carlo Adaptivity Adaptive Bridging and Tempering When applying importance sampling, we lose stochastic accuracy unweighted particles vs. }{{} Monte Carlo accuracy weighted particles }{{} loss due to degeneracy of particles in terms of sample sizes unweighted particles }{{} actual sample size vs. weighted particles }{{} effective sample size

Jonas Latz 24 Multilevel Sequential 2 Monte Carlo Adaptivity Adaptive Bridging and Tempering When applying importance sampling, we lose stochastic accuracy unweighted particles vs. }{{} Monte Carlo accuracy weighted particles }{{} loss due to degeneracy of particles in terms of sample sizes unweighted particles vs. weighted particles }{{}}{{} actual sample size effective sample size effective sample size (ESS) depends on β k and can be computed cheaply

Jonas Latz 25 Multilevel Sequential 2 Monte Carlo Adaptivity Adaptive Bridging and Tempering choose for some given τ > 0. ( β k = argmin β ESS(β) J ) 2 1 + τ 2,

Jonas Latz 25 Multilevel Sequential 2 Monte Carlo Adaptivity Adaptive Bridging and Tempering choose ( β k = argmin β ESS(β) J ) 2 1 + τ 2, for some given τ > 0. Well, but Introduces a bias into the estimation of the model evidence Adaptive Bridging/Tempering can be shown to converge to the correct posterior measure (Beskos et. al. 2016) [1]

Jonas Latz 26 Multilevel Sequential 2 Monte Carlo Adaptivity Adaptive update strategy: Bridging vs. Tempering Let β k (0, 1) be the current inverse temperature and l be the current discretisation level. What do we do next? Bridging l l + 1 or Tempering β k β k+1? Strategy: Compute ESS(1) of l l + 1 (measures similarity of µ y l and µy l+1 ) If ESS(1) < J (i.e. µ y 1+τ 2 l and µy l+1 are not very similar) Update Level l l + 1 (otherwise increasing β k would increase the differences of µ y l and µ y l+1 even more)

Jonas Latz 27 Numerical Experiments Model Outline Motivation Sequential Monte Carlo Samplers Multilevel Sequential 2 Monte Carlo for Inverse Problems Numerical Experiments Model Estimating a random field Estimating the model evidence Computational Cost Conclusions

Jonas Latz 28 Numerical Experiments Model Model Consider again exp(θ) p = f (on D) p = 0 (on D), (PDE) where D = (0, 1) 2, f contains nine sources. θ C 1 (D; R) shall be estimated based on N obs = 25 noisy observations.

Jonas Latz 29 Numerical Experiments Model Model Figure: Measurement locations and actual pressure based on the true underlying parameter θ true.

Jonas Latz 30 Numerical Experiments Model Prior Random Field µ 0 = N(0, C 0 ). C 0 is a Matern-type covariance operator with smoothness ν = 1.5 and correlation length λ = 0.65 The random field is discretised with a truncated Karhunen-Loeve-expansion with N sto = 10 KL terms. 0 0 0 0.5 0.5 0.5 1 0 0.5 1 1 0 0.5 1 1 0 0.5 1 Figure: Samples of the prior random field µ 0

Jonas Latz 31 Numerical Experiments Model Likelihood The data y was generated using noise η N(0, 0.01 2 Id). The noise assumption(s) for the likelihood was chosen more conservatively: ( L(y θ) exp 1 2 0.07 1 (y G(θ)) 2) (Ex.1) ( L(y θ) exp 1 2 0.035 1 (y G(θ)) 2) (Ex.2)

Jonas Latz 32 Numerical Experiments Model Simulation settings We applied MLS 2 MC, Tempering and MLB using J {156, 312, 625, 1250, 2500} particles and τ k {0.5, 1} (i.e. ESS {0.8J, 0.5J} in each update) The PDE is evaluated with a mesh size of h l = 2 (l+2) on level l {1,..., 5}. MLB was not possible given the noise standard deviation: 0.035. (µ y 1 and µy 2 were numerically singular.)

Jonas Latz 33 Numerical Experiments Estimating a random field Outline Motivation Sequential Monte Carlo Samplers Multilevel Sequential 2 Monte Carlo for Inverse Problems Numerical Experiments Model Estimating a random field Estimating the model evidence Computational Cost Conclusions

Jonas Latz 34 Numerical Experiments Estimating a random field Error Measure: Posterior measure Let µ, ν be probability measures on (R, BR). We define the Kolmogorov-Smirnoff (KS) distance between µ and ν by d KS (µ, ν) = sup µ((, x]) ν((, x]), x R As an error measure, we consider the d KS of (SMC, MLS 2 MC) and (SMC, MLB), as well as (SMC, SMC) (as a reference), for all 50 50 pairs of simulation results.

Jonas Latz 35 Numerical Experiments Estimating a random field Figure: KS distances between approx. posterior dist., given Γ = 0.07 2 Id

Jonas Latz 36 Numerical Experiments Estimating a random field Figure: KS distances between approx. posterior dist., given Γ = 0.035 2 Id

Jonas Latz 37 Numerical Experiments Estimating the Evidence Outline Motivation Sequential Monte Carlo Samplers Multilevel Sequential 2 Monte Carlo for Inverse Problems Numerical Experiments Model Estimating a random field Estimating the model evidence Computational Cost Conclusions

Jonas Latz 38 Numerical Experiments Estimating the Evidence Model Evidence The Model Evidence is the normalising constant of the posterior with respect to the prior: Z y = L(y θ)dµ 0 (θ) in general considered to be difficult to estimate (with Importance Sampling, MCMC) can be determined accurately in any SMC method biased estimator, if (β k : k = 1,..., K ) is picked adaptively is used in Bayesian Model selection

Jonas Latz 39 Numerical Experiments Estimating the Evidence Estimation results: Model Evidence 1 Empirical CDF, tau=0.5 F(x) 0.5 MLS 2 MC SMC MLB 0 0 1 2 3 4 5 x 10-5 Empirical CDF, tau=1 1 F(x) 0.5 MLS 2 MC SMC MLB 0 0 0.5 1 1.5 x 10-4 Figure: Empirical CDF of estimated model evidences after 50 runs. J = 2500, Noise Covariance Γ = 0.07 2 Id.

Jonas Latz 40 Numerical Experiments Estimating the Evidence Estimation results: Model Evidence 1 Empirical CDF, tau=0.5 F(x) 0.5 MLS 2 MC SMC 0 0 0.2 0.4 0.6 0.8 1 x 10-8 Empirical CDF, tau=1 1 F(x) 0.5 MLS 2 MC SMC 0 0 0.2 0.4 0.6 0.8 1 1.2 x 10-8 Figure: Empirical CDF of estimated model evidences after 50 runs. J = 2500, Noise Covariance Γ = 0.035 2 Id.

Jonas Latz 41 Numerical Experiments Computational Cost Outline Motivation Sequential Monte Carlo Samplers Multilevel Sequential 2 Monte Carlo for Inverse Problems Numerical Experiments Model Estimating a random field Estimating the model evidence Computational Cost Conclusions

Jonas Latz 42 Numerical Experiments Computational Cost Computational Cost We measure computational cost in terms of the theoretical cost of PDE evaluations on the different levels. Assumption: One evaluation of the model G l requires a computational cost of C l = 4 5 l Hence, C NL = 1.

Jonas Latz 43 Numerical Experiments Computational Cost Computational Cost 4 7 4 6 MLS 2 MC SMC MLB 4 7 MLS 2 MC SMC MLB 4 6 4 5 4 5 156 312 625 1250 2500 156 312 625 1250 2500 Figure: Computational Cost, given noise covariance Γ = 0.07 2 Id

Jonas Latz 44 Numerical Experiments Computational Cost Computational Cost 4 7 MLS 2 MC SMC 4 7 MLS 2 MC SMC 4 6 4 6 4 5 4 5 156 312 625 1250 2500 156 312 625 1250 2500 Figure: Computational Cost, given noise covariance Γ = 0.035 2 Id

Numerical Experiments Computational Cost Computational Cost vs. Accuracy 10 4.5 MLS 2 MC 10 4 MLB 10 3.5 SMC 10 3 0 0.1 0.2 0.3 0.4 10 4 10 3.5 10 3 MLS 2 MC MLB SMC 0 0.1 0.2 0.3 0.4 Figure: Computational Cost, given noise covariance Γ = 0.07 2 Id Jonas Latz 45

Numerical Experiments Computational Cost Computational Cost vs. Accuracy 10 4.5 MLS 2 MC 10 4 SMC 10 3.5 10 3 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 10 4.5 MLS 2 MC 10 4 SMC 10 3.5 10 3 0.02 0.04 0.06 0.08 0.1 0.12 Figure: Computational Cost, given noise covariance Γ = 0.035 2 Id Jonas Latz 46

Jonas Latz 47 Conclusions Outline Motivation Sequential Monte Carlo Samplers Multilevel Sequential 2 Monte Carlo for Inverse Problems Numerical Experiments Conclusions

Jonas Latz 48 Conclusions Conclusions The presented method is more efficient than single lvl SMC can be used consistently with black box models contains an adaptive method to decide whether to update level or inverse temperature can estimate the Model evidence Moreover, the presented method works well in high dimensions (tested up to 320 KL terms) can decide adaptively when to stop updating the discretisation level does not require any parameter tuning

References References BESKOS, A., JASRA, A., KANTAS, N., AND THIERY, A. On the convergence of adaptive sequential Monte Carlo methods. Ann. Appl. Probab. 26, 2 (2016), 1111 1146. BESKOS, A., JASRA, A., MUZAFFER, E. A., AND STUART, A. M. Sequential Monte Carlo methods for Bayesian elliptic inverse problems. Stat. Comput. 25, 4 (2015), 727 737. DEL MORAL, P., DOUCET, A., AND JASRA, A. Sequential Monte Carlo samplers. J. R. Statist. Soc. B 68, 3 (2006), 411 436. KOUTSOURELAKIS, P. S. Accurate Uncertainty Quantification using inaccurate Computational Models. SIAM J. Sci. Comput. 31, 5 (2009), 3274 3300. LATZ, J., PAPAIOANNOU, I., AND ULLMANN, E. The Multilevel Sequentialˆ2 Monte Carlo Sampler for Bayesian Inverse Problems. ArXiv e-prints 1709.09763 (2017). NEAL, R. M. Annealed importance sampling. Stat. Comp. 11, 2 (2001), 125 139. Jonas Latz 49

Jonas Latz 50 www.latz.io Jonas Latz Input/Output: www.latz.io