Bayesian Inverse problem, Data assimilation and Localization

Similar documents
Performance of ensemble Kalman filters with small ensembles

Ergodicity in data assimilation methods

Forecasting and data assimilation

Sequential Monte Carlo Samplers for Applications in High Dimensions

Lagrangian Data Assimilation and Its Application to Geophysical Fluid Flows

Data assimilation in high dimensions

Stochastic Collocation Methods for Polynomial Chaos: Analysis and Applications

Methods of Data Assimilation and Comparisons for Lagrangian Data

A new Hierarchical Bayes approach to ensemble-variational data assimilation

Organization. I MCMC discussion. I project talks. I Lecture.

Application of the Ensemble Kalman Filter to History Matching

ECE276A: Sensing & Estimation in Robotics Lecture 10: Gaussian Mixture and Particle Filtering

Seminar: Data Assimilation

Short tutorial on data assimilation

Data assimilation in high dimensions

Gaussian Filtering Strategies for Nonlinear Systems

Ensemble Kalman Filter

Data assimilation with and without a model

Local Ensemble Transform Kalman Filter

Fundamentals of Data Assimilation

A data-driven method for improving the correlation estimation in serial ensemble Kalman filter

Gaussian Process Approximations of Stochastic Differential Equations

Dynamic System Identification using HDMR-Bayesian Technique

Data assimilation as an optimal control problem and applications to UQ

What do we know about EnKF?

Bayesian Inverse Problems

MCMC Sampling for Bayesian Inference using L1-type Priors

Introduction to Machine Learning

Ensemble Data Assimilation and Uncertainty Quantification

A Stochastic Collocation based. for Data Assimilation

EnKF-based particle filters

Introduction to Machine Learning CMU-10701

Introduction to Bayesian methods in inverse problems

Correcting biased observation model error in data assimilation

Dimension-Independent likelihood-informed (DILI) MCMC

The Ensemble Kalman Filter:

Sequential Monte Carlo Methods for Bayesian Computation

Data Assimilation for Dispersion Models

Markov chain Monte Carlo methods in atmospheric remote sensing

Bayesian Methods for Machine Learning

Smoothers: Types and Benchmarks

(Extended) Kalman Filter

Robust MCMC Sampling with Non-Gaussian and Hierarchical Priors

Relative Merits of 4D-Var and Ensemble Kalman Filter

Lecture 2: From Linear Regression to Kalman Filter and Beyond

Fundamentals of Data Assimila1on

Covariance Matrix Simplification For Efficient Uncertainty Management

27 : Distributed Monte Carlo Markov Chain. 1 Recap of MCMC and Naive Parallel Gibbs Sampling

Review of Covariance Localization in Ensemble Filters

Uncertainty quantification for Wavefield Reconstruction Inversion

Par$cle Filters Part I: Theory. Peter Jan van Leeuwen Data- Assimila$on Research Centre DARC University of Reading

Lecture 2: From Linear Regression to Kalman Filter and Beyond

EnKF Review. P.L. Houtekamer 7th EnKF workshop Introduction to the EnKF. Challenges. The ultimate global EnKF algorithm

Bayesian Estimation with Sparse Grids

Bayesian Estimation of Input Output Tables for Russia

Computer Intensive Methods in Mathematical Statistics

4DEnVar. Four-Dimensional Ensemble-Variational Data Assimilation. Colloque National sur l'assimilation de données

Winter 2019 Math 106 Topics in Applied Mathematics. Lecture 1: Introduction

Convergence of the Ensemble Kalman Filter in Hilbert Space

Markov Chain Monte Carlo Methods for Stochastic

arxiv: v1 [physics.ao-ph] 23 Jan 2009

An Efficient Ensemble Data Assimilation Approach To Deal With Range Limited Observation

Particle Filtering for Data-Driven Simulation and Optimization

Data Assimilation Research Testbed Tutorial

Markov Chain Monte Carlo Methods for Stochastic Optimization

Inverse problems and uncertainty quantification in remote sensing

The Matrix Reloaded: Computations for large spatial data sets

Particle Filtering Approaches for Dynamic Stochastic Optimization

Bayesian Inference and MCMC

Monte Carlo Methods. Leon Gu CSD, CMU

Maximum Likelihood Ensemble Filter Applied to Multisensor Systems

Sampling from complex probability distributions

Sampling the posterior: An approach to non-gaussian data assimilation

State Estimation using Moving Horizon Estimation and Particle Filtering

Expectation Propagation Algorithm

A nested sampling particle filter for nonlinear data assimilation

Fundamentals of Data Assimila1on

Data assimilation with and without a model

Numerical Weather Prediction: Data assimilation. Steven Cavallo

Computer Vision Group Prof. Daniel Cremers. 11. Sampling Methods

Morphing ensemble Kalman filter

Lecture 8: The Metropolis-Hastings Algorithm

Lagrangian Data Assimilation and its Applications to Geophysical Fluid Flows

Kalman Filter and Ensemble Kalman Filter

Winter 2019 Math 106 Topics in Applied Mathematics. Lecture 9: Markov Chain Monte Carlo

Kernel adaptive Sequential Monte Carlo

Practical unbiased Monte Carlo for Uncertainty Quantification

Markov Chain Monte Carlo

Approximate Bayesian Computation and Particle Filters

Inverse Problems in the Bayesian Framework

A Note on the Particle Filter with Posterior Gaussian Resampling

An introduction to adaptive MCMC

Lagrangian Data Assimilation and Manifold Detection for a Point-Vortex Model. David Darmon, AMSC Kayo Ide, AOSC, IPST, CSCAMM, ESSIC

Efficient MCMC Sampling for Hierarchical Bayesian Inverse Problems

Data assimilation in the geosciences An overview

4. DATA ASSIMILATION FUNDAMENTALS

Bayesian Methods and Uncertainty Quantification for Nonlinear Inverse Problems

Part 1: Expectation Propagation

Estimating functional uncertainty using polynomial chaos and adjoint equations

A Spectral Approach to Linear Bayesian Updating

Transcription:

Bayesian Inverse problem, Data assimilation and Localization Xin T Tong National University of Singapore ICIP, Singapore 2018 X.Tong Localization 1 / 37

Content What is Bayesian inverse problem? What is data assimilation? What is the curse of dimensionality? Localization for algorithms. X.Tong Localization 2 / 37

Bayesian formulation Deterministic inverse problem Given h and data y, find state x, so that y = h(x). Tikhonov regularization: solves instead min y h(x) 2 R + x 2 C x R, C: PSD matrices, x 2 C = xt C 1 x. Bayesian inverse problem Prior x p 0 = N (0, C). Observe y = h(x) + v, v N (0, R), y p l (y x). Bayes formula for posterior distribution p(x y) = p 0(x)p l (y x) p0 (x)p l (y x)dx p 0(x)p l (y x) Try to obtain p(x y). X.Tong Localization 3 / 37

Bayesian formulation Deterministic inverse problem Given h and data y, find state x, so that y = h(x). Tikhonov regularization: solves instead min y h(x) 2 R + x 2 C x R, C: PSD matrices, x 2 C = xt C 1 x. Bayesian inverse problem Prior x p 0 = N (0, C). Observe y = h(x) + v, v N (0, R), y p l (y x). Bayes formula for posterior distribution p(x y) = p 0(x)p l (y x) p0 (x)p l (y x)dx p 0(x)p l (y x) Try to obtain p(x y). X.Tong Localization 3 / 37

Connection to deterministic IP Posterior density p(x y) p 0 (x)p l (y x) exp( 1 2 x 2 C 1 2 y h(x) 2 R). Maxium a posterior (MAP) estimator max x p(x y) = min y x h(x) 2 R + x 2 C. Deterministic IP finds the MAP. But p(x y) can contain much richer information. Quantifies the uncertainty within x, given data y. X.Tong Localization 4 / 37

Connection to deterministic IP Posterior density p(x y) p 0 (x)p l (y x) exp( 1 2 x 2 C 1 2 y h(x) 2 R). Maxium a posterior (MAP) estimator max x p(x y) = min y x h(x) 2 R + x 2 C. Deterministic IP finds the MAP. But p(x y) can contain much richer information. Quantifies the uncertainty within x, given data y. X.Tong Localization 4 / 37

Uncertainty quantification (UQ) Estimate the accuracy and robustness of the solution. Finding all the local mins, i.e., likely states. Risk: what is the chance a hurricane hits a city? Streaming data and estimation on the fly. Cost: more complicated computation X.Tong Localization 5 / 37

Monte Carlo Direct method Can we simply compute p(x y)? Involves quadrature (integral) and PDE. Not feasible if dim(x) > 5. Practical dim(x) = 10 10 8. One exception: if p 0 is Gaussian and h is linear Kalman filter: p(x y) is an explicit Gaussian. Monte Carlo Generate samples x k p(x y), i = 1,, N. Use sample statistics as an approximation. Local mins: clusters of samples. Sample variance: how uncertain are the estimators? Risk: what percentage of samples lead to bad outcomes? X.Tong Localization 6 / 37

Monte Carlo Direct method Can we simply compute p(x y)? Involves quadrature (integral) and PDE. Not feasible if dim(x) > 5. Practical dim(x) = 10 10 8. One exception: if p 0 is Gaussian and h is linear Kalman filter: p(x y) is an explicit Gaussian. Monte Carlo Generate samples x k p(x y), i = 1,, N. Use sample statistics as an approximation. Local mins: clusters of samples. Sample variance: how uncertain are the estimators? Risk: what percentage of samples lead to bad outcomes? X.Tong Localization 6 / 37

Markov chain Monte Carlo (MCMC) Given a target distribution p(x), generate a sequence of samples x 1, x 2,, x N. Standard MCMC steps Generate proposals x q(x k, x ) Accept with prob. α(x, x ) = min{1, p(x )q(x, x k )/q(x k, x )p(x)} Popular choices of proposals ξ k N (0, I n ). RWM: x = x k + σξ k MALA: x = x k + σ2 2 log p(xk ) + σξ k. pcn: x = 1 β 2 x k + β ξ k. Also emcee and Hamiltonian MCMC. σ, β are tuning parameters. X.Tong Localization 7 / 37

How does MCMC work in high dim? Sample isotropic Gaussian p = N (0, I n ). Measurement of efficiency: integrated auto-correlation time (IACT) Measure how many iterations to get an uncorrelated sample. RWM emcee Hamiltonian MCMC MALA IACT 1200 1000 800 600 400 200 0 0 500 1000 Dimension increases with dimension IACT 80 60 40 20 0 0 500 1000 Dimension All X.Tong Localization 8 / 37

Where is the problem? Let s look at RWM, assume x k = 0. Propose x = σξ k Accept with probability exp( 1 2 σ2 ξ k 2 ) exp( 1 2 σ2 d). If we keep σ = 1, never accept if d > 20. If we want acceptance at a constant rate, σ = d 1 2. But then x = x k + σξ k is highly correlated with x k. Similar for MALA, σ = d 1/3. Hamiltonian MCMC, σ = d 1/4. Is it possible to break this curse of dimensionality? X.Tong Localization 9 / 37

Where is the problem? Let s look at RWM, assume x k = 0. Propose x = σξ k Accept with probability exp( 1 2 σ2 ξ k 2 ) exp( 1 2 σ2 d). If we keep σ = 1, never accept if d > 20. If we want acceptance at a constant rate, σ = d 1 2. But then x = x k + σξ k is highly correlated with x k. Similar for MALA, σ = d 1/3. Hamiltonian MCMC, σ = d 1/4. Is it possible to break this curse of dimensionality? X.Tong Localization 9 / 37

Where is the problem? Let s look at RWM, assume x k = 0. Propose x = σξ k Accept with probability exp( 1 2 σ2 ξ k 2 ) exp( 1 2 σ2 d). If we keep σ = 1, never accept if d > 20. If we want acceptance at a constant rate, σ = d 1 2. But then x = x k + σξ k is highly correlated with x k. Similar for MALA, σ = d 1/3. Hamiltonian MCMC, σ = d 1/4. Is it possible to break this curse of dimensionality? X.Tong Localization 9 / 37

How to beat the curse of dimensionality? Gaussian structure Target p N (m, Σ), X = m + Σ1/2 Z. Approximate the area near MAP with Gaussian. High dimensional Gaussian+low dimensional nongaussian. Low effective dimension Components have diminishing uncertainty/importance. Example: Fourier modes of a PDE solution. Functional space MCMC (Stuart 2010) Lorenz system propagation. Video: MIT Aero Astro X.Tong Localization 10 / 37

Data assimilation Sequential inverse problem: x 0 p 0 Hidden: x n+1 = Ψ(x n ) + ξ n+1, Data: y n = h n (x n ) + v n. Estimate x n based on y 1,, y n. Bayesian approach: find p(x n y 1, y 2,, y n ) Online update from p(x n y 1, y 2,, y n ) p(x n+1 y 1, y 2,, y n, y n+1 ). without storing y 1,, y n. X.Tong Localization 11 / 37

Weather forecast Weather forecast: Hidden: x n+1 = Ψ(x n ) + ξ n+1, Data: y n = h(x n ) + v n. x n : atmosphere and ocean state of day n Ψ: PDE solver maps yesterday to today. h: partial observation made by satellites. Main challenge: high dimension, d 10 6 10 8. X.Tong Localization 12 / 37

Bayesian IP application Inverse problem with streaming data: x 0 p 0 Hidden: x n+1 = x n x 0, Data: y n = h n (x 0 ) + v n. Estimate x n based on y 1,, y n. Application: Subsurface detection. x 0 : subsurface structure. y n : data gathered through n-th experiements. X.Tong Localization 13 / 37

Kalman filter Linear setting Hidden: x n+1 = A n x n + B n + ξ n+1 Data: y n = Hx n + v n. Use Gaussian: x n y1...n N (m n, R n ) Forecast step: ˆm n+1 = A n m n + B n, R n+1 = A n R n A T n + Σ n. Assimilation step: apply the Kalman update rule m n+1 = ˆm n+1 + G( R n+1 )(y n+1 H ˆm n+1 ), R n+1 = K( R n+1 ) G(C) = CH T (R + HCH T ) 1, K(C) = C G(C)HC Posterior at t = n Prior+Obs at t = n + 1 forecast assimilate Posterior at t = n + 1 X.Tong Localization 14 / 37

Nonlinear filtering Main drawbacks Difficult to deal with nonlinear dynamics/observation. Computation is expensive O(d 3 ). Particle filter Can deal with nonlinearity. Try to generate samples from x k n p(x n y 1,..., y n ). Prohibitively expensive O(e d ). Ensemble Kalman filter Combination of two ideas: Gaussian distributions with samples Ensemble {x k n} K k=1 to represent N (x n, C n ) x n = x k n K, C n = 1 K 1 (x k n x n ) (x k n x n ). k X.Tong Localization 15 / 37

Nonlinear filtering Main drawbacks Difficult to deal with nonlinear dynamics/observation. Computation is expensive O(d 3 ). Particle filter Can deal with nonlinearity. Try to generate samples from x k n p(x n y 1,..., y n ). Prohibitively expensive O(e d ). Ensemble Kalman filter Combination of two ideas: Gaussian distributions with samples Ensemble {x k n} K k=1 to represent N (x n, C n ) x n = x k n K, C n = 1 K 1 (x k n x n ) (x k n x n ). k X.Tong Localization 15 / 37

Nonlinear filtering Main drawbacks Difficult to deal with nonlinear dynamics/observation. Computation is expensive O(d 3 ). Particle filter Can deal with nonlinearity. Try to generate samples from x k n p(x n y 1,..., y n ). Prohibitively expensive O(e d ). Ensemble Kalman filter Combination of two ideas: Gaussian distributions with samples Ensemble {x k n} K k=1 to represent N (x n, C n ) x n = x k n K, C n = 1 K 1 (x k n x n ) (x k n x n ). k X.Tong Localization 15 / 37

Ensemble Kalman filter Forecast step ˆx k n+1 = Ψ(X k n) + ξ k n+1, y k n+1 = h n+1 (x k n+1) + ζ k n+1 Assimilation step x k n+1 = ˆx k n+1 + G n+1 (y n+1 y k n+1). Gain matrix: G n+1 = S X SY T (R + S Y SY T ) 1, S X = 1 K 1 [x(1) n+1 x n+1,..., x k n+1 x n+1 ]. Complexity O(K 2 d). Posterior at t = n Prior+Obs at t = n + 1 Posterior at t = n + 1 X (k) n+1 X (k) forecast: M.C. assimilate n+1 X (k) n X.Tong Localization 16 / 37

Success of EnKF Successful applications to weather forecast System d 10 6, ensemble size K 10 2. Complexity of Kalman filter: O(d 3 ) = O(10 18 ). Complexity of EnKF: O(K 2 d) = O(10 10 ). No computation of gradient. Also find applications in Oil reservoir management Bayesian IP Deep learning. X.Tong Localization 17 / 37

EnKF in high dimension Theoretical dimension curse comes from another angle The covariance of x estimated using the sample covariance x = 1 K x k, Ĉ = 1 (x k x)(x k x) T. K 1 If x k N (0, Σ), Ĉ Σ = O( d/k). (Bai-Yin s law) If sample size K much less than the dimension d. X.Tong Localization 18 / 37

Local interaction High dimension often comes from dense grids. Interaction often is local: PDE discritization. Example: Lorenz 96 model ẋ i (t) = (x i+1 x i 2 )x i 1 x i dt + F, i = 1,, d Information travels along interaction, and is dissipated. X.Tong Localization 19 / 37

Sparsity: local covariance Correlation of Lorenz 96 X.Tong Localization 20 / 37 Correlation depends on information propagation. Correlation decays quickly with the distance. Covariance is localized with a structure Φ, e.g. Φ(x) = ρ x [Ĉ] i,j Φ( i j ) Φ(x) [0, 1] is decreasing. Distance can be general. 1 5 0.9 10 0.8 0.7 15 20 0.6 0.5 25 30 35 0.4 0.3 0.2 0.1 40 0 5 10 15 20 25 30 35 40

Covariance Localization Implementation: Schur product with a mask [Ĉ D L] i,j = [Ĉ] i,j [D L ] i,j Use Ĉ D L instead for covariance. [D L ] i,j = φ( i j ), with a radius L. Gaspari-Cohn matrix: φ(x) = exp( 4x 2 /L 2 )1 i j L. Cutoff matrix: φ(x) = 1 i j L. Application in EnKF: use Ĉn D L instead of Ĉn Essentially: assimilate information from near neighbors. X.Tong Localization 21 / 37

Advantage with localization Theorem (Bickel, Levina 08) x 1,..., x k N (0, Σ), sample covariance matrix C = 1 K K k=1 xk x k. There is a constant c, and for any t > 0 P( C D L Σ D L > D L 1 t) 8 exp(2 log d ck min{t, t 2 }) This indicates that K D L 2 1 log d is the necessary sample size. D L 1 = max i j D L i,j is independent of d, e.g, the cut-off/branding matrix. [D L cut] i,j = 1 i j L, D L cut 1 2L. Theorem (T. 18) If EnKF has a stable localized covariance structure, the sample size is of order log n. X.Tong Localization 22 / 37

Advantage with localization Theorem (Bickel, Levina 08) x 1,..., x k N (0, Σ), sample covariance matrix C = 1 K K k=1 xk x k. There is a constant c, and for any t > 0 P( C D L Σ D L > D L 1 t) 8 exp(2 log d ck min{t, t 2 }) This indicates that K D L 2 1 log d is the necessary sample size. D L 1 = max i j D L i,j is independent of d, e.g, the cut-off/branding matrix. [D L cut] i,j = 1 i j L, D L cut 1 2L. Theorem (T. 18) If EnKF has a stable localized covariance structure, the sample size is of order log n. X.Tong Localization 22 / 37

From EnKF to MCMC? Localization for EnKF: Update only in small local blocks. Works when covariance have local structure. How to apply localization to MCMC? How to update x i component by component? Gibbs sampling implements this idea exactly! Write x i = [x i 1, x i 2,, x i m]. x i k can be of dimension q, then d = qm. Generate x i+1 1 p(x 1 x i 2, x i 3, x i m). Generate x i+1 2 p(x 2 x i+1 1, x i 3, x i m). Generate x i+1 m p(x m x i+1 1, x i+1 2, x i+1 m 1 ). X.Tong Localization 23 / 37

From EnKF to MCMC? Localization for EnKF: Update only in small local blocks. Works when covariance have local structure. How to apply localization to MCMC? How to update x i component by component? Gibbs sampling implements this idea exactly! Write x i = [x i 1, x i 2,, x i m]. x i k can be of dimension q, then d = qm. Generate x i+1 1 p(x 1 x i 2, x i 3, x i m). Generate x i+1 2 p(x 2 x i+1 1, x i 3, x i m). Generate x i+1 m p(x m x i+1 1, x i+1 2, x i+1 m 1 ). X.Tong Localization 23 / 37

How efficient is Gibbs? First just test with p = N (0, I d ) Generate x i+1 1 p(x 1 x i 2, x i 3, x i m) = N (0, I q ). Generate x i+1 2 p(x 2 x i+1 1, x i 3, x i m) = N (0, I q ). Generate x i+1 m p(x m x i+1 1, x i+1 2, x i+1 m 1 ) = N (0, I q). Gibbs naturally exploits the component independence. It works efficiently against the dimension. How about component with sparse/local independence? X.Tong Localization 24 / 37

How efficient is Gibbs? First just test with p = N (0, I d ) Generate x i+1 1 p(x 1 x i 2, x i 3, x i m) = N (0, I q ). Generate x i+1 2 p(x 2 x i+1 1, x i 3, x i m) = N (0, I q ). Generate x i+1 m p(x m x i+1 1, x i+1 2, x i+1 m 1 ) = N (0, I q). Gibbs naturally exploits the component independence. It works efficiently against the dimension. How about component with sparse/local independence? X.Tong Localization 24 / 37

Block tridiagonal matrix Local covariance matrix C: [C] i,j decays to zero quickly when i j becomes large. Localized covariance matrix C: [C] i,j = 0 when i j > L. C has a bandwidth 2L. We will see local is a perturbation of localized We can choose q = L in x i = [x i 1, x i 2,, x i m], Then C is block tridiagonal. X.Tong Localization 25 / 37

Localized covariance Theorem (Morzfeld, T., Marzouk) Apply Gibbs sampler with block-size q to p = N (m, C). Suppose C is q-block-tridiagonal. Then the distribution of x k converges to p geometrically fast in all coordinates, and we can couple x k and a sample z N (m, C) such that where E C 1/2 (x k z) 2 β k d(1 + C 1/2 (x 0 m) 2 ), β 2(1 C 1 ) 2 C 4 1 + 2(1 C 1 ) 2 C 4, with C being the condition number of C. Localized covariance+mild condition dimension free covergence. X.Tong Localization 26 / 37

Localized precision Theorem (Morzfeld, T., Marzouk) Apply Gibbs sampler with block-size q to p = N (m, C). Suppose Σ = C 1 is q-block-tridiagonal. Then the distribution of x k converges to p geometrically fast in all coordinates, and we can couple x k and a sample z N (m, C) such that where E C 1/2 (x k z) 2 β k d(1 + C 1/2 (x 0 m) 2 ), β C(1 C 1 ) 2 1 + C(1 C 1 ) 2, with C being the condition number of C. Localized precision+mild condition dimension free covergence. X.Tong Localization 27 / 37

Covariance v.s. Precision Why both localized covariance and precision? A lemma in Bickle & Lindner 2012. Localized covariance+mild condition local precision. Localized precision+mild condition local covariance. We will see local is a perturbation of localized For computation of Gibbs sampler, localized precision is superior: x k+1 j N m j Ω 1 j,j Ω j,i(x k+1 i m i ) Ω 1 j,j Ω j,i(x k i m i ), Ω 1 j,j. i<j i>j When Ω is sparse, meaning fast computation. X.Tong Localization 28 / 37

Covariance v.s. Precision Why both localized covariance and precision? A lemma in Bickle & Lindner 2012. Localized covariance+mild condition local precision. Localized precision+mild condition local covariance. We will see local is a perturbation of localized For computation of Gibbs sampler, localized precision is superior: x k+1 j N m j Ω 1 j,j Ω j,i(x k+1 i m i ) Ω 1 j,j Ω j,i(x k i m i ), Ω 1 j,j. i<j i>j When Ω is sparse, meaning fast computation. X.Tong Localization 28 / 37

Generalization Gibbs works for Gaussian sampling, with localized covariance or precision. How about Bayesian inverse problem? y = h(x) + v, v N (0, R), x p 0 = N (m, C). If h is linear, p is also Gaussian, Gibbs is directly applicable. What to do when C e.t.c. are not localized but local? X.Tong Localization 29 / 37

Metropolis within Gibbs Add in Metropolis steps to incorporate information Generate x 1 p 0(x 1 x i 2, x i 3, x i m) Accept as x i+1 1 with α 1(x i 1, x 1, x i 2:m) { α 1(x i 1, x 1, x i 2:m) = min 1, exp( 1 y } 2 h(x ) 2 R) exp( 1 y 2 h(xi ) 2 R ), where x = (x 1, x i 2:m). Repeat for all 2,..., m blocks When h has a dimension free Lipschitz constant, y h(x ) 2 R y h(xi ) 2 R is independent of n. Dimension independent acceptance rate. Should have fast convergence, though proof is unclear. X.Tong Localization 30 / 37

Localization Often Ω and h are local [Ω] i,j decays to zero quickly when i j increases. [h(x)] j depends significantly only over a few x i. Fast sparse computation is possible with localized parameters [Ω] i,j decays to zero quickly when i j increases. [h(x)] j depends significantly only over a few x i. Localization: truncate the near zero terms, Ω Ω L, h h L. We call MwG with localization as l-mwg. Theorem (Morzfeld, T., Marzouk 2018) The perturbation to the inverse problem is of order { } max Ω Ω L 1, (H H L )(H H L ) T 1. A 1 = max 1 j n n i=1 A i,j. X.Tong Localization 31 / 37

Comparison with function space MCMC Function space MCMC: Discretization refines, domain const. Number of obs. const. Effective dimension const. Low-rank priors. Low-rank prior to posterior update. Solved by dimension reduction. MCMC for local problems: Domain size increases, discretization is const. Number of obs. increases. Effective dimension increases. High-rank, sparse priors. High-rank prior to posterior update. Solved by localization. X.Tong Localization 32 / 37

Example I: image deblurring Truth x N (0, δ 1 L 2 ), L is Laplacian. Defocus obs: y = Ax + η, η N (0, λ 1 I). Dimension is large O(10 4 ). X.Tong Localization 33 / 37

Example I: image deblurring Precision is sparse. Effective dimension is large. X.Tong Localization 34 / 37

Example II: Lorenz 96 inverse Truth x0 p0, p0 is Gaussian Climatology. Ψt : x0 7 xt : dxi = (xi+1 xi 2 )xi 1 xi + 8 Observe every other xt, y = H(Ψt (x0 )) + ξ. X.Tong Localization 35 / 37

Summary Bayesian IP provides more information on UQ. Often solved by MCMC algorithms. Data assimilation (DA): Bayesian IP with dynamical systems. EnKF: efficient solver for high dim DA. Localization: a useful technique in high dimensional settigns. X.Tong Localization 36 / 37

Reference Inverse problems: A Bayesian perspective, A.M.Stuart, Acta Numerica 2010. Localization for MCMC: sampling high-dimensional posterior distributions with local structure. arxiv:1710.07747 Performance analysis of local ensemble Kalman filter. to appear on J. Nonlinear Science. Thank you! X.Tong Localization 37 / 37