Ergodicity in data assimilation methods

Similar documents
Data assimilation in high dimensions

Data assimilation in high dimensions

What do we know about EnKF?

EnKF and filter divergence

EnKF and Catastrophic filter divergence

Stochastic Collocation Methods for Polynomial Chaos: Analysis and Applications

Filtering Sparse Regular Observed Linear and Nonlinear Turbulent System

Performance of ensemble Kalman filters with small ensembles

Bayesian Inverse problem, Data assimilation and Localization

Nonlinear stability of the ensemble Kalman filter with adaptive covariance inflation

Ensemble Data Assimilation and Uncertainty Quantification

Gaussian Filtering Strategies for Nonlinear Systems

(Extended) Kalman Filter

Ensemble Kalman Filter

A variance limiting Kalman filter for data assimilation: I. Sparse observational grids II. Model error

A Stochastic Collocation based. for Data Assimilation

A Note on the Particle Filter with Posterior Gaussian Resampling

Correcting biased observation model error in data assimilation

Lagrangian Data Assimilation and Its Application to Geophysical Fluid Flows

State and Parameter Estimation in Stochastic Dynamical Models

Application of the Ensemble Kalman Filter to History Matching

Forecasting and data assimilation

An Efficient Ensemble Data Assimilation Approach To Deal With Range Limited Observation

Data Assimilation Research Testbed Tutorial

A new Hierarchical Bayes approach to ensemble-variational data assimilation

Bayesian Calibration of Simulators with Structured Discretization Uncertainty

Lecture 2: From Linear Regression to Kalman Filter and Beyond

DART_LAB Tutorial Section 2: How should observations impact an unobserved state variable? Multivariate assimilation.

Many contemporary problems in science ranging from protein

Adaptive ensemble Kalman filtering of nonlinear systems

Robust Ensemble Filtering With Improved Storm Surge Forecasting

Forecasting Turbulent Modes with Nonparametric Diffusion Models: Learning from Noisy Data

Systematic strategies for real time filtering of turbulent signals in complex systems

Lagrangian Data Assimilation and Manifold Detection for a Point-Vortex Model. David Darmon, AMSC Kayo Ide, AOSC, IPST, CSCAMM, ESSIC

Data assimilation with and without a model

A mechanism for catastrophic filter divergence in data assimilation for sparse observation networks

Superparameterization and Dynamic Stochastic Superresolution (DSS) for Filtering Sparse Geophysical Flows

ECE276A: Sensing & Estimation in Robotics Lecture 10: Gaussian Mixture and Particle Filtering

Kalman Filter and Ensemble Kalman Filter

A data-driven method for improving the correlation estimation in serial ensemble Kalman filter

Convective-scale data assimilation in the Weather Research and Forecasting model using a nonlinear ensemble filter

Convergence of Square Root Ensemble Kalman Filters in the Large Ensemble Limit

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012

Information Flow/Transfer Review of Theory and Applications

Data assimilation with and without a model

Smoothers: Types and Benchmarks

Hierarchical Bayes Ensemble Kalman Filter

Superparameterization and Dynamic Stochastic Superresolution (DSS) for Filtering Sparse Geophysical Flows

The Ensemble Kalman Filter:

Stability of Ensemble Kalman Filters

MCMC Sampling for Bayesian Inference using L1-type Priors

Local Ensemble Transform Kalman Filter

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008

Relative Merits of 4D-Var and Ensemble Kalman Filter

Lecture : Probabilistic Machine Learning

Gaussian Process Approximations of Stochastic Differential Equations

Aspects of the practical application of ensemble-based Kalman filters

EnKF-based particle filters

Bayesian parameter estimation in predictive engineering

Convergence of the Ensemble Kalman Filter in Hilbert Space

Relationship between Singular Vectors, Bred Vectors, 4D-Var and EnKF

Relationship between Singular Vectors, Bred Vectors, 4D-Var and EnKF

Short tutorial on data assimilation

New Fast Kalman filter method

Local Ensemble Transform Kalman Filter: An Efficient Scheme for Assimilating Atmospheric Data

4. DATA ASSIMILATION FUNDAMENTALS

Bayesian Statistics and Data Assimilation. Jonathan Stroud. Department of Statistics The George Washington University

Semiparametric modeling: correcting low-dimensional model error in parametric models

arxiv: v1 [physics.ao-ph] 23 Jan 2009

Lagrangian data assimilation for point vortex systems

The Kalman Filter. Data Assimilation & Inverse Problems from Weather Forecasting to Neuroscience. Sarah Dance

Review of Covariance Localization in Ensemble Filters

Bayesian Support Vector Machines for Feature Ranking and Selection

Part 1: Expectation Propagation

Data Assimilation for Dispersion Models

CIS 390 Fall 2016 Robotics: Planning and Perception Final Review Questions

DATA ASSIMILATION FOR FLOOD FORECASTING

Methods of Data Assimilation and Comparisons for Lagrangian Data

Assimilation des Observations et Traitement des Incertitudes en Météorologie

Sequential Monte Carlo Samplers for Applications in High Dimensions

Seminar: Data Assimilation

Lecture 2: From Linear Regression to Kalman Filter and Beyond

Enhancing information transfer from observations to unobserved state variables for mesoscale radar data assimilation

Stochastic Analogues to Deterministic Optimizers

GSHMC: An efficient Markov chain Monte Carlo sampling method. Sebastian Reich in collaboration with Elena Akhmatskaya (Fujitsu Laboratories Europe)

Fast-slow systems with chaotic noise

Filtering the Navier-Stokes Equation

2D Image Processing. Bayes filter implementation: Kalman filter

Machine Learning 4771

Models for models. Douglas Nychka Geophysical Statistics Project National Center for Atmospheric Research

Efficient Data Assimilation for Spatiotemporal Chaos: a Local Ensemble Transform Kalman Filter

The Ensemble Kalman Filter:

Probability and Information Theory. Sargur N. Srihari

CSCI-567: Machine Learning (Spring 2019)

Lagrangian Data Assimilation and its Applications to Geophysical Fluid Flows

Nonparametric Bayesian Methods (Gaussian Processes)

Reconstruction Deconstruction:

Linear Dynamical Systems (Kalman filter)

9 Multi-Model State Estimation

Consistency of the maximum likelihood estimator for general hidden Markov models

Transcription:

Ergodicity in data assimilation methods David Kelly Andy Majda Xin Tong Courant Institute New York University New York NY www.dtbkelly.com April 15, 2016 ETH Zurich David Kelly (CIMS) Data assimilation April 15, 2016 1 / 29

Part I: What is data assimilation? David Kelly (CIMS) Data assimilation April 15, 2016 2 / 29

What is data assimilation? Suppose u satisfies an evolution equation du dt = F (u) with some unknown initial condition u 0 µ 0. There is a true trajectory of u that is producing partial, noisy observations at times t = h, 2h,..., nh: y n = Hu n + ξ n where H is a linear operator (think low rank projection), u n = u(nh), and ξ n N(0, Γ) iid. The aim of data assimilation is to characterize the conditional distribution of u n given the observations {y 1,..., y n } David Kelly (CIMS) Data assimilation April 15, 2016 3 / 29

The conditional distribution is updated via the filtering cycle. David Kelly (CIMS) Data assimilation April 15, 2016 4 / 29

Illustration (Initialization) y Ψ obs x Figure: The blue circle represents our guess of u 0. Due to the uncertainty in u 0, this is a probability measure. David Kelly (CIMS) Data assimilation April 15, 2016 5 / 29

Illustration (Forecast step) y Ψ obs x Figure: Apply the time h flow map Ψ. This produces a new probability measure which is our forecasted estimate of u 1. This is called the forecast step. David Kelly (CIMS) Data assimilation April 15, 2016 5 / 29

Illustration (Make an observation) y Ψ obs x Figure: We make an observation y 1 = Hu 1 + ξ 1. In the picture, we only observe the x variable. David Kelly (CIMS) Data assimilation April 15, 2016 5 / 29

Illustration (Analysis step) y Ψ obs x Figure: Using Bayes formula we compute the conditional distribution of u 1 y 1. This new measure (called the posterior) is the new estimate of u 1. The uncertainty of the estimate is reduced by incorporating the observation. The forecast distribution steers the update from the observation. David Kelly (CIMS) Data assimilation April 15, 2016 5 / 29

Bayes formula filtering update Let Y n = {y 0, y 1,..., y n }. We want to compute the conditional density P(u n+1 Y n+1 ), using P(u n Y n ) and y n+1. By Bayes formula, we have P(u n+1 Y n+1 ) = P(u n+1 Y n, y n+1 ) P(y n+1 u n+1 )P(u n+1 Y n ) But we need to compute the integral P(u n+1 Y n ) = P(u n+1 Y n, u n )P(u n Y n )du n. David Kelly (CIMS) Data assimilation April 15, 2016 6 / 29

The optimal filtering framework is impractical for high dimensional models, as the integrals are impossible to compute and densities impossible to store. In numerical weather prediction, we have O(10 9 ) variables for ocean-atmosphere models (discretized PDEs). David Kelly (CIMS) Data assimilation April 15, 2016 7 / 29

The Ensemble Kalman filter (EnKF) is a low dimensional representation, empirical approximation of the posterior measure P(u n Y n ). EnKF builds on the idea of a Kalman filter. David Kelly (CIMS) Data assimilation April 15, 2016 8 / 29

The Kalman Filter For a linear model u n+1 = Mu n + η n+1, the Bayesian integral is Gaussian and can be computed explicitly. The conditional density is proportional to ( exp 1 2 Γ 1/2 (y n+1 Hu) 2 1 ) 1/2 Ĉ 1 (u m n+1 ) 2 2 where ( m n+1, Ĉ n+1) is the forecast mean and covariance. We obtain m n+1 = (1 K n+1 H) m n + K n+1 y n+1 C n+1 = (I K n+1 H)Ĉ n+1, where K n+1 = Ĉ n+1h T (Γ + HĈ n+1h T ) 1 is the Kalman gain. The procedure of updating (m n, C n ) (m n+1, C n+1 ) is known as the Kalman filter. David Kelly (CIMS) Data assimilation April 15, 2016 9 / 29

Ensemble Kalman filter (Evensen 94) y Ψ obs x Figure: Start with K ensemble members drawn from some distribution. Empirical representation of u 0. The ensemble members are denoted u (k) 0. Only KN numbers are stored. Better than Kalman if K < N. David Kelly (CIMS) Data assimilation April 15, 2016 10 / 29

Ensemble Kalman filter (Forecast step) y Ψ obs x Figure: Apply the dynamics Ψ to each ensemble member. David Kelly (CIMS) Data assimilation April 15, 2016 10 / 29

Ensemble Kalman filter (Make obs) y Ψ Figure: Make an observation. obs x David Kelly (CIMS) Data assimilation April 15, 2016 10 / 29

Ensemble Kalman filter (Analysis step) y Ψ obs x Figure: Approximate the forecast distribution with a Gaussian. Fit the Gaussian using the empirical statistics of the ensemble. David Kelly (CIMS) Data assimilation April 15, 2016 10 / 29

How to implement the Gaussian approximation The posterior is approximately sampled using the Randomized Maximum Likelihood (RML) method: Draw the sample u (k) n+1 by minimizing the functional 1 2 Γ 1/2 (y (k) n+1 Hu) 2 + 1 1/2 Ĉ 1 (u Ψ (k) (u (k) n )) 2 2 where y (k) n+1 = y n+1 + ξ (k) n+1 is a perturbed observation. In the linear case Ψ(u n ) = Mu n + η n, this produces iid Gaussian samples with mean and covariance satisfying the Kalman update equations, with Ĉ in place of the true forecast covariance. We end up with u (k) n+1 = (1 K n+1h)ψ (k) (u (k) n ) + K n+1 Hy (k) n+1. David Kelly (CIMS) Data assimilation April 15, 2016 11 / 29

Ensemble Kalman filter (Perturb obs) y Ψ obs x Figure: Turn the observation into K artificial observations by perturbing by the same source of observational noise. y (k) 1 = y 1 + ξ (k) 1 David Kelly (CIMS) Data assimilation April 15, 2016 12 / 29

Ensemble Kalman filter (Analysis step) y Ψ obs x Figure: Update each member using the Kalman update formula. The Kalman gain K 1 is computed using the ensemble covariance. u (k) 1 = (1 K 1 H)Ψ(u (k) 0 ) + K 1Hy (k) 1 K 1 = Ĉ 1H T (Γ + HĈ 1H T ) 1 Ĉ 1 = 1 K (Ψ(u (k) 0 K 1 ) m n+1)(ψ(u (k) 0 ) m n+1) T David Kelly (CIMS) k=1 Data assimilation April 15, 2016 12 / 29

What are we interested in understanding? 1 Stability with respect to initializations (ergodicity): Statistics of (u n, u (1) n,..., u (K) n ) converge to invariant statistics as n. 2 Accuracy of the approximation: Law of (u (1) n,..., u (K) n ) Y n approximates posterior as n. Rmk. All results are in the K fixed regime. David Kelly (CIMS) Data assimilation April 15, 2016 13 / 29

Today we focus on geometric ergodicity for the signal-filter process (u n, u (1) n,..., u (K) n ). David Kelly (CIMS) Data assimilation April 15, 2016 14 / 29

The theoretical framework A Markov chain {X n } n N on a state space X is called geometrically ergodic if it has a unique invariant measure π and for any initialization X 0 we have sup E x 0 f (X n ) f (x)π(dx) C(x 0)r n f 1 for some r (0, 1) and any m-ble bdd f. We use the coupling approach: Let X n and X n be two copies of X n, such that X 0 = x 0 and X 0 π, and are coupled in such a way that X 0 = X n for n T, where T is the first hitting time X T = X T. Then we have P n δ x0 π TV 2P(T > n) David Kelly (CIMS) Data assimilation April 15, 2016 15 / 29

The Doeblin / Meyn-Tweedie approach is to verify two assumptions that guarantee the coupling can be constructed with P(T > n) r n : 1- Lyapunov function / Energy dissipation: E n X n+1 2 α X n 2 + β with α (0, 1), β > 0. 2- Minorization: Find compact C X, measure ν supported on C, κ > 0 such that P(x, A) κν(a) for all x C, A X. David Kelly (CIMS) Data assimilation April 15, 2016 16 / 29

Minorization is inherited from model Smooth densities on C minorization. For the signal-filter process (u n, u (1) n,..., u (K) n ) we can write the Markov chain as a composition of two maps: (u n, u (1) n,..., u (K) n ) (u n+1, u (1) n+1,..., u(k) n+1 ) (u n+1, Ψ (1) (u (1) n ),..., Ψ (K) (u (K) n ), y (1) n,..., y (K) n ) The first step is a nice Markov kernel with smooth densities (by assumptions on model and observational noise). The second is a deterministic map. We only need to check that deterministic map preserves the densities (just near one chosen point y ). David Kelly (CIMS) Data assimilation April 15, 2016 17 / 29

Lyapunov function: Inheriting an energy principle Suppose we know the model satisfies an energy principle E n Ψ(u) 2 α u 2 + β for α (0, 1), β > 0. Does the filter inherit the energy principle? E n u (k) n+1 2 α u (k) n 2 + β David Kelly (CIMS) Data assimilation April 15, 2016 18 / 29

Only observable energy is inherited (Tong, Majda, K. 16) We can show that if the model satisfies an observable energy criterion: then the filter will too: E n Hu n+1 2 α Hu n 2 + β E n Hu (k) n+1 2 α Hu (k) n 2 + β The observable energy criterion holds when there is no interaction between observed and unobserved variables at high energies. But when H is not full rank this does not give a Lyapunov function, so no ergodicity, only boundedness of observed components. David Kelly (CIMS) Data assimilation April 15, 2016 19 / 29

Catastrophic filter divergence Lorenz-96: u j = (u j+1 u j 2 )u j 1 u j + F with j = 1,..., 40. Periodic BCs. Observe every fifth node. (Harlim-Majda 10, Gottwald-Majda 12) True solution in a bounded set, but filter blows up to machine infinity in finite time! David Kelly (CIMS) Data assimilation April 15, 2016 20 / 29

For complicated models, only heuristic arguments offered as explanation. Can we prove it for a simpler constructive model? David Kelly (CIMS) Data assimilation April 15, 2016 21 / 29

The rotate-and-lock map (K., Majda, Tong. PNAS 15.) The model Ψ : R 2 R 2 is a composition of two maps Ψ(x, y) = Ψ lock (Ψ rot (x, y)) where ( ) ( ) ρ cos θ ρ sin θ x Ψ rot (x, y) = ρ sin θ ρ cos θ y and Ψ lock rounds the input to the nearest point in the grid G = {(m, (2n + 1)ε) R 2 : m, n Z}. It is easy to show that this model has an energy dissipation principle: for α (0, 1) and β > 0. Ψ(x, y) 2 α (x, y) 2 + β David Kelly (CIMS) Data assimilation April 15, 2016 22 / 29

(a) Figure: The red square is the trajectory u n = 0. The blue dots are the positions of the forecast ensemble Ψ(u + 0 ), Ψ(u 0 ). Given the locking mechanism in Ψ, this is a natural configuration. David Kelly (CIMS) Data assimilation April 15, 2016 23 / 29

(b) Figure: We make an observation (H shown below) and perform the analysis step. The green dots are u + 1, u 1. Observation matrix [ ] 1 0 H = ε 2 1 Truth u n = (0, 0). The filter is certain that the x-coordinate is ˆx (the dashed line). The filter thinks the observation must be (ˆx, ε 2ˆx + u 1,y ), but it is actually (0, 0) + noise. The filter concludes that u 1,y ε 2ˆx. David Kelly (CIMS) Data assimilation April 15, 2016 23 / 29

(c) Figure: Beginning the next assimilation step. Apply Ψ rot to the ensemble (blue dots) David Kelly (CIMS) Data assimilation April 15, 2016 23 / 29

(d) Figure: Apply Ψ lock. The blue dots are the forecast ensemble Ψ(u + 1 ), Ψ(u 1 ). Exact same as frame 1, but higher energy orbit. The cycle repeats leading to exponential growth. David Kelly (CIMS) Data assimilation April 15, 2016 23 / 29

Theorem (K., Majda, Tong 15 PNAS) For any N > 0 and any p (0, 1) there exists a choice of parameters such that ( ) P u (k) n M n for all n N 1 p where M n is an exponentially growing sequence. ie - The filter can be made to grow exponentially for an arbitrarily long time with an arbitrarily high probability. David Kelly (CIMS) Data assimilation April 15, 2016 24 / 29

Ensemble alignment can cause EnKF to gain energy, even when the model dissipates energy. Can we get around the problem by tweaking the algorithm? David Kelly (CIMS) Data assimilation April 15, 2016 25 / 29

Adaptive Covariance Inflation (Tong, Majda, K. 15) We modify algorithm by introducing a covariance inflation : where Ĉ n+1 Ĉ n+1 + λ n+1 I λ n+1 Θ n+1 1(Θ n+1 > Λ) Θ n+1 = 1 K y (k) n+1 K HΨ(u(k) n ) 2 k=1 and Λ is some constant threshold. If the predictions are near the observations, then there is no inflation. Thm. The modified EnKF inherits an energy principle from the model. E x Ψ(x) 2 α x 2 + β E n u (k) n+1 2 α u (k) n 2 + β Consequently, the modified EnKF is signal-filter ergodic. David Kelly (CIMS) Data assimilation April 15, 2016 26 / 29

Adaptive inflation schemes allows us to use cheap integration schemes in the forecast dynamics. These would usually lead to numerical blow-up. David Kelly (CIMS) Data assimilation April 15, 2016 27 / 29

0 10 20 30 40 50 60 70 80 90 100 80 70 EnKF EnKF AI 60 50 40 30 20 10 0 80 70 EnKF CI EnKF CAI 60 50 40 30 20 10 0 0 10 20 30 40 50 60 70 80 90 100 Figure: RMS error for EnKF on 5d Lorenz-96 with sparse obs (1 node), strong turbulence regime. Euler method with course step size. Lower panel has additional constant inflation which helps accuracy. Applicable to more sophisticated geophysical models, such as 2-layer QG with course graining (Lee, Majda 16 ). David Kelly (CIMS) Data assimilation April 15, 2016 28 / 29

References 1 - D. Kelly, K. Law & A. Stuart. Well-Posedness And Accuracy Of The Ensemble Kalman Filter In Discrete And Continuous Time. Nonlinearity (2014). 2 - D. Kelly, A. Majda & X. Tong. Concrete ensemble Kalman filters with rigorous catastrophic filter divergence. Proc. Nat. Acad. Sci. (2015). 3 - X. Tong, A. Majda & D. Kelly. Nonlinear stability and ergodicity of ensemble based Kalman filters. Nonlinearity (2016). 4 - X. Tong, A. Majda & D. Kelly. Nonlinear stability of the ensemble Kalman filter with adaptive covariance inflation. To appear in Comm. Math. Sci. (2016). All my slides are on my website (www.dtbkelly.com) Thank you! David Kelly (CIMS) Data assimilation April 15, 2016 29 / 29