Convergence of the Ensemble Kalman Filter in Hilbert Space

Similar documents
Convergence of Square Root Ensemble Kalman Filters in the Large Ensemble Limit

CONVERGENCE OF THE SQUARE ROOT ENSEMBLE KALMAN FILTER IN THE LARGE ENSEMBLE LIMIT

arxiv: v1 [physics.ao-ph] 23 Jan 2009

Smoothers: Types and Benchmarks

Convergence of the Square Root Ensemble Kalman Filter in the Large Ensemble Limit

Spectral and morphing ensemble Kalman filters

Stochastic Collocation Methods for Polynomial Chaos: Analysis and Applications

Lecture 2: From Linear Regression to Kalman Filter and Beyond

Morphing ensemble Kalman filter

Lagrangian Data Assimilation and Its Application to Geophysical Fluid Flows

A Brief Tutorial on the Ensemble Kalman Filter

HIGH-DIMENSIONAL DATA ASSIMILATION AND MORPHING ENSEMBLE KALMAN FILTERS WITH APPLICATIONS IN WILDFIRE MODELING. Jonathan D.

Analysis Scheme in the Ensemble Kalman Filter

Vector spaces. DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis.

A Stochastic Collocation based. for Data Assimilation

Lagrangian Data Assimilation and Manifold Detection for a Point-Vortex Model. David Darmon, AMSC Kayo Ide, AOSC, IPST, CSCAMM, ESSIC

Fundamentals of Data Assimilation

Spectral and morphing ensemble Kalman filters

Lecture 2: From Linear Regression to Kalman Filter and Beyond

Performance of ensemble Kalman filters with small ensembles

SPECTRAL AND MORPHING ENSEMBLE KALMAN FILTERS AND APPLICATIONS

Seminar: Data Assimilation

Ergodicity in data assimilation methods

Lecture Notes 1: Vector spaces

New Fast Kalman filter method

Nonlinear error dynamics for cycled data assimilation methods

4. DATA ASSIMILATION FUNDAMENTALS

Robust Ensemble Filtering With Improved Storm Surge Forecasting

Ensemble Kalman Filter

Efficient Implementation of the Ensemble Kalman Filter

Sequential Monte Carlo Samplers for Applications in High Dimensions

Par$cle Filters Part I: Theory. Peter Jan van Leeuwen Data- Assimila$on Research Centre DARC University of Reading

(Extended) Kalman Filter

EnKF and filter divergence

Particle filters, the optimal proposal and high-dimensional systems

The Ensemble Kalman Filter:

Bayesian Inverse problem, Data assimilation and Localization

The Ensemble Kalman Filter:

A Spectral Approach to Linear Bayesian Updating

Ensemble square-root filters

Fundamentals of Data Assimila1on

Lecture 7: Optimal Smoothing

Data assimilation in high dimensions

What do we know about EnKF?

4 Derivations of the Discrete-Time Kalman Filter

The Kalman Filter ImPr Talk

Data assimilation in high dimensions

Kalman Filter and Ensemble Kalman Filter

Data assimilation with and without a model

Gaussian Filtering Strategies for Nonlinear Systems

A Note on the Particle Filter with Posterior Gaussian Resampling

Data assimilation; comparison of 4D-Var and LETKF smoothers

Gaussian Process Approximations of Stochastic Differential Equations

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 13: SEQUENTIAL DATA

Adaptive ensemble Kalman filtering of nonlinear systems

Convergence Rates of Kernel Quadrature Rules

Asynchronous data assimilation

A new Hierarchical Bayes approach to ensemble-variational data assimilation

COVARIANCE IDENTITIES AND MIXING OF RANDOM TRANSFORMATIONS ON THE WIENER SPACE

Lagrangian Data Assimilation and its Applications to Geophysical Fluid Flows

Data assimilation with and without a model

Karhunen-Loève decomposition of Gaussian measures on Banach spaces

Assimilation des Observations et Traitement des Incertitudes en Météorologie

Data assimilation Schrödinger s perspective

Strong uniqueness for stochastic evolution equations with possibly unbounded measurable drift term

The Rademacher Cotype of Operators from l N

Application of the Ensemble Kalman Filter to History Matching

Alternative Characterizations of Markov Processes

Short tutorial on data assimilation

Local Ensemble Transform Kalman Filter

Large deviations and averaging for systems of slow fast stochastic reaction diffusion equations.

Data assimilation in the geosciences An overview

PREDICTOR-CORRECTOR AND MORPHING ENSEMBLE FILTERS FOR THE ASSIMILATION OF SPARSE DATA INTO HIGH-DIMENSIONAL NONLINEAR SYSTEMS

A data-driven method for improving the correlation estimation in serial ensemble Kalman filter

Dynamic System Identification using HDMR-Bayesian Technique

9 Brownian Motion: Construction

Practical Aspects of Ensemble-based Kalman Filters

Linear-Quadratic-Gaussian (LQG) Controllers and Kalman Filters

Iterative Solution of a Matrix Riccati Equation Arising in Stochastic Control

Ensemble Kalman filters, Sequential Importance Resampling and beyond

Organization. I MCMC discussion. I project talks. I Lecture.

Methods of Data Assimilation and Comparisons for Lagrangian Data

Fundamentals of Data Assimila1on

Gaussian Process Approximations of Stochastic Differential Equations

Almost Invariant Half-Spaces of Operators on Banach Spaces

Time Series Prediction by Kalman Smoother with Cross-Validated Noise Density

Local Ensemble Transform Kalman Filter: An Efficient Scheme for Assimilating Atmospheric Data

Addressing the nonlinear problem of low order clustering in deterministic filters by using mean-preserving non-symmetric solutions of the ETKF

Lecture 6: Multiple Model Filtering, Particle Filtering and Other Approximations

Spectral Ensemble Kalman Filters

Constrained State Estimation Using the Unscented Kalman Filter

Introduction to Ensemble Kalman Filters and the Data Assimilation Research Testbed

CIS 390 Fall 2016 Robotics: Planning and Perception Final Review Questions

State Estimation using Moving Horizon Estimation and Particle Filtering

Lecture 4: Extended Kalman filter and Statistically Linearized Filter

Accepted in Tellus A 2 October, *Correspondence

Vector measures of bounded γ-variation and stochastic integrals

Forecasting and data assimilation

Particle Filters. Outline

9 Multi-Model State Estimation

Transcription:

Convergence of the Ensemble Kalman Filter in Hilbert Space Jan Mandel Center for Computational Mathematics Department of Mathematical and Statistical Sciences University of Colorado Denver Parts based on joint work with Loren Cobb and Jonathan Beezley http://math.ucdenver.edu/~jmandel/papers/enkf-theory.pdf http://arxiv.org/abs/0901.2951 Supported by SF grant EGS-0835579 and Fondation STAE project ADTAO. Toulouse, June 14, 2012

Data assimilation - update cycle Goal: Inject data into a running model update advance in time update forecast analysis forecast analysis... data data Kalman filter used already in Apollo 11 navigation, now in GPS

Bayesian approach discrete time filtering = sequential statistical estimation Model state is a random variable U with probability density p (u) Data + measurement error enter as data likelihood p (d u) Probability densities updated by Bayes theorem: p (u) p (d u) p f (u) means proportional Bayesian update cycle time k 1 analysis U (k 1) p (k 1) (u) advance model time k forecast U (k),f p (k 1),f (u) Bayesian update p (u) p (d u) p f (u) time k analysis U (k) p (k) (u) data likelihood p (d u)

Kalman filter Model state U is represented by its mean and covariance. Assume that the forecast is Gaussian, U f (m f, Q f ), and the data likelihood is also Gaussian, p (d u) e (Hu d)t R 1 (Hu d)/2. Then the analysis is also Gaussian, U (m, Q), given by m = m f + K(d Hm f ), Q = (I KH)Q f where K = Q f H T (HQ f H T + R) 1 is the gain matrix

Kalman filter Hard to advance the covariance matrix when the model is nonlinear Modifications of the code needed tangential and adjoint models eeds to maintain the covariance matrix - can t do for large problems But the dimension of the state space is large - discretizations of PDEs

Ensemble Kalman filter (EnKF) represent model state by an ensemble, replace the forecast covariance by sample covariance computed from the forecast ensemble Q = 1 ( i=1 f X i X ) ( f X i X ) T, X = 1 i=1 X f i First idea (Evensen 1994): apply Kalman formula to the ensemble members: X i = X f i + K (d HX f i ), K = Q f HT (HQ f HT + R) 1 But the analysis covariance was wrong too small even if Q = Q were exact Fix: perturb data by sampling from data error distribution: replace d by D i = d + E i, i.i.d. E i (0, R) (Burgers, Evensen, van Leeuwen, 1998)

The EnKF with data perturbation is correct if the covariance is Lemma. Let U f (m f, Q f ), D = d + E, E (0, R) and U=U f +K(D-HU f ), K=Q f H T (HQ f H T +R) 1. Then U (m, Q) (i.e. U has the correct analysis distribution from the Bayes theorem and the Kalman filter). Proof. Computation in Burgers, Evensen, van Leeuwen, 1998. Corollary. If a forecast ensemble U f i is a sample from the forecast distribution, and the exact covariance is used, then the analysis ensemble U i is a sample from the analysis distribution.

EnKF properties The model is needed only as a black box. The EnKF was derived for Gaussian distributions but the algorithm does not depend on this. So it is often used for nonlinear models and non-gaussian distributions anyway. The EnKF was designed so that the ensemble is a sample from the correct analysis distribution if - the forecast ensemble is i.i.d. and Gaussian - the covariance matrix is exact But - the sample covariance matrix is a random element - the ensemble is not independent: sample covariance ties it together - the ensemble is not Gaussian: the update is a nonlinear mapping

Convergence of EnKF in the large ensemble limit Law of large numbers to guarantee that the EnKF gives correct results for large ensembles, in the gaussian case: Le Gland et al. (2011) Mandel et al (2011). Similar arguments. Here we simplify the approach from Mandel et al (2011) and show that it carries over to infinitely dimensional Hilbert space. Infinite dimensional case is important as the limit case of high dimensional state space.

Convergence analysis overview For a large ensemble - the sample covariance converges to the exact covariance - the ensemble is asymptotically i.i.d. - the ensemble is asymptotically Gaussian - the ensemble members converge to the correct distribution How? - the ensemble is exchangeable: plays the role of independent - the ensemble is a-priori bounded in L p : plays the role of Gaussian - use the weak law of large numbers for the sample covariance - the ensemble converges to a Gaussian i.i.d. ensemble obtained with the exact covariance matrix instead of the sample covariance ote: So far, we are still in a fixed, finite dimension. We ll see later what survives in the infinitely dimensional case.

Tools otation: X is the standard Euclidean norm in R n. If X : Ω R n is a random element, X p = E ( X p ) 1/p is its stochastic L p norm, 1 p <. X j P X (in probability) if ε > 0 : P ( Xj X ε ) 0. Continuous mapping theorem: if X k g(x j ) P g (X). P X and g is continuous, then L p convergence implies convergence in probability.

Tools Uniform integrability: if X j X k X in L q 1 q < p. P X and {Xj } is bounded in L p, then Weak law of large numbers: if X i L 2 are i.i.d., then 1 i=1 X i E (X 1 ) in L 2 as. Set of random elements {X 1,..., X } is exchangeable if their joint distribution is invariant to a permutation of their indices. A set that is i.i.d. is exchangeable. Conversely, exchangeable random elements are identically distributed, but not necessarily independent.

Convergence of the EnKF to the Kalman filter Generate the initial ensemble X (0) = [X(0) 1,..., X(0) ] i.i.d. and Gaussian. Run EnKF, get ensembles [X (k) 1,..., X(k) ],advance in time by linear models: X k+1,f i = M k Xi k + f k. For theoretical purposes, define the reference ensembles [U (k) 1,..., U (k) by U (0) = X(0) (k) and U by the EnKF, except with the exact covariance of the filtering distribution instead of the sample covariance. We already know that U (k) is a sample (i.i.d.) from the Gaussian filtering distribution (as it would be delivered by the Kalman filter). We will show by induction that X (k) i U (k) i in Lp for all 1 p < as, for all analysis cycles k. In which sense? We need this for all i = 1,..., but U i change and changes too. ]

Exchangeability of EnKF ensembles Lemma. All ensembles [X (k) 1,..., X(k) ] generated by the EnKF are exchangeable. Proof: The initial ensemble [X (0) 1,..., X(0) (0) ] = [U 1,..., U (0) ] is i.i.d., thus exchangeable, and the analysis step is permutation invariant. Corollary The random elements X(k) 1 U (k) 1,..., X(k) U (k) are also exchangeable. (Recall that U i are obtained using the exact covariance from the Kalman filter rather than the sample covariance.) Corollary. [ X (k) 1 U (k) 1,..., X(k) U (k) ] so we only need to consider the convergence are identically distributed, X(k) 1 U (k) 1 0. p

Estimates for the covariance Write the -th sample covariance of [X 1, X 2,...], as C (X) = 1 i=1 X i X T i 1 X i i=1 1 T X i i=1 Bound If X i L 2p are identically distributed, then C (X) p X 1 X1 T + X p 1 2 p X 1 2 2p + X 1 2 p Continuity of the sample covariance: If [X i, U i ] L 4 are identically distributed, then C (X) C (U) 2 8 X 1 U 1 4 X 1 2 4 + U 1 2 4

Estimates for the covariance L 2 law of large numbers for the sample covariance: If U i L 4, i = 1, 2,... are i.i.d., then C (X) C (U) 2 0,. Proof. X i X T i L 2 are i.i.d. and the weak law of large numbers applies. Corollary If [X i, U i ] L 4 are identically distributed, U i are independent, and X 1 U 1 4 0 as, then C (X) Cov (U 1 ) in L 2, thus also. C (X) P Cov (U 1 )

A-priori L p bounds on EnKF ensembles Lemma. All EnKF ensembles [X (k) 1,..., X(k) ] are bounded in Lp, X (k) 1 p C k,p with constants independent of the ensemble size, for all 1 p <. Proof: By induction over step index k. Recall that X (k),f X (k) K (k) = X(k),f = Q(k),f + K (k) (D(k) H(k) X (k),f ) H(k)T (H (k) Q (k),f H(k)T + R (k) ) 1 Bound the sample covariance Q (k),f p from the L 2p bound on Bound the matrix norm (H (k) Q (k),f H(k)T + R (k) ) 1 const from R (k) > 0 (positive definite).

Convergence of the EnKF is now simple: Recall: X (k) (k) i = EnKF ensemble, U i = reference i.i.d. ensemble from exact covariances. By induction over step index k: X (k),f 1 U (k),f 1 in L p as, 1 p < ) ( P Q (k),f = Cov Sample covariance Q (k),f = C ( X (k),f Continuous mapping theorem = Kalman gain Q (k) P U (k) 1. X (k) 1 Linearly advancing in time: X (k+1),f P 1 U (k+1),f 1 A-priori L p bound and uniform integrability = X (k+1),f in L p 1 p < 1 U (k),f 1 ) P K (k) and U (k+1),f 1

Towards high-dimension asymptotics We have proved convergence in large sample limit for arbitrary but fixed state space dimension. This says nothing about the speed of convergence as the state dimension increases. Curse of dimensionality: slow convergence in high dimension. Why and when is it happening? With arbitrary probability distributions, sure. But probability distributions in practice are not arbitrary: the state is a discretization of a random function. Random functions are random elements with values in infinitely dimensional spaces. State dimension is just from the mesh size. The smoother the functions, the faster the EnKF convergence. As a step towards convergence uniform in state space dimension, look at the infinitely dimensional case first - just like in numerical analysis where we study the PDE first a PDE and only then its numerical approximation

Bayes theorem in an infinitely dimensional Hilbert space Model state is a probability measure µ on a separable Hilbert space V with inner product,. Avoid densities: Bayes theorem p (u) p (d u) p f (u) becomes A µ (A) = p (d u) du(µ f) V p (d u) du(µ µ f -measurable A f) eed H p (d u) du(µ f) > 0 for the Bayes theorem to work. Data likelihood p (d u) > 0 for all u from some open set in V is enough. Gaussian data likelihood: ( p (d u) = exp 1 R 1/2 (Hu d) 2 2) if Hu d Im(R 1/2 ), 0 otherwise. In particular, R = I works just fine. But R cannot be covariance of a probability measure if the data space is infinitely dimensional.

Probability on an infinitely dimensional Hilbert space The mean of a random element X is the weak integral: E (X) V, v, E (X) = E ( v, X ) v V. Tensor product of vectors x, y V was xy T, now it is the bounded linear operator x y [V ], (x y) v = x y, v v V Covariance of a random element X becomes Cov (X) = E ((X E (X)) (X E (X))) = E (X X) E (X) E (X) Covariance of a random elemement on V must be a compact operator of trace class: its eigenvalues λ n must satisfy n=1 λ n <.

Curse of dimensionality? ot for probability measures! 10 2 SIR, = 10, m = 25 10 2 EnKF, = 10, m = 25 filter error 10 1 10 0 filter error 10 1 10 0 10 1 10 1 10 1 10 2 10 3 model size 10 2 10 1 10 2 10 3 model size Constant covariance eigenvalues λ n = 1 and the inverse law λ n = 1/n are not probability measures in the limit because n=1 λ n =. Inverse square law λ n = 1/n 2 gives a probability measure because n=1 λ n <. m=25 uniformly sampled data points from 1D state, =10 ensemble members. From Beezley (2009), Fig. 4.7.

The L 2 and the weak laws of large numbers The definition of L p (Ω, V ) = {u : E ( u p < )} and u p carry over. The L 2 law of large numbers for sample mean E = 1 i=1 X i carries over: Let X k L 2 (Ω, V ) be i.i.d., then E E (X 1 ) 2 1 X 1 E (X 1 ) 2 2 X 1 2, and consequently E P E (X1 ). But L p laws of large numbers do not generally hold on Banach spaces (in fact define Rademacher type p spaces), and the space [V ] of all bounded linear operators on a Hilbert space is not a Hilbert space but a quite general Banach space.

Sample covariance The definition carries over from finite dimension: C ([X 1,..., X ]) = 1 i=1 X i X i 1 X i i=1 1 X i i=1 But we do not have the L 2 (Ω, [V ]) law of large numbers for the sample covariance. Instead use L 2 (Ω, V HS ) with V HS the space of all Hilbert- Schmidt operators on V and the norm A 2 HS = n=1 Ae n, Ae n where {e n } is any complete orthonormal sequence in V. ow the L 2 law of large numbers applies in the Hilbert space V HS, and A [V ] A HS, so we recover C ([X 1,..., X ]) Cov X 1 in L 2 (Ω, [V ]).

Convergence of EnKF in a Hilbert space A-priori L p bounds on EnKF ensembles: We need only (H(k) Q (k),f H(k) + R (k) ) 1 [V ] < const then the bound caries over. This is true when the data error covariance has spectrum bounded below, R (k) ci > 0 - such as R (k) = I. This was good for the Bayes theorem already. With these few changes the whole proof carries over, avoiding entry-byentry arguments. But not all is good - the formulation of the EnKF algorithm now involves data perturbation by white noise. And white noise is not a random element with values in the Hilbert space, at least not in the usual sense.

EnKF in a Hilbert space with white noise data perturbation A weak random element (Balakrishnan 1976) has finitely additive not necessarily σ-additive distribution, which allows identity covariance (white noise). If the covariance of a gaussian weak random element is of trace class, the distribution is in fact σ-additive and the weak random element is a random element in the usual sense. So let D = d + E, E is weak random element (0, R), U f ( m, Q f ) and U = U f + K(D HU), with K = QH (H QH + R) 1 = the analysis U is also a weak random variable with the analysis distribution, (m, Q), Q = (I KH)Q trace class = the distribution of the analysis U is a gaussian probability measure.

Convergence of the EnKF is now simple: Recall: X (k) (k) i = EnKF ensemble, U i = reference i.i.d. ensemble from exact covariances. By induction over step index k: X (k),f 1 U (k),f 1 in L p as, 1 p < ) ( P Q (k),f = Cov Sample covariance Q (k),f = C ( X (k),f Continuous mapping theorem = Kalman gain Q (k) P U (k) 1. X (k) 1 Linearly advancing in time: X (k+1),f P 1 U (k+1),f 1 A-priori L p bound and uniform integrability = X (k+1),f in L p 1 p < 1 U (k),f 1 ) P K (k) and U (k+1),f 1

Conclusion With a bit of care, a simplified proof in finite dimension caries over to infinitely dimensional separable Hilbert space. White noise data error is good. White noise state space is bad. Computational experiments confirm that EnKF converges uniformly for high-dimensional distributions that approximate a Gaussian measure on Hilbert space. (J. Beezley, Ph.D. thesis, 2009). EnKF for distributions with slowly decaying eigenvalues converges very slowly and requires huge ensembles. The same is seen for particle filters... in spite of the fact that particle filters for problems without a structure fail in high dimension. Square root EnKF has no data perturbation - but the ensemble may not be exchangeable. ext: uniform convergence for high-dimensional problems that approximate an infinitely dimensional one... just like numerical PDEs?

References [1] A. V. Balakrishnan. Applied functional analysis. Springer-Verlag, ew York, 1976. Applications of Mathematics, o. 3. [2] Jonathan D. Beezley. High-Dimensional Data Assimilation and Morphing Ensemble Kalman Filters with Applications in Wildfire Modeling. PhD thesis, University of Colorado Denver, 2009. [3] Gerrit Burgers, Peter Jan van Leeuwen, and Geir Evensen. Analysis scheme in the ensemble Kalman filter. Monthly Weather Review, 126:1719 1724, 1998. [4] Giuseppe Da Prato. An introduction to infinite-dimensional analysis. Springer-Verlag, Berlin, 2006. [5] Geir Evensen. Sequential data assimilation with nonlinear quasi-geostrophic model using Monte Carlo methods to forecast error statistics. Journal of Geophysical Research, 99 (C5)(10):143 162, 1994. [6] François Le Gland, Valérie Monbet, and Vu-Duc Tran. Large sample asymptotics for the ensemble Kalman filter. In Dan Crisan and Boris Rozovskii, editors, The Oxford Handbook of onlinear Filtering. Oxford University Press, 2011. IRIA Report 7014, August 2009. [7] Michel Ledoux and Michel Talagrand. Probability in Banach spaces. Ergebnisse der Mathematik und ihrer Grenzgebiete (3), Vol. 23. Springer-Verlag, Berlin, 1991. [8] Jan Mandel, Loren Cobb, and Jonathan D. Beezley. On the convergence of the ensemble Kalman filter. Applications of Mathematics, 56:533 541, 2011. arxiv:0901.2951, January 2009.