Data assimilation Schrödinger s perspective Sebastian Reich (www.sfb1294.de) Universität Potsdam/ University of Reading IMS NUS, August 3, 218 Universität Potsdam/ University of Reading 1
Core components of DA Mean Field Equations Interacting Particle Systems Coupling of Measures Universität Potsdam/ University of Reading 2
Entropic couplings Universität Potsdam/ University of Reading 3
Ensemble prediction I Ensemble prediction system with M members: d dt Zi t = f (Zi t ), Zi π, i = 1,..., M. Source: ECMWF Universität Potsdam/ University of Reading 4
Ensemble prediction II Source: The quiet revolution of numerical weather prediction, Nature, 215 Universität Potsdam/ University of Reading 5
Ensemble based data assimilation Continuous in time assimilation of precipitation data y t : d dt Zi t = f (Zi t ) + α 1Q t (Z i t Z t) + α 2 K t (y t h(z i t )) Additional terms: Inflation: α 1 >, Q t R N z N z spd, Z t = 1 M i Z i t Nudging: α 2 >, gain matrix K t R N z N y, forward operator h. Universität Potsdam/ University of Reading 6
SDEs & inflation I Forward SDE X + π, t [, T], W + t dz + t = f t (Z + t )dt + γ1/2 dw + t, standard Brownian motion forward in time. Generates probability measure P [,T] over C([, T], R N z) with marginal densities π t, i.e. Z t π t. The same measure is generated by backward SDE dz t = b t (Z t )dt + γ1/2 dw t, W t Brownian motion backward in time, X T π T. It holds that b t (z) = f t (z) γ z log π t (z). Universität Potsdam/ University of Reading 7
SDEs & inflation II Fokker-Planck equation for marginals: t π t = z (π t f t ) + γ 2 zπ t = z (π t b t ) γ 2 zπ t = z (π t u t ) with u t (z) = 1 2 (f t(z) + b t (z)) = f t (z) γ 2 z log π t (z). Replace forward SDE by mean field equation d dt Z t = f t (Z t ) γ 2 z log π t (Z t ), Z π. Remark. Generates path measure Q [,T] which is different from SDE measure P [,T] ; only marginals π t agree! Universität Potsdam/ University of Reading 8
SDEs & inflation III Lagrangian interacting particles (Gaussian approximation to π t ): d dt Zi t = f t(z i t ) + γ 2 (P t) 1 (Z i t Z t), Z i π, i = 1,..., M, empirical covariance matrix Connection to inflation: P t = 1 M 1 (Z i t Z t)(z i t Z t) T. i Q t = (P t ) 1, α 1 = γ/2. Universität Potsdam/ University of Reading 9
References Daley, R., Atmospheric Data Analysis, Cambridge University Press, 1991. Nelson, E., Dynamical Theories of Brownian Motion, Princeton University Press, 1967. del Moral, P., Mean field simulation for Monte Carlo integration, CRC Press, 213. SR, Cotter, J., Probabilistic Forecasting and Bayesian Data Assimilation, Cambridge University Press, 215. Law, K. et al., Data Assimilation A Mathematical Introduction, Springer, 215. Qiang Liu, Stein Variational Gradient Descent as Gradient Flow, arxiv:174.752v2, 217. Universität Potsdam/ University of Reading 1
Recap: Assimilation of precipitation data Continuous in time assimilation of precipitation data y t : d dt Zi t = f (Zi t ) + α 1Q t (Z i t Z t) + α 2 K t (y t h(z i t )) Inflation: α 1 >, Q t R N z N z spd, Z t = 1 M i Z i t Nudging: α 2 >, gain matrix K t R N z N y, forward operator h. Universität Potsdam/ University of Reading 11
SDEs & nudging I Given a likelihood function T L(z [,T] ) := exp V t (z t )dt. For example V t (z) = β 2 h(z) y t 2. Bayes theorem (Radon Nikodym): d P [,T] dp [,T] (z [,T] ) := L(z [,T]) P [,T] [L]. The measure P [,T] solves the filtering/smoothing problem of SDE inference. Universität Potsdam/ University of Reading 12
SDEs & nudging II Mean field formulation: d Z t = f t ( Z t ) + P t z ψ t ( Z t ) dt + γdw t with the potential ψ t satisfying the elliptic PDE z ( π t P t z ψ t ) = π t (V t V t ) Z π = π, V t = π t [V t ], P t = cov ( Z t ). Universität Potsdam/ University of Reading 13
SDEs & nudging III If π t Gaussian and h(z) = Hz in V t, then P t z ψ t (z) = βp t H T y t Hz + HZ t. 2 Compare to nudging scheme: α 2 K t (y t Hz), i.e., K t = P t H T, β = α 2, but innovation different. Universität Potsdam/ University of Reading 14
Combining nudging and inflation Ensemble Kalman Bucy filter (Gaussian approximation to π t ): d dt Zi t = f t(z i t ) + γ 2 (P t) 1 (Z i t Z t) + βk t y t h(zi t ) + h t 2 Z i π, i = 1,..., M, empirical covariance matrices P t = 1 M 1 K t = 1 M 1 (Z i t Z t)(z i t Z t) T, i (Z i t Z t)(h(z i t ) h t) T. Remark. Feedback particle filter for likelihood with i V t (z)dt 1 2 h(z) 2 dt h(z) T dy t. Universität Potsdam/ University of Reading 15
References Moser, J., On the volume elements on a manifold, Trans. Amer. Math. Soc., 1965. SR, A dynamical systems framework for intermittent data assimilation, BIT, 21. Daum, F., Huang, J., Particle filter for nonlinear filters, ICASSP, IEEE, 211. Bergemann, K, SR, An ensemble Kalman Bucy filter for continuous time data assimilation, Meteorolog. Zeitschrift, 212. Yang, T. et al., Feedback particle filter, IEEE Trans. Autom. Control, 213. Taghvaei, A. et al., Kalman filter and its modern extensions for the continuous time nonlinear filtering problem, J. Dyn. Sys., Meas., and Control, 218. de Wiljes, et al., Long time stability and accuracy of the ensemble Kalman Bucy filter for fully observed processes and small measurement noise, SIAM J. Appl. Dyn. Sys., 218. Universität Potsdam/ University of Reading 16
Discrete time observations I State Model State Model State Assimilation Assimilation Assimilation Data Data Data time Universität Potsdam/ University of Reading 17
Discrete time observations II Discrete time observations: y tn = h(z tn ) + R 1/2 Ξ tn, n = 1,..., N. Likelihood function: L(z [,T] ) := exp 1 (y tn h(z tn )) T R 1 (y tn h(z tn )). 2 n Bayes: dp + [,T] (Z + L(Z [,T] dp ) := [,T] ) [,T] P [,T] [L]. Universität Potsdam/ University of Reading 18
Discrete time observations III For simplicity: single observation, i.e. N = 1, R = I, t 1 = T, L(z) = 1 2 y T h(z) 2. But keep recursive nature of sequential DA in mind! Four main players: last analysis: π forecast based on last analysis: π T new analysis at time t = T (Bayes, filtering distribution): π T smoothing distribution at t = : π Universität Potsdam/ University of Reading 19
Discrete time observations IV Prediction 1 Smoothing Schrödinger Filtering b Optimal Control b 1 Data y 1 t = t =1 Time Universität Potsdam/ University of Reading 2
Example: Gaussian mixture I π 1 is a Gaussian mixture, Gaussian likelihood, π 1 weighted Gaussian mixture..1 prior.5 prediction.8.4.6.3.4.2.2.1.2-1 -.5.5 1 space smoothing -4-2 2 4 space filtering 1.5.15.1.5 1.5-1 -.5.5 1 space -4-2 2 4 space Universität Potsdam/ University of Reading 21
Example: Gaussian mixture II 2.5 optimal control proposals 2.5 Schroedinger proposal 2 2 1.5 1.5 1 1.5.5-2 -1.5-1 -.5.5 1 1.5 space -2-1.5-1 -.5.5 1 1.5 space Applied to smoothing PDF π. Applied to π directly! Universität Potsdam/ University of Reading 22
Example: Brownian dynamics Scalar Brownian dynamics under a double well potential (γ =.5): 2 1 4 last analysis 4 1 4 forecast 1 initial mean observation 1.5 1 3 2 8.5 1 potential 6 4-5 5 4 1 4 smoother -5 5 6 1 4 new analysis 2 3 2 4 1 2-2 -1.5-1 -.5.5 1 1.5 2 variable z -5 5-5 5 The forecast and the new analysis at T =.5 are nearly singular with respect to each other. The relation between the last analysis (π ) and the smoother ( π ) is somewhat better. Universität Potsdam/ University of Reading 23
Discrete time observations III Forward-backward smoother iteration: Forward: Z + π. Yields π t. Backward: dz + t = f (Z + t )dt + γw + t, d Z t with Z T π T and Yields π t. = f ( Z t )dt γ z log π t ( Z t )dt + γw t, π T (z) L(z)π T (z). Smoother: d Z + t Z + π, Z + T π T. = f ( Z + t )dt + γ z log π t π t ( Z + t )dt + γw + t Universität Potsdam/ University of Reading 24
Example: Brownian dynamics II Scalar Brownian dynamics under a double well potential (γ =.5): 2 1 4 last analysis 4 1 4 forecast 1 initial mean observation 1.5 1 3 2 8.5 1 potential 6 4-5 5 4 1 4 smoother -5 5 6 1 4 new analysis 2 3 2 4 1 2-2 -1.5-1 -.5.5 1 1.5 2 variable z -5 5-5 5 The forward smoother SDE links the smoother measure π with π T. Still requires transforming π into π (but now at t = ). Universität Potsdam/ University of Reading 25
Schrödinger problem of DA I A different perspective on sequential DA: Schrödinger problem. Find the measure P [,T] which minimises the Kullback-Leibler divergence 2 1.5 1 1 4 last analysis 4 3 2 1 4 forecast P [,T] = arg inf Q P KL(Q [,T] P [,T] ) subject to the constraints π = q = π, π T = q T = π T..5-5 5 4 3 2 1 1 4 smoother -5 5 1-5 5 6 4 2 1 4 new analysis -5 5 The measure P [,T] is generated by a controlled SDE d Z + t = f ( Z + t )dt + u t( Z + t )dt + γdw + t. Universität Potsdam/ University of Reading 26
Schrödinger problem of DA II Find an initial distribution ϕ + and its evolution ϕ+ t under the forward SDE dz + t = f (Z + t ) dt + γdw + t such that the associated backward SDE dz = f (Z t t ) γ z log ϕ + t (Z t ) dt + γdw t with final condition ϕ T := π T leads to marginals ϕ t such that Then the control solves the Schrödinger problem. ϕ = π. u t = γ z log ϕ t ϕ + t Universität Potsdam/ University of Reading 27
Example I: Lorenz 63 Starting point: (i) Euler-Maruyama discretization Z n+1 = Z n + f (Z n ) t + (γ t) 1/2 Ξ n giving rise to transition kernel q + (z n+1 z n ). (ii) Given M samples z j n from Z n and M samples z i n+1 from Z n+1 define the M M matrix Q by q ij = q + (z i n+1 zj n ). (iii) Observations at time t n+1 lead to importance weights w i n+1. Universität Potsdam/ University of Reading 28
Example II: Lorenz 63 Schrödinger problem: subject to M P = arg min KL(P Q), KL(P Q) := p ij log p ij, P Π q i,j=1 ij Π = Application to DA P R M M : p ij, M p ij = 1/M, i=1 M p ij = w i j=1 a) Schrödinger resample: Use P to sample from π n+1 P[ Z j n+1 ] = zi n+1 ] = Mp ij. b) Schrödinger transform: Set M z j n+1 = M z i n+1 p ij i=1 + (γ t)ξ j. Universität Potsdam/ University of Reading 29
Example III: Lorenz 63 1-2 RMS errors as a function M RMSE 1-3 particle filter Schrodinger transform Schrodinger resample EnKBF.5.1.15.2 1/M Stochastic Lorenz 63, observations at every time step, observation error scaled with inverse time step. Universität Potsdam/ University of Reading 3
References Schrödinger, E., Über die Umkehrung der Naturgesetze, Sitzungsberichte der Preuss. Phys. Math. Klasse, 1932. Yongxin Chen et al., On the relation between optimal transport and Schrödinger bridges: A stochastic control viewpoint, J. Optim. Theory and Appl., 216. Fearnhead, P., Künsch, H.R., Particle filters and data assimilation, Annual Review of Statistics and Its Application, 218. Guarniero et al., The iterated auxiliary particle filter, J. American Statist. Assoc., 217. van Leeuwen, P.-J., Nonlinear data assimilation, Frontiers in Applied Dynamical Systems, Vol. 2, 215. Peyre, G., Cuturi, M., Computational Optimal Transport, arxiv:183.567, 218. SR, Data assimilation, Acta Numerica, 219, preliminary version on arxiv 187.8351. Universität Potsdam/ University of Reading 31
OT meets IS I Importance sampling (IS): Available realisations Z i T π T with importance weights w i π T π T (Z i T ). Optimal transport (OT): Instead of resampling, find coupling/transformation Z T = z ψ(z T ), Z T π T and Z π T. More abstractly, Z T (a) = Z T (a )δ(a a ψ(a))da, where A is some random reference variable. For example, A = Z T. Universität Potsdam/ University of Reading 32
OT meets IS II Replace the integral by a sum and formally write M Z j T = Z i T d ij i=1 Requires M 1 d ij = 1, d ij = w i. M i=1 j Select an "optimal" transformation through maximising correlation V(D) = 1 d ij Z i M T Zj T = 1 Z j T M Zj T. ij j In addition, either d ij (Ensemble Transform Particle Filter) or 1 M ( Z i M 1 T z T )( Z i T z M T ) T = w i (Z i T z T )(Z i T z T ) T i=1 i=1 (Nonlinear Ensemble Transform Filter). Universität Potsdam/ University of Reading 33
Numerical example I Lorenz-63 model, first component observed infrequently ( t =.12) and with large measurement noise (R = 8): Figure: RMSEs for various second-order accurate LETFs compared to the ETPF, the ESRF, and the SIR PF as a function of the sample size, M. Universität Potsdam/ University of Reading 34
Numerical example II Hybrid filter: P := P ESRF (α) P ETPF (1 α). Figure: RMSEs for hybrid ESRF (α = ) and 2nd-order corrected NETF/ETPF (α = 1) as a function of the sample size, M. Universität Potsdam/ University of Reading 35
References Evensen, G., Data assimilation the ensemble Kalman filter, Springer, 29. SR, A nonparametric ensemble transform method for Bayesian inference, SIAM J. Sci. Comput., 213. SR, Cotter, J., Probabilistic Forecasting and Bayesian Data Assimilation, Cambridge University Press, 215. Tödter J., Ahrens, B., A second order exact ensemble square root filter for nonlinear data assimilation, Month. Weather Rev., 215. Chustagulprom, N. et al., A hybrid ensemble transform particle filter for nonlinear and spatially extended systems, SIAM/ASA J. UQ, 216. Acevedo, W., et al., Second order accurate ensemble transform particle filters, SIAM J. Sci. Comput., 217. Fearnhead, P., Künsch, H.R., Particle filters and data assimilation, Annual Review of Statistics and Its Application, 218. Universität Potsdam/ University of Reading 36
Summary Continuous in time DA naturally leads to various interacting particle systems. Schrödinger problem provides an "optimal" mathematical framework for sequential DA with discrete in time observations. Numerical implementation nontrivial; good drift corrections can be derived using Gaussian approximations or kernel methods. Coupling arguments are central to derivation of interacting particle systems. Relevant to rare event simulations, optimal control problems and derivative free optimization. Universität Potsdam/ University of Reading 37
The End (source: J.B. Handelsman, New Yorker, 5/31/1969) Universität Potsdam/ University of Reading 38
Collaborators Walter Acevedo Kay Bergemann Yuan Cheng Nawinda Chustagulprom Colin Cotter Jana de Wiljes Prashant Mehta Wilhelm Stannat Amari Taghvaei Universität Potsdam/ University of Reading 39