Particle Filters in High Dimensions

Particle Filters in High Dimensions Dan Crisan Imperial College London Bayesian Computation for High-Dimensional Statistical Models Opening Workshop 27-31 August 2018 Dan Crisan (Imperial College London) Particle Filters in High Dimensions 27 August 2018 1 / 39

. Talk Synopsis Stochastic Filtering Particle filters/sequential Monte Carlo methods Tempering/Model Reduction Case study Motivation: Particle Filters for Numerical Weather Prediction A stochastic transport model and the associated filtering problem Particle filter (initial condition, add-on procedures, parameter choice) Final remarks Research partially supported by EPSRC grant EP/N023781/1. Numerical work done by Wei Pan (Imperial College London). C.J. Cotter, DC, D.D. Holm, W. Pan, I. Shevchenko, Numerically Modelling Stochastic Lie Transport in Fluid Dynamics, https://arxiv.org/abs/1801.09729 C.J. Cotter, DC, D.D. Holm, W. Pan, I. Shevchenko, Sequential Monte Carlo for Stochastic Advection by Lie Transport (SALT): A case study for the damped and forced incompressible 2D stochastic Euler equation, in preparation. A. Beskos, DC, A. Jasra, Ajay; K. Kamatani, Y. Zhou, Y A stable particle filter for a class of high-dimensional state-space models. Adv. in Appl. Probab. 49 (2017). A. Beskos, DC, A. Jasra, On the stability of sequential Monte Carlo methods in high dimensions, Ann. Appl. Probab. 24 (2014). Dan Crisan (Imperial College London) Particle Filters in High Dimensions 27 August 2018 2 / 39

. What is stochastic filtering? Stochastic Filtering: The process of using partial observations and a stochastic model to make inferences about an evolving dynamical system. X the signal process - hidden component Y the observation process - the data Dan Crisan (Imperial College London) Particle Filters in High Dimensions 27 August 2018 3 / 39

. What is stochastic filtering? The filtering problem: Find the conditional distribution of the signal X t given Y t = σ(y s, s [0, t]), i.e., π t (A) = P(X t A Y t ), t 0, A B(R d ). Discrete framework: {X t, Y t } t 0 Markov process The signal process {X t } t 0 Markov chain, X 0 π 0 (dx 0 ) P (X t dx t X t 1 = x t 1 ) = K t (x t 1, dx t ) = f t (x t x t 1 )dt, Example : X t = b (X t 1 ) + σ (X t 1 ) B t, The observation process B t N (0, 1) i.i.d. P ( Y t dy t X [0,t] = x [0,t], Y [0,t 1] = y [0,t 1] ) = P (Yt dy t X t = x t ) = g t (y t x t )dy Example : Y t = h (X t ) + V t, where X [0,t] (X 0,..., X t ), x [0,t] (x 0,..., x t ). V t N (0, 1) i.i.d. Dan Crisan (Imperial College London) Particle Filters in High Dimensions 27 August 2018 4 / 39

. What is stochastic filtering? Notation: posterior measure: the conditional distribution of the signal X t given Y t π t (A) = P(X t A Y t ), t 0, A B(R d ). predictive measure: the conditional distribution of the signal X t given Y t 1 p t (A) = P(X t A Y t 1 ), t 0, A B(R d ). If μ is a measure and k is a kernel, then μk (A) μ (dx) k (x, A). Bayes recursion. Prediction Updating p t = π t 1 K t π t = g t p t In other words, dπt dp t = C 1 t g t, where C t R d g t (y t, x t ) p t (dx t ). Dan Crisan (Imperial College London) Particle Filters in High Dimensions 27 August 2018 5 / 39

. Particle filters 1. Class of approximations: Particle filters/sequential Monte Carlo Methods: SMC (a j (t), vj 1 (t),..., vj d (t)) N j=1 }{{}}{{} weight position π t πt N = N j=1 a j (t) δ vj (t) 2. The law of evolution of the approximation: π N t 1 SMC selection {}}{ pt N {}}{ πt N mutation 3. The measure of the approximating error: A numerical example sup E [ πt n (ϕ) π t (ϕ) ], ˆπ t ˆπ t n. ϕ B(R d ) Dan Crisan (Imperial College London) Particle Filters in High Dimensions 27 August 2018 6 / 39

. Particle filters Remarks: Particle filters are recursive algorithms: The approximation for π t and Y t+1 are the only information used in order to obtain the approximation for π t+1. In other words, the information gained from Y 1,..., Y t is embedded in the current approximation. The standard particle filter involves sampling from the prior distribution of the signal and then using a weighted bootstrap technique (or equivalent) with weights defined by the likelihood of the most recent observation data. If d is small to moderate, then the standard particle filter can perform very well in the time parameter t. The function x k g(x k, y k ) can convey a lot of information about the signal, especially so in high dimensions. If this is the case, using the prior transition kernel f (x k 1, x k ) as proposal will be ineffective and standard particle filter will typically perform poorly often requiring that N = O(κ d ). Dan Crisan (Imperial College London) Particle Filters in High Dimensions 27 August 2018 7 / 39

. Particle filters Wall clock time per time step (seconds) 10 2 10 2.5 10 3 10 3.5 Algorithm PF STPF 5 10 15 20 25 30 Dimension Figure: Computational cost per time step to achieve a predetermined RMSE versus model dimension, for the standard particle filter and the space-time PF. Add-on procedures are required to resolve high-dimensional filtering problems: Tempering * OT prior posterior Sequential DA in space Jittering * Model Reduction (High Low Res)* Nudging Hybrid models Hamiltonian Monte Carlo Informed priors Localization Dan Crisan (Imperial College London) Particle Filters in High Dimensions 27 August 2018 8 / 39

. Tempering A tempering procedure Framework: For i = 1 to d {X t } t 0 Markov chain P (X t dx t X t 1 = x t 1 ) = f t (x t x t 1 )dx t, {X t, Y t } t 0 P (Y t dy t X t = x t ) = g t (y t x t )dy t reweight the particle using g 1 d t and (possibly) resample from it move particles using an MCMC that leaves g k d t f t π [0,t 1] invariant Beskos, DC, Jasra, On the stability of SMC methods in high dimensions, 2014. Kantas, Beskos, Jasra, Sequential Monte Carlo for inverse problems, 2014. Dan Crisan (Imperial College London) Particle Filters in High Dimensions 27 August 2018 9 / 39

. Model Reduction Model Reduction (High Low Res)* Model reduction is a valuable methodology that can lead to substantial computational savings This is the case when the order reduction is performed through a coarsening of the grid used for the numerical algorithm that approximates the dynamical system. For the following example we perform and state space order reduction from 2 513 2 0.5 10 6 to 2 65 2 = 8450. If applied erroneously, order reduction can introduce large errors. Model reduction can be theoretically justified through the continuity of the conditional distribution of the signal on (π 0, g 1,..., g t, K 1,..., K t ). Dan Crisan (Imperial College London) Particle Filters in High Dimensions 27 August 2018 10 / 39

. Model Reduction Recall that the recursion formula for the conditional distribution of the signal where dπt dp t p t = π t 1 K t π t = g t p t, = C 1 t g t, where C t R d g t (y t, x t ) p t (dx t ). Remark. The conditional distribution of the signal is a continuous function of (π 0, g 1,..., g t, K 1,..., K t ). In other words if lim ε 0 (πε 0, g1, ε..., gt ε, K1 ε,..., Kt ε ) = (π 0, g 1,..., g t, K 1,..., K t ) in a suitably chosen topology and define p ε t Δ = π ε t 1K ε t π ε t Δ = g ε t p ε t, (1) then lim ε 0 πt ε = π t and lim ε 0 pt ε = p t (again, in a suitably chosen topology). NB. Note that πt ε is no longer the solution of a filtering problem, but simply the solution of the iteration (1) Dan Crisan (Imperial College London) Particle Filters in High Dimensions 27 August 2018 11 / 39

. Application to high-dimensional problems A Case Study Dan Crisan (Imperial College London) Particle Filters in High Dimensions 27 August 2018 12 / 39

. Motivation State estimation in Numerical Weather Prediction Data Assimilation for NWP set of methodologies that combines past knowledge of a system in the form of a numerical model with new information about that system in the form of observations of that system. designed to improve forecasting, reduce model uncertainties and adjust model parameters. termen used mainly in the computational geoscience community major component of Numerical Weather Prediction Variational DA: combines the model and the data through the optimisation of a given criterion (minimisation of a so-called cost-function). Ensemble based DA: uses a set of model trajectories/possible scenarios that are intermittently updated according to data and are used to infer the past, current or future position of a system. Hurricane Irma forecast: a. ECMWF, b. USA Global Forecast Dan Crisan (Imperial College London) Particle Filters in High Dimensions 27 August 2018 13 / 39

. Motivation Numerical weather prediction requires massive computing capabilities 14,000 trillion calculations per second 2 petabytes of memory 460,000 compute cores. 24 petabytes of storage for saving data Dimension of the state space O(10 9 ) Dimension of the observation space O(10 7 ) Our computing capability The Cray XC40 supercomputer (source: Met Office website) 2x Intel Xeon CPU @ 2.10GHz 32 logical cores 2x Nvidia Quadro M4000 64GB memory Dimension of the state space 513 513 2 Dimension of the observation space 9 9 2 The Beast (source: Wei s office) Dan Crisan (Imperial College London) Particle Filters in High Dimensions 27 August 2018 14 / 39

. A stochastic transport model Example: fluid transport dynamics Consider a two dimensional incompressible fluid flow u defined on 2D-torus Ω = [0, L x ] [0, L y ] modelled by the two-dimensional Euler equations with forcing and dampening. Let q = ẑ curl u denote the vorticity of u, where ẑ denotes the z-axis. For a scalar field g : Ω R, we write g = ( y g, x g) T. Let ψ : Ω [0, ) R denote the stream function. t q + (u ) q = Q rq u = ψ Δψ = q. Q is the forcing term given by Q = 0.1 sin (8πx) r is a positive constant - the large scale dissipation time scale. we consider a slip flow boundary condition ψ Ω = 0. evolution of Lagrangian fluid parcels dx t dt = u(x t, t). Dan Crisan (Imperial College London) Particle Filters in High Dimensions 27 August 2018 15 / 39

. A stochastic transport model The filtering problem Dan Crisan (Imperial College London) Particle Filters in High Dimensions 27 August 2018 16 / 39

. A stochastic transport model Signal: A discrete space/time approximation of the PDE/SPDE on the domain [0, 1] 2 PDE System SPDE System t ω + u ω = Q rω dq + ū qdt + i ξ i q dw i t = (Q rq) dt u = ψ ū = ψ Δψ = ω Δ ψ = q with Q = 0.1 sin (8πx), r = 0.01. Boundary Condition ψ Ω = 0. The space/time approximation is generated using a mixed continuous and discontinuous Galerkin finite element scheme + an optimal third order strong stability preserving Runge-Kutta, [Bernsen et al 2006, Gottlieb 2005]. PDE SPDE Grid Resolution 512x512 64x64 Numerical Δt 0.0025 0.01 Spin-up 40 ett ett: eddy turnover time L/u L 2.5 time units. Dan Crisan (Imperial College London) Particle Filters in High Dimensions 27 August 2018 17 / 39

Initial configuration for the vorticity. A stochastic transport model ω spin = sin(8πx) sin(8πy) + 0.4 cos(6πx) cos(6πy) + 0.3 cos(10πx) cos(4πy) + 0.02 sin(2πy) + 0.02 sin(2πx) from which we spin up the system until an energy equilibrium state seems to have been reached. This equilibrium state, denoted by ω initial, is then chosen as the initial condition. (2) Dan Crisan (Imperial College London) Particle Filters in High Dimensions 27 August 2018 18 / 39

. A stochastic transport model Plot of the numerical PDE solution at the initial time t initial and its coarse-grained version done via spatial averaging and projection of the fine grid stream-function to the coarse grid. Dan Crisan (Imperial College London) Particle Filters in High Dimensions 27 August 2018 19 / 39

. A stochastic transport model Plot of the numerical PDE solution at the final time t = t initial + 146 large eddy turnover times (ett). The coarse-graining is done via spatial averaging and projection of the fine grid streamfunction to the coarse grid. Dan Crisan (Imperial College London) Particle Filters in High Dimensions 27 August 2018 20 / 39

. A stochastic transport model Observations: u is observed on a subgrid of the signal grid (9 9 points) { u SPDE t (x) + αz x, z x N (0, 1) Experiment 1 Y t (x) = u PDE t (x) + αz x, z x N (0, 1) Experiment 2 α is calibrated to the standard deviation of the true solution over a coarse grid cell. Dan Crisan (Imperial College London) Particle Filters in High Dimensions 27 August 2018 21 / 39

. SIR fails Histogram of weights The Standard Particle Filter fails! Figure: example: loglikelihoods histogram, period 1 ett, 100 particles Dan Crisan (Imperial College London) Particle Filters in High Dimensions 27 August 2018 22 / 39

. SIR fails A (non-standard) particle filter Dan Crisan (Imperial College London) Particle Filters in High Dimensions 27 August 2018 23 / 39

. Initial condition Initial Condition A good choice of the initial condition is esential for the successful implementation of the filter. In practice it is a reflection of the level of uncertainty of the estimate of initial position of the dynamical system. We use the initial condition is to obtain an ensemble which contain particles that are reasonably close to the truth. Choice for the running example deformation - physically consistent with the system, casimirs preserved. We take a nominal value ω t0 and deform it using the following modified Euler equation: tω + β i u(τ i ) ω = 0 (3) where β i N (0, ɛ), i = 1,..., N p are centered Gaussian weights with an apriori variance parameter ɛ, and τ i U (t initial, t 0 ), i = 1,..., N p are uniform random numbers. Thus each u (τ i ) corresponds to a PDE solution in the time period [t initial, t 0 ). Dan Crisan (Imperial College London) Particle Filters in High Dimensions 27 August 2018 24 / 39

. Initial condition Alternative choices q + ζ where ζ is gaussian random field, doable but not physical, only works for q because it s the least smooth of the three fields of interest. The other fields are spatially smooth. also this breaks the SPDE well-posedness theorem (function space regularity). Figure (ux,uy) directly perturb ψ, by ψ + ˉψ where ˉψ = (I κδ) 1 ζ invert elliptic operator with boundary condition ˉψ = 0. Figure (ux, uy) Dan Crisan (Imperial College London) Particle Filters in High Dimensions 27 August 2018 25 / 39

. Add-on techniques Model Reduction (High Low Res)* For the current numerical example we perform a state space order reduction from 2 513 2 0.5 10 6 to 2 65 2 = 8450. How do we reduce the model: We use a Stochastic PDE defined on a coarser grid: t q + (u ) q + (ξ k ) q dbt k = Q rq k=1 u = ψ Δψ = q. ξ k are divergence free given vector fields ξ k are computed from the true solution by using an empirical orthogonal functions (EOFs) procedure are scalar independent Brownian motions B k t dx t = u(x t, t) dt + i ξ i (x t ) dw i (t). Dan Crisan (Imperial College London) Particle Filters in High Dimensions 27 August 2018 26 / 39

. Methodology to calibrate the additional stochasticity The reason for this stochastic parametrization is grounded in solid physical considerations, see D.D. Holm, Variational principles for stochastic fluids, Proc.Roy. Soc. A, 2015. dx t = u f t (x t )dt dx t = ut c (x t )dt + ξ i (x t ) dw i (t) For each m = 0, 1,..., M 1 i 1 Solve dxij f (t)/dt = uf t (x ij f f (t)) with initial condition xij (mδt ) = x ij. 2 Compute ut c by low-pass filtering ut f along the trajectory. 3 Compute xij c c (t) by solving dxij (t)/dt = uc t (x ij f (t)) with the same initial condition. 4 Compute the difference Δxij m = xij f c ((m + 1)ΔT ) xij ((m + 1)ΔT ), which measures the error between the fine and coarse trajectory. Dan Crisan (Imperial College London) Particle Filters in High Dimensions 27 August 2018 27 / 39

. Methodology to calibrate the additional stochasticity Having obtained Δxij m, we would like to extract the basis for the noise. This amounts to a Gaussian model of the form Δxij m = Δx ij + δt N ξij k ΔW m, k where ΔWm k are i.i.d. Normal random variables with mean zero, variance 1. We estimate ξ by minimising 2 Δxij m N E Δx ij ξij k ΔW m k, δt ijm where the choice of N can be informed by using Empirical Orthogonal Function (EOFs). k=1 k=1 Dan Crisan (Imperial College London) Particle Filters in High Dimensions 27 August 2018 28 / 39

. Methodology to calibrate the additional stochasticity Number of sources of noise (EoFs - empirical orthogonal functions) decide on a case by case basis too many will slow down the algorithm On the left: Number of EOFs 90% variance vs 50% (no change). On the right: Normalised spectrum of the Δx covariance operator, showing number of eofs required to capture 50%, 70% and 90% of total variance Dan Crisan (Imperial College London) Particle Filters in High Dimensions 27 August 2018 29 / 39

. Methodology to calibrate the additional stochasticity Model reduction UQ pictures sytems 512x512 vs 128x128 vs 64x64 (a) ux (b) uy Dan Crisan (Imperial College London) Particle Filters in High Dimensions 27 August 2018 30 / 39

. Methodology to calibrate the additional stochasticity (a) psi (b) q Dan Crisan (Imperial College London) Particle Filters in High Dimensions 27 August 2018 31 / 39

. Methodology to calibrate the additional stochasticity The aim of the calibration is to capture the statistical properties of the fast fluctuations, rather than the trajectory of the flow. Validation of the stochastic parameterisation in terms of uncertainty quantification for the SPDE. Performance of DA algorithm relies on the correct modelling of the unresolved scales. Evaluating the uncertainty arising from the choice of EOFs is be part of the particle filter implementation. Dan Crisan (Imperial College London) Particle Filters in High Dimensions 27 August 2018 32 / 39

. The tempering procedure The tempering procedure Initialisation t=0: For i = 1,..., n, sample q i 0 from π 0. Iteration (t i 1, t i ]: Given the ensemble {X n (t i 1 )} n=1,...,n, 1 Evolve X n (t i 1 ) using the SPDE to obtain X n (t i ). 2 Given X := {X n (t i )} n=1,...,n, define normalised tempered weights ˉλ n,i (φ, X) := exp ( φλ n,i) m exp ( φλ m,i) where the dependence on X means the Λ n,i are computed using X. Define effective sample size ESS i (φ, X) := ˉλi (φ, X) 1. l 2 Set φ = 1. 3... While ESS i (φ, X) < N threshold do: (a) Find 1 φ < φ < 1 such that ESS i (φ (1 φ), X) N threshold. Resample according to ˉλ n,i (φ (1 φ), X) and apply MCMC if required (i.e. when there are duplicated particles), to obtain a new set of particles X (φ ). Set φ = 1 φ and X = X (φ ). (b) If ESS i N threshold then STOP and go to the next filtering step with {( Xn (t i ), ˉλ n,i )} n=1,...,n. Dan Crisan (Imperial College London) Particle Filters in High Dimensions 27 August 2018 33 / 39

. The tempering procedure MCMC Mutation (Jittering) Algorithm Given the ensemble {X n,k (t i )} n=1,...,n corresponding to the k th tempering step with temperatureφ k, and proposal step size ρ [0, 1], repeat the following steps. Propose ( X n (t i ) = G X n (t i 1 ), ρw (t i 1 : t i ; ω) + ) 1 ρ 2 Z (t i 1 : t i ; ω) where X n (t i ) = G (X n (t i 1 ), W (t i 1 : t i ; ω)), and W Z. Accept X n (t i ) with probability ( ˉλ φ k, X ) n (t i ) 1 ˉλ (φ k, X n (t i )) where λ (φ, x) = exp ( φλ(x)) is the unnormalised weight function. Dan Crisan (Imperial College London) Particle Filters in High Dimensions 27 August 2018 34 / 39

. Implementation parameter Number of Particles decide on a case by case basis too few will not give a reasonable solution too many will slow down the algorithm Picture Number of particles 225 (good) vs 500 (no change), 225 (good) vs 25 (less good) 25 seems ok but we want as many as computationally feasible to tune the algorithm (a) psi 225 vs 500 (b) psi 225 vs 25 Dan Crisan (Imperial College London) Particle Filters in High Dimensions 27 August 2018 35 / 39

. Resampling Intervals Resampling Intervals small resampling intervals lead to an unreasonable increase in the computational effort large resampling intervals make the algorithm fail the ESS can be used as criterion for choosing the resampling time adapted resampling times can be used ESS evolution in time/observation noise. DA Solution for DA periods: 1 ETT and 0.2 ETT (a) ux (b) uy (c) psi (d) q Figure: DA: obs spde, period 0.2 ett Dan Crisan (Imperial College London) Particle Filters in High Dimensions 27 August 2018 36 / 39

. Results (a) ux (b) uy (c) psi (d) q Figure: DA: obs pde, period 0.2 ett (a) ux (b) uy (c) psi (d) q Figure: DA: obs pde, period 1 ett Dan Crisan (Imperial College London) Particle Filters in High Dimensions 27 August 2018 37 / 39

. Results Number of tempering steps/average MCMC steps Dan Crisan (Imperial College London) Particle Filters in High Dimensions 27 August 2018 38 / 39

. Final Remarks Final remarks Particle filters/sequential Monte Carlo methods are theoretically justified algorithms for approximating the state of dynamical systems partially (and noisily) observed. Standard particle filters do NOT work in high dimensions. Properly calibrated and modified, particle filters can be used to solve high dimensional problems (see also the work of Peter Jan van Leeuwen, Roland Potthast, Hans Kunsch). Important parameters: initial condition, number of particle, number of observations, correction times, observation error, etc. One can use methodology to assess the reliability of ensemble forecasting system. Add-on techniques: Tempering * OT prior posterior Sequential DA in space * Jittering * Model Reduction (High Low Res)* Nudging Hybrid models Hamiltonian Monte Carlo Informed priors Localization Dan Crisan (Imperial College London) Particle Filters in High Dimensions 27 August 2018 39 / 39