Lecture 6: Bayesian Inference in SDE Models

Similar documents
Lecture 2: From Linear Regression to Kalman Filter and Beyond

Lecture 7: Optimal Smoothing

Lecture 6: Multiple Model Filtering, Particle Filtering and Other Approximations

Lecture 2: From Linear Regression to Kalman Filter and Beyond

Lecture 4: Numerical Solution of SDEs, Itô Taylor Series, Gaussian Approximations

Lecture 4: Numerical Solution of SDEs, Itô Taylor Series, Gaussian Process Approximations

State Space Representation of Gaussian Processes

On Continuous-Discrete Cubature Kalman Filtering

RECURSIVE OUTLIER-ROBUST FILTERING AND SMOOTHING FOR NONLINEAR SYSTEMS USING THE MULTIVARIATE STUDENT-T DISTRIBUTION

The Kalman Filter ImPr Talk

Prediction of ESTSP Competition Time Series by Unscented Kalman Filter and RTS Smoother

Kalman filtering and friends: Inference in time series models. Herke van Hoof slides mostly by Michael Rubinstein

Lecture 1: Pragmatic Introduction to Stochastic Differential Equations

Probabilistic Methods in Multiple Target Tracking. Review and Bibliography

ECE276A: Sensing & Estimation in Robotics Lecture 10: Gaussian Mixture and Particle Filtering

NON-LINEAR NOISE ADAPTIVE KALMAN FILTERING VIA VARIATIONAL BAYES

Lecture 4: Extended Kalman filter and Statistically Linearized Filter

Smoothers: Types and Benchmarks

State Space Gaussian Processes with Non-Gaussian Likelihoods

Nonlinear and/or Non-normal Filtering. Jesús Fernández-Villaverde University of Pennsylvania

Perception: objects in the environment

Rao-Blackwellized Particle Filter for Multiple Target Tracking

Linear Dynamical Systems

Lecture 8: Bayesian Estimation of Parameters in State Space Models

Sampling Methods (11/30/04)

Modeling and state estimation Examples State estimation Probabilities Bayes filter Particle filter. Modeling. CSC752 Autonomous Robotic Systems

Robotics. Mobile Robotics. Marc Toussaint U Stuttgart

SEQUENTIAL MONTE CARLO METHODS WITH APPLICATIONS TO COMMUNICATION CHANNELS. A Thesis SIRISH BODDIKURAPATI

Dynamic System Identification using HDMR-Bayesian Technique

Iterative Filtering and Smoothing In Non-Linear and Non-Gaussian Systems Using Conditional Moments

BAYESIAN ESTIMATION OF TIME-VARYING SYSTEMS:

Lecture 4: State Estimation in Hidden Markov Models (cont.)

Probabilistic Graphical Models

Local Positioning with Parallelepiped Moving Grid

Gaussian Process Approximations of Stochastic Differential Equations

Probabilistic Graphical Models

Nonlinear State Estimation! Extended Kalman Filters!

State Space and Hidden Markov Models

Imprecise Filtering for Spacecraft Navigation

PROBABILISTIC REASONING OVER TIME

State Estimation Based on Nested Particle Filters

Tampere University of Technology Tampere Finland

ROBOTICS 01PEEQW. Basilio Bona DAUIN Politecnico di Torino

Linear Dynamical Systems (Kalman filter)

Expectation Propagation in Dynamical Systems

Particle Filtering a brief introductory tutorial. Frank Wood Gatsby, August 2007

Nonparametric Bayesian Methods (Gaussian Processes)

Lecture on Parameter Estimation for Stochastic Differential Equations. Erik Lindström

EM-algorithm for Training of State-space Models with Application to Time Series Prediction

Introduction to Machine Learning

Towards a Bayesian model for Cyber Security

Lecture: Gaussian Process Regression. STAT 6474 Instructor: Hongxiao Zhu

Gaussian Process Approximations of Stochastic Differential Equations

State Estimation using Moving Horizon Estimation and Particle Filtering

Part 1: Expectation Propagation

Organization. I MCMC discussion. I project talks. I Lecture.

Optimal filtering with Kalman filters and smoothers a Manual for Matlab toolbox EKF/UKF

Probabilistic Graphical Models

Sequential Monte Carlo Methods for Bayesian Computation

Factor Analysis and Kalman Filtering (11/2/04)

Survey on Recursive Bayesian State Estimation

Probabilistic Fundamentals in Robotics. DAUIN Politecnico di Torino July 2010

Optimal filtering with Kalman filters and smoothers a Manual for Matlab toolbox EKF/UKF

Time Series Prediction by Kalman Smoother with Cross-Validated Noise Density

Sequential Monte Carlo and Particle Filtering. Frank Wood Gatsby, November 2007

Bayesian Regression Linear and Logistic Regression

Computer Intensive Methods in Mathematical Statistics

arxiv: v2 [cs.sy] 13 Aug 2018

1 Kalman Filter Introduction

CPSC 540: Machine Learning

Probabilistic Robotics

Mobile Robot Localization

SDE Coefficients. March 4, 2008

Bayesian Calibration of Simulators with Structured Discretization Uncertainty

A NEW NONLINEAR FILTER

Mini-Course 07 Kalman Particle Filters. Henrique Massard da Fonseca Cesar Cunha Pacheco Wellington Bettencurte Julio Dutra

Nonlinear Estimation Techniques for Impact Point Prediction of Ballistic Targets

Time Series Analysis

Bayesian Estimation For Tracking Of Spiraling Reentry Vehicles

Particleand Sigma-Point Methods for Stateand Parameter Estimation in Nonlinear Dynamic Systems

An introduction to Sequential Monte Carlo

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008

Expectation propagation for signal detection in flat-fading channels

Autonomous Mobile Robot Design

A Time-Varying Threshold STAR Model of Unemployment

NONLINEAR STATISTICAL SIGNAL PROCESSING: A PARTI- CLE FILTERING APPROACH

Gaussian Processes (10/16/13)

Lagrangian Data Assimilation and Manifold Detection for a Point-Vortex Model. David Darmon, AMSC Kayo Ide, AOSC, IPST, CSCAMM, ESSIC

TSRT14: Sensor Fusion Lecture 8

2D Image Processing (Extended) Kalman and particle filter

Nonparmeteric Bayes & Gaussian Processes. Baback Moghaddam Machine Learning Group

Crib Sheet : Linear Kalman Smoothing

Lecture 4: Dynamic models

output dimension input dimension Gaussian evidence Gaussian Gaussian evidence evidence from t +1 inputs and outputs at time t x t+2 x t-1 x t+1

Dual Estimation and the Unscented Transformation

Introduction to Probabilistic Graphical Models: Exercises

Markov Chain Monte Carlo Methods for Stochastic Optimization

Recursive Noise Adaptive Kalman Filtering by Variational Bayesian Approximations

Efficient Variational Inference in Large-Scale Bayesian Compressed Sensing

Particle Filtering Approaches for Dynamic Stochastic Optimization

Transcription:

Lecture 6: Bayesian Inference in SDE Models Bayesian Filtering and Smoothing Point of View Simo Särkkä Aalto University Simo Särkkä (Aalto) Lecture 6: Bayesian Inference in SDEs 1 / 45

Contents 1 SDEs as Dynamic Models in Filtering and Smoothing Problems 2 Discrete-Time Bayesian Filtering 3 Discrete-Time Bayesian Smoothing 4 Continuous/Discrete-Time Bayesian Filtering and Smoothing 5 Continuous-Time Bayesian Filtering and Smoothing 6 Related Topics and Summary Simo Särkkä (Aalto) Lecture 6: Bayesian Inference in SDEs 2 / 45

The Basic Ideas Use SDEs as prior models for the system dynamics. We measure the state of the SDE though a measurement model. We aim to determine the conditional distribution of the trajectory taken by the SDE given the measurements. Because the trajectory is an infinite-dimensional random variable, we do not want to form the full posterior distribution. Instead, we target to time-marginals of the posterior this is the idea of stochastic filtering theory. Signal 0.25 Signal derivative Measurement 0.2 0.15 0.1 0.05 0 0.05 0.1 0.15 0.2 0.25 0 2 4 6 8 10 12 14 16 Time Simo Särkkä (Aalto) Lecture 6: Bayesian Inference in SDEs 4 / 45

Types of state-estimation problems Continuous-time: The dynamics are modeled as continuous-time processes (SDEs). The measurements are modeled as continuous-time processes (SDEs). Continuous/discrete-time: The dynamics are modeled as continuous-time processes (SDEs). The measurements are modeled as discrete-time processes. Discrete-time: The dynamics are modeled as discrete-time processes. The measurements are modeled as discrete-time processes. Signal 0.25 Measurement 0.2 0.15 0.1 0.05 0 0.05 0.1 0.15 0.2 0.25 0 2 4 6 8 10 12 14 16 Time Signal 0.25 Signal derivative Measurement 0.2 0.15 0.1 0.05 0 0.05 0.1 0.15 0.2 0.25 0 2 4 6 8 10 12 14 16 Time Signal 0.25 Measurement 0.2 0.15 0.1 0.05 0 0.05 0.1 0.15 0.2 0.25 0 2 4 6 8 10 12 14 16 Time Simo Särkkä (Aalto) Lecture 6: Bayesian Inference in SDEs 5 / 45

Example: State Space Model for a Car [1/2] The dynamics of the car in 2d (x 1, x 2 ) are given by Newton s law: F(t) = m a(t), where a(t) is the acceleration, m is the mass of the car, and F(t) is a vector of (unknown) forces. Let s model F(t)/m as a 2-dimensional white noise process: d 2 x 1 / dt 2 = w 1 (t) d 2 x 2 / dt 2 = w 2 (t). Simo Särkkä (Aalto) Lecture 6: Bayesian Inference in SDEs 6 / 45

Example: State Space Model for a Car [2/2] If we define x 3 (t) = dx 1 / dt, x 4 (t) = dx 2 / dt, then the model can be written as a first order system of differential equations: d dt x 1 x 2 x 3 x 4 0 0 1 0 = 0 0 0 1 0 0 0 0 0 } 0 0 {{ 0 } F x 1 x 2 x 3 x 4 0 0 + 0 0 1 0 0 1 }{{} L ( w1 w 2 ). In shorter matrix form: More rigorously: dx dt = F x + L w. dx = F x dt + L dβ. Simo Särkkä (Aalto) Lecture 6: Bayesian Inference in SDEs 7 / 45

Continuous-Time Measurement Model for a Car (y 1, y 2 ) Assume that the position of the car (x 1, x 2 ) is measured and the measurements are corrupted by white noise e 1 (t), e 2 (t): y 1 (t) = x 1 (t) + e 1 (t) y 2 (t) = x 2 (t) + e 2 (t). The measurement model can be now written as ( ) 1 0 0 0 y(t) = H x(t) + e(t), with H = 0 1 0 0 Or more rigorously as an SDE dz = H x dt + dη. Simo Särkkä (Aalto) Lecture 6: Bayesian Inference in SDEs 8 / 45

General Continuous-Time State-Space Models [1/2] The resulting model is of the form dx = F x dt + L dβ dz = H x dt + dη. This is a special case of a continuous-time model: dx = f(x, t) dt + L(x, t) dβ dz = h(x, t) dt + dη. The first equation defines dynamics of the system state and the second relates measurements to the state. Given that we have observed z(t) (or y(t)), what can we say (in statistical sense) about the hidden process x(t)? Simo Särkkä (Aalto) Lecture 6: Bayesian Inference in SDEs 9 / 45

General Continuous-Time State-Space Models [2/2] Bayesian way: what is the posterior distribution of x(t) given the noisy measurements y(τ) on τ [0, T ]? This Bayesian solution is surpricingly old, as it dates back to work of Stratonovich around 1950s. The aim is usually to compute the filtering (posterior) distribution p(x(t) {y(τ) : 0 τ t}). We are also often interested in the smoothing distributions p(x(t ) {y(τ) : 0 τ T }) t [0, T ]. Note that we could also attempt to compute the full posterior p({x(t ) : 0 t T } {y(τ) : 0 τ T }). The full posterior is not usually feasible nor sensible we will return to this later. Simo Särkkä (Aalto) Lecture 6: Bayesian Inference in SDEs 10 / 45

Discrete-Time Measurement Model for a Car (y 1, y 2 ) Assume that the position of the car (x 1, x 2 ) is measured at discrete time instants t 1, t 2,..., t n : y 1,k = x 1 (t k ) + e 1,k y 2,k = x 2 (t k ) + e 2,k, (e 1,k, e 2,k ) N(0, R) are Gaussian. The measurement model can be now written as ( ) 1 0 0 0 y k = H x(t k ) + e k, H = 0 1 0 0 Or in probabilistic notation as p(y k x(t k )) = N(y k H x(t k ), R). Simo Särkkä (Aalto) Lecture 6: Bayesian Inference in SDEs 11 / 45

General Continuous/Discrete-Time State-Space Models The dynamic and measurement models now have the form: dx = F x dt + L dβ y k = H x(t k ) + r k, Special case of continuous/discrete-time models of the form dx = f(x, t) dt + L(x, t) dβ y k p(y k x(t k )). We are typically interested in the filtering and smoothing (posterior) distributions p(x(t k ) y 1,..., y k ), p(x(t ) y 1,..., y T ), t [0, t T ]. In principle, the full posterior can also be considered but we will concetrate on the above. Simo Särkkä (Aalto) Lecture 6: Bayesian Inference in SDEs 12 / 45

General Discrete-Time State-Space Models [1/2] Recall that the solution to the SDE dx = F x dt + L dβ is x(t) = exp(f (t t 0 )) x(t 0 ) + t If we set t t k and t 0 t k 1 we get x(t k ) = exp(f (t k t k 1 )) x(t k 1 ) + t 0 exp(f (t τ)) L dβ(τ). tk t k 1 exp(f (t τ)) L dβ(τ). Thus this is of the form x(t k ) = A k 1 x(t k 1 ) + q k 1 where A k 1 = exp(f (t k t k 1 )) is a given (deterministic) matrix and q k 1 is zero-mean Gaussian random variable with covariance Q k 1 = t k t k 1 exp(f (t τ)) L Q L T exp(f (t τ)) T dτ. Simo Särkkä (Aalto) Lecture 6: Bayesian Inference in SDEs 13 / 45

General Discrete-Time State-Space Models [2/2] Thus we can write the linear state-space model (e.g. the car) equivalently in form such as x(t k ) = A k 1 x(t k 1 ) + q k 1 y k = H x(t k ) + r k This is a special case of discrete-time models of the form x(t k ) p(x(t k ) x(t k 1 )) y k p(y k x(t k )). Generally p(x(t k ) x(t k 1 )) is the transition density of the SDE (The Green s function of Fokker Planck Kolmogorov) We are typically interested in the filtering/smoothing distributions p(x(t k ) y 1,..., y k ), p(x(t i ) y 1,..., y T ), i = 1, 2,..., T. Sometimes we can also do the full posterior... Simo Särkkä (Aalto) Lecture 6: Bayesian Inference in SDEs 14 / 45

Why Not The Full Posterior? Consider a discrete-time state-space model: x k p(x k x k 1 ) y k p(y k x k ). Due to Markovianity, the joint prior is now given as p(x 0:T ) = p(x 0 ) T p(x k x k 1 ). k=1 Due to conditional independence of measurements, the joint likelihood is given as p(y 1:T x 0:T ) = T p(y k x k ). k=1 Simo Särkkä (Aalto) Lecture 6: Bayesian Inference in SDEs 16 / 45

Why Not The Full Posterior? (cont.) We can now use Bayes rule to compute the full posterior p(x 0:T y 1:T ) = p(y 1:T x 0:T ) p(x 0:T ) p(y 1:T ) T k=1 = p(y k x k ) p(x k x k 1 ) p(x 0 ) T k=1 p(y k x k ) p(x k x k 1 ) p(x 0 ) dx 0:T This is very high dimensional (with SDEs infinite) and hence inefficient to work with this is why filtering theory was invented. We aim to fully utilize the Markovian structure of the model to efficiently compute the following partial posteriors: Filtering distributions Smoothing distributions p(x k y 1:k ), k = 1,..., T. p(x k y 1:T ), k = 1,..., T. Simo Särkkä (Aalto) Lecture 6: Bayesian Inference in SDEs 17 / 45

Bayesian Optimal Filter: Principle Assume that we have been given: 1 Prior distribution p(x 0 ). 2 State space model: x k p(x k x k 1 ) y k p(y k x k ), 3 Measurement sequence y 1:k = y 1,..., y k. We usually have x k x(t k ) for some times t 1, t 2,.... Bayesian optimal filter computes the distribution p(x k y 1:k ) Computation is based on recursion rule for incorporation of the new measurement y k into the posterior: p(x k 1 y 1:k 1 ) p(x k y 1:k ) Simo Särkkä (Aalto) Lecture 6: Bayesian Inference in SDEs 18 / 45

Bayesian Optimal Filter: Derivation of Prediction Step Assume that we know the posterior distribution of previous time step: p(x k 1 y 1:k 1 ). The joint distribution of x k, x k 1 given y 1:k 1 can be computed as (recall the Markov property): p(x k, x k 1 y 1:k 1 ) = p(x k x k 1, y 1:k 1 ) p(x k 1 y 1:k 1 ) = p(x k x k 1 ) p(x k 1 y 1:k 1 ), Integrating over x k 1 gives the Chapman-Kolmogorov equation p(x k y 1:k 1 ) = p(x k x k 1 ) p(x k 1 y 1:k 1 ) dx k 1. This is the prediction step of the optimal filter. Simo Särkkä (Aalto) Lecture 6: Bayesian Inference in SDEs 19 / 45

Bayesian Optimal Filter: Derivation of Update Step Now we have: 1 Prior distribution from the Chapman-Kolmogorov equation p(x k y 1:k 1 ) 2 Measurement likelihood from the state space model: p(y k x k ) The posterior distribution can be computed by the Bayes rule (recall the conditional independence of measurements): p(x k y 1:k ) = 1 Z k p(y k x k, y 1:k 1 ) p(x k y 1:k 1 ) = 1 Z k p(y k x k ) p(x k y 1:k 1 ) This is the update step of the optimal filter. Simo Särkkä (Aalto) Lecture 6: Bayesian Inference in SDEs 20 / 45

Bayesian Optimal Filter: Formal Equations Optimal filter Initialization: The recursion starts from the prior distribution p(x 0 ). Prediction: by the Chapman-Kolmogorov equation p(x k y 1:k 1 ) = p(x k x k 1 ) p(x k 1 y 1:k 1 ) dx k 1. Update: by the Bayes rule p(x k y 1:k ) = 1 p(y k x k ) p(x k y 1:k 1 ). Z k The normalization constant Z k = p(y k y 1:k 1 ) is given as Z k = p(y k x k ) p(x k y 1:k 1 ) dx k. Simo Särkkä (Aalto) Lecture 6: Bayesian Inference in SDEs 21 / 45

Bayesian Optimal Filter: Graphical Explanation On prediction step the distribution of previous step is propagated through the dynamics. Prior distribution from prediction and the likelihood of measurement. The posterior distribution after combining the prior and likelihood by Bayes rule. Simo Särkkä (Aalto) Lecture 6: Bayesian Inference in SDEs 22 / 45

Filtering Algorithms Kalman filter is the classical optimal filter for linear-gaussian models. Extended Kalman filter (EKF) is linearization based extension of Kalman filter to non-linear models. Unscented Kalman filter (UKF) is sigma-point transformation based extension of Kalman filter. Gauss-Hermite and Cubature Kalman filters (GHKF/CKF) are numerical integration based extensions of Kalman filter. Particle filter forms a Monte Carlo representation (particle set) to the distribution of the state estimate. Grid based filters approximate the probability distributions on a finite grid. Mixture Gaussian approximations are used, for example, in multiple model Kalman filters and Rao-Blackwellized Particle filters. Simo Särkkä (Aalto) Lecture 6: Bayesian Inference in SDEs 23 / 45

Kalman Filter: Model Gaussian driven linear model, i.e., Gauss-Markov model: x k = A k 1 x k 1 + q k 1 y k = H k x k + r k, q k 1 N(0, Q k 1 ) white process noise. r k N(0, R k ) white measurement noise. A k 1 is the transition matrix of the dynamic model. H k is the measurement model matrix. In probabilistic terms the model is p(x k x k 1 ) = N(x k A k 1 x k 1, Q k 1 ) p(y k x k ) = N(y k H k x k, R k ). Simo Särkkä (Aalto) Lecture 6: Bayesian Inference in SDEs 24 / 45

Kalman Filter: Equations Kalman Filter Initialization: x 0 N(m 0, P 0 ) Prediction step: m k P k = A k 1 m k 1 = A k 1 P k 1 A T k 1 + Q k 1. Update step: v k = y k H k m k S k = H k P k HT k + R k K k = P k HT k S 1 k m k = m k + K k v k P k = P k K k S k K T k. Simo Särkkä (Aalto) Lecture 6: Bayesian Inference in SDEs 25 / 45

Problem Formulation Probabilistic state space model: measurement model: y k p(y k x k ) dynamic model: x k p(x k x k 1 ) Assume that the filtering distributions p(x k y 1:k ) have already been computed for all k = 0,..., T. We want recursive equations of computing the smoothing distribution for all k < T : p(x k y 1:T ). The recursion will go backwards in time, because on the last step, the filtering and smoothing distributions coincide: p(x T y 1:T ). Simo Särkkä (Aalto) Lecture 6: Bayesian Inference in SDEs 27 / 45

Derivation of Formal Smoothing Equations [1/2] The key: due to the Markov properties of state we have: Thus we get: p(x k x k+1, y 1:T ) = p(x k x k+1, y 1:k ) p(x k x k+1, y 1:T ) = p(x k x k+1, y 1:k ) = p(x k, x k+1 y 1:k ) p(x k+1 y 1:k ) = p(x k+1 x k, y 1:k ) p(x k y 1:k ) p(x k+1 y 1:k ) = p(x k+1 x k ) p(x k y 1:k ). p(x k+1 y 1:k ) Simo Särkkä (Aalto) Lecture 6: Bayesian Inference in SDEs 28 / 45

Derivation of Formal Smoothing Equations [2/2] Assuming that the smoothing distribution of the next step p(x k+1 y 1:T ) is available, we get p(x k, x k+1 y 1:T ) = p(x k x k+1, y 1:T ) p(x k+1 y 1:T ) = p(x k x k+1, y 1:k ) p(x k+1 y 1:T ) = p(x k+1 x k ) p(x k y 1:k ) p(x k+1 y 1:T ) p(x k+1 y 1:k ) Integrating over x k+1 gives [ ] p(xk+1 x k ) p(x k+1 y 1:T ) p(x k y 1:T ) = p(x k y 1:k ) dx k+1 p(x k+1 y 1:k ) Simo Särkkä (Aalto) Lecture 6: Bayesian Inference in SDEs 29 / 45

Bayesian Optimal Smoothing Equations Bayesian Optimal Smoothing Equations The Bayesian optimal smoothing equations consist of prediction step and backward update step: p(x k+1 y 1:k ) = p(x k+1 x k ) p(x k y 1:k ) dx k [ ] p(xk+1 x k ) p(x k+1 y 1:T ) p(x k y 1:T ) = p(x k y 1:k ) dx k+1 p(x k+1 y 1:k ) The recursion is started from the filtering (and smoothing) distribution of the last time step p(x T y 1:T ). Simo Särkkä (Aalto) Lecture 6: Bayesian Inference in SDEs 30 / 45

Smoothing Algorithms Rauch-Tung-Striebel (RTS) smoother is the closed form smoother for linear Gaussian models. Extended, statistically linearized and unscented RTS smoothers are the approximate nonlinear smoothers corresponding to EKF, SLF and UKF. Gaussian RTS smoothers: cubature RTS smoother, Gauss-Hermite RTS smoothers and various others Particle smoothing is based on approximating the smoothing solutions via Monte Carlo. Rao-Blackwellized particle smoother is a combination of particle smoothing and RTS smoothing. Simo Särkkä (Aalto) Lecture 6: Bayesian Inference in SDEs 31 / 45

Linear-Gaussian Smoothing Problem Gaussian driven linear model, i.e., Gauss-Markov model: x k = A k 1 x k 1 + q k 1 y k = H k x k + r k, In probabilistic terms the model is p(x k x k 1 ) = N(x k A k 1 x k 1, Q k 1 ) p(y k x k ) = N(y k H k x k, R k ). Kalman filter can be used for computing all the Gaussian filtering distributions: p(x k y 1:k ) = N(x k m k, P k ). Rauch Tung Striebel smoother then computes the corresponding smoothing distributions p(x k y 1:T ) = N(x k m s k, Ps k ). Simo Särkkä (Aalto) Lecture 6: Bayesian Inference in SDEs 32 / 45

Rauch-Tung-Striebel Smoother Rauch-Tung-Striebel Smoother Backward recursion equations for the smoothed means m s k and covariances P s k : m k+1 = A k m k P k+1 = A k P k A T k + Q k G k = P k A T k [P k+1 ] 1 m s k = m k + G k [m s k+1 m k+1 ] P s k = P k + G k [P s k+1 P k+1 ] GT k, m k and P k are the mean and covariance from Kalman filter. The recursion is started from the last time step T, with m s T = m T and P s T = P T. Simo Särkkä (Aalto) Lecture 6: Bayesian Inference in SDEs 33 / 45

Continuous/Discrete-Time Bayesian Filtering and Smoothing: Method A Consider a continuous-discrete state-space model dx = f(x, t) dt + L(x, t) dβ y k p(y k x(t k )). We can always convert this into an equivalent discrete-time model x(t k ) p(x(t k ) x(t k 1 )) y k p(y k x(t k )). by solving the transition density p(x(t k ) x(t k 1 )). Then we can simply use the discrete-time filtering and smoothing algorithms. With linear SDEs we can discretize exactly; with non-linear SDEs we can use e.g. Itô-Taylor expansions. Simo Särkkä (Aalto) Lecture 6: Bayesian Inference in SDEs 35 / 45

Continuous/Discrete-Time Bayesian Filtering and Smoothing: Method B Another way is to replace the discrete-time prediction step p(x k y 1:k 1 ) = p(x k x k 1 ) p(x k 1 y 1:k 1 ) dx k 1. with its continuous-time counterpart. Generally, we get the Fokker-Planck equation p(x, t) t = i + 1 2 ij x i [f i (x, t) p(x, t)] 2 { } [L(x, t) Q L T (x, t)] ij p(x, t). x i x j with the initial condition p(x, t k 1 ) p(x k 1 y 1:k 1 ). Simo Särkkä (Aalto) Lecture 6: Bayesian Inference in SDEs 36 / 45

Continuous/Discrete-Time Bayesian Filtering and Smoothing: Method B (cont.) Continuous-Discrete Bayesian Optimal filter 1 Prediction step: Solve the Fokker-Planck-Kolmogorov PDE p t = i (f i p) + 1 2 ( ) [L Q L T ] ij p x i 2 x i x j ij 2 Update step: Apply the Bayes rule. p(x(t k ) y 1:k ) = p(y k x(t k )) p(x(t k ) y 1:k 1 ) p(yk x(t k )) p(x(t k ) y 1:k 1 ) dx(t k ) In linear models we can use the mean and covariance equations. Approximate mean/covariance equations are used in EKF/UKF/... The smoother consists of partial/ordinary differential equations. Simo Särkkä (Aalto) Lecture 6: Bayesian Inference in SDEs 37 / 45

Continuous-Time Stochastic Filtering Theory [3/3] Continuous-time state-space model dx = f(x, t) dt + L(x, t) dβ dz = h(x, t) dt + dη. To ease notation, let s define a linear operator A : A ( ) = i + 1 2 ij x i [f i (x, t) ( )] 2 x i x j {[L(x, t) Q L T (x, t)] ij ( )}. The Fokker Planck Kolmogorov equation can then be written compactly as p = A p. t Simo Särkkä (Aalto) Lecture 6: Bayesian Inference in SDEs 39 / 45

Kushner Stratonovich equation By taking the continuous-time limit of the discrete-time Bayesian filtering equations we get the following: Kushner Stratonovich equation The stochastic partial differential equation for the filtering density p(x, t Y t ) p(x(t) Y t ) is dp(x, t Y t ) = A p(x, t Y t ) dt + (h(x, t) E[h(x, t) Y t ]) T R 1 (dz E[h(x, t) Y t ] dt) p(x, t Y t ), where dp(x, t Y t ) = p(x, t + dt Y t+dt ) p(x, t Y t ) and E[h(x, t) Y t ] = h(x, t) p(x, t Y t ) dx. Simo Särkkä (Aalto) Lecture 6: Bayesian Inference in SDEs 40 / 45

Zakai equation We can get rid of the non-linearity by using an unnormalized equation: Zakai equation Let q(x, t Y t ) q(x(t) Y t ) be the solution to Zakai s stochastic partial differential equation dq(x, t Y t ) = A q(x, t Y t ) dt + h T (x, t) R 1 dz q(x, t Y t ), where dq(x, t Y t ) = q(x, t + dt Y t+dt ) q(x, t Y t ). Then we have p(x(t) Y t ) = q(x(t) Y t ) q(x(t) Yt ) dx(t). Simo Särkkä (Aalto) Lecture 6: Bayesian Inference in SDEs 41 / 45

Kalman Bucy filter The Kalman Bucy filter is the exact solution to the linear Gaussian filtering problem dx = F(t) x dt + L(t) dβ dz = H(t) x dt + dη. Kalman Bucy filter The Bayesian filter, which computes the posterior distribution p(x(t) Y t ) = N(x(t) m(t), P(t)) for the above system is K(t) = P(t) H T (t) R 1 dm(t) = F(t) m(t) dt + K(t) [dz(t) H(t) m(t) dt] dp(t) dt = F(t) P(t) + P(t) F T (t) + L(t) Q L T (t) K(t) R K T (t). Simo Särkkä (Aalto) Lecture 6: Bayesian Inference in SDEs 42 / 45

Related Topics We can also estimate parameters θ in SDEs/state-spate models: dx = f(x, t; θ) dt + L(x, t; θ) dβ The filtering theory provides the means to compute the required marginal likelihoods and parameter posteriors. It is also possible estimate f(x, t) non-parametrically, that is, using Gaussian process (GP) regression. Model selection, Bayesian model averaging, and other advanced concepts can also be combined with state-space inference. Stochastic control theory is related to optimal control design for SDE models. GP-regression can also be sometimes converted to inference on SDE models. Simo Särkkä (Aalto) Lecture 6: Bayesian Inference in SDEs 44 / 45

Summary We can use SDEs to model dynamics in Bayesian models. Dynamic (state-) estimation problems can be divided into continuous-time, continuous/discrete-time, and discrete-time problems the continuous models are SDEs. The full posterior of state trajectory is usually intractable therefore we compute filtering and smoothing distributions: p(x(t k ) y 1,..., y k ), p(x(t ) y 1,..., y T ), t [0, t T ]. The Bayesian filtering and smoothing equations also often need to be approximated. Methods: Kalman filters, extended Kalman filters (EKF/UKF/... ), particle filters and the related smoothers. Simo Särkkä (Aalto) Lecture 6: Bayesian Inference in SDEs 45 / 45