Gaussian processes for inference in stochastic differential equations

Similar documents
Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012

Gaussian Process Approximations of Stochastic Differential Equations

Gaussian Process Regression

CSci 8980: Advanced Topics in Graphical Models Gaussian Processes

Nonparametric Drift Estimation for Stochastic Differential Equations

Computer Vision Group Prof. Daniel Cremers. 4. Gaussian Processes - Regression

Gaussian Processes (10/16/13)

A variational radial basis function approximation for diffusion processes

COMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017

Computer Vision Group Prof. Daniel Cremers. 9. Gaussian Processes - Regression

State Space Representation of Gaussian Processes

Gaussian Process Approximations of Stochastic Differential Equations

Homogenization with stochastic differential equations

Deep learning with differential Gaussian process flows

Probabilistic & Unsupervised Learning

Nonparameteric Regression:

GAUSSIAN PROCESS REGRESSION

Probabilistic Graphical Models Lecture 20: Gaussian Processes

These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop

STAT 518 Intro Student Presentation

Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak

Tutorial on Gaussian Processes and the Gaussian Process Latent Variable Model

Variational Model Selection for Sparse Gaussian Process Regression

Gaussian Processes in Machine Learning

Nonparmeteric Bayes & Gaussian Processes. Baback Moghaddam Machine Learning Group

Normalized kernel-weighted random measures

Mathematical Methods for Neurosciences. ENS - Master MVA Paris 6 - Master Maths-Bio ( )

Gaussian Processes for Machine Learning

Gaussian processes and bayesian optimization Stanisław Jastrzębski. kudkudak.github.io kudkudak

GWAS V: Gaussian processes

Machine Learning. Bayesian Regression & Classification. Marc Toussaint U Stuttgart

p 1 ( Y p dp) 1/p ( X p dp) 1 1 p

Data assimilation with and without a model

Practical Bayesian Optimization of Machine Learning. Learning Algorithms

Lecture 4: Introduction to stochastic processes and stochastic calculus

The Smoluchowski-Kramers Approximation: What model describes a Brownian particle?

Linear regression example Simple linear regression: f(x) = ϕ(x)t w w ~ N(0, ) The mean and covariance are given by E[f(x)] = ϕ(x)e[w] = 0.

Kernel Adaptive Metropolis-Hastings

Density Estimation. Seungjin Choi

The Variational Gaussian Approximation Revisited

Advances in kernel exponential families

Poisson Jumps in Credit Risk Modeling: a Partial Integro-differential Equation Formulation

Lecture 1c: Gaussian Processes for Regression

Model Selection for Gaussian Processes

Gaussian with mean ( µ ) and standard deviation ( σ)

Linear Regression and Discrimination

Computer Vision Group Prof. Daniel Cremers. 2. Regression (cont.)

Information geometry for bivariate distribution control

The geometry of Gaussian processes and Bayesian optimization. Contal CMLA, ENS Cachan

ADVANCED MACHINE LEARNING ADVANCED MACHINE LEARNING. Non-linear regression techniques Part - II

PATTERN RECOGNITION AND MACHINE LEARNING

Lecture: Gaussian Process Regression. STAT 6474 Instructor: Hongxiao Zhu

Multivariate Bayesian Linear Regression MLAI Lecture 11

Controlled Diffusions and Hamilton-Jacobi Bellman Equations

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008

Exact Simulation of Diffusions and Jump Diffusions

Nonparametric Bayesian Methods (Gaussian Processes)

Kernels for Automatic Pattern Discovery and Extrapolation

Adaptive HMC via the Infinite Exponential Family

Nonparametric Bayesian Methods - Lecture I

Active and Semi-supervised Kernel Classification

A Process over all Stationary Covariance Kernels

System identification and control with (deep) Gaussian processes. Andreas Damianou

Non-Factorised Variational Inference in Dynamical Systems

Markov Chain Monte Carlo Algorithms for Gaussian Processes

Graphical Models for Collaborative Filtering

Density Modeling and Clustering Using Dirichlet Diffusion Trees

Probabilistic Reasoning in Deep Learning

CMU-Q Lecture 24:

Introduction to Gaussian Processes

Ma 221 Final Exam Solutions 5/14/13

A Short Introduction to Diffusion Processes and Ito Calculus

Unsupervised Learning

Stochastic Modelling in Climate Science

Path integrals for classical Markov processes

Linear Models for Regression

Manifold Monte Carlo Methods

IEOR E4570: Machine Learning for OR&FE Spring 2015 c 2015 by Martin Haugh. The EM Algorithm

NONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

Announcements. Proposals graded

Brownian Motion. An Undergraduate Introduction to Financial Mathematics. J. Robert Buchanan. J. Robert Buchanan Brownian Motion

Mathematical Economics: Lecture 2

Pattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions

Lecture 1: Pragmatic Introduction to Stochastic Differential Equations

Fully Bayesian Deep Gaussian Processes for Uncertainty Quantification

CS-E4830 Kernel Methods in Machine Learning

Probabilistic Models for Learning Data Representations. Andreas Damianou

Efficient Bayesian Inference of Sigmoidal Gaussian Cox Processes

The Automatic Statistician

Gaussian Process Regression Networks

Large-scale Collaborative Prediction Using a Nonparametric Random Effects Model

Bayesian inference for stochastic differential mixed effects models - initial steps

Likelihood NIPS July 30, Gaussian Process Regression with Student-t. Likelihood. Jarno Vanhatalo, Pasi Jylanki and Aki Vehtari NIPS-2009

Statistical Mechanics of Active Matter

Simulation of conditional diffusions via forward-reverse stochastic representations

Variable sigma Gaussian processes: An expectation propagation perspective

Kolmogorov Equations and Markov Processes

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Transcription:

Gaussian processes for inference in stochastic differential equations Manfred Opper, AI group, TU Berlin November 6, 2017 Manfred Opper, AI group, TU Berlin (TU Berlin) inference in SDE November 6, 2017 1 / 34

Gaussian Process models in machine learning Gaussian processes provide prior distributions over latent functions f ( ) to be learnt from data. anfred Opper, AI group, TU Berlin (TU Berlin) inference in SDE November 6, 2017 2 / 34

Gaussian Process models in machine learning Gaussian processes provide prior distributions over latent functions f ( ) to be learnt from data. 2.5 2 1.5 1 0.5 0!0.5!1!1.5!2 0 10 20 30 40 50 60 70 80 90 100 f ( ) GP(m, K) with m(x) = E[f (x)] and K(x, x ) = Cov [f (x), f (x )] anfred Opper, AI group, TU Berlin (TU Berlin) inference in SDE November 6, 2017 2 / 34

Gaussian Process models in machine learning Gaussian processes provide prior distributions over latent functions f ( ) to be learnt from data. 2.5 2 1.5 1 0.5 0!0.5!1!1.5!2 0 10 20 30 40 50 60 70 80 90 100 f ( ) GP(m, K) with m(x) = E[f (x)] and K(x, x ) = Cov [f (x), f (x )] Well known examples: 1 Regression: y i = f (x i ) + ν i anfred Opper, AI group, TU Berlin (TU Berlin) inference in SDE November 6, 2017 2 / 34

Gaussian Process models in machine learning Gaussian processes provide prior distributions over latent functions f ( ) to be learnt from data. 2.5 2 1.5 1 0.5 0!0.5!1!1.5!2 0 10 20 30 40 50 60 70 80 90 100 f ( ) GP(m, K) with m(x) = E[f (x)] and K(x, x ) = Cov [f (x), f (x )] Well known examples: 1 Regression: y i = f (x i ) + ν i 2 Classification (y i {0, 1}): P[y i = 1 f (x i )] = σ (f (x i )) 3... anfred Opper, AI group, TU Berlin (TU Berlin) inference in SDE November 6, 2017 2 / 34

Inference We would like to predict f (x) given observations y. = (y 1,..., y n ) at x 1,..., x n. GP prior is infinite dimensional! Manfred Opper, AI group, TU Berlin (TU Berlin) inference in SDE November 6, 2017 3 / 34

Inference We would like to predict f (x) given observations y. = (y 1,..., y n ) at x 1,..., x n. GP prior is infinite dimensional! For these simple models, inference reduces to n dimensional problem. Let f i = f (xi ) p(f (x) y) = p(f (x) f 1,..., f n )p(f 1,..., f n y)df 1 df n Manfred Opper, AI group, TU Berlin (TU Berlin) inference in SDE November 6, 2017 3 / 34

A more complicated likelihood Poisson process: Likelihood for set of n points D = (x 1, x 2,..., x n ) T is given by { } n L(D λ) = exp λ(x)dx λ(x i ) T i=1 Manfred Opper, AI group, TU Berlin (TU Berlin) inference in SDE November 6, 2017 4 / 34

A more complicated likelihood Poisson process: Likelihood for set of n points D = (x 1, x 2,..., x n ) T is given by { } n L(D λ) = exp λ(x)dx λ(x i ) T i=1 λ(x) is the unknown intensity or rate of the process. Gaussian Process Modulated Poisson Processes Model (Lloyd et al, 2015) λ(x) = f 2 (x), where f is a Gaussian process. Manfred Opper, AI group, TU Berlin (TU Berlin) inference in SDE November 6, 2017 4 / 34

Outline Stochastic differential equations Drift estimation for dense observations Drift estimation for sparse observations Sparse GP approximation Drift estimation from empirical distribution Outlook Manfred Opper, AI group, TU Berlin (TU Berlin) inference in SDE November 6, 2017 5 / 34

Stochastic differential equations dv (x) E.g. f (x) = dx dx dt = f (X ) + white noise Manfred Opper, AI group, TU Berlin (TU Berlin) inference in SDE November 6, 2017 6 / 34

Prior process: Stochastic differential equations (SDE) Mathematicians prefer Ito version for X t R d dx t = f (X t ) dt + D 1/2 (X }{{} t ) }{{} Drift Diffusion Limit of discrete time process X k ɛ t i.i.d. Gaussian. dw t }{{} Wiener process X t+ t X t = f (X t ) t + D 1/2 (X t ) t ɛ t. Manfred Opper, AI group, TU Berlin (TU Berlin) inference in SDE November 6, 2017 7 / 34

X(t) 0.4 0.8 1.2 Path with observations. 42.0 42.5 43.0 43.5 44.0 t Manfred Opper, AI group, TU Berlin (TU Berlin) inference in SDE November 6, 2017 8 / 34

Nonparametric drift estimation Infer the drift function f ( ) under smoothness assumptions from observations of the process X. Manfred Opper, AI group, TU Berlin (TU Berlin) inference in SDE November 6, 2017 9 / 34

Nonparametric drift estimation Infer the drift function f ( ) under smoothness assumptions from observations of the process X. Idea (see e.g. Papaspilioupoulis, Pokern, Roberts & Stuart (2012) Assume a Gaussian Process prior f ( ) GP(0, K) with covariance kernel K(x, x ). Manfred Opper, AI group, TU Berlin (TU Berlin) inference in SDE November 6, 2017 9 / 34

Simple for observations dense in time Euler discretization of SDE X t+ t X t = f (X t ) t + t ɛ t, for t 0. Manfred Opper, AI group, TU Berlin (TU Berlin) inference in SDE November 6, 2017 10 / 34

Simple for observations dense in time Euler discretization of SDE X t+ t X t = f (X t ) t + t ɛ t, for t 0. Likelihood (assume densely observed path X 0:T ) is Gaussian [ ] p(x 0:T f ) exp 1 X t+ t X t 2 2 t t [ exp 1 f (X t ) 2 t + ] f (X t ) (X t+ t X t ). 2 t t Posterior process is also a GP! Manfred Opper, AI group, TU Berlin (TU Berlin) inference in SDE November 6, 2017 10 / 34

Simple for observations dense in time Euler discretization of SDE X t+ t X t = f (X t ) t + t ɛ t, for t 0. Likelihood (assume densely observed path X 0:T ) is Gaussian [ ] p(x 0:T f ) exp 1 X t+ t X t 2 2 t t [ exp 1 f (X t ) 2 t + ] f (X t ) (X t+ t X t ). 2 t t Posterior process is also a GP! Solves the[ regression problem ] f (x) E Xt+ t X t t X t = x. Works well for t 0. Manfred Opper, AI group, TU Berlin (TU Berlin) inference in SDE November 6, 2017 10 / 34

Example: Double well model n = 6000 data points with t = 0.002, GP with polynomial kernel of order 4. X(t) 1.5 0.5 0.5 1.5 0 2 4 6 8 10 12 t f(x) 15 5 0 5 10 1.5 0.5 0.0 0.5 1.0 1.5 x Manfred Opper, AI group, TU Berlin (TU Berlin) inference in SDE November 6, 2017 11 / 34

Estimation of diffusion Euler discretized SDE Diffusion D(x) = lim t 0 X t+ X t = f (X t ) t + t ɛ t lim t 0 1 t Var (X t+ t X t X t = x) = 1 t E [ (X t+ t X t ) 2 X t = x ]. Independent of drift! Estimate D(x) with GPs by regression with data y t = (X t+ t X t ) 2 D(x) 1 2 0.6 1.0 1.4 1.8 1.5 0.5 0.0 0.5 1.0 1.5 Manfred Opper, AI group, TU Berlin (TU Berlin) inference in xsde November 6, 2017 12 / 34

Ice core data δ 18 O 46 42 38 34 0 20 40 60 80 100 120 t [ky] Manfred Opper, AI group, TU Berlin (TU Berlin) inference in SDE November 6, 2017 13 / 34

GP inference using RBF kernels U(x) 0 2 4 6 8 10 46 42 38 34 x D(x) 1 2 1 2 3 46 42 38 34 x Manfred Opper, AI group, TU Berlin (TU Berlin) inference in SDE November 6, 2017 14 / 34

For larger t... X(t) 1.4 1.2 1.0 0.8 f(x) 6 2 0 2 4 6 199.5 200.0 200.5 201.0 201.5 t 1.5 0.5 0.0 0.5 1.0 1.5 x Problem: We can compute p(x 0:T f ) but NOT P(y f )! anfred Opper, AI group, TU Berlin (TU Berlin) inference in SDE November 6, 2017 15 / 34

EM Algorithm Treat unobserved path X t for times t between observations as latent random variables (Batz, Ruttor & Opper NIPS 2013). EM algorithm: 1 E-step: Compute expected complete data likelihood L(f, f old ) = E pold [ln L(X 0:T f )] (1) where p old = posterior p(x 0:T y) computed with the previous estimate f old of the drift. 2 M-Step: Recompute the drift function as f new = arg min f (L(f, f old ) ln P 0 (f )) (2) Manfred Opper, AI group, TU Berlin (TU Berlin) inference in SDE November 6, 2017 16 / 34

Likelihood for a complete path p(x 0:T f ) exp [ 1 2 t ] X t+ t X t 2 [ exp 1 f (X t ) 2 + ] f (X t ) (X t+ t X t ) 2 t t }{{} L(X 0:T f ) t Manfred Opper, AI group, TU Berlin (TU Berlin) inference in SDE November 6, 2017 17 / 34

The complete likelihood 1 lim t 0 2 E p [ln L(X 0:T f )] = [ E p f (Xt ) 2] t 2E p [(f (X t ), X t+ t X t )] t = 1 2 T [ E p f (Xt ) 2] 2E p [(f (X t ), g t (X t ))] dt 0 = 1 f (x) 2 A(x)dx (f (x), z(x)) dx. (3) 2 where 1 g t (x) = lim t 0 t E p[x t+ t X t X t = x], as well as the functions A(x) = T 0 q t (x)dt b(x) = T 0 g t (x)q t (x)dt. Manfred Opper, AI group, TU Berlin (TU Berlin) inference in SDE November 6, 2017 18 / 34

Problems E-step requires posterior marginal densities q t for diffusion processes with arbitrary prior drift functions f (x). p(x 0:T y, f ) p(x 0:T f ) n δ(y k X kτ ), (4) k=1 GP has to deal with an infinite amount of densely imputed data. Manfred Opper, AI group, TU Berlin (TU Berlin) inference in SDE November 6, 2017 19 / 34

Approximate solutions Linearize drift between consecutive observations (Ornstein Uhlenbeck bridge). Hence, we consider the approximate process for t between two observations dx t = [f (y k ) Γ k (X t y k )]dt + D 1/2 k dw (5) with Γ k = f (y k ) and D k = D(y k ). For this process, the transition density is a multivariate Gaussian! Work with sparse GP approximation. Manfred Opper, AI group, TU Berlin (TU Berlin) inference in SDE November 6, 2017 20 / 34

Variational sparse GP approximation L. Csato (2002), M. Titsias (2009) Assume measure over functions f of the form dp(f ) = 1 Z dp 0(f ) e U(f ). anfred Opper, AI group, TU Berlin (TU Berlin) inference in SDE November 6, 2017 21 / 34

Variational sparse GP approximation L. Csato (2002), M. Titsias (2009) Assume measure over functions f of the form dp(f ) = 1 Z dp 0(f ) e U(f ). Approximate by dq(f ) = 1 Z s dp 0 (f ) e Us(fs). U s depends only on sparse set f s = {f (x)} x S of dim m. anfred Opper, AI group, TU Berlin (TU Berlin) inference in SDE November 6, 2017 21 / 34

Variational sparse GP approximation L. Csato (2002), M. Titsias (2009) Assume measure over functions f of the form dp(f ) = 1 Z dp 0(f ) e U(f ). Approximate by dq(f ) = 1 Z s dp 0 (f ) e Us(fs). U s depends only on sparse set f s = {f (x)} x S of dim m. Minimize KL divergence D(Q P) = dq(f ) ln dq(f ) dp(f ) Manfred Opper, AI group, TU Berlin (TU Berlin) inference in SDE November 6, 2017 21 / 34

Variational sparse GP approximation L. Csato (2002), M. Titsias (2009) Assume measure over functions f of the form dp(f ) = 1 Z dp 0(f ) e U(f ). Approximate by dq(f ) = 1 Z s dp 0 (f ) e Us(fs). U s depends only on sparse set f s = {f (x)} x S of dim m. Minimize KL divergence D(Q P) = dq(f ) ln dq(f ) dp(f ) Integrating over dq(f f s ) = dp 0 (f f s ) yields optimal U s (f s ) = E 0 [U(f ) f s ] Manfred Opper, AI group, TU Berlin (TU Berlin) inference in SDE November 6, 2017 21 / 34

Variational sparse GP approximation L. Csato (2002), M. Titsias (2009) Assume measure over functions f of the form dp(f ) = 1 Z dp 0(f ) e U(f ). Approximate by dq(f ) = 1 Z s dp 0 (f ) e Us(fs). U s depends only on sparse set f s = {f (x)} x S of dim m. Minimize KL divergence D(Q P) = dq(f ) ln dq(f ) dp(f ) Integrating over dq(f f s ) = dp 0 (f f s ) yields optimal U s (f s ) = E 0 [U(f ) f s ] Can be computed analytically for Gaussian prior P 0 and quadratic log likelihood U Manfred Opper, AI group, TU Berlin (TU Berlin) inference in SDE November 6, 2017 21 / 34

Some details We can show that E 0 [f (x) f s ] = k s (x)(k s ) 1 f s (6) Hence, if U(f ) = 1 2 f 2 (x)a(x)dx f (x)b(x)dx, we get U s (f s ) = E 0 [U(f) f s ] = 1 { 2 f s K 1 s fs K 1 s k s (x) b(x) dx. } k s (x) A(x) k s (x)dx K 1 s f s Manfred Opper, AI group, TU Berlin (TU Berlin) inference in SDE November 6, 2017 22 / 34

GP estimation after one iteration of EM. f(x) 6 2 0 2 4 6 1.5 0.5 0.0 0.5 1.0 1.5 x f(x) 10 0 5 10 1.5 0.5 0.0 0.5 1.0 1.5 x Manfred Opper, AI group, TU Berlin (TU Berlin) inference in SDE November 6, 2017 23 / 34

MSE 0.0 1.0 2.0 3.0 Direct EM Gibbs 0.0 0.5 1.0 1.5 2.0 2.5 3.0 t Figure: (color online) Comparison of the MSE for different methods over different time intervals. Manfred Opper, AI group, TU Berlin (TU Berlin) inference in SDE November 6, 2017 24 / 34

Example: A simple pendulum dx = Vdt, dv = γv + mgl sin(x ) ml 2 dt + d 1/2 dw t, Manfred Opper, AI group, TU Berlin (TU Berlin) inference in SDE November 6, 2017 25 / 34

Example: A simple pendulum dx = Vdt, dv = γv + mgl sin(x ) ml 2 dt + d 1/2 dw t, N = 4000 data points (x, v) with t obs = 0.3 and known diffusion constant d = 1. GP with periodic kernel. f(x, v = 0) 10 5 0 5 10 6 4 2 0 2 4 6 x Manfred Opper, AI group, TU Berlin (TU Berlin) inference in SDE November 6, 2017 25 / 34

Drift estimation for SDE using empirical distribution For X R d consider SDE dx t = f (X t )dt + σ(x t )dw t Try to estimate the drift function g( ) given only empirical distribution of (noise free) observations ˆp(x) = 1 n n δ(x x i ) i=1 Manfred Opper, AI group, TU Berlin (TU Berlin) inference in SDE November 6, 2017 26 / 34

Drift estimation for SDE using empirical distribution For X R d consider SDE dx t = f (X t )dt + σ(x t )dw t Try to estimate the drift function g( ) given only empirical distribution of (noise free) observations ˆp(x) = 1 n n δ(x x i ) i=1 Possible, if drift has the form f (x) = g(x) + A(x) ψ(x) where A and g are known functions. Manfred Opper, AI group, TU Berlin (TU Berlin) inference in SDE November 6, 2017 26 / 34

Generalised score functional Define with ε[ψ] = { } 1 2 ψ(x) 2 A + L g ψ(x) p(x)dx L g ψ(x) = g(x) ψ(x) + 1 2 ij D ij (x) 2 ψ(x) x (i) x (j) and g(x) 2 A = g(x) A(x)g(x) and D(x). = σ(x)σ(x). Manfred Opper, AI group, TU Berlin (TU Berlin) inference in SDE November 6, 2017 27 / 34

Generalised score functional Define with ε[ψ] = { } 1 2 ψ(x) 2 A + L g ψ(x) p(x)dx L g ψ(x) = g(x) ψ(x) + 1 2 ij D ij (x) 2 ψ(x) x (i) x (j) and g(x) 2 A = g(x) A(x)g(x) and D(x). = σ(x)σ(x). δε[ψ] δψ = 0 yields stationary Fokker Planck equation L g p(x) (A(x)ψ(x)p(x)) = 0 where L g is the adjoint of L g L g p(x) = [ f (x)p(x) + 12 ] (D(x)p(x)). A = g = 0: Score matching (Hyvärinen, 2005), (Sriperumbudur et al, 2014) for density estimation. Manfred Opper, AI group, TU Berlin (TU Berlin) inference in SDE November 6, 2017 27 / 34

Consider pseudo log likelihood n i=1 { } 1 2 ψ(x i) 2 A + L g ψ(x i ) where x i p( ). = X (ti ), i = 1,..., n is sample from the stationary density Manfred Opper, AI group, TU Berlin (TU Berlin) inference in SDE November 6, 2017 28 / 34

Consider pseudo log likelihood where x i p( ) n i=1 { } 1 2 ψ(x i) 2 A + L g ψ(x i ). = X (ti ), i = 1,..., n is sample from the stationary density Combine with a GP prior over ψ yields estimator for drift of the SDE if this is of the form f (x) = g(x) + A(x) ψ(x) where A and g are given (Batz, Ruttor, Opper 2016). anfred Opper, AI group, TU Berlin (TU Berlin) inference in SDE November 6, 2017 28 / 34

Langevin dynamics Consider 2nd order SDE dx t = V t dt, dv t = (F (x) λv)dt + σ(x t, V t )dw t. and set A xx = A xv = 0 and A vv = I, f x = v, g v = λv (known) anfred Opper, AI group, TU Berlin (TU Berlin) inference in SDE November 6, 2017 29 / 34

Langevin dynamics Consider 2nd order SDE dx t = V t dt, dv t = (F (x) λv)dt + σ(x t, V t )dw t. and set A xx = A xv = 0 and A vv = I, f x = v, g v = λv (known) The condition f v (x, v) = λv + v ψ(x, v) = λv + F (x) is fulfilled with ψ(x, v) = v F (x) and allows for arbitrary F (x). Use GP prior over F. anfred Opper, AI group, TU Berlin (TU Berlin) inference in SDE November 6, 2017 29 / 34

Example 1: Periodic model: F (x) = a sin x, D v = (σ cos(x)) 2 with n = 2000 observations, time lag τ = 0.25 f(x) 10 5 0 5 10 6 4 2 0 2 4 6 x Manfred Opper, AI group, TU Berlin (TU Berlin) inference in SDE November 6, 2017 30 / 34

Example 2: Non conservative drift model: F (1) (x) = x (1) (1 (x (1) ) 2 (x (2) ) 2 ) x (2) F (2) (x) = x (2) (1 (x (1) ) 2 (x (2) ) 2 ) x (1) with friction λ known, but D unknown, n = 2000 observations. 3 1 0 1 2 3 3 2 1 0 1 2 3 Manfred Opper, AI group, TU Berlin (TU Berlin) inference in SDE November 6, 2017 31 / 34

The case A = D: Likelihood for densely observed path ln L(X 0:T g) = 1 2 with u, v. = u D 1 v. { f (Xt ) 2 D 1 dt 2 f (X t ), dx t } + const Manfred Opper, AI group, TU Berlin (TU Berlin) inference in SDE November 6, 2017 32 / 34

The case A = D: Likelihood for densely observed path ln L(X 0:T g) = 1 2 { f (Xt ) 2 D 1 dt 2 f (X t ), dx t } + const with u, v. = u D 1 v. Assume f = g + D ψ and apply Ito formula = 1 2 T 0 { ψ D ψ dt + 2g ψ dt 2 ψ dx t } Manfred Opper, AI group, TU Berlin (TU Berlin) inference in SDE November 6, 2017 32 / 34

The case A = D: Likelihood for densely observed path ln L(X 0:T g) = 1 2 { f (Xt ) 2 D 1 dt 2 f (X t ), dx t } + const with u, v. = u D 1 v. Assume f = g + D ψ and apply Ito formula = 1 2 = 1 2 T 0 T 0 { ψ D ψ dt + 2g ψ dt 2 ψ dx t } { ψ(x t ) 2 D + L g ψ(x t ) } dt ψ(x T ) + ψ(x 0 ) Manfred Opper, AI group, TU Berlin (TU Berlin) inference in SDE November 6, 2017 32 / 34

Outlook Beyond EM: Full variational Bayesian approximation Estimation of diffusion from sparse data Quality of sparse GP approximation? Langevin dynamics with unobserved velocities? Manfred Opper, AI group, TU Berlin (TU Berlin) inference in SDE November 6, 2017 33 / 34

Many thanks to my collaborators: Andreas Ruttor, Philipp Batz funded by EU STREP project Manfred Opper, AI group, TU Berlin (TU Berlin) inference in SDE November 6, 2017 34 / 34