Gaussian Process Approximations of Stochastic Differential Equations

Similar documents
A variational radial basis function approximation for diffusion processes

Gaussian Process Approximations of Stochastic Differential Equations

Gaussian Process Approximations of Stochastic Differential Equations

Lecture 1a: Basic Concepts and Recaps

Gaussian processes for inference in stochastic differential equations

Lagrangian Data Assimilation and Manifold Detection for a Point-Vortex Model. David Darmon, AMSC Kayo Ide, AOSC, IPST, CSCAMM, ESSIC

STA 4273H: Sta-s-cal Machine Learning

STA414/2104 Statistical Methods for Machine Learning II

Linear Dynamical Systems

The Variational Gaussian Approximation Revisited

Expectation propagation for signal detection in flat-fading channels

Pattern Recognition and Machine Learning

Lecture 6: Bayesian Inference in SDE Models

Learning with Noisy Labels. Kate Niehaus Reading group 11-Feb-2014

Continuum Limit of Forward Kolmogorov Equation Friday, March 06, :04 PM

State Space Gaussian Processes with Non-Gaussian Likelihoods

Efficient and Flexible Inference for Stochastic Systems

The connection of dropout and Bayesian statistics

Introduction to Probabilistic Graphical Models: Exercises

Control Theory in Physics and other Fields of Science

Deep learning with differential Gaussian process flows

Expectation Propagation in Dynamical Systems

Lecture: Gaussian Process Regression. STAT 6474 Instructor: Hongxiao Zhu

Numerical Solutions of ODEs by Gaussian (Kalman) Filtering

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 13: SEQUENTIAL DATA

STA 4273H: Statistical Machine Learning

Learning Energy-Based Models of High-Dimensional Data

Nonparametric Bayesian Methods (Gaussian Processes)

Different approaches to model wind speed based on stochastic differential equations

Non-Parametric Bayes

Bayesian Inference for Wind Field Retrieval

Machine Learning Techniques for Computer Vision

Least Absolute Shrinkage is Equivalent to Quadratic Penalization

Recurrent Latent Variable Networks for Session-Based Recommendation

Relative Merits of 4D-Var and Ensemble Kalman Filter

Active and Semi-supervised Kernel Classification

Stochastic Spectral Approaches to Bayesian Inference

Lecture 4: Introduction to stochastic processes and stochastic calculus

Bayesian Regression Linear and Logistic Regression

Introduction to Computational Stochastic Differential Equations

Recent Advances in Bayesian Inference Techniques

Data assimilation Schrödinger s perspective

SVMs, Duality and the Kernel Trick

Machine Learning Support Vector Machines. Prof. Matteo Matteucci

Organization. I MCMC discussion. I project talks. I Lecture.

Fundamentals of Data Assimila1on

Lecture 6: Multiple Model Filtering, Particle Filtering and Other Approximations

Lecture 3. Linear Regression II Bastian Leibe RWTH Aachen

Model Selection for Gaussian Processes

Stochastic Collocation Methods for Polynomial Chaos: Analysis and Applications

Lecture 2: From Linear Regression to Kalman Filter and Beyond

Machine Learning and Bayesian Inference. Unsupervised learning. Can we find regularity in data without the aid of labels?

Cheng Soon Ong & Christian Walder. Canberra February June 2018

The Kalman Filter ImPr Talk

Creating Non-Gaussian Processes from Gaussian Processes by the Log-Sum-Exp Approach. Radford M. Neal, 28 February 2005

Lecture 13 : Variational Inference: Mean Field Approximation

Approximate Inference Part 1 of 2

Introduction to Machine Learning

Expectation Maximization

PATTERN RECOGNITION AND MACHINE LEARNING

Log Gaussian Cox Processes. Chi Group Meeting February 23, 2016

Independent Component Analysis. Contents

Approximate Inference Part 1 of 2

Fundamentals of Data Assimilation

Afternoon Meeting on Bayesian Computation 2018 University of Reading

Mixtures of Gaussians. Sargur Srihari

Lecture 2: From Linear Regression to Kalman Filter and Beyond

DETECTING PROCESS STATE CHANGES BY NONLINEAR BLIND SOURCE SEPARATION. Alexandre Iline, Harri Valpola and Erkki Oja

Non-Factorised Variational Inference in Dynamical Systems

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.

Curve Fitting Re-visited, Bishop1.2.5

BAYESIAN DECISION THEORY

State-Space Methods for Inferring Spike Trains from Calcium Imaging

Statistical Rock Physics

ECE521 Lectures 9 Fully Connected Neural Networks

Expectation Propagation for Approximate Bayesian Inference

State Space Models for Wind Forecast Correction

Probabilistic Graphical Models

Variational Model Selection for Sparse Gaussian Process Regression

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012

Doubly Stochastic Inference for Deep Gaussian Processes. Hugh Salimbeni Department of Computing Imperial College London

Auto-Encoding Variational Bayes

Stochastic Models, Estimation and Control Peter S. Maybeck Volumes 1, 2 & 3 Tables of Contents

Brief Introduction of Machine Learning Techniques for Content Analysis

Covariance Matrix Simplification For Efficient Uncertainty Management

DATA ASSIMILATION FOR FLOOD FORECASTING

Approximate inference in Energy-Based Models

CS281 Section 4: Factor Analysis and PCA

Variational Autoencoder

Graphical Models for Statistical Inference and Data Assimilation

A new Hierarchical Bayes approach to ensemble-variational data assimilation

STA 414/2104: Machine Learning

Series 6, May 14th, 2018 (EM Algorithm and Semi-Supervised Learning)

Seminar: Data Assimilation

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012

Modelling Under Risk and Uncertainty

STA 4273H: Statistical Machine Learning

Lagrangian Data Assimilation and Its Application to Geophysical Fluid Flows

Gaussian Processes (10/16/13)

2D Image Processing (Extended) Kalman and particle filter

Transcription:

Gaussian Process Approximations of Stochastic Differential Equations Cédric Archambeau Centre for Computational Statistics and Machine Learning University College London c.archambeau@cs.ucl.ac.uk CSML 2007 Reading Group on SDEs Joint work with Manfred Opper (TU Berlin), John Shawe-Taylor (UCL) and Dan Cornford (Aston).

Context: numerical weather prediction (data assimilation) Model equations for the atmosphere are ordinary differential equations (deterministic) Stochasticity arises from: Dynamics simplifications Discretisation of continuous process Physics (unknown parameters) Observations are satellite radiances (indirect, possibly non-linear) Measurement error Decision making must take the uncertainty into account!

Issues with the current methods Double-well potential Ensemble Kalman smoother (Eyink, et al., 2002)

Motivation Data Assimilation: Improve current prediction tools Current methods cannot exploit large data sets Current methods fail on (highly) non-linear data Parameters are unknown (model, noise, observation operator) Machine Learning: Deal with uncertainty in a principled way Discover optimal data representation: find sparse solution learn the kernel Deal with non-linear data Deal with huge amount of data Deal with high dimensional data Deal with large noise

Outline Diffusion processes Gaussian approximation of the prior process Approximate posterior process after observing the data Smoothing algorithm (E-step) Parameter estimation (M-step) Conclusion

Continuous-time continuous-state stochastic processes Wiener Process: Almost surely continuous Almost surely non-differentiable Stochastic differential equation for a diffusion process: To be interpreted as an (Ito) stochastic integral:

Describing diffusion processes Fokker-Planck equation: Time evolution of the transition density Drift: instantaneous rate of change of the mean Diffusion: instantaneous rate of change of squared fluctuations Stochastic differential equation: Depends on same drift and diffusion terms Alternative representation of the transition density (Non-)linear SDE leads to (non-)gaussian transition density Kernel representation: Captures correlations between data Particular SDE induces a specific (two-time) kernel Time varying approximation of an SDE is equivalent to learning the kernel (hot topic!)

Gaussian approximation of the non-gaussian process Time varying linear approximation of the drift: Gaussian marginal density: Time evolution of the means and the covariances (consistency): (see for example second CSML reading group on SDEs)

Optimal (Gaussian) approximation of the prior process Euler-Maruyama discrete approximation: Probability density of a discrete-time sample path: Same stochastic noise! Optimal prior process: path integral Kullback-Leibler divergence!

Comment on the Kullback-Leibler divergence Is this measure a good criterion? Support

Including the observations Posterior process: Gaussian likelihood with linear observation operator (for simplicity): EM-type training algorithm: where the lower bound is defined as

E-step: estimating the optimal (latent) state path SDE Likelihood Find optimal variational functions Constrained optimization problem: ODE for the marginal means ODE for the marginal covariances Integrating the Lagrangian by parts and differentiating leads to: ODEs for the Lagrange multipliers Gradient for the variational functions

Smoothing algorithm: Fix initial conditions. Repeat until convergence: Initial Conditions Forward sweep: Propagate means and covariances forward in time: Backward sweep: Propagate Lagrange multipliers backward in time (adjoint operation): Use jump conditions when observations: Update variational functions: (scaled gradient step)

Double-well example Gaussian process regression Variational Gaussian approximation

Comparison to MCMC simulation Hybrid Monte Carlo approach: Reference solution Generate sample paths from posterior Modify scheme in order to increase acceptance rate (molecular dynamics) Still requires to generate 100,000 for good results Hard to check convergence (Y. Shen, Aston)

M-step: learning the parameters by maximum likelihood Linear transformation H Observations noise R Stochastic noise?

When things go wrong

Con lusion Machine Learning for Data Assimilation Modeling the uncertainty is essential Learning the kernel is a challenging new concern Bunch of suboptimal/intermediate good solutions? Promising results: quality of the solution & potential extensions Future work includes: High(er) dimensional data Check gradient approach (cf. stochastic noise) Simplifications when the force (drift) derives from a potential Investigate a full variational Bayesian treatment Combine the variational approach with MCMC? Paper available from: www.cs.ucl.ac.uk/staff/c.archambeau Research project: VISDEM (multi-site, EPSRC funded)