Expectation Propagation in Dynamical Systems

Similar documents
State-Space Inference and Learning with Gaussian Processes

PILCO: A Model-Based and Data-Efficient Approach to Policy Search

Lecture: Gaussian Process Regression. STAT 6474 Instructor: Hongxiao Zhu

Expectation Propagation Algorithm

MODEL BASED LEARNING OF SIGMA POINTS IN UNSCENTED KALMAN FILTERING. Ryan Turner and Carl Edward Rasmussen

Gaussian Process Approximations of Stochastic Differential Equations

Approximate Inference Part 1 of 2

Approximate Inference Part 1 of 2

Analytic Long-Term Forecasting with Periodic Gaussian Processes

STA 4273H: Statistical Machine Learning

Recent Advances in Bayesian Inference Techniques

Gaussian Process Latent Variable Models for Dimensionality Reduction and Time Series Modeling

Non-Gaussian likelihoods for Gaussian Processes

Identification of Gaussian Process State-Space Models with Particle Stochastic Approximation EM

STA 4273H: Statistical Machine Learning

13 : Variational Inference: Loopy Belief Propagation and Mean Field

Linear Dynamical Systems

Learning Gaussian Process Models from Uncertain Data

Expectation propagation for signal detection in flat-fading channels

Expectation Propagation for Approximate Bayesian Inference

Modelling and Control of Nonlinear Systems using Gaussian Processes with Partial Model Information

GWAS V: Gaussian processes

Identification of Gaussian Process State-Space Models with Particle Stochastic Approximation EM

Reinforcement Learning with Reference Tracking Control in Continuous State Spaces

Non Linear Latent Variable Models

Multiple-step Time Series Forecasting with Sparse Gaussian Processes

Efficient Reinforcement Learning for Motor Control

Part 1: Expectation Propagation

Bayesian Inference Course, WTCN, UCL, March 2013

Data-Driven Differential Dynamic Programming Using Gaussian Processes

Expectation Propagation in Factor Graphs: A Tutorial

Bayesian Machine Learning

GP-SUM. Gaussian Process Filtering of non-gaussian Beliefs

State Space Gaussian Processes with Non-Gaussian Likelihoods

NON-LINEAR NOISE ADAPTIVE KALMAN FILTERING VIA VARIATIONAL BAYES

Lecture 2: From Linear Regression to Kalman Filter and Beyond

RECURSIVE OUTLIER-ROBUST FILTERING AND SMOOTHING FOR NONLINEAR SYSTEMS USING THE MULTIVARIATE STUDENT-T DISTRIBUTION

Gaussian Process Dynamical Models Jack M Wang, David J Fleet, Aaron Hertzmann, NIPS 2005

Density Propagation for Continuous Temporal Chains Generative and Discriminative Models

Lecture 6: Bayesian Inference in SDE Models

Probabilistic and Bayesian Machine Learning

Nonparametric Bayesian Methods (Gaussian Processes)

Lecture 13 : Variational Inference: Mean Field Approximation

p L yi z n m x N n xi

Black-box α-divergence Minimization

Optimal Control with Learned Forward Models

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 13: SEQUENTIAL DATA

2-Step Temporal Bayesian Networks (2TBN): Filtering, Smoothing, and Beyond Technical Report: TRCIM1030

PATTERN RECOGNITION AND MACHINE LEARNING

Learning with Noisy Labels. Kate Niehaus Reading group 11-Feb-2014

Unsupervised Learning

Gaussian Process Approximations of Stochastic Differential Equations

13: Variational inference II

Introduction to Probabilistic Graphical Models: Exercises

GAUSSIAN PROCESS REGRESSION

Neutron inverse kinetics via Gaussian Processes

Probabilistic Reasoning in Deep Learning

Expectation propagation as a way of life

Stochastic Variational Inference for Gaussian Process Latent Variable Models using Back Constraints

Chris Bishop s PRML Ch. 8: Graphical Models

Bayesian Machine Learning - Lecture 7

The Particle Filter. PD Dr. Rudolph Triebel Computer Vision Group. Machine Learning for Computer Vision

NPFL108 Bayesian inference. Introduction. Filip Jurčíček. Institute of Formal and Applied Linguistics Charles University in Prague Czech Republic

Probabilistic Graphical Models Lecture 20: Gaussian Processes

Model-Based Reinforcement Learning with Continuous States and Actions

Probabilistic Graphical Models

CSci 8980: Advanced Topics in Graphical Models Gaussian Processes

Model Selection for Gaussian Processes

Data Modeling & Analysis Techniques. Probability & Statistics. Manfred Huber

Non-Factorised Variational Inference in Dynamical Systems

Kalman filtering and friends: Inference in time series models. Herke van Hoof slides mostly by Michael Rubinstein

Variable sigma Gaussian processes: An expectation propagation perspective

Introduction to Gaussian Processes

Gaussian Processes in Machine Learning

Probabilistic numerics for deep learning

Variational Inference (11/04/13)

Lecture : Probabilistic Machine Learning

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012

Machine Learning Techniques for Computer Vision

9 Forward-backward algorithm, sum-product on factor graphs

Pattern Recognition and Machine Learning

Robust Filtering and Smoothing with Gaussian Processes

CSE 473: Artificial Intelligence

Probabilistic Graphical Models

Mathematical Formulation of Our Example

Introduction to Machine Learning

Probabilistic Graphical Models for Image Analysis - Lecture 4

Power EP. Thomas Minka Microsoft Research Ltd., Cambridge, UK MSR-TR , October 4, Abstract

Variational Principal Components

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering

Gaussian Processes for Machine Learning

System identification and control with (deep) Gaussian processes. Andreas Damianou

Distributed Gaussian Processes

Lecture 1a: Basic Concepts and Recaps

Gaussian Processes (10/16/13)

2D Image Processing. Bayes filter implementation: Kalman filter

STA 414/2104: Machine Learning

STA 4273H: Statistical Machine Learning

Modeling and state estimation Examples State estimation Probabilities Bayes filter Particle filter. Modeling. CSC752 Autonomous Robotic Systems

Transcription:

Expectation Propagation in Dynamical Systems Marc Peter Deisenroth Joint Work with Shakir Mohamed (UBC) August 10, 2012 Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 1

Motivation Figure : Complex time series: motion capture, GDP, climate Time series in economics, robotics, motion capture, etc. have unknown dynamical structure, are high-dimensional and noisy Flexible and accurate models Nonlinear (Gaussian process) dynamical systems (GPDS) Accurate inference in (GP)DS important for Better knowledge about latent structures Parameter learning Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 2

Outline 1 Inference in Time Series Models Filtering and Smoothing Expectation Propagation Approximating the Partition Function Relation to Smoothing 2 EP in Gaussian Process Dynamical Systems Gaussian Processes Filtering/Smoothing in GPDS Expectation Propagation in GPDS 3 Results Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 3

Time Series Models Inference in Time Series Models Filtering and Smoothing x t 1 x t x t+1 z t 1 z t z t+1 x t = f(x t 1 ) + w, w N ( 0, Q ) z t = g(x t ) + v, v N ( 0, R ) Latent state x R D Measurement/observation z R E Transition function f Measurement function g Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 4

Inference in Time Series Models Filtering and Smoothing x t 1 x t x t+1 z t 1 z t z t+1 Objective: Posterior distribution over latent variables x t Filtering (Forward Inference) Compute p(x t z 1:t ) for t = 1,..., T Smoothing (Forward-Backward Inference) Compute p(x t z 1:t ) for t = 1,..., T (forward sweep) Compute p(x t z 1:T ) for t = T,..., 1 (backward sweep) Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 5

Inference in Time Series Models Filtering and Smoothing x t 1 x t x t+1 z t 1 z t z t+1 Objective: Posterior distribution over latent variables x t Filtering (Forward Inference) Compute p(x t z 1:t ) for t = 1,..., T Smoothing (Forward-Backward Inference) Compute p(x t z 1:t ) for t = 1,..., T (forward sweep) Compute p(x t z 1:T ) for t = T,..., 1 (backward sweep) Examples: Linear systems: Kalman filter/smoother (Kalman, 1959) Nonlinear systems: Approximate inference Extended Kalman Filter/Smoother (Kalman, 1959 1961) Unscented Kalman Filter/Smoother (Julier & Uhlmann, 1997) Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 5

Filtering and Smoothing Machine Learning Perspective x t 1 x t x t+1 z t 1 z t z t+1 Treat filtering/smoothing as an inference problem in graphical models with hidden variables Allows for efficient local message passing Messages are unnormalized probability distributions distributed Iterative refinement of the posterior marginals p(x t ), t = 1,..., T Multiple forward-backward sweeps until global consistency (convergence) Here: Expectation Propagation (Minka 2001) Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 6

Expectation Propagation Expectation Propagation x t 1 x t x t+1 p(x t+1 x t ) x t x t+1 z t 1 z t z t+1 p(z t x t ) p(z t+1 x t+1 ) Inference in factor graphs Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 7

Expectation Propagation Expectation Propagation x t 1 x t x t+1 p(x t+1 x t ) x t x t+1 z t 1 z t z t+1 p(z t x t ) p(z t+1 x t+1 ) Inference in factor graphs p(x t ) = n i=1 t i(x t ) q(x t ) = n i=1 t i (x t ) Approximate factors t i are members of the Exponential Family (e.g., Multinomial, Gamma, Gaussian) Find good a good approximation such that q p Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 7

Expectation Propagation Expectation Propagation Figure : Moment matching vs. mode matching. Borrowed from Bishop (2006) EP locally minimizes KL(p q), where p is the true distribution and q is an approximation (from Exponential Family) to it. EP = moment matching (unlike Variational Bayes [ mode matching ], which minimizes KL(q p)) EP exploits properties of the Exponential Family: Compute moments of distributions via derivatives of the log-partition function Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 8

Expectation Propagation Expectation Propagation q (x t) x t p(x t+1 x t) x t+1 q (x t+1) q (x t) x t q (x t) q (x t+1) x t+1 q (x t+1) q (x t) q (x t+1) p(z t x t) p(z t+1 x t+1) q (x t) q (x t+1) Figure : Factor graph (left) and fully factored factor graph (right). Write down the (fully factored) factor graph p(x t ) = n i=1 t i(x t ) q(x t ) = n i=1 t i (x t ) Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 9

Expectation Propagation Expectation Propagation q (x t) x t p(x t+1 x t) x t+1 q (x t+1) q (x t) x t q (x t) q (x t+1) x t+1 q (x t+1) q (x t) q (x t+1) p(z t x t) p(z t+1 x t+1) q (x t) q (x t+1) Figure : Factor graph (left) and fully factored factor graph (right). Write down the (fully factored) factor graph p(x t ) = n i=1 t i(x t ) q(x t ) = n i=1 t i (x t ) Find approximate t i, such that KL(p q) is minimized. Multiple sweeps through graph until global consistency of the messages is assured Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 9

Expectation Propagation Messages in a Dynamical System q (x t ) x t q (x t ) q (x t+1 ) x t+1 q (x t+1 ) q (x t ) q (x t+1 ) Approximate (factored) marginal: q(x t ) = i t i (x t ) Here, our messages t i have names: Measurement message q Forward message q Backward message q Define cavity distribution: q \i (x t ) = q(x t )/ t i (x t ) = k i t k (x t ) Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 10

Gaussian EP in More Detail Expectation Propagation 1 Write down the factor graph q (xt) xt q (xt) q (xt+1) xt+1 q (xt+1) q (xt) q (xt+1) Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 11

Expectation Propagation Gaussian EP in More Detail 1 Write down the factor graph 2 Initialize all messages t i, i =,, Until convergence: q (xt) xt q (xt) q (xt) q (xt+1) xt+1 q (xt+1) q (xt+1) Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 11

Gaussian EP in More Detail Expectation Propagation 1 Write down the factor graph 2 Initialize all messages t i, i =,, Until convergence: q (xt) xt q (xt) q (xt) q (xt+1) xt+1 q (xt+1) 3 For all latent variables x t and corresponding messages t i (x t ) do q (xt+1) Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 11

Gaussian EP in More Detail Expectation Propagation 1 Write down the factor graph 2 Initialize all messages t i, i =,, Until convergence: q (xt) xt q (xt) q (xt) q (xt+1) xt+1 q (xt+1) 3 For all latent variables x t and corresponding messages t i (x t ) do 1 Compute the cavity distribution q \i (x t ) = N ( x t µ \i t, Σ \i ) t by Gaussian division. q (xt+1) Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 11

Gaussian EP in More Detail Expectation Propagation 1 Write down the factor graph 2 Initialize all messages t i, i =,, Until convergence: q (xt) xt q (xt) q (xt) q (xt+1) xt+1 q (xt+1) 3 For all latent variables x t and corresponding messages t i (x t ) do 1 Compute the cavity distribution q \i (x t ) = N ( x t µ \i t, Σ \i ) t by Gaussian division. 2 Compute the moments of t i (x t )q \i (x t ) Updated moments of q(x t ) q (xt+1) Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 11

Gaussian EP in More Detail Expectation Propagation 1 Write down the factor graph 2 Initialize all messages t i, i =,, Until convergence: q (xt) xt q (xt) q (xt) q (xt+1) xt+1 q (xt+1) 3 For all latent variables x t and corresponding messages t i (x t ) do 1 Compute the cavity distribution q \i (x t ) = N ( x t µ \i t, Σ \i ) t by Gaussian division. 2 Compute the moments of t i (x t )q \i (x t ) Updated moments of q(x t ) 3 Compute updated message q (xt+1) t i (x t ) = q(x t )/q \i (x t ) Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 11

Expectation Propagation Updating the Measurement Message q (x t ) x t q (x t ) q (x t ) Measurement message true factor cavity distr. {}}{{}}{ q (x t ) = proj[ t (x t ) q \ (x t ) ] q \ (x t ) The proj[.] operator projects onto Exponential Family distributions Implemented by taking derivatives of the log partition function log Z, where Z = t (x t )q \ (x t )dx t, t (x t ) = p(z t x t ) Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 12

Expectation Propagation Updating in Context: Forward Message q (x t) x t p(x t+1 x t) x t+1 q (x t+1) q (x t) x t q (x t) q (x t+1) x q t+1 (x t+1) q (x t) q (x t+1) q (x t) q (x t+1) Forward message Need to take the coupling between x t and x t+1 into account (lost when writing down the fully factored factor graph). Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 13

Expectation Propagation Updating in Context: Forward Message q (x t) x t p(x t+1 x t) x t+1 q (x t+1) q (x t) x t q (x t) q (x t+1) x q t+1 (x t+1) q (x t) q (x t+1) q (x t) q (x t+1) Forward message Need to take the coupling between x t and x t+1 into account (lost when writing down the fully factored factor graph). Key insight: Want a close approximation q (x t+1 )q (x t+1 ) q (x t+1 ) q \ (x t+1 ) p(x t+1 x t )q (x t )q (x t )dx t }{{} context q \ (x t+1 ) Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 13

Expectation Propagation Updating in Context: Forward Message q (x t) x t p(x t+1 x t) x t+1 q (x t+1) q (x t) x t q (x t) q (x t+1) x q t+1 (x t+1) q (x t) q (x t+1) q (x t) q (x t+1) Forward message Need to take the coupling between x t and x t+1 into account (lost when writing down the fully factored factor graph). Key insight: Want a close approximation q (x t+1 )q (x t+1 ) q (x t+1 ) q \ (x t+1 ) p(x t+1 x t )q (x t )q (x t )dx t }{{} context q \ (x t+1 ) cavity distr. Achieve this by projection {}}{ true factor {}}{ q (x t+1 ) = proj[ q \ (x t+1 ) t (x t+1 )] q \, (x t+1 ) t (x t+1 ) = p(x t+1 x t )q (x t )q (x t )dx t Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 13

Key Points and Challenge Approximating the Partition Function EP is based on matching the moments of t i (x t )q \i (x t ) Computing the partition function Z i (µ \i t, Σ\i t ) = t i (x t )q \i (x t )dx t and its derivatives with respect to µ \i t and Σ \i t are sufficient for EP Properties of the Exponential Family Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 14

Key Points and Challenge Approximating the Partition Function EP is based on matching the moments of t i (x t )q \i (x t ) Computing the partition function Z i (µ \i t, Σ\i t ) = t i (x t )q \i (x t )dx t and its derivatives with respect to µ \i t and Σ \i t are sufficient for EP Properties of the Exponential Family Tricky part: Integral not solvable for nonlinear systems with continuous variables Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 14

Approach Inference in Time Series Models Approximating the Partition Function Interpretation of partition function Z i as a probability distribution. Example: Measurement message Z = t (x)q \ (x)dx = p(z x)q \ (x)dx = p(z) Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 15

Approach Inference in Time Series Models Approximating the Partition Function Interpretation of partition function Z i as a probability distribution. Example: Measurement message Z = t (x)q \ (x)dx = p(z x)q \ (x)dx = p(z) Idea: Approximate p(z) by a (Gaussian) distribution Z Take the derivatives of log Z with respect to the moments of the cavity distribution Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 15

Approach Inference in Time Series Models Approximating the Partition Function Interpretation of partition function Z i as a probability distribution. Example: Measurement message Z = t (x)q \ (x)dx = p(z x)q \ (x)dx = p(z) Idea: Approximate p(z) by a (Gaussian) distribution Z Take the derivatives of log Z with respect to the moments of the cavity distribution Get updated moments for the posterior and the messages Fixes the intractability problems, but we are no longer exact Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 15

Possible Gaussian Approximations Approximating the Partition Function Example: Measurement message Z = t (x)q \ (x)dx = t (x) = N ( z g(x), S ) t (x)n ( x µ \, Σ \ ) dx Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 16

Possible Gaussian Approximations Approximating the Partition Function Example: Measurement message Z = t (x)q \ (x)dx = t (x) = N ( z g(x), S ) t (x)n ( x µ \, Σ \ ) dx Linearize g at µ \ integral tractable Gaussian moment matching: compute mean and variance of Z approximate Z by a Gaussian with the correct mean/variance Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 16

Theoretical Results Inference in Time Series Models Relation to Smoothing Z = t (x)q \ (x)dx = t (x) = N ( z g(x), S ) t (x)n ( x µ \, Σ \ ) dx Relation to Common Filters/Smoothers Approximating Z by a Gaussian Z is equivalent to approximating p(x, z) by a Gaussian an approximation that is common to almost all filtering algorithms a a Deisenroth & Ohlsson (ACC 2011) Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 17

Theoretical Results Inference in Time Series Models Relation to Smoothing Z = t (x)q \ (x)dx = t (x) = N ( z g(x), S ) t (x)n ( x µ \, Σ \ ) dx Relation to Common Filters/Smoothers Approximating Z by a Gaussian Z is equivalent to approximating p(x, z) by a Gaussian an approximation that is common to almost all filtering algorithms a a Deisenroth & Ohlsson (ACC 2011) Generalizing Common Smoothers Linearizing g(x) in Z generalizes the EKS to an iterative procedure Moment matching generalizes the ADS to an iterative procedure Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 17

Relation to Smoothing Interesting Side Effects To minimize the KL divergence, EP updates require the derivatives log Z µ \, log Z Σ \ 1 Deisenroth & Mohamed (arxiv preprint, 2012) Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 18

Interesting Side Effects Relation to Smoothing To minimize the KL divergence, EP updates require the derivatives log Z µ \, log Z Σ \ The Gaussian approximation of Z = p(z) N ( ) µ z, Σ z is exact if and only if there is a linear relationship between x and z, i.e., z = Jx, x N ( µ \, Σ \ ) for some J µ z, Σ z have a special form 1 Deisenroth & Mohamed (arxiv preprint, 2012) Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 18

Interesting Side Effects Relation to Smoothing To minimize the KL divergence, EP updates require the derivatives log Z µ \, log Z Σ \ The Gaussian approximation of Z = p(z) N ( ) µ z, Σ z is exact if and only if there is a linear relationship between x and z, i.e., z = Jx, x N ( µ \, Σ \ ) for some J µ z, Σ z have a special form Linearity must be explicitly encoded in the partial derivatives! 1 Deisenroth & Mohamed (arxiv preprint, 2012) Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 18

Interesting Side Effects Relation to Smoothing To minimize the KL divergence, EP updates require the derivatives log Z µ \, log Z Σ \ The Gaussian approximation of Z = p(z) N ( ) µ z, Σ z is exact if and only if there is a linear relationship between x and z, i.e., z = Jx, x N ( µ \, Σ \ ) for some J µ z, Σ z have a special form Linearity must be explicitly encoded in the partial derivatives! Example: log Z µ \ = log Z µ z µ z µ \ = (z µ z) Σ 1 z Even if µ z is a general function of µ \ and Σ \, this must be ignored. Otherwise: Inconsistent EP updates! 1 1 Deisenroth & Mohamed (arxiv preprint, 2012) Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 18 J

Illustration: Toy Tracking Problem Relation to Smoothing 4 2 Ground truth EKS State 0 2 4 5 10 15 20 Time step Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 19

Illustration: Toy Tracking Problem Relation to Smoothing 4 2 Ground truth EKS 4 2 Ground truth EP EKS State 0 State 0 2 2 4 4 5 10 15 20 Time step 5 10 15 20 Time step Iteratively improving the posteriors via EP can heal the the EKS Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 19

EP in Gaussian Process Dynamical Systems Gaussian Process Dynamical Systems x t 1 x t x t+1 z t 1 z t z t+1 x t = f(x t 1 ) + w, w N ( 0, Q ) z t = g(x t ) + v, v N ( 0, R ) State x (not observed) Measurement/observation z GP distribution p(f) over transition function f GP distribution p(g) over measurement function g Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 20

EP in Gaussian Process Dynamical Systems Gaussian Processes Gaussian Processes for Flexible Modeling Non-parametric method flexible, i.e., shape of function adapts to data Probabilistic method consistently describes uncertainties about the unknown function Sufficient: specification of high-level assumptions (e.g., smoothness) Automatic trade-off between data-fit and complexity of the function (Occam s razor) 2 x t 0 2 5 4 3 2 1 0 1 2 3 4 5 6 7 8 9 10 (x t 1, u t 1 ) Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 21

EP in Gaussian Process Dynamical Systems Gaussian Process Regression Gaussian Processes Mathematically: Probability distribution over functions Bayesian inference tractable: 1 Specify high-level prior beliefs p(f) about the function (e.g., smoothness) 2 Observe data X, y = f(x) + ε 3 Compute posterior distribution p(f X, y) over functions Bayes theorem: p(f X, y) = p(y X, f)p(f) p(y X) p(f): Prior (over functions) p(y X, f): Likelihood (noise model) p(f X, y): Posterior (over functions) Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 22

EP in Gaussian Process Dynamical Systems Gaussian Processes Pictorial Introduction to Gaussian Processes 3 2 1 f(x) 0 1 2 3 5 0 5 x Prior belief about the function. Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 23

EP in Gaussian Process Dynamical Systems Gaussian Processes Pictorial Introduction to Gaussian Processes 3 2 1 f(x) 0 1 2 3 5 0 5 x Observe some function values. Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 23

EP in Gaussian Process Dynamical Systems Gaussian Processes Pictorial Introduction to Gaussian Processes 3 2 1 f(x) 0 1 2 3 5 0 5 x Posterior belief about the function. Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 23

EP in Gaussian Process Dynamical Systems Filtering/Smoothing in GPDS Gaussian Process Dynamical Systems x t 1 x t x t+1 z t 1 z t z t+1 x t = f(x t 1 ) + w, w N ( 0, Q ) z t = g(x t ) + v, v N ( 0, R ) GP distribution p(f) over transition function f GP distribution p(g) over measurement function g Let s talk about inference in GPDSs Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 24

EP in Gaussian Process Dynamical Systems Inference in GPDS Filtering/Smoothing in GPDS t t 1 0.5 0 0.5 1 0 1 2 3 p( ) t p(x t 1, u t 1 ) 1 0 1 0.5 0 0.5 1 (x t 1, u t 1 ) Objective: Gaussian approximations to the joints p(x t, z t z 1:t 1 ) and p(x t 1, x t z 1:t 1 ) sufficient for Gaussian filtering/smoothing 2 2 Deisenroth & Ohlsson (ACC 2011) 3 Deisenroth et al. (ICML 2009), Deisenroth et al. (IEEE-TAC, 2012) Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 25

EP in Gaussian Process Dynamical Systems Inference in GPDS Filtering/Smoothing in GPDS t t 1 0.5 0 0.5 1 0 1 2 3 p( ) t p(x t 1, u t 1 ) 1 0 1 0.5 0 0.5 1 (x t 1, u t 1 ) Objective: Gaussian approximations to the joints p(x t, z t z 1:t 1 ) and p(x t 1, x t z 1:t 1 ) sufficient for Gaussian filtering/smoothing 2 Mapping distributions through a GP requires approximations, e.g., Linearization of the posterior GP mean function (red) Moment matching (blue) Filtering/smoothing in GPDS 3 : GP-EKS, GP-ADS, GP-CKS,... 2 Deisenroth & Ohlsson (ACC 2011) 3 Deisenroth et al. (ICML 2009), Deisenroth et al. (IEEE-TAC, 2012) Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 25

EP in Gaussian Process Dynamical Systems Expectation Propagation in GPDS EP in GPDS Generalize single-sweep forward-backward smoothing in GPDSs to an iterative procedure using EP Slightly more involved than EP in nonlinear systems (e.g., EP-EKS) Also have to average over function distribution (GP) 4 Deisenroth & Mohamed (arxiv preprint, 2012) Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 26

EP in Gaussian Process Dynamical Systems Expectation Propagation in GPDS EP in GPDS Generalize single-sweep forward-backward smoothing in GPDSs to an iterative procedure using EP Slightly more involved than EP in nonlinear systems (e.g., EP-EKS) Also have to average over function distribution (GP) Key idea the same as before: Approximate the partition function by a Gaussian distribution 4 Linearization of the posterior mean function (e.g., Ko & Fox, 2009) EP-GPEKS Moment matching (e.g., Quiñonero-Candela et al., 2003) EP-GPADS 4 Deisenroth & Mohamed (arxiv preprint, 2012) Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 26

Results Results: Synthetic Data (1) 4 2 f(x) 0 2 4 Ground truth Training data GP 5 0 5 x Figure : GP model with training set and ground truth x t+1 = 4 sin(4x t ) + w, w N ( 0, 0.1 2) z t = 4 sin(4x t ) + v, v N ( 0, 0.1 2) Initial state distribution p(x 1 ) = N ( 0, 1 ) very broad 30 training points for GP models, randomly selected Tracking horizon: 20 time steps Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 27

Results Results: Synthetic Data (2) state 6 4 2 0 2 True state Posterior state distribution (EP GPADS) Posterior state distribution (GPADS) 4 0 5 10 15 Time step (a) Posterior trajectories with confidence bounds. Average NLL per data point 2 1 0 1 2 EP GPADS GPADS 5 10 15 20 25 30 EP iteration (b) Average NLL as a function of the EP iteration with standard error. After convergence, the posterior is spot on (left) Iterating EP greatly improves predictive power (right) Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 28

Results Results: Pendulum Tracking Pendulum Method NLL x MAE x LPU x GPEKS 0.29 ± 0.30 0.30 ± 0.02 2.76 ± 0.12 EP-GPEKS 0.24 ± 0.33 0.31 ± 0.02 2.77 ± 0.12 GPADS 0.75 ± 0.06 0.29 ± 0.02 2.52 ± 0.06 EP-GPADS 0.79 ± 0.06 0.29 ± 0.02 2.58 ± 0.04 NLL: negative log likelihood MAE: mean absolute error LPU: log posterior uncertainty predictive performance error of the posterior mean tightness of the posterior Linearization-based inference: Variances too small EP makes things worse Moment-matching based inference: Coherent estimates EP improves posterior Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 29

Results Results: Motion Capture Data 10 trials of golf swings recorded at 40 Hz (mocap.cs.cmu.edu) Observations z R 56 Latent space x R 3 7 training sequences, 3 test sequences GPDS learning via GPDM approach (Wang et al., 2008) Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 30

Results Results: Motion Capture Data Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 31

Results Summary General framework for iterative inference in dynamical systems Key: Approximation of the partition function Rederive classical filters/smoothers as a special case Promising results in (GP)DS marc@ias.tu-darmstadt.de http://www.ias.tu-darmstadt.de/team/marcdeisenroth Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 32

Results References [1] C. M. Bishop. Pattern Recognition and Machine Learning. Information Science and Statistics. Springer-Verlag, 2006. [2] M. P. Deisenroth, M. F. Huber, and U. D. Hanebeck. Analytic Moment-based Gaussian Process Filtering. In L. Bouttou and M. L. Littman, editors, Proceedings of the 26th International Conference on Machine Learning, pages 225 232, Montreal, QC, Canada, June 2009. Omnipress. [3] M. P. Deisenroth and S. Mohamed. Expectation Propagation in Gaussian Process Dynamical Systems, July 2012. http://arxiv.org/abs/1207.2940. [4] M. P. Deisenroth and H. Ohlsson. A General Perspective on Gaussian Filtering and Smoothing: Explaining Current and Deriving New Algorithms. In Proceedings of the American Control Conference, 2011. [5] M. P. Deisenroth, R. Turner, M. Huber, U. D. Hanebeck, and C. E. Rasmussen. Robust Filtering and Smoothing with Gaussian Processes. IEEE Transactions on Automatic Control, 57(7):1865 1871, 2012. doi:10.1109/tac.2011.2179426. [6] S. J. Julier and J. K. Uhlmann. A New Extension of the Kalman Filter to Nonlinear Systems. In Proceedings of AeroSense: 11th Symposium on Aerospace/Defense Sensing, Simulation and Controls, pages 182 193, 1997. [7] R. E. Kalman. A New Approach to Linear Filtering and Prediction Problems. Transactions of the ASME Journal of Basic Engineering, 82(Series D):35 45, 1960. [8] J. Ko and D. Fox. GP-BayesFilters: Bayesian Filtering using Gaussian Process Prediction and Observation Models. Autonomous Robots, 27(1):75 90, July 2009. [9] T. P. Minka. A Family of Algorithms for Approximate Bayesian Inference. PhD thesis, Massachusetts Institute of Technology, Cambridge, MA, USA, January 2001. [10] J. Quiñonero-Candela, A. Girard, J. Larsen, and C. E. Rasmussen. Propagation of Uncertainty in Bayesian Kernel Models Application to Multiple-Step Ahead Forecasting. In IEEE International Conference on Acoustics, Speech and Signal Processing, volume 2, pages 701 704, April 2003. [11] J. M. Wang, D. J. Fleet, and A. Hertzmann. Gaussian Process Dynamical Models for Human Motion. IEEE Transactions on Pattern Analysis and Machine Intelligence, 30(2):283 298, 2008. Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 33