Expectation Propagation in Dynamical Systems

Expectation Propagation in Dynamical Systems Marc Peter Deisenroth Joint Work with Shakir Mohamed (UBC) August 10, 2012 Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 1

Motivation Figure : Complex time series: motion capture, GDP, climate Time series in economics, robotics, motion capture, etc. have unknown dynamical structure, are high-dimensional and noisy Flexible and accurate models Nonlinear (Gaussian process) dynamical systems (GPDS) Accurate inference in (GP)DS important for Better knowledge about latent structures Parameter learning Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 2

Outline 1 Inference in Time Series Models Filtering and Smoothing Expectation Propagation Approximating the Partition Function Relation to Smoothing 2 EP in Gaussian Process Dynamical Systems Gaussian Processes Filtering/Smoothing in GPDS Expectation Propagation in GPDS 3 Results Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 3

Time Series Models Inference in Time Series Models Filtering and Smoothing x t 1 x t x t+1 z t 1 z t z t+1 x t = f(x t 1 ) + w, w N ( 0, Q ) z t = g(x t ) + v, v N ( 0, R ) Latent state x R D Measurement/observation z R E Transition function f Measurement function g Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 4

Inference in Time Series Models Filtering and Smoothing x t 1 x t x t+1 z t 1 z t z t+1 Objective: Posterior distribution over latent variables x t Filtering (Forward Inference) Compute p(x t z 1:t ) for t = 1,..., T Smoothing (Forward-Backward Inference) Compute p(x t z 1:t ) for t = 1,..., T (forward sweep) Compute p(x t z 1:T ) for t = T,..., 1 (backward sweep) Examples: Linear systems: Kalman filter/smoother (Kalman, 1959) Nonlinear systems: Approximate inference Extended Kalman Filter/Smoother (Kalman, 1959 1961) Unscented Kalman Filter/Smoother (Julier & Uhlmann, 1997) Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 5

Filtering and Smoothing Machine Learning Perspective x t 1 x t x t+1 z t 1 z t z t+1 Treat filtering/smoothing as an inference problem in graphical models with hidden variables Allows for efficient local message passing Messages are unnormalized probability distributions distributed Iterative refinement of the posterior marginals p(x t ), t = 1,..., T Multiple forward-backward sweeps until global consistency (convergence) Here: Expectation Propagation (Minka 2001) Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 6

Expectation Propagation Expectation Propagation x t 1 x t x t+1 p(x t+1 x t ) x t x t+1 z t 1 z t z t+1 p(z t x t ) p(z t+1 x t+1 ) Inference in factor graphs Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 7

Expectation Propagation Expectation Propagation x t 1 x t x t+1 p(x t+1 x t ) x t x t+1 z t 1 z t z t+1 p(z t x t ) p(z t+1 x t+1 ) Inference in factor graphs p(x t ) = n i=1 t i(x t ) q(x t ) = n i=1 t i (x t ) Approximate factors t i are members of the Exponential Family (e.g., Multinomial, Gamma, Gaussian) Find good a good approximation such that q p Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 7

Expectation Propagation Expectation Propagation Figure : Moment matching vs. mode matching. Borrowed from Bishop (2006) EP locally minimizes KL(p q), where p is the true distribution and q is an approximation (from Exponential Family) to it. EP = moment matching (unlike Variational Bayes [ mode matching ], which minimizes KL(q p)) EP exploits properties of the Exponential Family: Compute moments of distributions via derivatives of the log-partition function Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 8

Expectation Propagation Expectation Propagation q (x t) x t p(x t+1 x t) x t+1 q (x t+1) q (x t) x t q (x t) q (x t+1) x t+1 q (x t+1) q (x t) q (x t+1) p(z t x t) p(z t+1 x t+1) q (x t) q (x t+1) Figure : Factor graph (left) and fully factored factor graph (right). Write down the (fully factored) factor graph p(x t ) = n i=1 t i(x t ) q(x t ) = n i=1 t i (x t ) Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 9

Expectation Propagation Expectation Propagation q (x t) x t p(x t+1 x t) x t+1 q (x t+1) q (x t) x t q (x t) q (x t+1) x t+1 q (x t+1) q (x t) q (x t+1) p(z t x t) p(z t+1 x t+1) q (x t) q (x t+1) Figure : Factor graph (left) and fully factored factor graph (right). Write down the (fully factored) factor graph p(x t ) = n i=1 t i(x t ) q(x t ) = n i=1 t i (x t ) Find approximate t i, such that KL(p q) is minimized. Multiple sweeps through graph until global consistency of the messages is assured Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 9

Expectation Propagation Messages in a Dynamical System q (x t ) x t q (x t ) q (x t+1 ) x t+1 q (x t+1 ) q (x t ) q (x t+1 ) Approximate (factored) marginal: q(x t ) = i t i (x t ) Here, our messages t i have names: Measurement message q Forward message q Backward message q Define cavity distribution: q \i (x t ) = q(x t )/ t i (x t ) = k i t k (x t ) Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 10

Gaussian EP in More Detail Expectation Propagation 1 Write down the factor graph q (xt) xt q (xt) q (xt+1) xt+1 q (xt+1) q (xt) q (xt+1) Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 11

Expectation Propagation Gaussian EP in More Detail 1 Write down the factor graph 2 Initialize all messages t i, i =,, Until convergence: q (xt) xt q (xt) q (xt) q (xt+1) xt+1 q (xt+1) q (xt+1) Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 11

Gaussian EP in More Detail Expectation Propagation 1 Write down the factor graph 2 Initialize all messages t i, i =,, Until convergence: q (xt) xt q (xt) q (xt) q (xt+1) xt+1 q (xt+1) 3 For all latent variables x t and corresponding messages t i (x t ) do 1 Compute the cavity distribution q \i (x t ) = N ( x t µ \i t, Σ \i ) t by Gaussian division. q (xt+1) Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 11

Gaussian EP in More Detail Expectation Propagation 1 Write down the factor graph 2 Initialize all messages t i, i =,, Until convergence: q (xt) xt q (xt) q (xt) q (xt+1) xt+1 q (xt+1) 3 For all latent variables x t and corresponding messages t i (x t ) do 1 Compute the cavity distribution q \i (x t ) = N ( x t µ \i t, Σ \i ) t by Gaussian division. 2 Compute the moments of t i (x t )q \i (x t ) Updated moments of q(x t ) q (xt+1) Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 11

Gaussian EP in More Detail Expectation Propagation 1 Write down the factor graph 2 Initialize all messages t i, i =,, Until convergence: q (xt) xt q (xt) q (xt) q (xt+1) xt+1 q (xt+1) 3 For all latent variables x t and corresponding messages t i (x t ) do 1 Compute the cavity distribution q \i (x t ) = N ( x t µ \i t, Σ \i ) t by Gaussian division. 2 Compute the moments of t i (x t )q \i (x t ) Updated moments of q(x t ) 3 Compute updated message q (xt+1) t i (x t ) = q(x t )/q \i (x t ) Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 11

Expectation Propagation Updating the Measurement Message q (x t ) x t q (x t ) q (x t ) Measurement message true factor cavity distr. {}}{{}}{ q (x t ) = proj[ t (x t ) q \ (x t ) ] q \ (x t ) The proj[.] operator projects onto Exponential Family distributions Implemented by taking derivatives of the log partition function log Z, where Z = t (x t )q \ (x t )dx t, t (x t ) = p(z t x t ) Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 12

Expectation Propagation Updating in Context: Forward Message q (x t) x t p(x t+1 x t) x t+1 q (x t+1) q (x t) x t q (x t) q (x t+1) x q t+1 (x t+1) q (x t) q (x t+1) q (x t) q (x t+1) Forward message Need to take the coupling between x t and x t+1 into account (lost when writing down the fully factored factor graph). Key insight: Want a close approximation q (x t+1 )q (x t+1 ) q (x t+1 ) q \ (x t+1 ) p(x t+1 x t )q (x t )q (x t )dx t }{{} context q \ (x t+1 ) Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 13

Expectation Propagation Updating in Context: Forward Message q (x t) x t p(x t+1 x t) x t+1 q (x t+1) q (x t) x t q (x t) q (x t+1) x q t+1 (x t+1) q (x t) q (x t+1) q (x t) q (x t+1) Forward message Need to take the coupling between x t and x t+1 into account (lost when writing down the fully factored factor graph). Key insight: Want a close approximation q (x t+1 )q (x t+1 ) q (x t+1 ) q \ (x t+1 ) p(x t+1 x t )q (x t )q (x t )dx t }{{} context q \ (x t+1 ) cavity distr. Achieve this by projection {}}{ true factor {}}{ q (x t+1 ) = proj[ q \ (x t+1 ) t (x t+1 )] q \, (x t+1 ) t (x t+1 ) = p(x t+1 x t )q (x t )q (x t )dx t Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 13

Key Points and Challenge Approximating the Partition Function EP is based on matching the moments of t i (x t )q \i (x t ) Computing the partition function Z i (µ \i t, Σ\i t ) = t i (x t )q \i (x t )dx t and its derivatives with respect to µ \i t and Σ \i t are sufficient for EP Properties of the Exponential Family Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 14

Key Points and Challenge Approximating the Partition Function EP is based on matching the moments of t i (x t )q \i (x t ) Computing the partition function Z i (µ \i t, Σ\i t ) = t i (x t )q \i (x t )dx t and its derivatives with respect to µ \i t and Σ \i t are sufficient for EP Properties of the Exponential Family Tricky part: Integral not solvable for nonlinear systems with continuous variables Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 14

Approach Inference in Time Series Models Approximating the Partition Function Interpretation of partition function Z i as a probability distribution. Example: Measurement message Z = t (x)q \ (x)dx = p(z x)q \ (x)dx = p(z) Idea: Approximate p(z) by a (Gaussian) distribution Z Take the derivatives of log Z with respect to the moments of the cavity distribution Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 15

Approach Inference in Time Series Models Approximating the Partition Function Interpretation of partition function Z i as a probability distribution. Example: Measurement message Z = t (x)q \ (x)dx = p(z x)q \ (x)dx = p(z) Idea: Approximate p(z) by a (Gaussian) distribution Z Take the derivatives of log Z with respect to the moments of the cavity distribution Get updated moments for the posterior and the messages Fixes the intractability problems, but we are no longer exact Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 15

Possible Gaussian Approximations Approximating the Partition Function Example: Measurement message Z = t (x)q \ (x)dx = t (x) = N ( z g(x), S ) t (x)n ( x µ \, Σ \ ) dx Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 16

Possible Gaussian Approximations Approximating the Partition Function Example: Measurement message Z = t (x)q \ (x)dx = t (x) = N ( z g(x), S ) t (x)n ( x µ \, Σ \ ) dx Linearize g at µ \ integral tractable Gaussian moment matching: compute mean and variance of Z approximate Z by a Gaussian with the correct mean/variance Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 16

Theoretical Results Inference in Time Series Models Relation to Smoothing Z = t (x)q \ (x)dx = t (x) = N ( z g(x), S ) t (x)n ( x µ \, Σ \ ) dx Relation to Common Filters/Smoothers Approximating Z by a Gaussian Z is equivalent to approximating p(x, z) by a Gaussian an approximation that is common to almost all filtering algorithms a a Deisenroth & Ohlsson (ACC 2011) Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 17

Theoretical Results Inference in Time Series Models Relation to Smoothing Z = t (x)q \ (x)dx = t (x) = N ( z g(x), S ) t (x)n ( x µ \, Σ \ ) dx Relation to Common Filters/Smoothers Approximating Z by a Gaussian Z is equivalent to approximating p(x, z) by a Gaussian an approximation that is common to almost all filtering algorithms a a Deisenroth & Ohlsson (ACC 2011) Generalizing Common Smoothers Linearizing g(x) in Z generalizes the EKS to an iterative procedure Moment matching generalizes the ADS to an iterative procedure Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 17

Relation to Smoothing Interesting Side Effects To minimize the KL divergence, EP updates require the derivatives log Z µ \, log Z Σ \ 1 Deisenroth & Mohamed (arxiv preprint, 2012) Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 18

Interesting Side Effects Relation to Smoothing To minimize the KL divergence, EP updates require the derivatives log Z µ \, log Z Σ \ The Gaussian approximation of Z = p(z) N ( ) µ z, Σ z is exact if and only if there is a linear relationship between x and z, i.e., z = Jx, x N ( µ \, Σ \ ) for some J µ z, Σ z have a special form Linearity must be explicitly encoded in the partial derivatives! 1 Deisenroth & Mohamed (arxiv preprint, 2012) Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 18

Interesting Side Effects Relation to Smoothing To minimize the KL divergence, EP updates require the derivatives log Z µ \, log Z Σ \ The Gaussian approximation of Z = p(z) N ( ) µ z, Σ z is exact if and only if there is a linear relationship between x and z, i.e., z = Jx, x N ( µ \, Σ \ ) for some J µ z, Σ z have a special form Linearity must be explicitly encoded in the partial derivatives! Example: log Z µ \ = log Z µ z µ z µ \ = (z µ z) Σ 1 z Even if µ z is a general function of µ \ and Σ \, this must be ignored. Otherwise: Inconsistent EP updates! 1 1 Deisenroth & Mohamed (arxiv preprint, 2012) Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 18 J

Illustration: Toy Tracking Problem Relation to Smoothing 4 2 Ground truth EKS State 0 2 4 5 10 15 20 Time step Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 19

Illustration: Toy Tracking Problem Relation to Smoothing 4 2 Ground truth EKS 4 2 Ground truth EP EKS State 0 State 0 2 2 4 4 5 10 15 20 Time step 5 10 15 20 Time step Iteratively improving the posteriors via EP can heal the the EKS Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 19

EP in Gaussian Process Dynamical Systems Gaussian Process Dynamical Systems x t 1 x t x t+1 z t 1 z t z t+1 x t = f(x t 1 ) + w, w N ( 0, Q ) z t = g(x t ) + v, v N ( 0, R ) State x (not observed) Measurement/observation z GP distribution p(f) over transition function f GP distribution p(g) over measurement function g Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 20

EP in Gaussian Process Dynamical Systems Gaussian Processes Gaussian Processes for Flexible Modeling Non-parametric method flexible, i.e., shape of function adapts to data Probabilistic method consistently describes uncertainties about the unknown function Sufficient: specification of high-level assumptions (e.g., smoothness) Automatic trade-off between data-fit and complexity of the function (Occam s razor) 2 x t 0 2 5 4 3 2 1 0 1 2 3 4 5 6 7 8 9 10 (x t 1, u t 1 ) Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 21

EP in Gaussian Process Dynamical Systems Gaussian Process Regression Gaussian Processes Mathematically: Probability distribution over functions Bayesian inference tractable: 1 Specify high-level prior beliefs p(f) about the function (e.g., smoothness) 2 Observe data X, y = f(x) + ε 3 Compute posterior distribution p(f X, y) over functions Bayes theorem: p(f X, y) = p(y X, f)p(f) p(y X) p(f): Prior (over functions) p(y X, f): Likelihood (noise model) p(f X, y): Posterior (over functions) Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 22

EP in Gaussian Process Dynamical Systems Gaussian Processes Pictorial Introduction to Gaussian Processes 3 2 1 f(x) 0 1 2 3 5 0 5 x Prior belief about the function. Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 23

EP in Gaussian Process Dynamical Systems Gaussian Processes Pictorial Introduction to Gaussian Processes 3 2 1 f(x) 0 1 2 3 5 0 5 x Observe some function values. Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 23

EP in Gaussian Process Dynamical Systems Gaussian Processes Pictorial Introduction to Gaussian Processes 3 2 1 f(x) 0 1 2 3 5 0 5 x Posterior belief about the function. Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 23

EP in Gaussian Process Dynamical Systems Filtering/Smoothing in GPDS Gaussian Process Dynamical Systems x t 1 x t x t+1 z t 1 z t z t+1 x t = f(x t 1 ) + w, w N ( 0, Q ) z t = g(x t ) + v, v N ( 0, R ) GP distribution p(f) over transition function f GP distribution p(g) over measurement function g Let s talk about inference in GPDSs Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 24

EP in Gaussian Process Dynamical Systems Inference in GPDS Filtering/Smoothing in GPDS t t 1 0.5 0 0.5 1 0 1 2 3 p( ) t p(x t 1, u t 1 ) 1 0 1 0.5 0 0.5 1 (x t 1, u t 1 ) Objective: Gaussian approximations to the joints p(x t, z t z 1:t 1 ) and p(x t 1, x t z 1:t 1 ) sufficient for Gaussian filtering/smoothing 2 2 Deisenroth & Ohlsson (ACC 2011) 3 Deisenroth et al. (ICML 2009), Deisenroth et al. (IEEE-TAC, 2012) Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 25

EP in Gaussian Process Dynamical Systems Inference in GPDS Filtering/Smoothing in GPDS t t 1 0.5 0 0.5 1 0 1 2 3 p( ) t p(x t 1, u t 1 ) 1 0 1 0.5 0 0.5 1 (x t 1, u t 1 ) Objective: Gaussian approximations to the joints p(x t, z t z 1:t 1 ) and p(x t 1, x t z 1:t 1 ) sufficient for Gaussian filtering/smoothing 2 Mapping distributions through a GP requires approximations, e.g., Linearization of the posterior GP mean function (red) Moment matching (blue) Filtering/smoothing in GPDS 3 : GP-EKS, GP-ADS, GP-CKS,... 2 Deisenroth & Ohlsson (ACC 2011) 3 Deisenroth et al. (ICML 2009), Deisenroth et al. (IEEE-TAC, 2012) Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 25

EP in Gaussian Process Dynamical Systems Expectation Propagation in GPDS EP in GPDS Generalize single-sweep forward-backward smoothing in GPDSs to an iterative procedure using EP Slightly more involved than EP in nonlinear systems (e.g., EP-EKS) Also have to average over function distribution (GP) Key idea the same as before: Approximate the partition function by a Gaussian distribution 4 Linearization of the posterior mean function (e.g., Ko & Fox, 2009) EP-GPEKS Moment matching (e.g., Quiñonero-Candela et al., 2003) EP-GPADS 4 Deisenroth & Mohamed (arxiv preprint, 2012) Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 26

Results Results: Synthetic Data (1) 4 2 f(x) 0 2 4 Ground truth Training data GP 5 0 5 x Figure : GP model with training set and ground truth x t+1 = 4 sin(4x t ) + w, w N ( 0, 0.1 2) z t = 4 sin(4x t ) + v, v N ( 0, 0.1 2) Initial state distribution p(x 1 ) = N ( 0, 1 ) very broad 30 training points for GP models, randomly selected Tracking horizon: 20 time steps Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 27

Results Results: Synthetic Data (2) state 6 4 2 0 2 True state Posterior state distribution (EP GPADS) Posterior state distribution (GPADS) 4 0 5 10 15 Time step (a) Posterior trajectories with confidence bounds. Average NLL per data point 2 1 0 1 2 EP GPADS GPADS 5 10 15 20 25 30 EP iteration (b) Average NLL as a function of the EP iteration with standard error. After convergence, the posterior is spot on (left) Iterating EP greatly improves predictive power (right) Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 28

Results Results: Pendulum Tracking Pendulum Method NLL x MAE x LPU x GPEKS 0.29 ± 0.30 0.30 ± 0.02 2.76 ± 0.12 EP-GPEKS 0.24 ± 0.33 0.31 ± 0.02 2.77 ± 0.12 GPADS 0.75 ± 0.06 0.29 ± 0.02 2.52 ± 0.06 EP-GPADS 0.79 ± 0.06 0.29 ± 0.02 2.58 ± 0.04 NLL: negative log likelihood MAE: mean absolute error LPU: log posterior uncertainty predictive performance error of the posterior mean tightness of the posterior Linearization-based inference: Variances too small EP makes things worse Moment-matching based inference: Coherent estimates EP improves posterior Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 29

Results Results: Motion Capture Data 10 trials of golf swings recorded at 40 Hz (mocap.cs.cmu.edu) Observations z R 56 Latent space x R 3 7 training sequences, 3 test sequences GPDS learning via GPDM approach (Wang et al., 2008) Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 30

Results Results: Motion Capture Data Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 31

Results Summary General framework for iterative inference in dynamical systems Key: Approximation of the partition function Rederive classical filters/smoothers as a special case Promising results in (GP)DS marc@ias.tu-darmstadt.de http://www.ias.tu-darmstadt.de/team/marcdeisenroth Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 32

Results References [1] C. M. Bishop. Pattern Recognition and Machine Learning. Information Science and Statistics. Springer-Verlag, 2006. [2] M. P. Deisenroth, M. F. Huber, and U. D. Hanebeck. Analytic Moment-based Gaussian Process Filtering. In L. Bouttou and M. L. Littman, editors, Proceedings of the 26th International Conference on Machine Learning, pages 225 232, Montreal, QC, Canada, June 2009. Omnipress. [3] M. P. Deisenroth and S. Mohamed. Expectation Propagation in Gaussian Process Dynamical Systems, July 2012. http://arxiv.org/abs/1207.2940. [4] M. P. Deisenroth and H. Ohlsson. A General Perspective on Gaussian Filtering and Smoothing: Explaining Current and Deriving New Algorithms. In Proceedings of the American Control Conference, 2011. [5] M. P. Deisenroth, R. Turner, M. Huber, U. D. Hanebeck, and C. E. Rasmussen. Robust Filtering and Smoothing with Gaussian Processes. IEEE Transactions on Automatic Control, 57(7):1865 1871, 2012. doi:10.1109/tac.2011.2179426. [6] S. J. Julier and J. K. Uhlmann. A New Extension of the Kalman Filter to Nonlinear Systems. In Proceedings of AeroSense: 11th Symposium on Aerospace/Defense Sensing, Simulation and Controls, pages 182 193, 1997. [7] R. E. Kalman. A New Approach to Linear Filtering and Prediction Problems. Transactions of the ASME Journal of Basic Engineering, 82(Series D):35 45, 1960. [8] J. Ko and D. Fox. GP-BayesFilters: Bayesian Filtering using Gaussian Process Prediction and Observation Models. Autonomous Robots, 27(1):75 90, July 2009. [9] T. P. Minka. A Family of Algorithms for Approximate Bayesian Inference. PhD thesis, Massachusetts Institute of Technology, Cambridge, MA, USA, January 2001. [10] J. Quiñonero-Candela, A. Girard, J. Larsen, and C. E. Rasmussen. Propagation of Uncertainty in Bayesian Kernel Models Application to Multiple-Step Ahead Forecasting. In IEEE International Conference on Acoustics, Speech and Signal Processing, volume 2, pages 701 704, April 2003. [11] J. M. Wang, D. J. Fleet, and A. Hertzmann. Gaussian Process Dynamical Models for Human Motion. IEEE Transactions on Pattern Analysis and Machine Intelligence, 30(2):283 298, 2008. Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 33