Lecture 2: From Linear Regression to Kalman Filter and Beyond

Similar documents
Lecture 2: From Linear Regression to Kalman Filter and Beyond

Lecture 6: Bayesian Inference in SDE Models

Lecture 7: Optimal Smoothing

The Kalman Filter ImPr Talk

Lecture 6: Multiple Model Filtering, Particle Filtering and Other Approximations

Part 1: Expectation Propagation

Lecture 4: Extended Kalman filter and Statistically Linearized Filter

Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak

Kalman filtering and friends: Inference in time series models. Herke van Hoof slides mostly by Michael Rubinstein

Bayesian Inference and MCMC

Density Estimation. Seungjin Choi

Dynamic System Identification using HDMR-Bayesian Technique

Bayesian RL Seminar. Chris Mansley September 9, 2008

Why do we care? Measurements. Handling uncertainty over time: predicting, estimating, recognizing, learning. Dealing with time

Bayesian Machine Learning

Robotics. Mobile Robotics. Marc Toussaint U Stuttgart

Bayesian Machine Learning

Probabilistic Machine Learning

Nonlinear and/or Non-normal Filtering. Jesús Fernández-Villaverde University of Pennsylvania

Computer Intensive Methods in Mathematical Statistics

Probabilistic Graphical Models

ECE276A: Sensing & Estimation in Robotics Lecture 10: Gaussian Mixture and Particle Filtering

Local Positioning with Parallelepiped Moving Grid

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 13: SEQUENTIAL DATA

Probabilistic Graphical Models

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012

Lecture : Probabilistic Machine Learning

Patterns of Scalable Bayesian Inference Background (Session 1)

Towards a Bayesian model for Cyber Security

Why do we care? Examples. Bayes Rule. What room am I in? Handling uncertainty over time: predicting, estimating, recognizing, learning

Modeling and state estimation Examples State estimation Probabilities Bayes filter Particle filter. Modeling. CSC752 Autonomous Robotic Systems

RECURSIVE OUTLIER-ROBUST FILTERING AND SMOOTHING FOR NONLINEAR SYSTEMS USING THE MULTIVARIATE STUDENT-T DISTRIBUTION

Advanced Computational Methods in Statistics: Lecture 5 Sequential Monte Carlo/Particle Filtering

Rao-Blackwellized Particle Filter for Multiple Target Tracking

Lecture: Gaussian Process Regression. STAT 6474 Instructor: Hongxiao Zhu

Introduction to Particle Filters for Data Assimilation

Sequential Monte Carlo Methods for Bayesian Computation

Probabilistic Graphical Models

The Metropolis-Hastings Algorithm. June 8, 2012

Probabilistic Graphical Models

Probabilistic Fundamentals in Robotics. DAUIN Politecnico di Torino July 2010

GAUSSIAN PROCESS REGRESSION

Bayesian Machine Learning

L06. LINEAR KALMAN FILTERS. NA568 Mobile Robotics: Methods & Algorithms

Lagrangian Data Assimilation and Manifold Detection for a Point-Vortex Model. David Darmon, AMSC Kayo Ide, AOSC, IPST, CSCAMM, ESSIC

4 Derivations of the Discrete-Time Kalman Filter

State Estimation Based on Nested Particle Filters

Lecture 8: Bayesian Estimation of Parameters in State Space Models

UNDERSTANDING DATA ASSIMILATION APPLICATIONS TO HIGH-LATITUDE IONOSPHERIC ELECTRODYNAMICS

ROBOTICS 01PEEQW. Basilio Bona DAUIN Politecnico di Torino

Time Series Prediction by Kalman Smoother with Cross-Validated Noise Density

Approximate Bayesian Computation

Introduction to Mobile Robotics Probabilistic Robotics

Exercises Tutorial at ICASSP 2016 Learning Nonlinear Dynamical Models Using Particle Filters

Introduction into Bayesian statistics

Introduction to Machine Learning

NON-LINEAR NOISE ADAPTIVE KALMAN FILTERING VIA VARIATIONAL BAYES

(5) Multi-parameter models - Gibbs sampling. ST440/540: Applied Bayesian Analysis

1 Bayesian Linear Regression (BLR)

EKF, UKF. Pieter Abbeel UC Berkeley EECS. Many slides adapted from Thrun, Burgard and Fox, Probabilistic Robotics

EKF, UKF. Pieter Abbeel UC Berkeley EECS. Many slides adapted from Thrun, Burgard and Fox, Probabilistic Robotics

BAYESIAN ESTIMATION OF TIME-VARYING SYSTEMS:

David Giles Bayesian Econometrics

CS-E3210 Machine Learning: Basic Principles

Short tutorial on data assimilation

Lecture 4: Probabilistic Learning. Estimation Theory. Classification with Probability Distributions

Graphical Models for Collaborative Filtering

Classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012

Perception: objects in the environment

SEQUENTIAL MONTE CARLO METHODS WITH APPLICATIONS TO COMMUNICATION CHANNELS. A Thesis SIRISH BODDIKURAPATI

Introduction to Probabilistic Graphical Models: Exercises

Gaussian Process Approximations of Stochastic Differential Equations

TSRT14: Sensor Fusion Lecture 8

Probabilistic Graphical Models

Particle Filtering for Data-Driven Simulation and Optimization

Mathematical Formulation of Our Example

Linear Dynamical Systems

Calibration of Stochastic Volatility Models using Particle Markov Chain Monte Carlo Methods

Logistic Regression Review Fall 2012 Recitation. September 25, 2012 TA: Selen Uguroglu

Introduction to Machine Learning

Filtering and Likelihood Inference

Linear Dynamical Systems (Kalman filter)

Efficient Likelihood-Free Inference

Expectation Propagation for Approximate Bayesian Inference

Recursive Noise Adaptive Kalman Filtering by Variational Bayesian Approximations

Dynamic models 1 Kalman filters, linearization,

Machine Learning. Probability Basics. Marc Toussaint University of Stuttgart Summer 2014

an introduction to bayesian inference

Kernel Bayes Rule: Nonparametric Bayesian inference with kernels

Nonparameteric Regression:

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework

Sensor Fusion: Particle Filter

2D Image Processing (Extended) Kalman and particle filter

Probabilistic Methods in Multiple Target Tracking. Review and Bibliography

Auxiliary Particle Methods

Introduction to Bayesian Inference

Data Modeling & Analysis Techniques. Probability & Statistics. Manfred Huber

Harrison B. Prosper. Bari Lectures

A Monte Carlo Sequential Estimation for Point Process Optimum Filtering

Gaussian processes and bayesian optimization Stanisław Jastrzębski. kudkudak.github.io kudkudak

Transcription:

Lecture 2: From Linear Regression to Kalman Filter and Beyond Department of Biomedical Engineering and Computational Science Aalto University January 26, 2012

Contents 1 Batch and Recursive Estimation 2 Towards Bayesian Filtering 3 Kalman Filter and General Bayesian Optimal Filter 4 Summary and Demo

Batch Linear Regression [1/2] 1.7 Measurement 1.6 True Signal 1.5 1.4 1.3 1.2 1.1 1 0.9 0.8 0 0.2 0.4 0.6 0.8 1 Consider the linear regression model y k = a 1 + a 2 t k +ǫ k, with ǫ k N(0,σ 2 ) and a = (a 1, a 2 ) N(m 0, P 0 ). In probabilistic notation this is: where H k = (1 t k ). p(y k a) = N(y k H k a,σ 2 ) p(a) = N(a m 0, P 0 ),

Batch Linear Regression [2/2] The Bayesian batch solution by the Bayes rule: N p(a y 1:N ) p(a) p(y k a) k=1 = N(a m 0, P 0 ) The posterior is Gaussian N N(y k H k a,σ 2 ). k=1 p(a y 1:N ) = N(a m N, P N ). The mean and covariance are given as [ m N = P 1 0 + 1 ] 1 [ ] 1 σ 2 HT H σ 2 HT y+p 1 0 m 0 [ P N = P 1 0 + 1 1 H] σ 2 HT, where H k = (1 t k ) and H = (H 1 ; H 2 ;...;H N ), and y = (y 1 ;...;y N ).

Recursive Linear Regression [1/3] Assume that we have already computed the posterior distribution, which is conditioned on the measurements up to k 1: p(a y 1:k 1 ) = N(a m k 1, P k 1 ). Assume that we get the kth measurement y k. Using the equations from the previous slide we get p(a y 1:k ) p(y k a) p(a y 1:k 1 ) N(a m k, P k ). The mean and covariance are given as [ m k = P 1 k 1 + 1 σ 2 HT k H k P k = [ P 1 k 1 + 1 σ 2 HT k H k] 1. ] 1 [ 1 σ 2 HT k y k + P 1 k 1 m k 1 ]

Recursive Linear Regression [2/3] By the matrix inversion lemma (or Woodbury identity): P k = P k 1 P k 1 H T k [ H k P k 1 H T k +σ2] 1 Hk P k 1. Now the equations for the mean and covariance reduce to S k = H k P k 1 H T k +σ2 K k = P k 1 H T k S 1 k m k = m k 1 + K k [y k H k m k 1 ] P k = P k 1 K k S k K T k. Computing these for k = 0,..., N gives exactly the linear regression solution but without a matrix inversion 1! A special case of Kalman filter. 1 Without an explicit matrix inversion

Recursive Linear Regression [3/3] Convergence of the recursive solution to the batch solution on the last step the solutions are exactly equal: 2.5 2 E[a 1 ] a 1 E[a ] 2 a 2 1.5 1 0.5 0 0 0.2 0.4 0.6 0.8 1 Time

Batch vs. Recursive Estimation [1/2] General batch solution: Specify the measurement model: p(y 1:N θ) = k p(y k θ). Specify the prior distribution p(θ). Compute posterior distribution by the Bayes rule: p(θ y 1:N ) = 1 Z p(θ) k p(y k θ). Compute point estimates, moments, predictive quantities etc. from the posterior distribution.

Batch vs. Recursive Estimation [2/2] General recursive solution: Specify the measurement likelihood p(y k θ). Specify the prior distribution p(θ). Process measurements y 1,...,y N one at a time, starting from the prior: p(θ y 1 ) = 1 Z 1 p(y 1 θ)p(θ) p(θ y 1:2 ) = 1 Z 2 p(y 2 θ)p(θ y 1 ). p(θ y 1:N ) = 1 Z N p(y N θ)p(θ y 1:N 1 ). The posterior at the last step is the same as the batch solution.

Advantages of Recursive Solution The recursive solution can be considered as the online learning solution to the Bayesian learning problem. Batch Bayesian inference is a special case of recursive Bayesian inference. The parameter can be modeled to change between the measurement steps basis of filtering theory.

Drift Model for Linear Regression [1/3] Let assume Gaussian random walk between the measurements in the linear regression model: p(y k a k ) = N(y k H k a k,σ 2 ) p(a k a k 1 ) = N(a k a k 1, Q) p(a 0 ) = N(a 0 m 0, P 0 ). Again, assume that we already know p(a k 1 y 1:k 1 ) = N(a k 1 m k 1, P k 1 ). The joint distribution of a k and a k 1 is (due to Markovianity of dynamics!): p(a k, a k 1 y 1:k 1 ) = p(a k a k 1 ) p(a k 1 y 1:k 1 ).

Drift Model for Linear Regression [2/3] Integrating over a k 1 gives: p(a k y 1:k 1 ) = p(a k a k 1 ) p(a k 1 y 1:k 1 ) da k 1. This equation for Markov processes is called the Chapman-Kolmogorov equation. Because the distributions are Gaussian, the result is Gaussian where p(a k y 1:k 1 ) = N(a k m k, P k ), m k = m k 1 P k = P k 1 + Q.

Drift Model for Linear Regression [3/3] As in the pure recursive estimation, we get p(a k y 1:k ) p(y k a k ) p(a k y 1:k 1 ) N(a k m k, P k ). After applying the matrix inversion lemma, mean and covariance can be written as S k = H k P k HT k +σ2 K k = P k HT k S 1 k m k = m k + K k[y k H k m k ] P k = P k K ks k K T k. Again, we have derived a special case of the Kalman filter. The batch version of this solution would be much more complicated.

State Space Notation In the previous section we formulated the model as p(a k a k 1 ) = N(a k a k 1, Q) p(y k a k ) = N(y k H k a k,σ 2 ) But in Kalman filtering and control theory the vector of parameters a k is usually called state and denoted as x k. More standard state space notation: Or equivalently p(x k x k 1 ) = N(x k x k 1, Q) p(y k x k ) = N(y k H k x k,σ 2 ) x k = x k 1 + q y k = H k x k + r, where q N(0, Q), r N(0,σ 2 ).

Kalman Filter [1/2] The canonical Kalman filtering model is p(x k x k 1 ) = N(x k A k 1 x k 1, Q k 1 ) p(y k x k ) = N(y k H k x k, R k ). More often, this model can be seen in the form x k = A k 1 x k 1 + q k 1 y k = H k x k + r k. The Kalman filter actually calculates the following distributions: p(x k y 1:k 1 ) = N(x k m k, P k ) p(x k y 1:k ) = N(x k m k, P k ).

Kalman Filter [2/2] Prediction step of the Kalman filter: m k = A k 1 m k 1 P k = A k 1 P k 1 A T k 1 + Q k 1. Update step of the Kalman filter: S k = H k P k HT k + R k K k = P k HT k S 1 k m k = m k + K k [y k H k m k ] P k = P k K k S k K T k. These equations will be derived from the general Bayesian filtering equations in the next lecture.

Probabilistic Non-Linear Filtering [1/2] Generic discrete-time state space models Generic Markov models x k = f(x k 1, q k 1 ) y k = h(x k, r k ). y k p(y k x k ) x k p(x k x k 1 ). Approximation methods: Extended Kalman filters (EKF), Unscented Kalman filters (UKF), sequential Monte Carlo (SMC) filters a ka particle filters.

Probabilistic Non-Linear Filtering [2/2] In continuous-discrete filtering models, dynamics are modeled in continuous time, measurements at discrete time steps. The continuous time versions of Markov models are called as stochastic differential equations: dx dt = f(x, t)+w(t) where w(t) is a continuous time Gaussian white noise process. Approximation methods: Extended Kalman filters, Unscented Kalman filters, sequential Monte Carlo, particle filters.

Summary Linear regression problem can be solved as batch problem or recursively the latter solution is a special case of Kalman filter. A generic Bayesian estimation problem can also be solved as batch problem or recursively. If we let the linear regression parameter change between the measurements, we get a simple linear state space model again solvable with Kalman filtering model. By generalizing this idea and the solution we get the Kalman filter algorithm. By further generalizing to non-gaussian models results in a generic probabilistic state space model.

Demonstration Batch and recursive linear regression.