Sequential Bayesian Updating

Similar documents
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 13: SEQUENTIAL DATA

Lecture 2: From Linear Regression to Kalman Filter and Beyond

Lecture 2: From Linear Regression to Kalman Filter and Beyond

Linear Dynamical Systems

More on nuisance parameters

CSC 2541: Bayesian Methods for Machine Learning

Computer Intensive Methods in Mathematical Statistics

Bayesian Methods in Positioning Applications

Sequential Monte Carlo Methods

L09. PARTICLE FILTERING. NA568 Mobile Robotics: Methods & Algorithms

Why do we care? Measurements. Handling uncertainty over time: predicting, estimating, recognizing, learning. Dealing with time

MONTE CARLO METHODS. Hedibert Freitas Lopes

Markov Models. CS 188: Artificial Intelligence Fall Example. Mini-Forward Algorithm. Stationary Distributions.

ECO 513 Fall 2008 C.Sims KALMAN FILTER. s t = As t 1 + ε t Measurement equation : y t = Hs t + ν t. u t = r t. u 0 0 t 1 + y t = [ H I ] u t.

MID-TERM EXAM ANSWERS. p t + δ t = Rp t 1 + η t (1.1)

Lecture 8: Bayesian Estimation of Parameters in State Space Models

Inferring biological dynamics Iterated filtering (IF)

Bayesian Computations for DSGE Models

Sequential Monte Carlo Methods (for DSGE Models)

Winter 2019 Math 106 Topics in Applied Mathematics. Lecture 8: Importance Sampling

Bayesian Inference and Decision Theory

Computer Intensive Methods in Mathematical Statistics

Expectation Propagation Algorithm

Bayesian Asymptotics

Machine Learning for OR & FE

Why do we care? Examples. Bayes Rule. What room am I in? Handling uncertainty over time: predicting, estimating, recognizing, learning

Sensor Tasking and Control

Monte Carlo in Bayesian Statistics

CSEP 573: Artificial Intelligence

Markov Chain Monte Carlo methods

Announcements. CS 188: Artificial Intelligence Fall VPI Example. VPI Properties. Reasoning over Time. Markov Models. Lecture 19: HMMs 11/4/2008

Kalman filtering and friends: Inference in time series models. Herke van Hoof slides mostly by Michael Rubinstein

Particle Filters. Pieter Abbeel UC Berkeley EECS. Many slides adapted from Thrun, Burgard and Fox, Probabilistic Robotics

The 1d Kalman Filter. 1 Understanding the forward model. Richard Turner

Sensor Fusion: Particle Filter

AUTOMOTIVE ENVIRONMENT SENSORS

LECTURE 5 NOTES. n t. t Γ(a)Γ(b) pt+a 1 (1 p) n t+b 1. The marginal density of t is. Γ(t + a)γ(n t + b) Γ(n + a + b)

Approximate Inference

F denotes cumulative density. denotes probability density function; (.)

Sequential Monte Carlo Methods for Bayesian Computation

Bayesian inference. Fredrik Ronquist and Peter Beerli. October 3, 2007

Nuisance parameters and their treatment

Hidden Markov Models. Vibhav Gogate The University of Texas at Dallas

Dynamic Multipath Estimation by Sequential Monte Carlo Methods

Introduction to Machine Learning

Parametric Techniques Lecture 3

Probabilistic Graphical Models

Statistical Techniques in Robotics (16-831, F12) Lecture#21 (Monday November 12) Gaussian Processes

Dynamic System Identification using HDMR-Bayesian Technique

Human Pose Tracking I: Basics. David Fleet University of Toronto

CS 5522: Artificial Intelligence II

Parameter Estimation. Industrial AI Lab.

Statistical Inference and Methods

CS 188: Artificial Intelligence Spring Announcements

Hidden Markov models

CS 5522: Artificial Intelligence II

STA 4273H: Statistical Machine Learning

Gaussian Processes. 1 What problems can be solved by Gaussian Processes?

Direct Simulation Methods #2

Robotics. Mobile Robotics. Marc Toussaint U Stuttgart

Rao-Blackwellized Particle Filter for Multiple Target Tracking

Lecture 6: Bayesian Inference in SDE Models

DS-GA 1002 Lecture notes 11 Fall Bayesian statistics

Lecture 16: State Space Model and Kalman Filter Bus 41910, Time Series Analysis, Mr. R. Tsay

2D Image Processing (Extended) Kalman and particle filter

Inferring information about models from samples

Hidden Markov Models. Hal Daumé III. Computer Science University of Maryland CS 421: Introduction to Artificial Intelligence 19 Apr 2012

Remarks on Improper Ignorance Priors

PROBABILISTIC REASONING OVER TIME

DART_LAB Tutorial Section 5: Adaptive Inflation

Lecture 9. Time series prediction

Bayesian Regression Linear and Logistic Regression

Human-Oriented Robotics. Temporal Reasoning. Kai Arras Social Robotics Lab, University of Freiburg

An introduction to Sequential Monte Carlo

Answers and expectations

Advanced Computational Methods in Statistics: Lecture 5 Sequential Monte Carlo/Particle Filtering

Sequential Monte Carlo Methods

LECTURE 15 Markov chain Monte Carlo

Gaussian Process Approximations of Stochastic Differential Equations

Artificial Intelligence

Lecture 8 October Bayes Estimators and Average Risk Optimality

Riemann Manifold Methods in Bayesian Statistics

Lecture 7: Optimal Smoothing

an introduction to bayesian inference

Sequential Monte Carlo Methods (for DSGE Models)

Parallel Particle Filter in Julia

Part 1: Expectation Propagation

DS-GA 1003: Machine Learning and Computational Statistics Homework 7: Bayesian Modeling

Estimation of Operational Risk Capital Charge under Parameter Uncertainty

Eco517 Fall 2004 C. Sims MIDTERM EXAM

Bayesian Analysis (Optional)

Bayesian Sequential Design under Model Uncertainty using Sequential Monte Carlo

DAG models and Markov Chain Monte Carlo methods a short overview

Metropolis-Hastings Algorithm

Choosing among models

RAO-BLACKWELLISED PARTICLE FILTERS: EXAMPLES OF APPLICATIONS

Sequential Monte Carlo Methods

Bayesian Inference for DSGE Models. Lawrence J. Christiano

Lecture Note 7: Velocity Kinematics and Jacobian

Introduction to Probabilistic Machine Learning

Transcription:

BS2 Statistical Inference, Lectures 14 and 15, Hilary Term 2009 May 28, 2009

We consider data arriving sequentially X 1,..., X n,... and wish to update inference on an unknown parameter θ online. In a Bayesian setting, we have a prior distribution π(θ) and at time n we have a density for data conditional on θ as f (x 1,..., x n θ) = f (x 1 θ)f (x 2 x 1, θ) f (x n x n 1, θ) where we have let x i = (x 1,..., x i ). Note that we are not assuming X 1,..., X n,... to be independent conditionally on θ. At time n, we may have updated our distribution of θ to its posterior π n (θ) = f (θ x n ) π(θ)f (x n θ).

If we obtain a new observation X n+1 = x n+1 we may either start afresh and write π n+1 (θ) = f (θ x n+1 ) π(θ)f (x n+1 θ) or we could claim that just before time n + 1, our knowledge of θ is summarized in the distribution π n (θ) so we just use this as a prior distribution for the new piece of information and update as π n+1 (θ) π n (θ)f (x n+1 x n, θ). Indeed, these updates are identical since π n+1 (θ) π n (θ)f (x n+1 x n, θ) π(θ)f (x n θ)f (x n+1 x n, θ) = π(θ)f (x n+1 θ) π n+1 (θ).

We may summarize these facts by replacing the usual expression for a Bayesian updating scheme posterior prior likelihood with revised current new likelihood represented by the formula π n+1 (θ) π n (θ) L n+1 (θ) = π n (θ)f (x n+1 x n, θ). In this dynamic perspective we notice that at time n we only need to keep a representation of π n and otherwise can ignore the past. The current π n contains all information needed to revise knowledge when confronted with new information L n+1 (θ). We sometimes refer to this way of updating as recursive.

Basic dynamic model Fundamental tasks Prediction and filtering Smoothing The previous considerations take on a particular dynamic form when also the parameter or state θ is changing with time. More precisely, we consider a Markovian model for the state dynamics of the form f (θ 0 ) = π(θ 0 ), f (θ i+1 θ i ) = f (θ i+1 θ i ) where the evolving states θ 0, θ 1,... are not directly observed, but information about them are available through sequential observations X i = x i, where f (x i θ i, x i 1 ) = f (x i θ i ) so the joint density of states and observations is f (x n, θ n ) = π(θ 0 ) n f (θ i θ i 1 )f (x i θ i ). i=1

Basic dynamic model Fundamental tasks Prediction and filtering Smoothing This type of model is common in robotics, speech recognition, target tracking, and steering/control, for example of large ships, airplanes, and space ships. The natural tasks associated with inference about the evolving state θ i are known as Filtering: Find f (θ n x n ). What is the current state?

Basic dynamic model Fundamental tasks Prediction and filtering Smoothing This type of model is common in robotics, speech recognition, target tracking, and steering/control, for example of large ships, airplanes, and space ships. The natural tasks associated with inference about the evolving state θ i are known as Filtering: Find f (θ n x n ). What is the current state? Prediction: Find f (θ n+1 x n ). What is the next state?

Basic dynamic model Fundamental tasks Prediction and filtering Smoothing This type of model is common in robotics, speech recognition, target tracking, and steering/control, for example of large ships, airplanes, and space ships. The natural tasks associated with inference about the evolving state θ i are known as Filtering: Find f (θ n x n ). What is the current state? Prediction: Find f (θ n+1 x n ). What is the next state? Smoothing: Find f (θ j x n ), j < n. What was the past state at time j?

Basic dynamic model Fundamental tasks Prediction and filtering Smoothing If the filter distribution f (θ n x n ) is available we may calculate the predictive distribution as f (θ n+1 x n ) = f (θ n+1 θ n )f (θ n x n ) dθ n (1) θ n which uses the current filter distribution and the dynamic model. When a new observation X n+1 = x n+1 is obtained, we can use revised current new likelihood to update the filter distribution as f (θ n+1 x n+1 ) f (θ n+1 x n )f (x n+1 θ n+1 ), (2) i.e. the updated filter distribution is found by combining the current predictive with the incoming likelihood. The predictive distributions can now be updated to yield a general recursive scheme of predict-observe-filter-predict-observe-filter...

Basic dynamic model Fundamental tasks Prediction and filtering Smoothing When we have more time, we may similarly look retrospectively and try to reconstruct the movements of θ. This calculation is slightly more subtle than filtering. We first get f (θ j 1 x n ) = f (θ j 1 θ j, x n )f (θ j x n ) dθ j θ j = f (θ j 1 θ j, x j 1 )f (θ j x n ) dθ j, θ j where we have used that and so f (θ j 1, θ j, x n ) = f (θ j 1, θ j, x j 1 )f (x j,..., x n θ j, θ j 1 ) = f (θ j 1, θ j, x j 1 )f (x j,..., x n θ j ) f (θ j, x n ) = f (θ j, x j 1 )f (x j,..., x n θ j ) f (θ j 1 θ j, x n ) = f (θ j 1 θ j, x j 1 ).

Basic dynamic model Fundamental tasks Prediction and filtering Smoothing Since we can think of f (θ j θ j 1 ) as a likelihood in θ j we further get f (θ j 1 θ j, x j 1 ) f (θ j θ j 1 )f (θ j 1 x j 1 ) we thus get f (θ j 1 x n ) f (θ j θ j 1 )f (θ j 1 x j 1 )f (θ j x n ) dθ j θ j f (θ j 1 x j 1 ) f (θ j θ j 1 )f (θ j x n ) dθ j, θ j Which is the basic smoothing recursion: f (θ j 1 x n ) f (θ j 1 x j 1 ) f (θ j θ j 1 )f (θ j x n ) dθ j. (3) θ j It demands that we have stored a representation of the filter distributions f (θ j 1 x j 1 ) as well as the dynamic state model.

Basic model Updating the filters Correcting predictions and observations Geometric construction A special case of the previous is traditionally attributed to Kalman from a result in 1960, and known as the and smoother but was in fact developed in full detail by the Danish statistician T.N. Thiele in 1880. It is based on the Markovian state model θ i+1 θ i N (θ i, σ 2 i+1), θ 0 = 0 and the simple observational model X i θ i N (θ i, τ 2 i ), i = 1,... where typically σi 2 = (t i t i 1 )σ 2 and τi 2 = τ 2 with t i denoting the time of the ith observation. For simplicity we shall assume t i = i and w i = 1 in the following.

Basic model Updating the filters Correcting predictions and observations Geometric construction The filtering relations become particularly simple, since the conditional distributions all are normal, and we are only concerned with expectations and variances. We repeat Thiele s argument as an instance of the general theory developed. Suppose at time n we have the filter distribution of θ n as N (µ n, ω 2 n). Then the predictive distribution of θ n+1 is θ n+1 x n N (µ n, ω 2 n + σ 2 ). We can think of µ n as our current best measurement of θ n+1, with this variance. The contribution from the observation is a measurement of θ n+1 with a value of x n+1 and a variance τ 2. The best way of combining these estimates is to take a weighted average with the inverse variances as weights.

Basic model Updating the filters Correcting predictions and observations Geometric construction It follows that our new filter distribution has expectation µ n+1 = µ n/(ω 2 n + σ 2 ) + x n+1 /τ 2 (ω 2 n + σ 2 ) 1 + τ 2 = τ 2 µ n + (σ 2 + ω 2 n)x n+1 τ 2 + σ 2 + ω 2 n and variance ω 2 n+1 = 1 (ωn 2 + σ 2 ) 1 + τ 2 = τ 2 (σ 2 + ωn) 2 τ 2 + σ 2 + ωn 2. Clearly this result could also have been obtained from expanding the sum of squares in the expression for the filter distribution (2) f (θ n+1 x n ) exp { (θ n+1 µ n ) 2 2(σ 2 + ωn) 2 + (θ n+1 x n+1 ) 2 } 2τ 2.

Basic model Updating the filters Correcting predictions and observations Geometric construction We may elaborate the expression for µ n+1 and write it as a correction of µ n or of x n+1 as or µ n+1 = µ n + σ2 + ωn 2 τ 2 + σ 2 + ωn 2 (x n+1 µ n ) µ n+1 = x n+1 τ 2 + σ 2 + ωn 2 (x n+1 µ n ) showing how at each stage n the filtered value is obtained by modifying the observed and predicted values when the prediction is not on target. The readily generalizes to the multivariate case and more complex models for the state evolution and observation equation. We abstain from further details. τ 2

Basic model Updating the filters Correcting predictions and observations Geometric construction This geometric construction of the and smoother is taken from Thiele (1880).

Basic Monte Carlo representation Moving and reweighting particles Effective number of particles Resampling and replenishing One of the most recent developments in modern statistics is using Monte Carlo methods for representing the predictive and filtered distributions. We assume that we at time n have represented the filter distribution (2) by a sample f (θ n x n ) {θ 1 n,..., θ M } so that we would approximate any integral w.r.t. this density as h(θ n )f (θ n x n ) dθ n M h(θn). i i=1 The values {θ 1 n,..., θ M n } are generally referred to as particles.

Basic Monte Carlo representation Moving and reweighting particles Effective number of particles Resampling and replenishing More generally, we may have the particles associated with weights f (θ n x n ) {(θ 1 n, w 1 n ),..., (θ M, w M n )} with M i=1 w i n = 1, so that the integral is approximated by h(θ n )f (θ n x n ) dθ n M h(θn)w i n. i (4) Typically, w i will reflect that we have been sampling from a proposal distribution g(θ n ) rather than the target distribution f (θ n x n ) so the weights are calculated as i=1 w i n = f (θ i n x n )/g(θ i n).

Basic Monte Carlo representation Moving and reweighting particles Effective number of particles Resampling and replenishing When filtering to obtain particles representing the next stage of the filtering distribution we move each particle a random amount by drawing θn+1 i at random from a proposal distribution g n+1 (θ θn, i x n+1 ) and subsequently reweight the particle as wn+1 i wn i f (θn+1 i θi n)f (x n+1 θn+1 i ) g n+1 (θn+1 i θi n, x n+1 ) the numerator being proportional to f (θ i n+1 θi n, x n+1 ). There are many possible proposal distributions but a common choice is a normal distribution with an approximately correct mean and slightly enlarged variance.

Basic Monte Carlo representation Moving and reweighting particles Effective number of particles Resampling and replenishing The approximate inverse variance of the integral (4) is for the constant function h 1 equal to 1 M n = i (w n) i 2 which is known as effective number of particles. It is maximized for w i 1/M which represents weights obtained when sampling from the correct distribution. As the filtering evolves, it may happen that some weights become very small, reflecting bad particles, which are placed in areas of small probability. This leads to the effective number of particles becoming small.

Basic Monte Carlo representation Moving and reweighting particles Effective number of particles Resampling and replenishing To get rid of these, M new particles are resampled with replacement, the probability for choosing particle i at each sampling being equal to w i so that bad particles have high probability of not being included. This creates now a new set of particles which now all have weight 1/M. However, some particles will now be repeated in the sample and when this has been done many times, there may be only few particles left. Various schemes then exist for replenishing and sampling new particles. This can also be done routinely at each filtering, for example by first sampling two new particles for every existing one and subsequently resampling as above to retain exactly M particles.