Learning Static Parameters in Stochastic Processes

Similar documents
The Hierarchical Particle Filter

Exercises Tutorial at ICASSP 2016 Learning Nonlinear Dynamical Models Using Particle Filters

Monte Carlo Approximation of Monte Carlo Filters

Inferring biological dynamics Iterated filtering (IF)

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 13: SEQUENTIAL DATA

Package RcppSMC. March 18, 2018

Density Propagation for Continuous Temporal Chains Generative and Discriminative Models

A new iterated filtering algorithm

Sequential Monte Carlo Methods for Bayesian Computation

Advanced Computational Methods in Statistics: Lecture 5 Sequential Monte Carlo/Particle Filtering

An Brief Overview of Particle Filtering

Sequential Monte Carlo Methods (for DSGE Models)

Blind Equalization via Particle Filtering

L09. PARTICLE FILTERING. NA568 Mobile Robotics: Methods & Algorithms

An introduction to Sequential Monte Carlo

Inference for partially observed stochastic dynamic systems: A new algorithm, its theory and applications

Infinite-State Markov-switching for Dynamic. Volatility Models : Web Appendix

Markov Chain Monte Carlo Methods for Stochastic Optimization

Particle Filtering Approaches for Dynamic Stochastic Optimization

Calibration of Stochastic Volatility Models using Particle Markov Chain Monte Carlo Methods

EVALUATING SYMMETRIC INFORMATION GAP BETWEEN DYNAMICAL SYSTEMS USING PARTICLE FILTER

Kernel adaptive Sequential Monte Carlo

1 / 32 Summer school on Foundations and advances in stochastic filtering (FASF) Barcelona, Spain, June 22, 2015.

Efficient Monitoring for Planetary Rovers

1 / 31 Identification of nonlinear dynamic systems (guest lectures) Brussels, Belgium, June 8, 2015.

Particle Filtering for Data-Driven Simulation and Optimization

Sequential Bayesian Inference for Dynamic State Space. Model Parameters

An efficient stochastic approximation EM algorithm using conditional particle filters

MCMC for big data. Geir Storvik. BigInsight lunch - May Geir Storvik MCMC for big data BigInsight lunch - May / 17

Variable Resolution Particle Filter

(5) Multi-parameter models - Gibbs sampling. ST440/540: Applied Bayesian Analysis

Markov Models. CS 188: Artificial Intelligence Fall Example. Mini-Forward Algorithm. Stationary Distributions.

Bagging During Markov Chain Monte Carlo for Smoother Predictions

Kernel Sequential Monte Carlo

A graph contains a set of nodes (vertices) connected by links (edges or arcs)

Introduction to Particle Filters for Data Assimilation

Inference in state-space models with multiple paths from conditional SMC

Markov Chain Monte Carlo Methods for Stochastic

Introduction to Machine Learning CMU-10701

Sensor Fusion: Particle Filter

ECO 513 Fall 2009 C. Sims HIDDEN MARKOV CHAIN MODELS

Sequential Monte Carlo Samplers for Applications in High Dimensions

CS 343: Artificial Intelligence

Bayesian Methods for Machine Learning

Monte Carlo Methods. Leon Gu CSD, CMU

CS 188: Artificial Intelligence Spring Announcements

Introduction to Restricted Boltzmann Machines

Machine Learning for Data Science (CS4786) Lecture 24

Particle Filters. Outline

Condensed Table of Contents for Introduction to Stochastic Search and Optimization: Estimation, Simulation, and Control by J. C.

Hidden Markov Models. Hal Daumé III. Computer Science University of Maryland CS 421: Introduction to Artificial Intelligence 19 Apr 2012

Decayed Markov Chain Monte Carlo for Interactive POMDPs

CS 188: Artificial Intelligence Fall Recap: Inference Example

Session 3A: Markov chain Monte Carlo (MCMC)

Computational statistics

A Note on Auxiliary Particle Filters

Markov Chain Monte Carlo

Bayesian Inference and MCMC

Self Adaptive Particle Filter

Summary STK 4150/9150

Markov Chain Monte Carlo

Robert Collins CSE586, PSU Intro to Sampling Methods

Robert Collins CSE586, PSU Intro to Sampling Methods

Approximate Inference

POMP inference via iterated filtering

Markov Chain Monte Carlo methods

Controlled sequential Monte Carlo

The Unscented Particle Filter

28 : Approximate Inference - Distributed MCMC

Announcements. CS 188: Artificial Intelligence Fall VPI Example. VPI Properties. Reasoning over Time. Markov Models. Lecture 19: HMMs 11/4/2008

IN particle filter (PF) applications, knowledge of the computational

27 : Distributed Monte Carlo Markov Chain. 1 Recap of MCMC and Naive Parallel Gibbs Sampling

Dynamic System Identification using HDMR-Bayesian Technique

Introduction to Machine Learning CMU-10701

Perfect simulation algorithm of a trajectory under a Feynman-Kac law

Outline. CSE 573: Artificial Intelligence Autumn Agent. Partial Observability. Markov Decision Process (MDP) 10/31/2012

Bayes Nets: Sampling

CSEP 573: Artificial Intelligence

The Jackknife-Like Method for Assessing Uncertainty of Point Estimates for Bayesian Estimation in a Finite Gaussian Mixture Model

Consistent Downscaling of Seismic Inversions to Cornerpoint Flow Models SPE

AN EFFICIENT TWO-STAGE SAMPLING METHOD IN PARTICLE FILTER. Qi Cheng and Pascal Bondon. CNRS UMR 8506, Université Paris XI, France.

Fundamental Probability and Statistics

CS 188: Artificial Intelligence

Contrastive Divergence

Seminar: Data Assimilation

Recursive Kernel Density Estimation of the Likelihood for Generalized State-Space Models

Introduction to Machine Learning

An introduction to particle filters

CS 188: Artificial Intelligence. Bayes Nets

Identity Uncertainty

Human Pose Tracking I: Basics. David Fleet University of Toronto

Pattern Recognition and Machine Learning. Bishop Chapter 11: Sampling Methods

Identification of Gaussian Process State-Space Models with Particle Stochastic Approximation EM

Lecture 2: From Linear Regression to Kalman Filter and Beyond

Computer Intensive Methods in Mathematical Statistics

Fast Sequential Parameter Inference for Dynamic State Space Models

RAO-BLACKWELLIZED PARTICLE FILTER FOR MARKOV MODULATED NONLINEARDYNAMIC SYSTEMS

Why do we care? Examples. Bayes Rule. What room am I in? Handling uncertainty over time: predicting, estimating, recognizing, learning

Dynamic Multipath Estimation by Sequential Monte Carlo Methods

Density Estimation. Seungjin Choi

Transcription:

Learning Static Parameters in Stochastic Processes Bharath Ramsundar December 14, 2012 1 Introduction Consider a Markovian stochastic process X T evolving (perhaps nonlinearly) over time variable T. We may observe this process only through observations Y T. This process is parameterized by vector of parameters Θ, which is typically unknown and must be learned from the sequence {Y T }. Learning the parameter Θ is critical to gaining understanding of this class of stochastic processes. Nonlinear stochastic processes arise in a variety of situations. For example, understanding the time evolution of stock value through a standard stochastic volatility model requires the ability to track the nonlinear evolution of price X T. Such problems are typically solved by Sequential Monte Carlo (SMC) Methods, such as particle filtering, but only when parameter vector Θ is provided to the Monte Carlo Algorithm. Standard algorithms such as particle filtering encounter difficulties when Θ is unknown. Consequently, a large literature has developed focusing on algorithms to learn Θ in this setting. For example, researchers from statistics and econometrics have introduced the Particle MCMC Algorithm [1] which mixes Markov Chain Monte Carlo (MCMC) methods with Sequential Monte Carlo algorithms in order to learn parameters Θ. Unfortunately, such algorithms tend to be slow, and often require at least quadratic time complexity [3]. In this project, we introduce a new algorithm that learns parameter Θ while performing inference. Namely, we modify the Decayed MCMC Filtering algorithm [4] to learn static parameters. We then perform empirical analysis showing the robustness of this algorithm in handling static parameters and also prove preliminary correctness results. Direction and guidance for this research were provided by Professor Stuart Russell of UC Berkeley. 2 Nonlinear State Space Model The following toy nonlinear stochastic problem [2] is routinely used to evaluate algorithms for their ability to handle nonlinear time evolution. X n = X n 1 2 + 25 X n 1 + 8 cos(1.2n) + V 1 + Xn 1 2 n Y n = X2 n 20 + W n X 1 N (0, 5), the V n are IID drawn from N (0, σ 2 v), and the W n are IID drawn from N (0, σ 2 w) (N (m, σ 2 ) denotes the Gaussian distribution of mean m and variance σ 2 and IID means independent 1

and identically distributed). We set parameter vecter Θ = (σ v, σ w ). Henceforth, we maintain the convention in figures that blue lines are observations, green lines are true states, and red lines are the products of inference. Figure 1: Nonlinear State Space Model, σ w = σ v = 3 3 Prior Literature on Static Parameter Learning In this section we review various algorithms created for Static Parameter Learning Problems. 3.1 Particle Filtering The Particle Filter does not learn Static Parameters, but effectively draws samples from nonlinear processes given such parameters. We implemented a simple particle filter to draw samples from the nonlinear state space model. Figure 2 shows that the particle filter can almost infer the true state. 3.2 Particle Markov Chain Monte Carlo The PMCMC algorithm [1] is an extension of the Markov Chain Monte Carlo (MCMC) framework to handle nonlinearities. The PMCMC extends the reach of this framework by using a particle filter to sample from nonlinear distributions. We implemented PMCMC and used it to calculate distributions for parameters σ v, σ w. See Figure 3 for details. Figure 2: Nonlinear State Space Model Particle Filtering, 50 particles, 100 time steps Figure 3: Histogram PMCMC Parameter Estimation 2

Algorithm 1: DMCMC: Decayed MCMC Filtering Input: g(t): Decay Function; S: Total Number of Time Steps; K: Gibbs Moves Per Time Step; Output: Approximate Filtering Distributions D s 1 for s = 1 to S do 2 for i = 1 to K do 3 Choose t from g(s); 4 sample X t from P (X t X t 1, X t+1, Y t ); 5 increment count for X s in D s ; 6 return D 1,..., D T ; 4 SDMCMC: Decayed MCMC with Static Parameters In this section we define a variant of the Decayed MCMC Filtering algorithm (introduced in [4]) for the static parameter estimation problem. 4.1 Decayed MCMC Algorithm We start by giving a brief overview of Decayed MCMC Filtering. Assume as before hidden state variables X 1,..., X T and evidence variables Y 1,..., Y T. Decayed MCMC creates a markov chain to target the distribution X T Y 1,... Y T through Gibbs updates. To save time, the algorithm spends progressively less time updating past elements and focuses on recent history. The rate of this decay is given by function g(t) on window [0, T ] which controls sampling. For example g(t ) = 1 gives T the standard Gibbs Sampling Methodology. The result achieved in [4] is that for g α,δ (t) = α(t t + 1) (1+δ) Algorithm 1 converges to the true filtering distribution for X T in time not dependent on T. 4.2 Decayed MCMC for Static Parameters MCMC techniques have long been used in the statistical literature to estimate static parameters. We consequently propose Algorithm (2) as a method to dynamically learn static parameters while filtering. 4.2.1 Empirical Results Empirical results on the nonlinear state space model shows that Algorithm 2 works well in practice. However, the effectiveness of Algorithm 2 becomes most clear when we emphasize the fact that this inference is performed without prior knowledge of Θ = (σ v, σ w ). For comparison, we perform inference in the Particle Filter with Θ initialized according to P Θ. From Figures 4 and 5, we see that the particle filter noticeably diverges from the true state, while SDMCMC achieves near perfect accuracy. In fact, with growing sequence size S, Algorithm 2 does not lose accuracy, while the particle filter does. Figure 6 compares the L 2 distances of the inferred and true solutions for SDMCMC and 3

Algorithm 2: SDMCMC: Decayed MCMC Filtering for Static Parameters Input: g(t): Decay Function; S: Total Number of Time Steps; K: Gibbs Moves Per Time Step; Output: Approximate Filtering Distributions D s, parameter Θ 1 Sample Θ from prior P Θ ; 2 for s = 1 to S do 3 for i = 1 to K do 4 Choose t from g(s); 5 Sample u from U([0, 1]); 6 if u < 1 then S 7 Resample Θ from P (Θ X 1,..., X s ); 8 9 10 else sample X t from P (X t X t 1, X t+1, Y t, Θ); increment count for X s in D s ; 11 12 Sample X s+1 from P (X s+1 X s, Y s+1, Θ); return D 1,..., D T, Θ; Figure 4: Decayed MCMC, K = 3 Figure 5: Particle Filter, 50 particles SMC (Particle Filter) methods for S = 200 timesteps. This figure indicates that the L 2 distance is divergent for growing T with SMC methods but is convergent for SDMCMC. Finally, we consider the parameter learning capabilities of SDMCMC. Figures 7 and 8 show the histograms of σ v, σ w considered in a run of Algorithm 2. Although the distribution does not center around the true parameters σ v = σ w = 3, it is close. The diagrams suggest that there might be some bias. Further analysis is required to clarify this point. 4.2.2 Preliminary Mathematical Correctness Analysis We present some preliminary mathematical analysis of Algorithm 2. Theorem 4.1. For all s < S in the outer for loop of Algorithm 2, the inner for loop defines a Markov process with stationary distribution X 1,..., X s, Θ Y 1,..., Y s. Proof. It suffices to show that the distribution X 1,..., X s, Θ Y 1,..., Y s is invariant under an action of the inner for loop. To do so, we will consider the two cases of the if condition separately. Suppose u < 1/S. Note that the marginal distribution (summing out Θ) on X 1,... X s is exactly 4

Figure 6: Log-Log Comparison of L 2 distance for SDM- Figure 7: σ v Histogram, True CMC and SMC σ v = 3 Figure 8: σ w Histogram, True σ w = 3 the conditional X 1,..., X s Y 1,..., Y s. Algorithm 2 samples Θ P (Θ X 1,..., X s ). It follows that the joint distribution after the sample is P (Θ X 1,..., X s )P (X 1,..., X s Y 1,..., Y s ) = P (Θ X 1,..., X s, Y 1,..., Y s )P (X 1,..., X s Y 1,..., Y s ) = P (Θ, X 1,..., X s Y 1,..., Y s ) Now suppose that u 1/S. Suppose that i is sampled according to decay g(s). Then the marginal distribution (summing out X i ) is Θ, X 1,..., ˆX i,..., X s P (Θ, X 1,..., ˆX i,..., X s Y 1,..., Y s ) The hat signifies exclusion. Conditional independence shows that the joint distribution after the Gibbs sample is then P (X i Θ, X i 1, X i+1, Y i )P (Θ, X 1,..., ˆX i,... X s Y 1,..., Y s ) = P (X i Θ, X 1,..., ˆX i,..., X s, Y 1,..., Y s )P (Θ, X 1,..., ˆX i,..., X s Y 1,..., Y s ) = P (Θ, X 1,..., X s Y 1,..., Y s ) References [1] C. Andrieu, A. Doucet, and R. Holenstein. Particle markov chain monte carlo methods. Journal of the Royal Statistical Society, 2010. [2] N.J. Gordon, D.J. Salmond, and A.F.M. Smith. Novel approach to nonlinear/non-gaussian bayesian state estimation. In Radar and Signal Processing, IEE Proceedings F, volume 140, pages 107 113. IET, 1993. [3] N. Kantas. An overview of sequential monte carlo methods for parameter estimation in general state-space models. In Proc. IFAC Symposium on System Identification (SYSID), 2009. [4] B. Marthi, H. Pasula, S. Russell, and Y. Peres. Decayed mcmc filtering. In Proceedings of the Eighteenth conference on Uncertainty in artificial intelligence, pages 319 326. Morgan Kaufmann Publishers Inc., 2002. 5