A new iterated filtering algorithm

Similar documents
Inference for partially observed stochastic dynamic systems: A new algorithm, its theory and applications

POMP inference via iterated filtering

Inferring biological dynamics Iterated filtering (IF)

Malaria transmission: Modeling & Inference

Learning Static Parameters in Stochastic Processes

Inference for partially observed stochastic dynamic systems. University of California, Davis Statistics Department Seminar

Dynamic Modeling and Inference for Ecological and Epidemiological Systems

Dynamic modeling and inference for ecological systems. IOE 899: Seminar in Industrial and Operations Engineering

Monte Carlo in Bayesian Statistics

Exercises Tutorial at ICASSP 2016 Learning Nonlinear Dynamical Models Using Particle Filters

Sequential Monte Carlo Samplers for Applications in High Dimensions

Likelihood-based inference for dynamic systems

Advanced Computational Methods in Statistics: Lecture 5 Sequential Monte Carlo/Particle Filtering

Sequential Bayesian Inference for Dynamic State Space. Model Parameters

Sequential Monte Carlo Methods for Bayesian Computation

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 13: SEQUENTIAL DATA

Density Estimation. Seungjin Choi

Multistate Modeling and Applications

Journal of Statistical Software

Kernel adaptive Sequential Monte Carlo

Scaling Neighbourhood Methods

Journal of Statistical Software

Computational statistics

Approximate Bayesian Computation and Particle Filters

Computer Intensive Methods in Mathematical Statistics

Pseudo-marginal MCMC methods for inference in latent variable models

Riemann Manifold Methods in Bayesian Statistics

A Review of Pseudo-Marginal Markov Chain Monte Carlo

Approximate Bayesian Computation

MCMC for big data. Geir Storvik. BigInsight lunch - May Geir Storvik MCMC for big data BigInsight lunch - May / 17

Markov Chain Monte Carlo

CPSC 540: Machine Learning

Sequence Modelling with Features: Linear-Chain Conditional Random Fields. COMP-599 Oct 6, 2015

Infinite-State Markov-switching for Dynamic. Volatility Models : Web Appendix

Stat 535 C - Statistical Computing & Monte Carlo Methods. Arnaud Doucet.

Nonparametric Drift Estimation for Stochastic Differential Equations

Panel data analysis via mechanistic models

Bayesian Methods for Machine Learning

Chapter 12 PAWL-Forced Simulated Tempering

Controlled sequential Monte Carlo

Hmms with variable dimension structures and extensions

Bayesian Computations for DSGE Models

EVALUATING SYMMETRIC INFORMATION GAP BETWEEN DYNAMICAL SYSTEMS USING PARTICLE FILTER

Graphical Models for Collaborative Filtering

Lecture 2: From Linear Regression to Kalman Filter and Beyond

Nonparametric Bayesian Methods (Gaussian Processes)

An efficient stochastic approximation EM algorithm using conditional particle filters

Lecture 2: From Linear Regression to Kalman Filter and Beyond

Sequential Monte Carlo Methods (for DSGE Models)

STA 4273H: Statistical Machine Learning

SMC 2 : an efficient algorithm for sequential analysis of state-space models

Computer Practical: Metropolis-Hastings-based MCMC

Fast Likelihood-Free Inference via Bayesian Optimization

Approximate Bayesian Computation: a simulation based approach to inference

Perfect simulation algorithm of a trajectory under a Feynman-Kac law

Bayesian Estimation of DSGE Models 1 Chapter 3: A Crash Course in Bayesian Inference

Weak convergence of Markov chain Monte Carlo II

Introduction to Particle Filters for Data Assimilation

David Giles Bayesian Econometrics

Log Gaussian Cox Processes. Chi Group Meeting February 23, 2016

Markov Chains and Hidden Markov Models

Supplement to TB in Canadian First Nations at the turn-of-the twentieth century

Bayesian Calibration of Simulators with Structured Discretization Uncertainty

Machine Learning Techniques for Computer Vision

Analysis Methods for Supersaturated Design: Some Comparisons

Chapter 3: Maximum-Likelihood & Bayesian Parameter Estimation (part 1)

Consistency of the maximum likelihood estimator for general hidden Markov models

Principles of Bayesian Inference

Computer Intensive Methods in Mathematical Statistics

CSC 2541: Bayesian Methods for Machine Learning

Part 1: Expectation Propagation

1 / 31 Identification of nonlinear dynamic systems (guest lectures) Brussels, Belgium, June 8, 2015.

The Kalman Filter ImPr Talk

arxiv: v1 [stat.co] 1 Jun 2015

Introduction to Machine Learning

Sequential Monte Carlo Methods for Bayesian Model Selection in Positron Emission Tomography

Spatially Adaptive Smoothing Splines

Tutorial on Approximate Bayesian Computation

Blind Equalization via Particle Filtering

Bayesian Regression Linear and Logistic Regression


Gaussian Process Approximations of Stochastic Differential Equations

Sequential Monte Carlo and Particle Filtering. Frank Wood Gatsby, November 2007

Stat 535 C - Statistical Computing & Monte Carlo Methods. Arnaud Doucet.

Machine Learning for OR & FE

1 / 32 Summer school on Foundations and advances in stochastic filtering (FASF) Barcelona, Spain, June 22, 2015.

Condensed Table of Contents for Introduction to Stochastic Search and Optimization: Estimation, Simulation, and Control by J. C.

MCMC Sampling for Bayesian Inference using L1-type Priors

Variational Autoencoder

LECTURE 15 Markov chain Monte Carlo

Package RcppSMC. March 18, 2018

Inference in state-space models with multiple paths from conditional SMC

Stat 535 C - Statistical Computing & Monte Carlo Methods. Arnaud Doucet.

State-Space Methods for Inferring Spike Trains from Calcium Imaging

Auxiliary Particle Methods

Stein Variational Gradient Descent: A General Purpose Bayesian Inference Algorithm

State Space and Hidden Markov Models

Efficient Likelihood-Free Inference

April 20th, Advanced Topics in Machine Learning California Institute of Technology. Markov Chain Monte Carlo for Machine Learning

Bayesian nonparametrics

Transcription:

A new iterated filtering algorithm Edward Ionides University of Michigan, Ann Arbor ionides@umich.edu Statistics and Nonlinear Dynamics in Biology and Medicine Thursday July 31, 2014

Overview 1 Introduction to iterated filtering methodology. 2 A new iterated filtering algorithm (IF2). 3 Theoretical justification of IF2. 4 Applications of IF2.

POMP models Data y 1,..., y N collected at times t 1,..., t N are modeled as noisy and incomplete observations of a latent Markov process {X (t)}. This is a partially observed Markov process (POMP) model, also known as a hidden Markov model (HMM) or state space model.

SMC methods for POMP models Filtering is estimation of the latent dynamic process X (t n ) given data y1,..., y N for a fixed model, parameterized by θ. Sequential Monte Carlo (SMC) is a numerical method for filtering and evaluating the likelihood function. SMC is also called a particle filter. Filtering has extensive applications in science and engineering. Over the last 15 years, SMC has become popular for filtering non-linear, non-gaussian POMP models.

What is iterated filtering? Iterated filtering algorithms adapt SMC into a tool for inference on θ. We call IF1 the iterated filtering algorithm of Ionides, Bretó & King (2006). IF1 uses an extended POMP model where θ is replaced by a time-varying process θ(t) which follows a random walk. SMC filtering on this model can approximate the derivative of the log likelihood. Two naive approaches that fail in all but simple problems: Apply a black-box optimizer such as Nelder-Mead to the SMC evaluation of the likelihood. Carry out Bayesian inference by SMC with θ added to the POMP as a static parameter.

IF2: iterated SMC with perturbed parameters For m in 1 : M [M filtering iterations, with decreasing σ m ] h 0 ( Θ m 1 j ; σ m ) for j in 1 : J X F,m f X0 (x 0 ; Θ F,m ) for j in 1 : J For n in 1 : N [SMC with J particles] Θ F,m Θ P,m h n ( Θ F,m n 1,j, σ m) for j in 1 : J X P,m f Xn X n 1 (x n X F,m w m = f Y n X n (y n X P,m n 1,j ; ΘP,m j ) for j in 1 : J ; Θ P,m ) for j in 1 : J J Draw k 1:J with P(k j = i) = wn,i/ m u=1 w n,u m Θ F,m = Θ P,m n,k j and X F,m = X P,m n,k j for j in 1 : J End For Set Θ m j = Θ F,m N,j End For for j in 1 : J

IF2: iterated SMC with perturbed parameters For m in 1 : M h 0 ( Θ m 1 j ; σ m ) for j in 1 : J X F,m f X0 (x 0 ; Θ F,m ) for j in 1 : J [carry out SMC on an extended model, with the time-varying Θ F,m parameters included in the latent state, intialized at (X F,m, Θ F,m )] Set Θ m j = Θ F,m N,j for j in 1 : J End For

input: Simulator for latent process initial density, f X0 (x 0 ; θ) Simulator for latent process transition density, f Xn Xn 1 (x n x n 1 ; θ), n in 1 : N Evaluator for measurement density, f Yn Xn (y n x n ; θ), n in 1 : N Data, y1:n Number of iterations, M Number of particles, J Initial parameter swarm, {Θ 0 j, j in 1 : J} Perturbation density, h n(θ ϕ ; σ), n in 1 : N Perturbation sequence, σ 1:M output: Final parameter swarm, {Θ M j, j in 1 : J} For m in 1 : M End For Θ F,m X F,m h 0 ( Θ m 1 ; σ j m) for j in 1 : J f X0 (x 0 ; Θ F,m ) for j in 1 : J For n in 1 : N Θ P,m X P,m h n( Θ F,m, σm) for j in 1 : J n 1,j f Xn Xn 1 (x n X F,m n 1,j ; ΘP,m ) for j in 1 : J j w m = f Yn Xn (y n X P,m ; Θ P,m ) for j in 1 : J Draw k 1:J with P(k j = i) = w m n,i. PJu=1 w m n,u Θ F,m End For Set Θ m j = Θ F,m for j in 1 : J N,j = Θ P,m and X F,m = X P,m for j in 1 : J n,k j n,k j

Numerical examples We compare IF1, IF2 and the particle Markov chain Monte Carlo (PMCMC) method of Andrieu et al (2010). PMCMC is an SMC-based plug-and-play algorithm for full-information Bayesian inference on POMPs. Computations were carried out using the pomp R package (King et al, 2009). Data and code reproducing our results are included in a supplement to an article in review (Ionides, Nguyen, Atchadé, Stoev & King).

Toy example. X (t) = ( exp{θ 1 }, θ 2 exp{θ 1 } ), constant for all t. 100 independent observations: Given X (t) = x, [ ( )] 100 0 Y n Normal x,. 0 1 A. IF1 point estimates from 30 replications and the MLE (green triangle). B. IF2 point estimates from 30 replications and the MLE (green triangle). C. Final parameter value of 30 PMCMC chains. D. Kernel density estimates from 8 of these 30 PMCMC chains, and the true posterior distribution (dotted black line).

Why is IF2 so much better than IF1 on this problem? IF1 updates parameters by a linear combination of filtered parameter estimates for the extended model with time-varying parameters. Taking linear combinations can knock the optimizer off nonlinear ridges of the likelihood function. IF2 does not have this vulnerability.

Application to a cholera model The study population P(t) is split into susceptibles, S(t), infecteds, I (t), and k recovered classes R 1 (t),..., R k (t). The state process X (t) = (S(t), I (t), R 1 (t),..., R k (t)) follows a stochastic differential equation driven by a Brownian motion {B(t)}, ds = { kɛr k + δ(s H) λ(t)s } dt + dp + (σi /P) db, di = { λ(t)s (m + δ + γ)i } dt, dr 1 = { γi (kɛ + δ)r 1 } dt, dr k. = { } kɛr k 1 (kɛ + δ)r k dt. The nonlinearity arises through the force of infection, λ(t), specified as { λ(t) = β exp β trend (t t 0 ) + } { N s j=1 β Ns } js j (t) I + ω exp j=1 ω js j (t), where {s j (t), j = 1,..., N s } is a periodic cubic B-spline basis. The data are monthly counts of cholera mortality, modeled as Y n Normal(M n, τ 2 M 2 n) for M n = t n t n 1 m I (s) ds.

Comparison of IF1 and IF2 on the cholera model. Algorithmic tuning parameters for both IF1 and IF2 were set at the values chosen by King et al (2008) for IF1. Log likelihoods of the parameter vector output by IF1 and IF2, both started at a uniform draw from a large hyper-rectangle. 17 poorly performing searches are off the scale of this plot (15 due to the IF1 estimate, 2 due to the IF2 estimate). Dotted lines show the maximum log likelihood.

IF2 as an iterated Bayes map Each iteration of IF2 is a Monte Carlo approximation to a map T σ f (θ N ) = l(θ 0:N )h(θ 0:N ϕ ; σ)f (ϕ) dϕ dθ 0:N 1 l(θ 0:N )h(θ 0:N ϕ ; σ)f (ϕ) dϕ dθ 0:N, (1) where l(θ 0:N ) is the likelihood of the data under the extended model with time-varying parameter θ 0:N. f and T σ f in (1) approximate the initial and final density of the IF2 parameter swarm. When the standard deviation of the parameter perturbations is held fixed at σ m = σ > 0, IF2 is a Monte Carlo approximation to T M σ f (θ). Iterated Bayes maps are not usually contractions. We study the homogeneous case, σ m = σ. Studying the limit σ 0 may be as appropriate as an asymptotic analysis to study the practical properties of a procedure such as IF2, with σ m decreasing down to some positive level σ > 0 but never completing the asymptotic limit σ m 0.

IF2 as a generalization of data cloning In the case σ = 0, the iterated Bayes map corresponds to the data cloning approach of Lele (2007, 2010). For σ = 0, Lele et al (2007, 2010) found central limit theorems. For σ 0, the limit as M is not usually Gaussian. Taking σ 0 adds numerical stability, which is necessary for convergence of SMC approximations.

Theorem 1. Assuming adequate mixing conditions, there is a unique probability density f σ with lim T σ M f = f σ, M with the limit taken in the total variation or Hilbert projective norms. The SMC approximation to T M σ f converges to T M σ f as J, uniformly in M. The Hilbert projective metric has the nice property that Tσ M contraction. Theorem 1 follows from existing results on filter stability. is a

Theorem 2. Under regularity conditions, lim σ 0 f σ approaches a point mass at the maximum likelihood estimate (MLE). Outline of proof. Trajectories in parameter space which stray away from the MLE are down-weighted by the Bayes map relative to trajectories staying close to the MLE. As σ decreases, excursions any fixed distance away from the MLE require an increasing number of iterations and therefore receive an increasing penalty from the iterated Bayes map. Bounding this penalty proves the theorem.

Conclusions IF1 enabled previously infeasible likelihood-based inference for non-linear, non-gaussian POMP models. We have not yet found a situation where IF2 performs substantially worse than IF1. In complex nonlinear models, we have found IF2 always substantially better. In addition, IF2 is simpler. Some extensions are easier: IF2 can readily handle parameters for which the information in the data is concentrated in a sub-interval. If you like IF1, you ll love IF2.

Andrieu, C., Doucet, A., and Holenstein, R. (2010). Particle Markov chain Monte Carlo methods. J. R. Stat. Soc. B, 72:269 342. Ionides, E. L., Bhadra, A., Atchadé, Y., and King, A. A. (2011). Iterated filtering. Ann. Stat., 39:1776 1802. Ionides, E. L., Bretó, C., and King, A. A. (2006). Inference for nonlinear dynamical systems. Proc. Natl. Acad. Sci. USA, 103:18438 18443. King, A. A., Ionides, E. L., Pascual, M., and Bouma, M. J. (2008). Inapparent infections and cholera dynamics. Nature, 454:877 880. Lele, S. R., Dennis, B., and Lutscher, F. (2007). Data cloning: easy maximum likelihood estimation for complex ecological models using Bayesian Markov chain Monte Carlo methods. Ecology Letters, 10(7):551 563. Lele, S. R., Nadeem, K., and Schmuland, B. (2010). Estimability and likelihood inference for generalized linear mixed models using data cloning. J. Am. Stat. Assoc., 105:1617 1625.

More iterated filtering references can be found on Wikipedia wikipedia.org/wiki/iterated_filtering Thank You! The End.