Divide-and-Conquer Sequential Monte Carlo

Similar documents
An Brief Overview of Particle Filtering

Monte Carlo Approximation of Monte Carlo Filters

The Hierarchical Particle Filter

Divide-and-Conquer with Sequential Monte Carlo

Controlled sequential Monte Carlo

Exercises Tutorial at ICASSP 2016 Learning Nonlinear Dynamical Models Using Particle Filters

Supplementary material for Divide-and-Conquer with Sequential Monte Carlo

Sequential Monte Carlo Methods for Bayesian Computation

Advanced Computational Methods in Statistics: Lecture 5 Sequential Monte Carlo/Particle Filtering

1 / 31 Identification of nonlinear dynamic systems (guest lectures) Brussels, Belgium, June 8, 2015.

1 / 32 Summer school on Foundations and advances in stochastic filtering (FASF) Barcelona, Spain, June 22, 2015.

An introduction to Sequential Monte Carlo

Auxiliary Particle Methods

A Note on Auxiliary Particle Filters

SMC 2 : an efficient algorithm for sequential analysis of state-space models

Adversarial Sequential Monte Carlo

Sequential Monte Carlo Methods for Bayesian Model Selection in Positron Emission Tomography

L09. PARTICLE FILTERING. NA568 Mobile Robotics: Methods & Algorithms

Sequential Monte Carlo in the machine learning toolbox

Sequential Monte Carlo methods for system identification

Introduction. log p θ (y k y 1:k 1 ), k=1

A Review of Pseudo-Marginal Markov Chain Monte Carlo

Sequential Monte Carlo Samplers for Applications in High Dimensions

Learning Static Parameters in Stochastic Processes

Sequential Monte Carlo and Particle Filtering. Frank Wood Gatsby, November 2007

Sequential Monte Carlo samplers

Particle Filtering a brief introductory tutorial. Frank Wood Gatsby, August 2007

Kernel adaptive Sequential Monte Carlo

Package RcppSMC. March 18, 2018

An ABC interpretation of the multiple auxiliary variable method

Density Estimation. Seungjin Choi

Concentration inequalities for Feynman-Kac particle models. P. Del Moral. INRIA Bordeaux & IMB & CMAP X. Journées MAS 2012, SMAI Clermond-Ferrand

Sequential Monte Carlo for Graphical Models

An Introduction to Particle Filtering

Graphical model inference: Sequential Monte Carlo meets deterministic approximations

An introduction to particle filters

Particle Filtering for Data-Driven Simulation and Optimization

Sequential Monte Carlo Methods (for DSGE Models)

Inference in state-space models with multiple paths from conditional SMC

Introduction to Particle Filters for Data Assimilation

Bayesian Computations for DSGE Models

Kernel Sequential Monte Carlo

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 13: SEQUENTIAL DATA

Computer Intensive Methods in Mathematical Statistics

Markov Chain Monte Carlo Methods for Stochastic Optimization

Particle Filtering Approaches for Dynamic Stochastic Optimization

arxiv: v1 [stat.co] 1 Jun 2015

An efficient stochastic approximation EM algorithm using conditional particle filters

LTCC: Advanced Computational Methods in Statistics

Sensor Fusion: Particle Filter

A new class of interacting Markov Chain Monte Carlo methods

Sequential Monte Carlo Methods

Sequential Monte Carlo Methods

Identification of Gaussian Process State-Space Models with Particle Stochastic Approximation EM

The chopthin algorithm for resampling

Monte Carlo Solution of Integral Equations

Efficient Monitoring for Planetary Rovers

Data-driven Particle Filters for Particle Markov Chain Monte Carlo

Markov Chain Monte Carlo Methods for Stochastic

Identification of Gaussian Process State-Space Models with Particle Stochastic Approximation EM

Pseudo-marginal MCMC methods for inference in latent variable models

Bayesian Sequential Design under Model Uncertainty using Sequential Monte Carlo

Lecture 8: Bayesian Estimation of Parameters in State Space Models

Recent Developments in Auxiliary Particle Filtering

Mean field simulation for Monte Carlo integration. Part II : Feynman-Kac models. P. Del Moral

Computer Intensive Methods in Mathematical Statistics

Lecture 2: From Linear Regression to Kalman Filter and Beyond

RAO-BLACKWELLIZED PARTICLE FILTER FOR MARKOV MODULATED NONLINEARDYNAMIC SYSTEMS

Data-driven Particle Filters for Particle Markov Chain Monte Carlo

A new iterated filtering algorithm

Learning of state-space models with highly informative observations: a tempered Sequential Monte Carlo solution

The Particle Filter. PD Dr. Rudolph Triebel Computer Vision Group. Machine Learning for Computer Vision

List of projects. FMS020F NAMS002 Statistical inference for partially observed stochastic processes, 2016

Notes on pseudo-marginal methods, variational Bayes and ABC

TSRT14: Sensor Fusion Lecture 8

Infinite-State Markov-switching for Dynamic. Volatility Models : Web Appendix

Adaptive Population Monte Carlo

Particle Metropolis-Hastings using gradient and Hessian information

Advanced Monte Carlo integration methods. P. Del Moral (INRIA team ALEA) INRIA & Bordeaux Mathematical Institute & X CMAP

Sequential Bayesian Inference for Dynamic State Space. Model Parameters

Graphical Models and Kernel Methods

Practical Bayesian Optimization of Machine Learning. Learning Algorithms

Particle Filters: Convergence Results and High Dimensions

Parallel sequential Monte Carlo samplers and estimation of the number of states in a Hidden Markov Model

Particle methods for maximum likelihood estimation in latent variable models

EVALUATING SYMMETRIC INFORMATION GAP BETWEEN DYNAMICAL SYSTEMS USING PARTICLE FILTER

Master Degree in Data Science. Particle Filtering for Nonlinear State Space Models. Hans-Peter Höllwirth. Director: Dr. Christian Brownlees

Computer Intensive Methods in Mathematical Statistics

MCMC for big data. Geir Storvik. BigInsight lunch - May Geir Storvik MCMC for big data BigInsight lunch - May / 17

Lecture 2: From Linear Regression to Kalman Filter and Beyond

Integrated Non-Factorized Variational Inference

Sequential Monte Carlo Methods in High Dimensions

Bayesian Inference and MCMC

Lecture Particle Filters

Calibration of Stochastic Volatility Models using Particle Markov Chain Monte Carlo Methods

Particle Metropolis-adjusted Langevin algorithms

Sequential Monte Carlo for Bayesian Computation

Nested Sequential Monte Carlo Methods

Based on slides by Richard Zemel

Negative Association, Ordering and Convergence of Resampling Methods

Transcription:

Divide-and-Conquer Joint work with: John Aston, Alexandre Bouchard-Côté, Brent Kirkpatrick, Fredrik Lindsten, Christian Næsseth, Thomas Schön University of Warwick a.m.johansen@warwick.ac.uk http://go.warwick.ac.uk/amjohansen/talks/ Algorithms and Computationally Intensive Statistics November 25th, 2016

Outline Importance Sampling to (SMC) SMC to Divide and Conquer SMC (DC-SMC) Some Theoretical Properties of DC-SMC Illustrative Applications Conclusions and Some (Open) Questions

Essential Problem The Abstract Problem Given a density, π(x) = γ(x) Z, such that γ(x) can be evaluated pointwise, how can we approximate π and how about Z? Can we do so robustly? In a distributed setting?

Importance Sampling Simple identity: provided γ µ: γ(x) Z = γ(x)dx = µ(x) µ(x)dx So, if X 1, X 2,... iid µ, then: unbiasedness slln clt [ 1 lim N N N [ ] 1 N γ(x i ) N : E =Z N µ(x i ) 1 lim N N N i=1 ( [ ]) γ(xi ) where W N 0, Var µ(x 1 ) ϕ(x 1). N i=1 i=1 γ(x i ) µ(x i ) ϕ(x i) a.s. γ(ϕ) γ(x i ) µ(x i ) ϕ(x i) γ(ϕ) ] d W

Sequential Importance Sampling Write γ(x 1:n ) = γ(x 1 ) define, for p = 1,..., n then γ(x 1:n ) µ(x 1:n ) }{{} W n(x 1:n ) γ p (x 1:p ) = γ 1 (x 1 ) = γ 1(x 1 ) µ 1 (x 1 ) }{{} w 1 (x 1 ) p=2 n γ(x p x 1:p 1 ), p=2 p γ(x q x 1:q 1 ), q=2 n γ p (x 1:p ), γ p 1 (x 1:p 1 )µ p (x p x 1:p 1 ) }{{} w p(x 1:p ) and we can sequentially approximate Z p = γ p (x 1:p )dx 1:p.

Sequential Importance Resampling (SIR) Given a sequence γ 1 (x 1 ), γ 2 (x 1:2 ),...: Initialisation, n = 1: Sample X 1 1,..., X N 1 Compute iid µ 1 W i 1 = γ 1(X i 1 ) µ i 1 (X i 1 ) Obtain Ẑ N 1 = 1 N N i=1 W i 1 π N 1 = Ni=1 W i 1 δ X i 1 N j=1 W j 1 [This is just (self-normalized) importance sampling.]

Iteration, n n + 1: Resample: sample (X 1 n,1:n 1,..., X N n,1:n 1 ) iid N i=1 δ X i n 1 Sample X i n,n q n ( X i n,1:n 1 ) Compute W i n = γ n (X i n,1:n ) γ n 1 (X i n,1:n 1 ) q n(x i n,n X i n,1:n 1 ). Obtain Ẑ N n = Ẑ N n 1 1 N N Wn i π n N = i=1 N i=1 W nδ i X i n N j=1 W n j.

SIR: Theoretical Justification Under regularity conditions we still have: unbiasedness slln E[Ẑ N n ] = Z n lim N πn n (ϕ) a.s. = π n (ϕ) clt For a normal random variable W n of appropriate variance: lim N[ π N n (ϕ) π n (ϕ)] = d W n N although establishing this becomes a little harder (cf., e.g. Del Moral (2004), Andrieu et al. 2010).

Simple Particle Filters: One Family of SIR Algorithms x 1 x 2 x 3 x 4 x 5 x 6 y 1 y 2 y 3 y 4 y 5 y 6 Unobserved Markov chain {X n } transition f. Observed process {Y n } conditional density g. The joint density is available: p(x 1:n, y 1:n θ) = f θ 1 (x 1 )g θ (y 1 x 1 ) Natural SIR target distributions: n f θ (x i x i 1 )g θ (y i x i ). i=2 πn(x θ 1:n ) :=p(x 1:n y 1:n, θ) p(x 1:n, y 1:n θ) =: γn(x θ 1:n ) Zn θ = p(x 1:n, y 1:n θ)dx 1:n = p(y 1:n θ)

Bootstrap PFs and Similar Choosing πn(x θ 1:n ) :=p(x 1:n y 1:n, θ) p(x 1:n, y 1:n θ) =: γn(x θ 1:n ) Zn θ = p(x 1:n, y 1:n θ)dx 1:n = p(y 1:n θ) and q p (x p x 1:p 1 ) = f θ (x p x p 1 ) yields the bootstrap particle filter of Gordon et al. (1993), whereas q p (x p x 1:p 1 ) = p(x p x p 1, y p, θ) yields the locally optimal particle filter. Note: Many alternative particle filters are SIR algorithms with other targets. Cf. J. and Doucet (2008); Doucet and J. (2011).

Samplers: Another SIR Class Given a sequence of targets π 1,..., π n on arbitrary spaces, Del Moral et al. (2006) extend the space: π n (x 1:n ) =π n (x n ) γ n (x 1:n ) =γ n (x n ) Z n = = 1 p=n 1 1 p=n 1 γ n (x 1:n )dx 1:n γ n (x n ) L p (x p+1, x p ) L p (x p+1, x p ) 1 p=n 1 L p (x p+1, x p )dx 1:n = γ n (x n )dx n = Z n

A Simple SMC Sampler Given γ 1,..., γ n, on (E, E), for i = 1,..., N Sample X i 1 iid µ 1 compute W i 1 = γ 1(X i 1 ) µ 1 (X i 1 ) and Ẑ1 N = 1 N N i=1 W 1 i For p = 2,..., n Resample: X 1:N iid n,n 1 N i=1 W n 1 i δ X n 1. Sample: X i n K n (X i n,1:n 1, ), where π nk n = π n. Compute: W i n = γn(x i n,n 1 ) γ n 1(X i n,n 1 ). Then Ẑ N n = Ẑ N n 1 1 N N i=1 W i n, n and πn N i=1 = W nδ i X i n n. j=1 W n j

Bayesian Inference (Chopin, 2001;Del Moral et al., 2006) In a Bayesian context: Given a prior p(θ) and likelihood l(θ; y 1:m ) One could specify: Data Tempering γ p (θ) = p(θ)l(θ; y 1:mp ) for m 1 = 0 < m 2 < < m T = m Likelihood Tempering γ p (θ) = p(θ)l(θ; y 1:m ) βp β 1 = 0 < β 2 < < β T = 1 Something else? Here Z T = p(θ)l(θ; y 1:n )dθ and γ T (θ) p(θ y 1:n ). Specifying (m 1,..., m T ), (β 1,..., β T ) or (γ 1,..., γ T ) is hard 1. for 1 Just ask Lewis...

Illustrative Sequences of Targets dnorm(x, mu0, var0) 0.0 0.2 0.4 0.6 0.8 1.0 1.2 dnorm(x, mu0, var0) 0.0 0.2 0.4 0.6 0.8 1.0 1.2 10 5 0 5 10 x 10 5 0 5 10 x

One Adaptive Scheme (Zhou, J. & Aston, 2016)+Refs Resample When ESS(W 1:N n ) = ( N i=1 (W i n) 2 ) 1 is below a threshold. Likelihood Tempering At iteration n: Set β n such that: N( N n ) 2 N k=1 W (k) (k) n 1 (w n ) = CESS 2 j=1 W (j) n 1 w (j) which controls χ 2 -discrepancy between successive distributions. Proposals Follow (Jasra et al., 2010): adapt to keep acceptance rate about right. Question Are there better, practical approaches to specifying a sequence of distributions?

Divide-and-Conquer (Lindsten, J. et al., 2016) Many models admit natural decompositions: Level 0: x 5 Level 1: x 4 x 4 Level 2: x 1 x 2 x 3 x 1 x 2 x 3 x 1 x 2 x 3 y 1 y 2 y 3 y 1 y 2 y 3 y 1 y 2 y 3 To which we can apply a divide-and-conquer strategy: π r π t......... π c1 π cc

A few formalities... Use a tree, T of models (with rootward variable inclusion): π r π t......... π c1 π cc Let t T denote a node; r T is the root. Let C(t) = {c 1,..., c C } denote the children of t. Let X t denote the space of variables included in t but not its children. dc-smc can be viewed as a recursion over this tree.

dc-smc(t) an extension of SIR 1. For c C(t) : 1.1 ({x i c, w i c} N i=1, Ẑ N c ) dc-smc(c). 1.2 Resample {x i c, wc} i N i=1 to obtain the equally weighted particle system {ˆx i c, 1} N i=1. 2. For particle i = 1 : N: 2.1 If X t, simulate x i t q t ( ˆx i c 1,..., ˆx i c C ), where (c 1, c 2,..., c C ) = C(t); else x i t. 2.2 Set x i t = (ˆx i c 1,..., ˆx i c C, x i t). 2.3 Compute w i t = γ t (x i t ) 1 c C(t) γc (ˆxi c ) q t ( x i t ˆxi c 1,...,ˆx i c ). C 3. Compute Ẑ N t = { 1 N N i=1 wi t} c C(t) Ẑ N c. 4. Return ({x i t, w i t} N i=1, Ẑ N t ).

Theoretical Properties I Unbiasedness of Normalising Constant Estimates Provided that γ t c C(t) γ c q t for every t T and an unbiased, exchangeable resampling scheme is applied to every population at every iteration, we have for any N 1: E[Ẑ r N ] = Z r = γ r (x r )dx r.

Strong Law of Large Numbers Under regularity conditions the weighted particle system (x N,i r, wr N,i ) N i=1 generated by dc-smc(r) is consistent in that for all functions f : Z R satisfying certain assumptions: N i=1 w N,i r N j=1 wn,j r ϕ(x N,i r ) a.s. π(ϕ(x)) as N.

Some (Importance) Extensions 1. Mixture Resampling: we could make use of N i=1 j=1 N wc i 1 wc j dγ t 2 (x i c dγ c1 γ 1, x j c 2 )δ (x i c2 c1,x j c 2 ) 2. Tempering (Del Moral et al, 2006): between γ c1 γ c2 and γ t 3. Adaptation (Zhou, J. and Aston, 2016): tempering sequence MCMC kernel parameters

An Ising Model k indexes v k V Z 2 x k { 1, +1} p(z) e βe(z), β 0 E(z) = (k,l) E x kx l We consider a grid of size 64 64 with β = 0.4407 (the critical temperature).

A sequence of decompositions

log Z 3815 3810 3805 3800 SMC D&C SMC (ann) D&C SMC (ann+mix) 2450 2500 2550 2600 10 2 10 3 10 4 E[E(x)] Wall clock time (s) 2650 SMC D&C SMC (ann) D&C SMC (ann+mix) Single flip MH 10 2 10 3 10 4 Wall clock time (s) Summaries over 50 independent runs of each algorithm.

New York Schools Maths Test: data Data organised into a tree T. A root-to-leaf path is: NYC (the root, denoted by r T ), borough, school district, school, year. Each leaf t T comes with an observation of m t exam successes out of M t trials. Total of 278 399 test instances five borough (Manhattan, The Bronx, Brooklyn, Queens, Staten Island), 32 distinct districts, 710 distinct schools.

New York Schools Maths Test: Bayesian Model Number of successes m t at a leaf t is Bin(M t, p t ). where p t = logistic(θ t ), where θ t is a latent parameter. internal nodes of the tree also have a latent θ t model the difference in θ t along e = (t t ) as θ t = θ t + e, where, e N(0, σ 2 e). We put an improper prior (uniform on (, )) on θ r. We also make the variance random, but shared across siblings, σ 2 t Exp(1).

New York Schools Maths Test: Implementation The basic SIR-implementation of dc-smc. Using the natural hierarchical structure provided by the model. Given σ 2 t and the θ t at the leaves, the other random variables are multivariate normal. We instantiate values for θ t only at the leaves. At internal node t, sample only σt 2 and marginalize out θ t. Each step of dc-smc therefore is either: i. At leaves sample p t Beta(1 + m t, 1 + M t m t ) and set θ t = logit(p t ). ii. At internal nodes sample σt 2 Exp(1). Java implementation: https://github.com/alexandrebouchard/multilevelsmc

New York Schools Maths Test: Results Mean parameter 5 4 3 2 1 0 NY NY Bronx NY Kings NY Manhattan NY Queens NY Richmond 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 Posterior density approximation DC with 10 000 particles. Bronx County has the highest fraction (41%) of children (under 18) living below poverty level. 2 Queens has the second lowest (19.7%), after Richmond (Staten Island, 16.7%). Staten Island contains a single school district so the posterior distribution is much flatter for this borough. 2 Statistics from the New York State Poverty Report 2013, http://ams.nyscommunityaction.org/resources/documents/news/nyscaas 2013 Poverty Report.pdf

Normalising Constant Estimates 3825 Estimate of log(z) 3850 3875 3900 sampling_method 100 1000 10000 100000 1000000 Number of particles (log scale) DC STD

Distributed Implementation 5000 4000 Time (s) 3000 2000 Speedup 10 1000 0 1 0 10 20 30 Number of nodes 1 10 Number of nodes Xeon X5650 2.66GHz processors connected by a non-blocking Infiniband 4X QDR network

Conclusions SMC SIR D&C-SMC SIR + Coalescence Distributed implementation is straightforward D&C strategy can improve even serial performance Some questions remain unanswered: How can we construct (near) optimal tree-decompositions? How much standard SMC theory can be extended to this setting? Some application areas are appealing: Inference for phylogenetic trees in linguistics. Principled aggregration of mass univariate analyses from neuroimaging.

References [1] C. Andrieu, A. Doucet, and R. Holenstein. Particle Markov chain Monte Carlo. Journal of the Royal Statistical Society B, 72(3):269 342, 2010. [2] N. Chopin. A sequential particle filter method for static models. Biometrika, 89(3):539 551, 2002. [3] P. Del Moral. Feynman-Kac formulae: genealogical and interacting particle systems with applications. Probability and Its Applications. Springer Verlag, New York, 2004. [4] P. Del Moral, A. Doucet, and A. Jasra. methods for Bayesian Computation. In Bayesian Statistics 8. Oxford University Press, 2006. [5] A. Doucet and A. M. Johansen. A tutorial on particle filtering and smoothing: Fiteen years later. In D. Crisan and B. Rozovsky, editors, The Oxford Handbook of Nonlinear Filtering, pages 656 704. Oxford University Press, 2011. [6] N. J. Gordon, S. J. Salmond, and A. F. M. Smith. Novel approach to nonlinear/non-gaussian Bayesian state estimation. IEE Proceedings-F, 140(2):107 113, April 1993. [7] A. Jasra, D. A. Stephens, A. Doucet, and T. Tsagaris. Inference for Lévy-Driven Stochastic Volatility Models via Adaptive. Scandinavian Journal of Statistics, 38(1):1 22, Dec. 2010. [8] A. M. Johansen and A. Doucet. A note on the auxiliary particle filter. Statistics and Probability Letters, 78 (12):1498 1504, September 2008. URL http://dx.doi.org/10.1016/j.spl.2008.01.032. [9] F. Lindsten, A. M. Johansen, C. A. Naesseth, B. Kirkpatrick, T. Schön, J. A. D. Aston, and A. Bouchard-Côté. Divide and conquer with sequential Monte Carlo samplers. Journal of Computational and Graphical Statistics, 2016. In press. [10] Y. Zhou, A. M. Johansen, and J. A. D. Aston. Towards automatic model comparison: An adaptive sequential Monte Carlo approach. Journal of Computational and Graphical Statistics, 25(3):701 726, 2016. doi: 10.1080/10618600.2015.1060885.