Flexible Particle Markov chain Monte Carlo methods with an application to a factor stochastic volatility model

Similar documents
Controlled sequential Monte Carlo

Pseudo-marginal MCMC methods for inference in latent variable models

Inference in state-space models with multiple paths from conditional SMC

Sequential Monte Carlo Methods for Bayesian Computation

Exercises Tutorial at ICASSP 2016 Learning Nonlinear Dynamical Models Using Particle Filters

Online appendix to On the stability of the excess sensitivity of aggregate consumption growth in the US

arxiv: v1 [stat.co] 1 Jun 2015

STA 4273H: Statistical Machine Learning

Calibration of Stochastic Volatility Models using Particle Markov Chain Monte Carlo Methods

List of projects. FMS020F NAMS002 Statistical inference for partially observed stochastic processes, 2016

Lecture 4: Dynamic models

Sequential Monte Carlo methods for system identification

An introduction to Sequential Monte Carlo

17 : Markov Chain Monte Carlo

PSEUDO-MARGINAL METROPOLIS-HASTINGS APPROACH AND ITS APPLICATION TO BAYESIAN COPULA MODEL

Bayesian Monte Carlo Filtering for Stochastic Volatility Models

SMC 2 : an efficient algorithm for sequential analysis of state-space models

Riemann Manifold Methods in Bayesian Statistics

Bayesian Model Comparison:

Computational statistics

Kernel Sequential Monte Carlo

Bayesian Methods for Machine Learning

A Note on Auxiliary Particle Filters

Hastings-within-Gibbs Algorithm: Introduction and Application on Hierarchical Model

Sequential Monte Carlo Methods (for DSGE Models)

CPSC 540: Machine Learning

The Metropolis-Hastings Algorithm. June 8, 2012

A Review of Pseudo-Marginal Markov Chain Monte Carlo

Advanced Computational Methods in Statistics: Lecture 5 Sequential Monte Carlo/Particle Filtering

On some properties of Markov chain Monte Carlo simulation methods based on the particle filter

Particle Metropolis-adjusted Langevin algorithms

Towards inference for skewed alpha stable Levy processes

Likelihood-free MCMC

Infinite-State Markov-switching for Dynamic. Volatility Models : Web Appendix

Sequential Monte Carlo Methods

An efficient stochastic approximation EM algorithm using conditional particle filters

Kernel adaptive Sequential Monte Carlo

Weak convergence of Markov chain Monte Carlo II

Introduction to Machine Learning

Hmms with variable dimension structures and extensions

CS242: Probabilistic Graphical Models Lecture 7B: Markov Chain Monte Carlo & Gibbs Sampling

Bayesian Inference for Discretely Sampled Diffusion Processes: A New MCMC Based Approach to Inference

Lecture 7 and 8: Markov Chain Monte Carlo

Discussion of Bootstrap prediction intervals for linear, nonlinear, and nonparametric autoregressions, by Li Pan and Dimitris Politis

Bayesian Inference and MCMC

Pattern Recognition and Machine Learning. Bishop Chapter 11: Sampling Methods

Monte Carlo Approximation of Monte Carlo Filters

Bayesian Computations for DSGE Models

Package RcppSMC. March 18, 2018

Auxiliary Particle Methods

TAKEHOME FINAL EXAM e iω e 2iω e iω e 2iω

Learning of state-space models with highly informative observations: a tempered Sequential Monte Carlo solution

Non-homogeneous Markov Mixture of Periodic Autoregressions for the Analysis of Air Pollution in the Lagoon of Venice

Markov Chain Monte Carlo methods

Introduction. log p θ (y k y 1:k 1 ), k=1

SUPPLEMENT TO MARKET ENTRY COSTS, PRODUCER HETEROGENEITY, AND EXPORT DYNAMICS (Econometrica, Vol. 75, No. 3, May 2007, )

On Markov chain Monte Carlo methods for tall data

Nonparametric Drift Estimation for Stochastic Differential Equations

Default Priors and Effcient Posterior Computation in Bayesian

Pseudo-marginal Metropolis-Hastings: a simple explanation and (partial) review of theory

Lecture 2: From Linear Regression to Kalman Filter and Beyond

Sequential Monte Carlo Samplers for Applications in High Dimensions

Particle Metropolis-Hastings using gradient and Hessian information

Inferring biological dynamics Iterated filtering (IF)

Probabilistic Graphical Models Lecture 17: Markov chain Monte Carlo

Labor-Supply Shifts and Economic Fluctuations. Technical Appendix

An Brief Overview of Particle Filtering

arxiv: v1 [stat.co] 23 Apr 2018

MCMC for big data. Geir Storvik. BigInsight lunch - May Geir Storvik MCMC for big data BigInsight lunch - May / 17

State-Space Methods for Inferring Spike Trains from Calcium Imaging

The Hierarchical Particle Filter

MCMC algorithms for fitting Bayesian models

State-space Model. Eduardo Rossi University of Pavia. November Rossi State-space Model Fin. Econometrics / 53

Lecture 8: The Metropolis-Hastings Algorithm

27 : Distributed Monte Carlo Markov Chain. 1 Recap of MCMC and Naive Parallel Gibbs Sampling

Negative Association, Ordering and Convergence of Resampling Methods

Markov Chain Monte Carlo (MCMC)

Lecture 2: From Linear Regression to Kalman Filter and Beyond

Dynamic System Identification using HDMR-Bayesian Technique

A Search and Jump Algorithm for Markov Chain Monte Carlo Sampling. Christopher Jennison. Adriana Ibrahim. Seminar at University of Kuwait

Bayesian Estimation of DSGE Models 1 Chapter 3: A Crash Course in Bayesian Inference

1 / 31 Identification of nonlinear dynamic systems (guest lectures) Brussels, Belgium, June 8, 2015.

One Pseudo-Sample is Enough in Approximate Bayesian Computation MCMC

Notes on pseudo-marginal methods, variational Bayes and ABC

Bayesian Parameter Inference for Partially Observed Stopped Processes

The chopthin algorithm for resampling

MCMC 2: Lecture 3 SIR models - more topics. Phil O Neill Theo Kypraios School of Mathematical Sciences University of Nottingham

Statistical Inference and Methods

Computer Intensive Methods in Mathematical Statistics

Markov Chain Monte Carlo Methods

arxiv: v3 [stat.me] 12 Jul 2015

A short introduction to INLA and R-INLA

Switching Regime Estimation

Inexact approximations for doubly and triply intractable problems

DAG models and Markov Chain Monte Carlo methods a short overview

Markov Chain Monte Carlo

A new iterated filtering algorithm

L09. PARTICLE FILTERING. NA568 Mobile Robotics: Methods & Algorithms

Computer Intensive Methods in Mathematical Statistics

Bayesian Inference for DSGE Models. Lawrence J. Christiano

Transcription:

Flexible Particle Markov chain Monte Carlo methods with an application to a factor stochastic volatility model arxiv:1401.1667v4 [stat.co] 23 Jan 2018 Eduardo F. Mendes School of Applied Mathematics Fundação Getulio Vargas David Gunawan School of Economics University of New South Wales Abstract Christopher K. Carter School of Economics University of New South Wales Robert Kohn School of Economics University of New South Wales Particle Markov Chain Monte Carlo methods are used to carry out inference in nonlinear and non-gaussian state space models, where the posterior density of the states is approximated using particles. Current approaches usually perform Bayesian inference using a particle Marginal Metropolis-Hastings algorithm, a particle Gibbs sampler, or a particle Metropolis within Gibbs sampler. This paper shows how the three ways of generating variables mentioned above can be combined in a flexible manner to give sampling schemes that converge to a desired target distribution. The advantage of our approach is that the sampling scheme can be tailored to obtain good results for different applications, for example when some parameters and the states are highly correlated. We investigate the properties of this flexible sampling scheme, including conditions for uniform convergence to the posterior. We illustrate our methods with a factor stochastic volatility state space model where one group of parameters can be generated in a straightforward manner in a particle Gibbs step by conditioning on the states, and a second group of parameters are generated without conditioning on the states because of the high dependence between such parameters and the states. Keywords: Diffusion equation; Factor stochastic volatility model; Metropolis-Hastings; Particle Gibbs sampler. 1

1 Introduction Our article deals with statistical inference for both the unobserved states and the parameters in a class of state space models. Its main goal is to give a flexible approach to constructing sampling schemes that converge to the posterior distribution of the states and the parameters. The sampling schemes generate particles as auxiliary variables. This work extends the methods proposed by Andrieu et al. 2010, Lindsten and Schön 2012b, Lindsten et al. 2014 and Olsson and Ryden 2011. Andrieu et al. 2010 introduce two particle Markov chain Monte Carlo MCMC methods for state space models. The first is particle marginal Metropolis-Hastings PMMH, where the parameters are generated with the states integrated out. The second is particle Gibbs PG, which generates the parameters given the states. They show that the augmented density targeted by this algorithm has the joint posterior density of the parameters and states as a marginal density. Andrieu et al. 2010 and Andrieu and Roberts 2009 show that the law of the marginal sequence of parameters and states, sampled using either PG or PMMH, converges to the true posterior as the number of iterations increase. Both particle MCMC methods are the focus of recent research. Olsson and Ryden 2011 and Lindsten and Schön 2012b use backward simulation Godsill et al., 2004 for sampling the state vector, instead of ancestral tracing Kitagawa, 1996. These authors derive the augmented target distribution and find that, in simulation studies, their method performs better than the PMMH or its PG counterpart using ancestral tracing. Chopin and Singh 2015 show that backward simulation is more efficient than ancestral tracing in a PG setting, as well as being robust to the resampling method used, e.g. multinomial, residual or stratified sampling. Lindsten and Schön 2012b extend the PG sampler to a particle Metropolis within Gibbs PMwG sampler to deal with the case where the parameters cannot be generated exactly conditional on the states. Unless stated otherwise, we write PG to denote the PG and PMwG samplers that generate the parameters conditional on the states. In a separate line of research Pitt et al. 2012, Doucet et al. 2015 and Sherlock et al. 2015, show that the best tradeoff between computational cost and integrated autocorrelation time is attained when the variance of the estimated log-likelihood is between about 1.0 and 3.3, the exact value depending on the quality of the proposal in the ideal or infinite particle case. The better the proposal, the smaller is the optimal standard deviation. The correlated PMMH proposed by Deligiannidis et al. 2017 tolerates a much larger variance of the estimated log-likelihood without the Markov chain getting stuck by correlating the random numbers used in constructing the estimators of the likelihood at the current and proposed values of the parameters. Deligiannidis et al. 2017 show that by inducing a high correlation in successive iterations between the random numbers used to construct the estimates of the likelihood, it is necessary to increase the number of particles N in proportion to T k/k+1, where k is the state dimension and T is the number of observations. This suggests that that the computational complexity of correlated PMMH is O T 2k+1 k+1 up to a logarithmic factor compared to OT 2 for the standard PMMH. However, this also means that the correlated PMMH sampler has a distinct advantage over the standard PMMH sampler for 2

low values of the dimension k, but the advantage diminishes with increasing k. For PG or PMwG, it is unclear what role the standard deviation of the estimated likelihood plays in choosing the number of particles. Particle MCMC methods have two attractions. First, essentially the same code can be used for a variety of models. All that is usually necessary is to code up a new version of the observation equation and possibly the state transition equation, as well as code for generating the states. Furthermore, any Metropolis within Gibbs step used in standard MCMC simulation can be used similarly for particle based MCMC. Second, it is relatively straightforward to integrate out the states. However, since particle methods are computationally intensive, it is important to implement them efficiently. Therefore, using only Metropolis- Hastings or Gibbs samplers may be undesirable. We know from the literature on Gaussian and conditionally Gaussian state space models that confining MCMC for state space models to Gibbs sampling or Metropolis-Hastings sampling can result in inefficient or even degenerate sampling. See, for example, Kim et al. 1998 who show for a stochastic volatility model that generating the states conditional on the parameters and the parameters conditional on the states can result in a highly inefficient sampler. See also Carter and Kohn 1996 and Gerlach et al. 2000 who demonstrate using a signal plus noise model that a Gibbs sampler for the states and indicator variables for the structural breaks produces a degenerate sampler. A natural solution is to combine Gibbs and Metropolis-Hastings samplers. A second reason for combining Metropolis-Hastings samplers with Gibbs samplers for particle MCMC is that we may wish to generate as many of the parameters by PG as practical because it is often difficult to obtain good proposals for the parameters using PMMH as the likelihood and its first two derivatives can usually only be estimated. Section 4 discusses this issue more fully. Our work extends the particle MCMC framework introduced in Andrieu et al. 2010, Lindsten and Schön 2012b and Lindsten et al. 2014 to situations where using just PMMH or just PG is impossible or inefficient. We derive a particle sampler on the same augmented space as the PMMH and PG samplers, in which some parameters are sampled conditionally on the states and the remaining parameters are sampled with the states integrated out. We call this a PMMH+PG sampler. We show that the PMMH+PG sampler targets the same augmented density as the PMMH or PG samplers. We provide supplementary material showing that the Markov chain generated by the algorithm is uniformly ergodic, given regularity conditions. It implies that the marginal law of the Markov chain generated by n th iteration of the algorithm converges to the posterior density function geometrically fast, uniformly on its starting value, as n. We use ancestral tracing in the particle Gibbs step to make the presentation accessible and the online supplementary material shows how to modify the methods proposed in the paper to incorporate auxiliary particle filters and backward simulation in the particle Gibbs step. The proofs may also be modified using arguments found in Olsson and Ryden 2011, and the same results hold. We conjecture it is possible to show geometric ergodicity, instead of uniform ergodicity, using arguments similar to those in Propositions 7 and 9 of Andrieu and Vihola 2015, but both the notation and algebra are more involved and it is not the 3

main purpose of this paper. The paper is organized as follows. Section 2 introduces the basic concepts and notation used throughout the paper and introduces the PMMH+PG sampler for estimating a single state space and its associated parameters. Section 3 applies the methods to a factor stochastic volatility model. This is more complex than the sampling scheme introduced in Section 2 because many univariate sampling schemes are involved as part of the overall sampling scheme. Section 4 gives empirical results for the factor stochastic volatility using both simulated and real data. The appendix in the paper presents some more details of the sampling scheme for the factor model introduced in Section 3. The paper has an online supplement which contains some further empirical results as well as technical results. We use the following notation in both the main paper and the online supplement. Equation 1, Section 1, Algorithm 1, and Sampling Scheme 1, etc, refer to the main paper, while Equation S1, Section S1, Algorithm S1, and Sampling Scheme S1, etc, refer to the supplement. 2 The PMMH+PG sampling scheme for state space models This section introduces a sampling scheme that combines PMMH and PG steps for the Bayesian estimation of a state space model. The first three sections give preliminary results and examples and Section 2.4 presents the sampling scheme. The methods and models introduced in this section are used in the more complex model in Section 3. 2.1 State space model Define N as the set of positive integers and let {X t } t N and {Y t } t N denote X -valued and Y-valued stochastic processes, where {X t } t N is a latent Markov process with initial density f θ 1 x and transition density f θ t x x, i.e., X 1 f θ 1 and X t X t 1 = x f θ t x t = 2, 3,.... The latent process {X t } t N is observed only through {Y t } t N, whose value at time t depends on the value of the hidden state at time t, and is distributed according to gt θ y x: Y t X t = x g θ t x t = 1, 2,.... The densities ft θ and gt θ are indexed by a parameter vector θ Θ, where Θ is an open subset of R d θ, and all densities are with respect to suitable dominating measures, denoted as dx and dy. The dominating measures are frequently taken to be the Lebesgue measure if X BR dx and Y BR dy, where BA is the Borel σ-algebra generated by the set A. Usually X = R dx and Y = R dy. We use the colon notation for collections of random variables, i.e., for integers r s, a r:s = a r,..., a s, a r:s t = a r t,..., a s t and for t u, ar:s t:u = ar t:u,..., as t:u. The joint probability 4

density function of x 1:T, y 1:T is p x 1:T, y 1:T θ = f θ 1 x 1 g θ 1y 1 x 1 T ft θ x t x t 1 gt θ y t x t. We define Z 1 θ := py 1 θ and Z t θ := py t y 1:t 1, θ for t 2, so the likelihood is Z 1:T θ = Z 1 θ Z 2 θ... Z T θ. The joint filtering density of X 1:t is t=2 p x 1:t y 1:t, θ = p x 1:t, y 1:t θ. Z 1:t θ The filtering density of θ and X 1:T can also be factorized as px 1:T, θ y 1:T = px 1:T, y 1:T θpθ Z 1:T, where the marginal likelihood Z 1:T = Θ Z 1:T θ pθ dθ = py 1:T. This factorization is used in the particle Markov chain Monte Carlo algorithms. 2.1.1 Examples As a first example we con- Example 1 Univarate Stochastic Volatility Model sider the univariate stochastic volatility SV model with h 1 N µ, τ 2 1 φ 2 y t = exph t /2ɛ t, 1 h t = µ + φ h t 1 µ + η t, 2 and η t N0, τ 2. Equations 1 and 2 are in state space form with x 1:T = h 1:T and the three unknown parameters are µ, φ 1 < φ < 1 and τ 2 > 0. The stochastic volatility model has been extensively used in the financial econometrics literature: see Durbin and Koopman 2012, pp.216-221, who discuss the basic stochastic volatility model and some extensions. Example 2 Univariate Ornstein Uhlenbeck process with closed form transition density This example considers the continuous time Ornstein-Uhlenbeck OU processes introduced by Stein and Stein 1991. y t = exph t /2ɛ t, 3 dh t = αµ h t dt + τdw t, 4 where the latent volatility is driven by a stochastic differential equation and W t is a Wiener process with VardW t = dt. The transition distribution for h t has the closed form Lunde et al., 2015, p. 7. h t h t 1 N µ + exp α h t 1 µ, 5 1 exp 2α τ 2, 5 2α

with h 1 N µ, τ 2. 6 2α Equations 3 and 5 are in state space form with x 1:T = h 1:T and the parameters are α > 0, µ and τ 2 > 0. Example 3 Univariate Ornstein Uhlenbeck process with Euler approximation Our article also studies the performance of our sampling methods obtained by the Euler approximation scheme of the stochastic differential equation 4 because there are many models for log-volatility that do not have closed form transition densities but where we can apply our sampling methods. The Euler scheme approximates the evolution of the log-volatilities h t by placing M 1 latent points between times t and t + 1. These intermediate points are evenly spaced in time. The intermediate volatility components are denoted by h t,1,..., h t,m 1. It will be convenient in the equation below to set h t,0 = h t and h t,m = h t+1. The equation for the Euler evolution, starting at h t,0 is see for example, Stramer and Bognar 2011, pg. 234 h t,j h t,j 1 N h t,j 1 + α µ h t,j 1 δ, τ 2 δ, 7 for j = 1,..., M, where δ = 1/M. In our article, when we consider the Euler approach to the OU process, we will take 7 as the true transition equation. Thus, in this scenario we have a state space model with observation equation 3 at the time points t = 1,..., T, and missing observations at the time points t + jδ for j = 1,..., M 1 and t = 1,..., T 1. It will be convenient in the developments below to express the Euler based SV model in the following state space form which has no missing observations. Define x 1 = h 1 and x t = h t, h t 1,M 1,..., h t 1,1 T, for t = 2,..., T. We then have observation densities given by Equation 3, initial state distribution given by Equation 6, and the state transition densities are given by f θ t x t x t 1 = M ft 1,jh θ t 1,j h t 1,j 1 t = 2,..., T, 8 j=1 where the densities f θ t,jh t,j h t,j 1 for j = 1,..., M and t = 1,..., T 1 are defined by Equation 7. 2.2 Target distribution for state space models We first approximate the joint filtering densities {px t y 1:t, θ : t = 1, 2,... } sequentially, using particles, i.e., weighted samples, x 1:N t, w t 1:N, drawn from auxiliary distributions m θ t. This requires specifying importance densities m θ 1x 1 := m 1 x 1 Y 1 = y 1, θ and m θ t x t x t 1 := m t x t X t 1 = x t 1, Y 1:t = y 1:t, θ, and a resampling scheme Ma 1:N t 1 w t 1, 1:N where each a i t 1 = k indexes a particle in x 1:N t 1, w t 1, 1:N and is sampled with probability w t 1. k We refer 6

to Doucet et al. 2000, Van Der Merwe et al. 2001, and Guo et al. 2005 for the choice of importance densities and Douc and Cappé 2005 for a comparison between resampling schemes. Unless stated otherwise, upper case letters indicate random variables and lower case letters indicate the corresponding values of these random variables, e.g., A j t and a j t, X t and x t. We denote the vector of particles by U 1:T := X1 1:N,..., XT 1:N, A 1:N 1,..., A 1:N T 1 where a j t is the value of the random variable A j t and its sample space by U := X T N N T 1N. The Sequential Monte Carlo SMC algorithm used here is the same one as in Section 4.1 of Andrieu et al. 2010, and is defined in Section S1 and Algorithm S1 in the supplementary material. The algorithm provides an unbiased estimate of the likelihood T N Ẑ 1:T θ = Zu 1:T, θ = N 1, where w i 1 = f θ 1 x i 1g θ 1y 1 x i 1 m θ 1x i 1 t=1, w i t = gθ t y t x i tf θ t x i t x ai t 1 t 1 m θ t x i t x ai t 1 t 1 i=1 t=2 i=1 w i t for t = 2,...,, T, and w i t = i=1 m θ t wt i N. j=1 wj t The joint distribution of the particles given the parameters is { N T N } ψ u 1:T θ := m θ 1 x i 1 Ma 1:N t 1 w t 1 1:N x i t x ai t 1 t 1. 9 The key idea of particle MCMC methods is to construct a target distribution on an augmented space that includes the particles U 1:T and has a marginal distribution equal to px 1:T, θ y 1:T. This section describes the target distribution from Andrieu et al. 2010. Later sections describe current MCMC methods to sample from this distribution and hence sample from px 1:T, θ y 1:T. Section S3 of the supplementary material describes other choices of target distribution and how it is straightforward to modify our results to apply to these distributions. The simplest way of sampling from the particulate approximation of px 1:T y 1:T, θ is called ancestral tracing. It was introduced in Kitagawa 1996 and used in Andrieu et al. 2010 and consists of sampling one particle from the final particle filter. The method is equivalent to sampling an index J = j with probability w j T, tracing back its ancestral lineage b j 1:T bj T = j and bj t 1 = a bj t t 1 and choosing the particle x j 1:T = xbj 1 1,..., x bj T T. With some abuse of notation, for a vector a t, denote a k t = a 1 t,..., a k 1 t, a k+1 t,..., a N t, with obvious changes for k {1, N}, and denote { } u j 1:T = x bj 1 1,..., x bj T 1 T 1, x j T, a b1 1 1,..., a bj T 1 T 1. 7

It simplifies the notation to sometimes use the following one-to-one transformation { } u 1:T, j x j 1:T, bj 1:T 1, j, u j 1:T, and switch between the two representations and use whichever { is more convenient. } Note that the right hand expression will sometimes be written as x 1:T, b 1:T 1, j, u j 1:T without ambiguity. We now assume Assumptions S1 and S2, which we give in Section S1 of the online supplement. The target distribution from Andrieu et al. 2010 is π N x 1:T, b 1:T 1, j, u j 1:T, θ := px 1:T, θ y 1:T N T ψ u 1:T θ m θ 1 x b 1 T 1 t=2 wabt t 1 t 1 m θ t x bt t 1 t 1 t x abt. 10 Assumption S1 ensures that π N u 1:T θ is absolutely continuous with respect to ψ u 1:T θ so ψ u 1:T θ can be used as a Metropolis-Hasting proposal density for generating from π N u 1:T θ. From Assumption S2, equation 10 has the following marginal distribution π N x 1:T, b 1:T 1, j, θ = px 1:T, θ y 1:T N T, 11 and hence π N x 1:T, θ = px 1:T, θ y 1:T. More detail is given in the online supplement. 2.2.1 Examples continued Equations 9, 10 and 11 apply directly to the univariate stochastic volatility model Example 1 and also the OU stochastic volatility model with a closed form transition equation Example 2. We now show in more detail how these results apply to the OU model with the Euler approximation in Example 3 defined by Equations 3, 6 and 8. We use the proposal densities m θ t x t x t 1 = f θ t x t x t 1 t = 2,..., T which can be generated using Equation 7. With the bootstrap particle filter, the importance weights are w i 1 = g θ t y 1 h i 1, w i t = gθ t y t h i tf θ t x i t x ai t 1 t 1 m θ t x i t x ai t 1 t 1 = g θ t y t h i t i = 1,..., N; t = 2,..., T. 12 The simple form of the weights in Equation 12 is the reason we use the high dimensional vector state space form given in Equation 8 instead of the one dimensional state space form given in Equation 7. 8

2.3 Conditional sequential Monte Carlo CSMC The particle Gibbs algorithm in Andrieu et al. 2010 uses exact conditional distributions to construct a Gibbs sampler. If we use the ancestral tracing augmented distribution given in 10, then this includes the conditional distribution given by π N u j 1:T xj 1:T, bj 1:T 1, j, θ, which involves constructing the particulate approximation conditional on a pre-specified path. The conditional sequential Monte Carlo algorithm, introduced in Andrieu et al. 2010, is a sequential Monte Carlo algorithm in which a particle X1:T J = XBJ 1,..., X BJ T T, and the associated sequence of ancestral indices B1:T J 1 are kept unchanged. In other words, the conditional sequential Monte Carlo algorithm is a procedure that resamples all the particles and indices except for U1:T J = XJ 1:T, AJ 1:T 1 = XBJ 1 1,..., X BJ T T, BJ 1,..., BT J 1. Algorithm S2 of the the supplementary material describes the conditional sequential Monte Carlo algorithm as in Andrieu et al. 2010, consistent with x j 1:T, aj 1:T 1, j. 2.4 Flexible sampling scheme for state space models This sampling scheme is suitable for the state space form given in Section 2.1 where some of the parameters can be generated exactly conditional on the state vectors, but other parameters must be generated using Metropolis-Hasting proposals. Let θ := θ 1,..., θ p be a partition of the parameter vector into p components where each component may be a vector and let 0 p 1 p. Let Θ = Θ 1... Θ p be the corresponding partition of the parameter space. We will use the notation θ i := θ 1,..., θ i 1, θ i+1,..., θ p. The following sampling scheme generates the parameters θ 1,..., θ p1 using PMMH steps and the parameters θ p1 +1,..., θ p using PG steps. We call this a PMMH+PG sampler. To simplify the discussion, we assume that both particle marginal Metropolis-Hastings steps and particle Gibbs steps are used, i.e., 0 < p 1 < p. Sampling Scheme 1 PMMH+PG Sampler Given initial values for U 1:T, J and θ, one iteration of the MCMC involves the following steps. 1. PMMH sampling For i = 1,..., p 1 Step i: a Sample θ i q i,1 U 1:T, J, θ i, θ i. b Sample U 1:T ψ θ i, θ i. c Sample J π N U 1:T, θ i, θ i. d Accept the proposed values θ i, U 1:T and J with probability α i U 1:T, J, θ i ; U 1:T, J, θ i θ i = 1 π N U 1:T, θ i θ i π N U 1:T, θ i θ i q i U 1:T, θ i U 1:T, J, θ i, θ i q i U 1:T, θ i U 1:T, J, θ i, θ i, 13 9

where q i U 1:T, θ i U 1:T, J, θ i, θ i = q i,1 θ i U 1:T, J, θ i, θ i ψu 1:T θ i, θ i. 2. PG sampling For i = p 1 + 1,..., p Step i: a Sample θ i q i X J 1:T, BJ 1:T 1, J, θ i, θ i. b Accept the proposed value θ i with probability α i θi ; θ i X1:T J, B1:T J 1, J, θ i = 1 πn θ i X1:T J, BJ 1:T 1, J, θ i π N θ i X1:T J, BJ 1:T 1, J, θ q iθ i X1:T J, BJ 1:T 1, J, θ i, θ i i q i θ i X1:T J, BJ 1:T 1, J, θ i, θ i. 14 3. Sample U J 1:T π N X1:T J, BJ 1:T 1, J, θ using the conditional sequential Monte Carlo algorithm CSMC discussed in Section 2.3. 4. Sample J π N U 1:T, θ. Note that Parts 2 to 4 are the same as the particle Gibbs sampler described in Andrieu et al. 2010 or the particle Metropolis within Gibbs sampler described in Lindsten and Schön 2012a. Part 1 differs from the particle Marginal Metropolis-Hastings approach discussed in Andrieu et al. 2010 by generating the variable J which selects the trajectory. This is necessary since J is used in Part 2. A major computational cost of the algorithm is generating the particles p 1 times in Part 1 and running the CSMC algorithm in Part 3. Hence there is a computational cost in using the PMMH+PG sampler compared to a particle Gibbs sampler. Similar comments apply to a blocked PMMH sampler. Section S2 of the supplementary material discusses the convergence of Sampling Scheme 1 to its target distribution and Section S4 of the supplementary material illustrates the sampling scheme by applying it to a univariate OU model for daily stock return data. Remark 1 Andrieu et al. 2010 show that π N U 1:T, θ i θ i ψ U 1:T θ i, θ i = ZU 1:T, θpθ i θ i, 15 p y 1:T θ i and hence the Metropolis-Hasting acceptance probability in 13 simplifies to 1 Zθ i, θ i, U 1:T Zθ i, θ i, U 1:T q i,1 θ i U 1:T, J, θ i, θ i pθ i θ i q i,1 θ i U 1:T, J, θ i, θ i pθ i θ i. 16 Equation 16 shows the PMMH steps can be viewed as involving a particulate approximation to an ideal sampler which we use to estimate the likelihood of the model. This version of the PMMH algorithm can also be viewed as a Metropolis-Hastings algorithm using an unbiased estimate of the likelihood. See our discussion in Section 1 on optimally selecting the number of particles in PMMH. 10

Remark 2 Part 1 of the sampling scheme is a good choice for parameters θ i which are highly correlated with the state vector X 1:T. Part 2 of the sampling scheme is a good choice if the parameter θ i is not highly correlated with the states and it is possible to sample exactly from the distribution π N θ i X 1:T, θ i = p θ i X 1:T, y 1:T, θ i or a good approximation is available as a Metropolis-Hastings proposal. See Lindsten and Schön 2012a for more discussion about the particle Metropolis-Hasting proposals in Part 2. 3 Sampling schemes for factor stochastic volatility models 3.1 The factor stochastic volatility model Factor stochastic volatility SV models are a popular approach to jointly model many covarying time series because they are able to capture the common features in the series by a small number of latent factors, see, e.g. Chib et al. 2006 and Kastner et al. 2017. However, estimating high dimensional time-varying factor models can be very challenging because the likelihood involves calculating an integral over a very high dimensional latent state space, and also the number of parameters in the model can be large. Current approaches to estimate these models often employ MCMC samplers, see for example, Chib et al. 2006 and Kastner et al. 2017, but, as we shall show, particle methods can be much more flexible in the range of SV factor models that they can estimate. We consider a factor SV model with the volatilities of the factors following a traditional SV model as in Chib et al. 2006 and Kastner et al. 2017, while the idiosyncratic errors are independent with log volatilities following continuous time Ornstein-Uhlenbeck OU processes Stein and Stein, 1991. The log volatility of an OU process has a closed form state transition density. However, many SV diffusion processes do not, and we also apply our estimation methods when we approximate the OU transition density using an Euler approximation. Our examples show on both simulated and real data that the PMMH + PG sampler works well. Our sampling scheme generates the latent factors in the PG step and then, conditioning on the latent factors, deals with multiple univariate latent state space models. Second, we sample most of the parameters in a PG step, and improve the mixing of a small number of parameters by generating them in a PMMH step. We note that our example is purely illustrative of our methods and our approach can easily handle multiple factors and most types of log-volatilities for the factors and idiosyncratic errors. Suppose that P t is a S 1 vector of daily stock prices and define y t := log P t log P t 1 as the log-return of the stocks. We model y t as a factor SV model y t = βf t + V 1 2 t ɛ t t = 1,..., T, 17 11

where f t is a K 1 vector of latent factors with K S, β is a S K factor loading matrix of unknown parameters. Further details on the restrictions on β are given in Appendix A.1. We model the latent factors as f t N 0, D t and ɛ t N 0, I. The time-varying variance matrices D t and V t depend on unobserved random variables λ t = λ 1t,..., λ Kt and h t = h 1t,..., h St such that D t := diag exp λ 1t,..., exp λ Kt, V t := diag exp h 1t,..., exp h St, and therefore y t f t, h t N y t βf t, V t. stochastic volatility process Each λ kt is assumed to follow an independent λ kt = φ k λ kt 1 + τ fk η kt, k = 1,..., K, 18 with η kt N 0, 1. This process was discussed in Example 1 in Section 2.1.1. Each h st is assumed to follow a Gaussian OU continuous time volatility process dh st = α s µ s h st dt + τ ɛs dw st, for s = 1,..., S, 19 with the W st are independent Brownian motions. This process was discussed in Example 2 in Section 2.1.1. The transition distribution for each h st has the closed form h st h s,t 1 N µ s + exp α s h s,t 1 µ s, 1 exp 2α s τ 2 ɛs 2α s, s = 1,..., S. 20 Our article also studies the performance of the PG and PMMH+PG estimators obtained by the Euler scheme approximation of the state transition density of Equation 19 because there are many log-volatility models that do not have closed form transition densities but where we can apply our sampling methods. The Euler scheme approximates the evolution of log-volatilities h st by placing M 1 latent points between times t and t + 1 to obtain the state transition equations starting at h t,0 as h st,j+1 h st,j N h st,j + α s µ s h st,j δ, τ 2 ɛsδ, 21 for j = 0,..., M 1, where δ = 1/M. See Example 3 in Sections 2.1.1 and 2.2.1 and in particular Equation 7 for notation and details. We denote the parameter vector for the factor stochastic volatility model given by Equations 17, 19 and 18 by ω = β; φ k, τ fk, k = 1,..., K; α s, µ s, τ ɛs, s = 1,..., S. The application of the methods in the paper to the stochastic volatility factor model is given in Section 4. 12

3.2 Target density for the factor SV model Although the factor model outlined in Section 3.1 can be written in the state space form in Section 2.1, it is more efficient to take advantage of extra structure in the model and base the sampling on multiple independent univariate state space models. This section outlines the conditional independence structure and the more complex target density and sampling schemes required for this approach to estimating the posterior distribution of the factor SV model. Further details are given in Appendix A. 3.2.1 Conditional independence in the factor SV model The key to making the estimation of the factor SV model tractable is that given the values of y 1:T, f 1:T, ω and the conditional independence of the innovations of the returns, the factor model in Equation 17 separates into independent components consisting of K univariate SV models for the latent factors and S univariate OU models for the idiosyncratic errors. For k = 1,..., K, we have that f kt λ kt N 0, exp λ kt, 22 with the transition density given in Equation 18. For s = 1,..., S, we have y st f t, h st N β s f t, exp h st, 23 with the exact and approximate transition densities given in Equations 20 and 21. 3.2.2 The closed form density case This section provides an appropriate target density for a factor model with the closed form density given in Equation 20. The target density includes all the random variables produced by K + S univariate particle filters that generate the factor log volatilities λ k,1:t for k = 1,..., K and the idiosyncratic log volatilities h s,1:t for s = 1,..., S, as well as the factors f 1:T and the parameters ω. It will be convenient in the developments below to define θ = f 1:T, ω. To specify the univariate particle filters that generate the factor log volatilities λ k,1:t for k = 1,..., K we use Equations 18 and 22 and to generate the idiosyncratic log volatilities h s,1:t for s = 1,..., S we use Equations 20 and 23. We denote the weighted samples by λ 1:N kt, w 1:N fkt and h 1:N st, wɛst 1:N. We denote the proposal densities by m θ fk1 λ k1, m θ fkt λ kt λ kt 1, m θ ɛs1 h s1 and m θ ɛst h st h st 1 for t = 2,..., T. We denote the resampling schemes by M f a 1:N fk,t 1 wfk,t 1 1:N for k = 1,..., K, where each a i fk,t 1 = j indexes a particle in λ 1:N kt, w 1:N fkt and is chosen with probability w j fkt ; the resampling scheme M ɛ a 1:N ɛst 1 w 1:N ɛst 1 for s = 1,..., S is defined similarly. We denote the vector of particles by U f,1:k,1:t = λ 1:N 1:K,1:T, A 1:N f,1:k,1:t, 24 13

and U ɛ,1:s,1:t = h 1:N 1:S,1:T, A 1:N ɛ,1:s,1:t. 25 The joint distribution of the particles given the parameters is { N T N ψ fk U fk,1:t θ = m θ fk1 λ i k1 M f a 1:N fk,t 1 w 1:N fk,t 1 for k = 1,..., K and ψ ɛs U ɛs,1:t θ = i=1 N i=1 m θ ɛs1 for s = 1,..., S. Next, we define indices J fk t=2 i=1 { T N h i s1 M ɛ a 1:N ɛs,t 1 w 1:N ɛs,t 1 t=2 i=1 m θ fkt m θ ɛst λ i fkt λ ai fk,t 1 fkt 1 h i st h ai ɛs,t 1 s,t 1 } 26 } 27 = j for each k = 1,..., K, then trace back its ances- tral lineage b j fk,1:t b j fk,t = j, bj fk,t 1 = abj fk,t fk,t 1, and select the particle trajectory λ j k,1:t = λ bj fk1 k,1,..., λbj fkt k,t. Similarly, we define indices J ɛs = j for each s = 1,..., S, then trace back its ancestral lineage b j ɛs,1:t b j ɛs,t = j, bj ɛs,t 1 = a bj s,t ɛs,t 1, and select the particle trajectory. h j s,1:t h = bj ɛs1 s,1,..., h bj ɛst s,t The augmented target density of the factor model is defined as π N U f,1:k,1:t, U ɛ,1:s,1:t, J f, J ɛ, θ := π λ J f 1:K,1:T, hj 1:S,1:T ɛ, θ K N T K+S k=1 m θ fk1 S s=1 λ b fk1 k1 ψ fk U fk,1:t θ b T fkt t=2 wa fk,t 1 fk,t 1 mθ fkt m θ ɛs1 h b ɛs1 s1 λ b fkt kt ψ ɛs U ɛs,1:t θ ɛst T t=2 wab ɛst 1 ɛs,t 1m θ ɛst λ a b fkt fk,t 1 k,t 1 h bɛst st. 28 h abɛst ɛs,t 1 s,t 1 The first term in Equation 28 is defined using the joint distribution of the factor SV model using Equation 20 for the selected trajectories and the factors f 1:T conditional on y, ω. The prior for ω is specified by the user. The conditional independence results discussed in Section 3.2.1 show that this distribution factors into separate terms for the K + S univariate state space models given by Equations 18, 20, 22 and 23 as shown in Equation 29 below. π λ J f 1:K,1:T, hj 1:S,1:T ɛ, θ = π θ K k=1 14 π λ J fk k,1:t θ S s=1 π h J ɛs s,1:t θ. 29

3.2.3 Approximating the transition density by an Euler scheme This section provides an appropriate target density for a factor model with the Euler approximation given in Equation 21. We follow the approach in Example 3 in Section 2.1.1 and introduce state vectors for s = 1,..., S defined as x s1 = h s1 and x st = h st, h s,t 1,M 1,..., h s,t 1,1 T, for t = 2,..., T. The state transition densities are given by f θ stx st x s,t 1 = M fs,t 1,jh θ s,t 1,j h s,t 1,j 1 t = 2,..., T, 30 j=1 where the densities f θ s,t,jh s,t,j h s,t,j 1 for j = 1,..., M, t = 1,..., T 1 and s = 1,..., S are defined by Equation 21. We follow the approach in Example 3 in Section 2.2.1 and use the proposal densities m θ ɛstx st x s,t 1 = f θ stx st x s,t 1 t = 2,..., T and s = 1,..., S which can be generated using Equation 21. With these modifications, we use the same construction as Section 3.2.2. The modifications give U ɛ,1:s,1:t = x 1:N 1:S,1:T, A 1:N ɛ,1:s,1:t 31 ψ ɛs U ɛs,1:t θ = N i=1 m θ ɛs1 { T N x i s1 M ɛ a 1:N ɛs,t 1 w 1:N ɛs,t 1 t=2 i=1 m θ ɛst x i st x ai ɛs,t 1 s,t 1 } 32 π N U f,1:k,1:t, U ɛ,1:s,1:t, J f, J ɛ, θ := π λ J f 1:K,1:T, xj 1:S,1:T ɛ, θ K N T K+S k=1 m θ fk1 S s=1 λ b fk1 k1 ψ fk U fk,1:t θ b T fkt t=2 wa fk,t 1 fk,t 1 mθ fkt m θ ɛs1 x b ɛs1 s1 λ b fkt kt ψ ɛs U ɛs,1:t θ ɛst T t=2 wab ɛst 1 ɛs,t 1m θ ɛst λ a b fkt fk,t 1 k,t 1 x bɛst st 33 x abɛst ɛs,t 1 s,t 1 π λ J f 1:K,1:T, xj 1:S,1:T ɛ, θ = π θ K k=1 π λ J fk k,1:t θ S s=1 π x J ɛs s,1:t θ. 34 3.3 PMMH+PG sampling scheme for the factor SV model We illustrate our methods using the P MMH α, τ 2 f, τ 2 ɛ +P G β, f 1:T, φ, µ sampler, which was found to give good performance in the Empirical studies in Section 4. It is straightforward to modify the sampling scheme for other choices of which parameters to sample with a 15

PMMH step and which to sample with a PG step. Our procedure to determine an efficient sampling scheme is to run the PG algorithm first to identify which parameters have large IACT, or, in some cases, require a large amount of computational time to generate in the PG step. We then generate these parameters in the PMMH step. In particular, we can apply this procedure to the univariate SV models described in Section 2.1.1 to identify the parameters that may need to be generated by PMMH by first estimating them by PG. See, for example, our discussion of the univariate OU model in Section S4. In particular, we note that if an Euler approximation is used, then generating any parameter in the OU model is time intensive as we need to determine, store and use the ancestor history of the entire state vector. The sampling schemes for the factor SV model with the closed form transition density given by Equation 20 and the model with the Euler scheme given by Equation 21 have the same structure, so Sampling Scheme 2 is given below in a generic form and the appropriate state space models are used for the different cases, see Sections 3.2.2 and 3.2.3 for details. We have simplified the conditional distributions in Sampling Scheme 2 wherever possible using the conditional independence properties discussed in Section 3.2.1. The Metropolis-Hasting proposal densities for Sampling scheme 2 are given in Section 3.3.1. Sampling Scheme 2 P MMH α, τ 2 f, τ 2 ɛ + P G β, f 1:T, φ, µ Given initial values for U f,1:t, U ɛ,1:t, J f, J ɛ and θ, one iteration of the MCMC involves the following steps. 1. PMMH sampling, a For k = 1,..., K i. Sample τ 2 fk qτ 2 fk U fk,1:t, τ 2 fk, θ \τ 2 fk ii. Sample U fk,1:t ψ fk τ 2 fk, θ \τ 2 fk iii. Sample Jfk U from πn fk,1:t, τ 2 fk, θ \τ 2 fk iv. Accept the proposed values τ 2 fk α U fk,1:t, J fk, τ 2 fk; U fk,1:t, Jfk, τ 2 fk θ \τ 2 fk = Z Ufk,1:T, τ 2 fk, θ \τ 2 fk 1 Z U fk,1:t, τ 2 fk, θ \τ 2 fk, U fk,1:t, and J fk with probability p τ 2 fk p τ 2 fk b For s = 1,..., S, i. Sample α s, τ 2 ɛs q αs,τ U 2 ɛs ɛs,1:t, α s, τ 2 ɛs, θ \αs,τ 2 ii. Sample U ɛs,1:t ψ ɛs αs, τ 2 ɛs, θ \αs,τ 2 ɛs iii. Sample Jɛs from π U N ɛs,1:t, α s, τ 2 ɛs, θ \αs,τ 2 ɛs q τ 2 fk τ 2 fk U fk,1:t, τ 2 fk, θ \τ 2 fk. q τ 2 fk τ 2 fk U fk,1:t, τ 2 fk, θ \τ 2 fk ɛs 16

iv. Accept the proposed values α s, τ 2 ɛs, U ɛs,1:t, and J ɛs with probability 2. PG sampling α U ɛs,1:t, J s, α s, τ 2 ɛs ; U ɛs,1:t, Jɛs, α s, τ 2 ɛs θ\αs,τ ɛs 2 = 1 Z Uɛs,1:T, α s, τ 2 ɛs, θ \αs,τ 2 ɛs p α s, τ 2 ɛs Z U ɛs,1:t, α s, τ 2 ɛs, θ \αs,τ 2 s p αs, τ 2 s q α s,τ 2 αs, τ 2 ɛs ɛs U ɛs,1:t, α s, τ 2 ɛs, θ \αs,τ 2 ɛs. q αs,τ 2 ɛs α s, τ 2 ɛs U ɛs,1:t, α s, τ 2 ɛs, θ \αs,τ 2 ɛs a Sample β λ J f 1:T, hj ɛ 1:T, BJ f f,1:t 1, BJ ɛ ɛ,1:t 1, J f, J ɛ, θ \β, y 1:T using Equation 38 in Appendix A.1. b Redraw the diagonal elements of β through the deep interweaving procedure described in Appendix A.2. This step is necessary to improve the mixing of the factor loading matrix β. c Sample f 1:T λ J f 1:T, hj ɛ 1:T, BJ f f,1:t 1, BJ ɛ ɛ,1:t 1, J f, J ɛ, θ \f 1:T, y 1:T using Equation 39 in Appendix A.3. d For k = 1,..., K i. Sample φ k from the proposal q φk λ J fk k,1:t, θ \φ k and accept with probability 1 π N φ k λ J fk k,1:t, B fk,1:t 1, J fk, θ \φk π N φ k λ J fk k,1:t, B fk,1:t 1, J fk, θ \φk q φk φ k λ J fk k,1:t, θ \φ k q φk φ k λ J fk k,1:t, θ \φ k. ii. Sample U J fk fk,1:t π N λ J fk k,1:t, B fk,1:t 1, J fk, θ using the conditional sequential Monte Carlo algorithm CSMC discussed in Section S2. iii. Sample J fk π N U fk,1:t, θ. e For s = 1,..., S, i. Sample µ s from the proposal q µs h J ɛs s,1:t, θ \µ s and accept with probability 1 πn µ s h Jɛs s,1:t, B ɛs,1:t 1, J ɛs, θ \µs π N µ s h Js s,1:t, B q µs ɛs,1:t 1, J ɛs, θ \µs µs h Jɛs s,1:t, θ \µ s s,1:t, θ \µ s q µs µ s h Jɛs ii. Sample U Jɛs Js ɛs,1:t πn hs,1:t, B ɛs,1:t 1, J ɛs, θ using the conditional sequential Monte Carlo algorithm CSMC discussed in Section 2.3. iii. Sample J ɛs π N U ɛs,1:t, θ. 3.3.1 Proposal densities This section details the proposal densities used in Sampling Scheme 2. 17

For k = 1,..., K, q τ 2 fk is an adaptive random walk. For s = 1,..., S, q αs,τ 2 is an adaptive random walk. ɛs For k = 1,..., K, q φk λ J fk k,1:t, θ \φ k = N c φk, d φk, where c φk = d φ k τ 2 fk T λ kt λ kt 1, and d φk = For s = 1,..., S, q µs h J ɛs s,1:t, θ \µ s = N cµs, d µs, where c µs = d µ s τ 2 ɛs t=2 h s,1 2α s + exp 2α s h s,t 1 exp α s h s,t 1 τ 2 fk T 1 t=2 λ, kt 2α s T h s,t exp α s h s,t + 1 exp 2α s t=2. d µs = 2α s + 2α s 1 exp 2α s τ 2 ɛs T 1 1 2 exp α s + exp 2α s 2, 3.4 The PMMH sampling scheme for the factor stochastic volatility model The PMMH method generates the parameters by integrating out all the latent factors, so that the observation equation is given by y t λ t, h t, ω N 0, βd t β + V t. 35 The state transition equations are given by Equations 18 and either Equation 20 for the closed form case or Equation 30 for the Euler scheme. 4 Empirical Studies This section presents empirical results for the factor SV model described in Section 3 to illustrate the flexibility of the sampling approach given in our article. That is, we show it is desirable to generate parameters that are highly correlated with the states using a PMMH step that does not condition on the states. Conversely, if there is a subset of parameters that is not highly correlated with the states, then it is preferable to generate them using a particle Gibbs step, or a particle Metropolis within Gibbs step, that conditions on the states, especially when the subset is large. A simple example of the methods is given in Section S4 of the supplementary material where Sampling Scheme 1 is applied to a univariate OU model for daily stock return data. 18

4.1 Preliminaries To define our measure of the inefficiency of a sampler that takes computing time into account, we first define the integrated autocorrelation time IACT for a univariate parameter θ, IACT θ := 1 + 2 ρ j,θ 36 where ρ j,θ is the correlation of the iterates of θ in the MCMC after the chain has converged. A large value of IACT for one or more of the parameters indicates that the chain does not mix well. We estimate IACT θ based on M iterates θ [1],..., θ [M] after convergence as ÎACT θ,m = 1 + 2 where ρ j,θ is the estimate of ρ j,θ, L M = min1000, L and L = min j M ρ j,θ < 2/ M because 1/ M is approximately the standard error of the autocorrelation estimates when the series is white noise. Let ÎACT MAX and ÎACT MEAN be the maximum and mean of the estimated IACT values over all the parameters in the model, respectively. Our measure of the inefficiency of a sampler based on ÎACT MAX is the time normalized variance TNV. j=1 L M j=1 ρ j,θ, TNV MAX = ÎACT MAX CT, 37 where CT is the computing time in seconds per iteration; we define the inefficiency of a sampler based on ÎACT MEAN similarly. The relative time normalized variance RTNV shows the TNV relative to the most efficient method. We use the following notation to describe the algorithms used in the examples. The basic samplers, as used in Sampling Schemes 1 or 2, are PMMH and PG. These samplers can be used alone or in combination. For example, PMMHθ means using a PMMH step to sample the parameter vector θ; PMMHθ 1 + PGθ 2 means sampling θ 1 in the PMMH step and θ 2 in the PG step; and PGθ means sampling θ using the PG sampler. In all the examples, the PMMH step uses the bootstrap particle filter to sample the particles and the adaptive random walk in Roberts and Rosenthal 2009 as the proposal density for the parameters. The particle filter and the parameter samplers are implemented in Matlab. We note that efficient Metropolis-Hastings schemes for standard MCMC are possible because both the likelihood and its derivatives can be computed exactly or, in the case of derivatives, by finite differences. This is not the case, however, for PMMH where the likelihood is estimated and its derivatives are not available, even numerically. We therefore use the adaptive random walk in all our examples to generate the parameters as it provides a general and flexible sampling solution that does not require the computation or estimation of the local gradient or hessian. The random walk proposal scales the covariance matrix by the factor 2.38/ d in the non-particle case and 2.56/ d in the particle case Sherlock et al., 19

2015, where d is the number of parameters in the random walk. This means that for large d the scale factor can be very small and the random walk is very inefficient because it moves slowly. Moreover, and importantly, Sherlock et al. 2015 show that in the particle case, the optimal acceptance rate of random walk proposals is 7%, which suggests that it is quite inefficient. Sherlock et al. 2015 derive this acceptance rate assuming that d, which also means that, at least theoretically, the step size is very small. The use of more efficient particle MALA algorithms is discussed by Nemeth et al. 2016 who write Our results show that the behaviour of particle MALA depends crucially on how accurately we can estimate the gradient of the log-posterior. If the error in the estimate of the gradient is not controlled sufficiently well as we increase dimension, then asymptotically there will be no advantage in using particle MALA over a particle MCMC algorithm using a random-walk proposal. However, as noted by Sherlock et al. 2015, it is, in general, even more difficult to obtain accurate estimates of the gradient of the log-posterior than it is to get accurate estimates of the log-posterior. This discussion again suggests why using PG may be preferred to PMMH whenever possible, because it may be easier to obtain better proposals within a PG framework. 4.2 Factor stochastic volatility model 4.2.1 Simulation Study We conducted a simulation study to compare several estimation approaches: PG with ancestral tracing PGAT, PG with backward simulation PGBS, PMMH+PG, and PMMH with exact and approximate transition densities. We simulated data with T = 1, 000, S = 20, and K = 1 from the factor model in Equation 17. We set α s = 0.06, and τ 2 s = 0.1 for all p, φ 1 = 0.98, τ 2 f1 = 0.1 and β s = 0.8 for all s. For every unrestricted element of the factor loading matrix β, we chose independent Gaussian distributions, i.e. β sk N 0, 1, and the priors for the state transition density parameters are α s IG v 02, s 0 2, τ 2 s IG v 02, s 0 2, τ 2 fk IG v 02, s 0 2 where v0 = 10 and s 0 = 1, and φ k U 1, 1. These prior densities cover most possible values in practice. The initial state of λ kt is normally distributed N 0, τ 2 fk 1 φ 2 k for k = 1,..., K. The initial state of h st is also normally distributed N µ s, τ 2 s 2α s for s = 1,..., S. We ran all the sampling schemes for 11, 000 iterations and discarded the initial 1, 000 iterates as warmup. We used M = 10 latent points for the Euler approximations to the state transition densities. The PMMH method uses the observation density in Equation 17, which includes all K + S dimensional latent log-volatilities simultaneously. This becomes a high dimensional 20 dimensional state space model. The performance of the standard PMMH sampler depends critically on the number of particles N used to estimate the likelihood. Pitt et al. 2012 suggest selecting the number of particles N such that the variance of the log of the estimated likelihood is around 1 to obtain an optimal tradeoff between computing time and statistical efficiency. Table 1 gives the variance of the log of estimated likelihood for different number of particles for the PMMH method using the bootstrap filter and shows that even with 5,000 particles, the log of the estimated likelihood still has a large variance 20

and the Markov chain for the PMMH approach could get stuck. We therefore do not report results for the PMMH method as it is computationally very expensive and its TNV would be significantly higher than the PG and PMMH+PG methods. The correlated PMMH proposed by Deligiannidis et al. 2017 correlates the random vectors u and u, used to construct the estimators of the likelihood at the current and proposed values of the parameters θ and θ to reduce the variance of the difference between log Z 1:T θ, u log Z 1:T θ, u appearing in the PMMH acceptance ratio. We set the correlation between the individual elements of u and u to corr u i, u i = 0.999999. We then obtained 1, 000 independent estimates of log Z 1:T θ, u and log Z1:T θ, u at the true value of θ and computed their sample correlation. The sample correlation was 0.06, showing that it is difficult to preserve the correlation in such a high dimensional state space model and that the correlated PMMH Markov chain will still get stuck unless enough particles are used to ensure that the variance of the log of the estimator of the likelihood is close to 1. This is consistent with our discussion in Section 1 of the performance of the correlated PMMH in high dimensions, with the number of particles required by the correlated PMMH being OT 1.95 which is almost the same as that required by the standard PMMH. A second serious drawback with PMMH is that the dimension of the parameter space in the factor SV model is large making it hard to implement PMMH efficiently. In high dimensions it is difficult to obtain good proposals, while the random walk proposal is straightforward to implement but inefficient. Although we can partition the parameter space into multiple small blocks of parameters and use the random walk within each block, this becomes computationally inefficient as it is necessary to run the particle filter many times. Table S3 in Section S5 of the supplement shows the IACT values for the parameters in the factor SV model estimated using three different samplers using the exact transition density, PMMH α, τ 2 ɛ, τ f 2 + PG β, µ, φ, PGAT β, α, τ 2 ɛ, τ 2 f, φ and PGBS β, α, τ 2 ɛ, τ 2 f, φ. All three samplers estimate the factor loading matrix β and µ with comparable IACT values. The PMMH+PG sampler always has lower IACT values than both PG samplers for parameters α,τ 2 ɛ, τ 2 f, and φ. There are some improvements in terms of IACT obtained by using PGBS compared to PGAT. Table 2 summarises the estimation results using the exact transition density and shows that in terms of TNV MAX, the PMMH+PG sampler is 9.25 and 4.19 times better than PGAT and PGBS, respectively, and in terms of TNV MEAN, the PMMH+PG is 2.69 and 2.55 times better than PGAT and PGBS, respectively. Table S4 in Section S5 of the supplement shows the IACT values for all the parameters in the model for the three samplers using the Euler approximation scheme for the transition density. Note that the PMMH+PG samplers with exact and approximate state transition densities have very similar IACT values for all the parameters suggesting that the inefficiency of the PMMH+PG sampler does not deteriorate when the Euler approximation is used. However, both PG samplers, PGAT and PGBS, using the Euler approximation are significantly worse than the PGAT and PGBS samplers with the exact transition density. For example, the IACT of τ 2 4 in PGAT with the exact transition density is 283.23, compared to 977.93 for PGAT with the Euler approximation. Table 3 summarises the estimation results with the Euler approximation of the transition density and shows that in terms of TNV MAX, the 21