Recursive Kernel Density Estimation of the Likelihood for Generalized State-Space Models

Size: px
Start display at page:

Download "Recursive Kernel Density Estimation of the Likelihood for Generalized State-Space Models"

Transcription

1 Recursive Kernel Density Estimation of the Likelihood for Generalized State-Space Models A.E. Brockwell February 28, 2005 Abstract In time series analysis, the family of generalized state-space models is extremely rich. However, their likelihood functions are intractable, except in certain special cases, and this limits the options in analyses. In practice, a study typically (1) uses some kind of approximation to the likelihood function, for instance, one obtained analytically or by making use of the particle filter or related methods, (2) adopts a standard Markov chain Monte Carlo approach to parameter estimation, or (3) sacrifices goodness-of-fit for numerical convenice by choosing an approximating model for which the likelihood can be computed. Each of these approaches has advantages and disadvantages, but since none of them yields a consistent estimate of the likelihood, model selection remains an outstanding problem for the general family. This paper addresses this problem by introducing a recursive estimator of the log-likelihood for the generalized state-space model, which is obtained as a kernel density estimator driven by the iterations of a Markov chain. The estimator is very simple to compute, and is shown to converge almost surely to the exact log-likelihood as the number of iterations of the Markov chain approaches infinity. Keywords: generalized, state-space model, non-gaussian, nonlinear, likelihood, recursive, kernel density, estimator, Markov chain, dynamic model 1 Introduction The family of generalized state-space models (also sometimes referred to as nonlinear dynamic models, and partly discussed, for instance, in Shumway and Stoffer, 2000; Brockwell and Davis, 2002; West and Harrison, 1997) is arguably the richest family of time series models 1

2 considered in the literature. A generalized state-space model consists of two components, a latent process, called the state process, usually assumed to be Markovian, and an observation process, whose elements have conditional distributions given values of corresponding elements of the latent process. Special cases of the model include classical ARIMA models (Box and Jenkins, 1970) (multivariate and non-gaussian varieties included), financial models such as GARCH models (Engle, 1982; Bollerslev, 1986) and stochastic volatility models (Ghysels et al., 1996; Taylor, 1994), a range of nonlinear models used in engineering (a number of interesting examples can be found in Doucet et al., 2001) models for censored time series, time series of counts (such as the neuron-spiking time series considered in Brockwell et al., 2004), and many others. Even for a number of widely-studied special cases, analysis in the literature is carried out either by using approximating models with tractable likelihood functions, or by making use of likelihood approximations. This has at least two drawbacks. One is that resulting parameter estimates (and forecasts based on those parameters) are likely to be biased. The other is that without the likelihood, model comparison and selection is difficult. Of course, in certain cases, the approximations used may be good enough that the exact likelihood is not necessary, but without a means of computing the exact likelihood (or otherwise bounding the approximation error), it is not possible to know the degree to which the approximation error matters. Some of these problems can be avoided by adopting a Bayesian approach to inference. By doing so, it is possible, at least, to estimate posterior distributions of parameters for more complicated special cases, but model comparison remains a serious problem, since effective methods for likelihood calculation have not been developed in this context. (The key problem here is that the likelihood is in fact an integral over all possible values of the latent state process.) Thus, for the sake of (1) allowing likelihood-based analysis of a richer class of time series models than previously considered in the literature, (2) performing model comparison and selection for such models, and (3) assessing the quality of approximations to the likelihood in various special cases, it is desirable to be able to compute the likelihood for the general family of models. In this paper, we combine the techniques of Markov chain simulation and recursive kernel density estimation to obtain an estimator of the log-likelihood for the model. We then show that the estimator converges almost surely to the true log-likelihood, as the number of iterations of a particular Markov chain increases to infinity. (As is usually the case, there is a penalty to pay for the increase in generality of the family of models. In the few special cases where the likelihood can be obtained exactly, the proposed estimator is more computationally demanding than the standard expressions.) The development of the estimator and proof of the consistency result is far less straightforward than it may sound. Standard Markov chain Monte Carlo techniques such as those developed by Carlin et al. (1992) are not useful for this purpose, because they do not yield draws from distributions required to compute the likelihood. Another nice (gradient-based) approach to parameter estimation for this class of models was developed recently by Andrieu and Doucet (2003), but their scheme requires the model to be stationary, and does not provide a consistent estimator of the likelihood for a 2

3 finite-length time series. Therefore a new simulation scheme is developed, which technically is not a Markov chain Monte Carlo scheme, since the limiting distribution is unknown (even to within a constant of proportionality), even though its marginals are known. Furthermore, although consistency results have been established for recursive kernel density estimators based on iid samples (Wolverton and Wagner, 1969; Yamato, 1971; Wegman and Davies, 1979), and based on samples which are identically distributed but dependent (Masry and Györfi, 1987), such results have not appeared for recursive density estimators based on samples from an ergodic Markov chain (although it is worth noting that Yu, 1994, develops results similar to those needed here, but for non-recursive kernel density estimators.) It is also worth discussing the relationship between the approach developed in this paper and particle filtering-based methods. Such methods, developed in their modern form in Kitagawa (1996); Gordon et al. (1993), and discussed in detail in Doucet et al. (2001), are arguably some of the most important recent developments in dealing with the generalized state-space model. They rely on particle, or Monte Carlo, approximations to conditional distributions of latent states, given observations, and in theory, they could also be used along with kernel density estimation to estimate the log-likelihood for a given model. A key difference between that approach and the approach proposed in this paper is the nature of convergence of the estimator. Particle filtering schemes yield conditional distributions converging to the correct distribution as the number of particles increases to infinity. But it is not possible to increase the number of particles without re-running a particle filter, hence a process of sequentially increasing the number of particles until an estimator becomes good enough is very inefficient. In contrast, the approach developed in this paper simply involves repeated scanning through the time series, and (almost sure) convergence occurs as the number of scans increases. The paper is organized as follows. In Section 2, we give a formal definition of the generalized state-space model, and we introduce our estimator of the log-likelihood. In Section 3 we present the relevant convergence results. In Section 4, we give a simple example of the estimator for simulated data coming from a model where the exact log-likelihood can be computed. In Section 5, we discuss additional potential applications of the results in this paper, and in the appendix, we prove the main results. 3

4 2 The Method 2.1 The Model Formally, the generalized state-space model is defined on a probability space (Ω, F, P ). The Markovian state process {X t R p, t = 1, 2,..., T } satisfies P (X 1 A) = f 0 (x 1 )dλ(x 1 ), for all A B p, A P (X t+1 A X t = x) = f t (x t+1 x t )dλ(x t+1 ), for all A B p, (1) A where B p is the Borel σ-field on R p and f t ( ) is a specified conditional probability density function (the transition density of the Markov chain) with respect to a measure λ on (R p, B p ) (usually, but not necessarily taken to be Lebesgue measure). The observed process {Y t R q, t = 1, 2,..., T } satisfies P (Y t A {X t, t Z}, {Y s, s < t}) = g t (y t x t )dν(y t ), for all A B q, (2) where g t ( ) is a conditional probability density with respect to a measure ν on (R q, B q ) (also often taken to be Lebesgue measure), referred to as the observation density. For the sake of computing likelihoods and residuals, one is interested in the conditional densities π t (x t ) of X t, given observations Y 1,..., Y t, for t = 1, 2,..., T, and the conditional one-step predictive densities p t (x t ) of X t, given observations Y 1,..., Y t 1, for t = 1, 2,..., T. These are referred to, respectively, as the filtering densities and predictive densities, and can be obtained recursively by making use of the equalities p t (x t ) = f t 1 (x t x t 1 )π t 1 (x t 1 )dλ(x t 1 ) (3) and A π t (x t ) p t (x t )g t (y t x t ). (4) (One typically starts with p 1 (x 1 ) = f 1 (x 1 ), then uses the two equations above to compute, in sequence, π 1 (x 1 ), p 2 (x 2 ), π 2 (x 2 ),....) The predictive densities are useful, in particular, because the log-likelihood can be expressed as l(y 1,..., y T ) = T log(q t (y t )), (5) t=1 where q t (y t ) = g t (y t x t )p t (x t )dλ(x t ) (6) 4

5 is the one-step predictive density of Y t, given Y 1 = y 1,..., Y t 1 = y t 1. Evaluation of the likelihood is generally difficult, since integrals accumulate in the recursions (3,4). An important exception is the celebrated special case of this model, the linear Gaussian state-space model, where X t+1 N(AX t, Σ) and Y t N(BX t, Λ), for appropriately sized matrices A, B, Σ, and Λ. In this case, assuming that f 0 (x 1 ) is a (multivariate) normal density, all the filtering and predictive densities turn out to be (multivariate) normal, and their means and variances can be determined using the well-known Kalman recursions (see Kalman, 1960). 2.2 The Estimator Suppose that data {Y t = y t, t = 1, 2,..., T } are observed. Roughly speaking, our estimator of the log-likelihood is obtained by generating a Markov chain {Z i R p T, i = 1, 2,...}, then generating draws {W i R q T } conditionally on the values of {Z i }, with the property that as i, the density of the t-th component of W i approaches q t ( ), and finally, feeding the values W i into recursive kernel density estimators. Because {(Z i, W i )} is Markovian, the update at iteration i can be made without knowledge of values {(Z j, W j ), j < (i 1)}, thus the estimator is truly recursive. Note that the procedure we propose to generate the Markov chain is not a Metropolis-Hastings algorithm, even though it contains superficial similarities to one. For convenience, in what follows, we will make the following assumption. Assumption 2.1 The measure ν( ) in (2) is q-dimensional Lebesgue measure, and each density function g t (y t, ) is strictly positive with a finite upper bound. Remark: It is possible to adapt the scheme and results in this paper to handle cases where this assumption does not hold, but for the sake of clarity of exposition, we do not present those results here. The estimator is obtained as follows. Let Z 0 = (Z 0,1,..., Z 0,T ) be a collection of T random vectors (representing an initial guess of the state sequence {X 1,..., X T }), in R p T. Then construct a Markov chain and a sequence using the following procedure. {Z i R p T, i = 1, 2,...}, Z i = (Z i,1,..., Z i,t ) {W i R q T, i = 1, 2,...}, W i = (W i,1,..., W i,t ) 5

6 Procedure 2.2 (Markov transition from Z i 1 to Z i ) For t = 1, 2,..., T, carry out the following steps. 1. If t = 1, draw Q from f 1 (z), otherwise (if t > 1) draw Q from the transition density f t 1 ( Z i,t 1 ). 2. Draw W i,t from the density g t ( Q). 3. Compute α = min (1, g t (y t Q)/g t (y t Z i 1,t )). With probability α, set Z i,t = Q. Otherwise set Z i,t = Z i 1,t. All draws of Q,W i,t, and acceptance decisions (Step 3), over differing values of t and i, are required to be mutually independent, conditioned on the observations used to determine their distributions. The draws W i,t obtained in Procedure 2.2, for large i, can be regarded as (dependent) draws from a distribution with density approaching q t ( ). Hence it would be possible to construct estimators of the one-step predictive densities q t (y t ), using the standard kernel density estimators q (n) t (y t ) = 1 n n i=1 ( ) 1 yt W i,t K, (7) h q n where K( ) is a kernel function and {h i, i = 1, 2,...} is a sequence of bandwidths. (See Scott, 1992, for more details on kernel density estimation.) Both the function K( ) and the bandwidths h i are required to satisfy certain standard conditions (stated in the next section). The drawback is that to compute each of these estimates, one must keep track of the entire set of draws W i,t, i = 1, 2,..., n. Therefore it is much more convenient to use the recursive form of the kernel density estimator. This has the same form as a standard kernel density estimator, but in (7) the bandwidths are indexed by i, rather than n. In other words, we write ˆq (n) t (y t ) = 1 n ( ) 1 yt W i,t K, (8) n This allows the estimator to be expressed in the recursive form ( ) ( ˆq (n) n 1 yt W n,t t (y t ) = n i=1 h q i h n h i ˆq (n 1) t (y t ) + 1 nh q K n h n ). (9) A natural choice for a recursive estimator of the log-likelihood of the generalized state-space model (c.f. (5)) is then T ˆln (y 1,..., y T ) = log(ˆq (n) t (y t )). (10) t=1 6

7 3 Analysis In this section, we state our two main results. We will use terminology as in Meyn and Tweedie (1993), in particular, a Markov chain {X t } taking values in R d will be said to be uniformly ergodic if it has a unique invariant distribution ν (also referred to as a limiting distribution) such that sup P n (x, ) ν T V 0, as n, x R d where P n (x, ) = Pr(X n+1 A X n = x) is the transition kernel of the chain, and T V denotes the total variation norm on signed measures. (Note that uniform ergodicity implies irreducibility and aperiodicity.) The first result says that the simulation procedure yields draws from distributions converging to the filtering distributions. Theorem 3.1 Suppose that for the model (1,2), Assumption 2.1 holds. Suppose also that an arbitrary initial set of states Z 0 is chosen, and that Procedure 2.2 is used to generate a Markov chain {Z i, i = 1, 2,...}. Then {Z i } is a uniformly ergodic Markov chain with a limiting distribution ν on (R (p T ), B (p T ) ), and the marginal distributions of ν have densities ν 1,..., ν T, with respect to λ, given by ν t (x) = π t (x), where π t denotes the conditional density of X t, given Y 1 = y 1,..., Y t = y t. Remark: Although the scheme contains marginal updates which are consistent with Metropolis-Hastings updates, the overall scheme is not a Metropolis-Hastings scheme. The limiting distribution of the chain {Z i } is not known (at least to the author, at the time of writing this paper), only its marginals are known. We next impose standard restrictions on the kernel function K( ) and the bandwidths {h n }. Assumption 3.2 The kernel function K( ) is bounded, integrable with K(u)du = 1, satisfies uk(u)du = 0, and has an integrable radial majorant ψ(x) = sup y x K(y). Furthermore, the sequence of bandwidths {h j } is given by for some h 0 > 0 and 0 < α < 1/q. h n = h 0 n α, (11) 7

8 Remark: It is possible to allow a more general functional form for h n, but the expression in (11) is widely used, and convenient to compute. In fact, it has been shown (see, e.g. Scott, 1992) that under certain assumptions, in the case where q = 1, choosing h 0 to be 1.06 times the standard deviation of the distribution whose density is being estimated, and α = 1/5 is a good choice for the non-recursive form of the estimator. The next result establishes consistency of the estimator as the number of iterations of the Markov chain increases. It relies on an extension of a consistency result of Masry and Györfi (1987), given in the appendix. Theorem 3.3 Suppose that the conditions of Theorem 3.1 are satisfied. Let ˆl n (y 1,..., y T ) be the recursive estimator of the log-likelihood, given by (9,10), with kernel function and bandwidths satisfying Assumption 3.2. Then, for almost all (y 1,..., y T ), lim ˆln (y 1,..., y T ) = l(y 1,..., y T ), n almost surely. Remark: In Theorem 3.3, almost sure convergence does not occur in the usual sense of convergence as the amount of data becomes infinitely large. Rather, it occurs as the number of Markov chain iterations grows. Thus, arbitrarily precise estimates can be obtained for a given model, and given observed data {y 1,..., y T }, simply by allowing the number of iterations of the Markov chain {Z i } to increase. 4 Example As a simple illustration of the procedure, consider a Gaussian first-order autoregressive process with additive Gaussian noise. To be more precise, let the model be given by X t+1 = 0.5X t + Z t, t = 1, 2,..., {Z t } iid N(0, 1) Y t = X t + W t, {W t } iid N(0, 1), with X 1 N(0, 4/3), where the processes {W t } and {Z t } are independent of each other and of X 1. We simulate observations {y 1,..., y 100 } from this model, and then it is a simple matter to compute the exact log-likelihood using the Kalman filter, and the estimates ˆl n (y 1,..., y 100 ) (in both cases, assuming parameters are known). Figure 1 shows the resulting estimate ˆl n (...) as a function of n, with the true log-likelihood shown as a horizontal line. 8

9 Kernel Density Estimator Exact Log Likelihood Log Likelihood Markov Chain Iteration x 10 4 Figure 1: Comparison of the estimator ˆl n (...) as n increases from 1 to , for a simulated time series of length 100. In this case, since the model is linear and Gaussian, the true loglikelihood, shown as the horizontal line around , can be calculated using the Kalman filter. In this particular example, of course, it is far quicker to use the Kalman filter to compute the log-likelihood. However, the Kalman filter is not applicable in the more general case covered in this paper. Notice also that reasonably good estimates of the log-likelihood are obtained in this example after only a few thousand iterations of the Markov chain. 5 Discussion In this paper, we have introduced a Markov chain simulation procedure which yields draws from distributions approaching the filtering and one-step predictive distributions as the number of Markov chain iterations increases, and shown that the draws from the Markov chain can be used in recursive kernel density estimators to obtain consistent estimates of the loglikelihood. A potentially useful by-product of this procedure, not considered in this paper, is sets of residuals. Since the procedure yields consistent estimates of the one-step predictive densities q t ( ), it can also be readily adapted to give estimates of the one-step predictive cumulative density functions Q t (y t ) = y t q t(y)dy. In the one-dimensional case, these can be trivially converted into residuals by the transformation R t = Φ 1 (Q t (y t )), where Φ( ) denotes the 9

10 inverse of the standard normal cumulative distribution function. If the model were indeed correct, then these residuals would be realizations of a set of iid standard normal random variables. Thus checks for model goodness-of-fit could be carried out using standard batteries of tests for iid normal random variables. Another aspect worth further study is asymptotic distributional properties of the estimators. While consistency is nice, it would be useful, if carrying out optimization over parameter space, for instance, to be able to estimate the error in the log-likelihood estimator, as a function of the number of Markov chain iterations. 6 Acknowledgments The author is grateful to Arnaud Doucet, Larry Wasserman and Peter Spirtes for valuable comments and discussion related to various aspects of parts of this work. A Proofs First we give the proof that Procedure 2.2 generates a Markov chain with limiting distribution whose marginal distributions are the filtering distributions. Proof of Theorem 3.1: The proof consists of two main parts In the first part, we show that {Z i } is a uniformly ergodic Markov chain, while in the second we show that the marginal distributions of the limiting distribution match the desired filtering distributions. It will be useful to define, for m = 1, 2,..., T, Z (m) i = (Z i,1,..., Z i,m ) R p m. (Note that Z (T ) i = Z i.) Part 1. Consider the sequence {Z i,m }, for some fixed m {1,..., T }. It is easily verified that the sequence is a Markov chain. Let its transition kernel be denoted by R k m(x, A) = Pr(Z (m) i+k A Z(m) i = x), x R p m, A B p m. Let µ (m) X : Bp m R denote the distribution of the process {X 1,..., X m }. Then for k 1, Rm(x, 1 A) α m (x, z)dµ (m) X (z), (12) z A 10

11 where α m (x, z) is the probability of accepting all proposals from t = 1 to t = m, given by α m (x, z) = m t=1 t=1 ( min 1, g ) t(y t z t ). (13) g t (y t x t ) This expression can be bounded below by a function of z, m ( ) g t (y t z t ) α m (x, z) α m (z) := min 1, > 0. (14) sup x R p g t (y t x) (The fact that α m (z) > 0 follows immediately from Assumption 2.1.) Combining (12) and (13), we see that for every A such that µ (m) X (A) > 0, Rm 1 (x, A) α m+1 (z)dµ (m+1) X (z) := ζ (m) (A) > 0. (15) A It follows directly from inequality (15), since the measure ζ (m) is non-trivial, and doesn t depend on x, that the entire state-space R p T constitutes a small set (see Meyn and Tweedie, 1993, for the definition of a small set) Thus by Theorem (v) of Meyn and Tweedie (1993), the Markov chain is uniformly ergodic. In other words, it has a limiting distribution ν (m), and (see Theorem of Meyn and Tweedie, 1993) for some constants 0 < c m < and 0 < ρ m < 1, sup R k (x, ) ν m ( ) c m ρ k m, (16) x R p T where denotes the total variation norm of a signed measure. Part 2. It remains to be established that the corresponding marginal distributions match the filtering distributions π t. This can be done inductively. First consider {Z (1) i, i = 1, 2,...}. We know from the previous part that this is a uniformly ergodic Markov chain. It is easily checked that it is also a Metropolis-Hastings chain with proposal densities f 1 ( ) and a limiting distribution with density π 1 ( ). Therefore ν 1 has density π 1. Next, suppose that the marginal densities of ν (m) are ν (m) t ( ) = π t ( ), for t = 1, 2,..., m. We know that ν (m+1) exists, and since the first m components of Z (m+1) i are exactly equal to Z (m) i, the first m marginal densities of Z (m+1) i must be ν (m+1) t ( ) = π t ( ), t = 1, 2,..., m. Now suppose that for some i, Z (m+1) i ν (m+1). Then Z i+1,m π m, and is independent of Z i,m+1, so the proposal Q generated for Z i+1,m+1 has density p m+1 ( ) (recall the definition (3)), and is independent of Z i,m+1. The acceptance probability is min(1, g m+1 (y m+1 Q)/g m+1 (y m+1 Z i,m+1 )). This means that Z i+1,m+1 can be regarded as the result of an application of a Metropolis- Hastings transition kernel to the state Z i,m+1, for which the limiting distribution is proportional to p m+1 ( )g m+1 (y m+1 ) π m+1 ( ). 11

12 Thus there is only one density which both Z i,m+1 and Z i+1,m+1 can have (since this hypothetical Metropolis-Hastings transition kernel has a unique invariant distribution), and that must be π m+1 ( ). In other words, the marginal density ν (m+1) m+1 ( ) must be equal to π m+1 ( ). We have just shown that if the marginal densities of ν (m) are π 1,..., π m, then the (m + 1)st marginal density of ν (m+1) must be π m+1. Therefore, by induction, the marginal densities of ν (which is the same as ν (T ) ) must be π 1,..., π T. This completes the proof. Lemma A.1 Suppose that the conditions of Theorem 3.3 hold. Then the sequence {(Z i, W i ), i = 1, 2,...} is a uniformly ergodic Markov chain taking values in R p T +q T, with a limiting distribution for which the marginal distributions of the components corresponding to W i,1,..., W i,t have densities, with respect to Lebesgue measure, equal to the one-step predictive densities q 1,..., q T. Proof: It is easily verified (from inspection of Procedure 2.2) that the distribution of (Z i, W i ), given {(Z j, W j ), j < i}, depends only on Z i 1. Thus {(Z i, W i )} is a Markov chain. Let its transition kernel be denoted by S 1 (x, C) = Pr((Z i, W i ) C (Z i 1, W i 1 ) = x), x R p T +q T, C B p T +q T. p T +q T Consider a point x = (z, w) R and A B p T, B B q T. Then a particular product set C = A B, with where S 1 (x, C) = Pr(Z i A W i B Z i 1 = z) = Pr(W i B Z i = u)p r(z i du Z i 1 = z) = A Pr(W i B Z i = u)rt 1 (z, du) A Pr(W i B Z i = u)dζ T (u) A = g 1:T (v u)d(λ ζ T )(v, u), (17) A B g 1:T (v, u) = T g i (v i u i ). Since g 1:T is a strictly positive function, the last expression in (17) defines a measure on the family of product sets A B. By the extension theorem, there is a unique extension of this measure to B p T +q T. Let this measure be denoted by ξ( ). Furthermore, note that ξ( ) does not depend on x, and ξ(a B) > 0 whenever µ X (A) > 0 and the Lebesgue measure of B is positive. Thus S 1 (x, C) ξ(c) (18) 12 i=1

13 p T +q T for all x, and ξ( ) is a non-trivial measure. It follows that the entire state-space R constitutes a small set for the Markov chain ({(Z i, W i )}, and hence, by Theorem (v) of Meyn and Tweedie (1993), that {(Z i, W i )} is a uniformly ergodic Markov chain. From Theorem 3.1, we know that {Z i } has an invariant distribution ν whose marginal distributions have densities π 1,..., π T. Thus the marginal distributions of the Z-component of the invariant distribution of the Markov chain {Z i, W i } are also π 1,..., π T. Now suppose that for some i, Z i,t 1 has density π t 1 ( ). It follows from equations (3) and (6), and Steps 1 and 2 of Procedure 2.2, that W i,t must have density q t ( ). Thus the marginal distributions of the W -component of the invariant distribution of {Z i, W i } must have densities q 1,..., q T. This completes the proof. Lemma A.2 Suppose that the conditions of Theorem 3.3 hold. Let µ i,t denote the distribution of W i,t, and µ t denote the corresponding marginal distribution of the limiting distribution of the Markov chain {(Z i, W i )}. Then for almost all y R q, we have 1 lim i h q K i ( y u h i ) dµ i,t (u) = q t (y). Proof: For convenience, define K i (x) = h q i K(x/h i ). Write K i (y u)dµ i,t = K i (y u)dµ t (u) + K i (y u)dν i,t (u), where ν i,t = µ i,t µ t. Taking, limits on both sides, and applying Lemma 3.1 of Masry and Györfi (1987) (since µ t has density q t ), we get, for almost all y R q, lim K i (y u)dµ i,t = q t (y) + lim K i (y u)dν i,t (u). (19) i i We can bound the last term in this expression as follows. Let k denote the upper bound for K( ). Then K i (y u)dν i,t (u) = 1 h q K((y u)/h i )dν i,t (u) i 1 h q k dν i,t (u) i = k h q µ i,t µ t, (20) i where denotes the total variation norm. But by Theorem of Meyn and Tweedie (1993), since {(Z i, W i )} is a uniformly ergodic Markov chain, the total variation distance 13

14 in (20) is bounded (c.f. (16)) by a sequence converging geometrically to zero, that is, for some constants c > 0, 0 < ρ < 1, k h q i µ i,t µ t k h q i cρ i = k cρ i 0, as i. (21) h 0 i α Combining (19), (20), and (21), we get the desired result. (Note that for the absolute error term in (21) to converge to zero, we require that the Markov chain converge in total variation norm faster than the bandwidths h i shrink to zero.) The following result can be regarded as an extension of a special case of Theorem 2.1 of Masry and Györfi (1987). Lemma A.3 Suppose that the conditions of Theorem 3.3 hold. Then for t = 1, 2,..., T, ( ) n (1 αq)/2 ] [ˆq (n) (log n)(log 2 n) 1+δ t (y) E ˆq (n) t (y) 0 almost surely, for almost all y R q, for any δ > 0. Proof: By Theorem of Meyn and Tweedie (1993), the sequence {W i,t, i = 1, 2,...} is asymptotically uncorrelated. The desired result then follows directly from an application of Theorem 2.1 of Masry and Györfi (1987) (using, in their notation, u 2 n = [(log n)(log 2 n)] 1 ). The fact that the theorem still applies in our case, where samples are not identically distributed, but instead only have distributions converging geometrically in total variation distance, can be established by replicating the proof given in Masry and Györfi (1987), replacing use of their Lemma 3.1 with our Lemma A.2. We are now in a position to give the proof of the main convergence result. Proof of Theorem 3.3: Furthermore, since ˆq (n) t follows from Lemma A.2 that For fixed t {1,..., T }, by Lemma A.3, ˆq (n) t (y) E[ˆq (n) t (y)] 0, as n, for almost all y R q. (y) = n 1 n i=1 K i(y W i,t ) and µ i,t is the distribution of W i,t, it E ˆq (n) t (y) q t (y). Thus ˆq (n) t (y) must converge almost surely, for almost all y R q, to q t (y), and consequently log(ˆq (n) t (y)) a.s. log(q t (y)), for almost all y R q. The desired result then follows directly upon examination of definitions (5)and (10). 14

15 References C. Andrieu and A. Doucet. Online expectation-maximization type algorithms. In Proc. IEEE ICASSP, T. Bollerslev. Generalized autoregressive conditional heteroskedasticity. Econometrics, 31: , G. Box and G. Jenkins. Time Series Analysis, Forecasting and Control. Holden-Day, A. Brockwell, A. Rojas, and R. Kass. Recursive Bayesian decoding of motor cortical signals by particle filtering. Journal of Neurophysiology, 91: , P.J. Brockwell and R.A. Davis. Introduction to Time Series and Forecasting. Springer, second edition, B.P. Carlin, N.G. Polson, and D.S. Stoffer. A Monte Carlo approach to nonnormal and nonlinear state-space modeling. Journal of the American Statistical Association, 87(418): , ISSN A. Doucet, N. de Freitas, and N. Gordon, editors. Sequential Monte Carlo Methods in Practice. Springer, New York, R.F. Engle. Autoregressive conditional heteroscedasticity with estimates of the variance of uk inflation. Econometrica, 50: , E. Ghysels, A.C. Harvey, and E. Renault. Stochastic volatility. In Handbook of Statistics, volume 14. Elsevier, N.J. Gordon, D.J. Salmond, and A.F.M. Smith. Novel approach to nonlinear/non-gaussian Bayesian state estimation. IEE Proc. F, 140: , R.E. Kalman. A new approach to linear prediction and filtering problems. Journal of Basic Engineering (ASME), 82D:35 45, G. Kitagawa. Monte Carlo filter and smoother for non-gaussian nonlinear state space models. Journal of Computational and Graphical Statistics, 5(1):1 25, E. Masry and L. Györfi. Strong consistency and rates for recursive probability density estimators of stationary processes. Journal of Multivariate Analysis, 22:79 93, S.P. Meyn and R.L. Tweedie. Markov Chains and Stochastic Stability. Springer, D.W. Scott. Multivariate Density Estimation. Wiley, R.H. Shumway and D.S. Stoffer. Time Series Analysis and its Applications. Springer,

16 J.S. Taylor. Modeling stochastic volatility: a review and comparative study. Mathematics of Finance, 4: , E.J. Wegman and H.I. Davies. Remarks on some recursive estimators of a probability density. Annals of Statistics, 7(2): , M. West and J. Harrison. Bayesian Forecasting and Dynamic Models. Springer, New York, second edition, C.T. Wolverton and T.J. Wagner. Asymptotically optimal discriminant functions for pattern classification. IEEE Transactions on Information Theory, IT-15: , H. Yamato. Sequential estimation of a continuous probability density function and mode. Bull. Math. Statistics, 14:1 12, B. Yu. Estimating the L 1 error of kernel estimators for Markov sampler. Technical Report 409, Dept. of Statistics, U.C. Berkeley,

Introduction. log p θ (y k y 1:k 1 ), k=1

Introduction. log p θ (y k y 1:k 1 ), k=1 ESAIM: PROCEEDINGS, September 2007, Vol.19, 115-120 Christophe Andrieu & Dan Crisan, Editors DOI: 10.1051/proc:071915 PARTICLE FILTER-BASED APPROXIMATE MAXIMUM LIKELIHOOD INFERENCE ASYMPTOTICS IN STATE-SPACE

More information

Learning Static Parameters in Stochastic Processes

Learning Static Parameters in Stochastic Processes Learning Static Parameters in Stochastic Processes Bharath Ramsundar December 14, 2012 1 Introduction Consider a Markovian stochastic process X T evolving (perhaps nonlinearly) over time variable T. We

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 7 Approximate

More information

Sequential Monte Carlo Methods for Bayesian Computation

Sequential Monte Carlo Methods for Bayesian Computation Sequential Monte Carlo Methods for Bayesian Computation A. Doucet Kyoto Sept. 2012 A. Doucet (MLSS Sept. 2012) Sept. 2012 1 / 136 Motivating Example 1: Generic Bayesian Model Let X be a vector parameter

More information

Part I State space models

Part I State space models Part I State space models 1 Introduction to state space time series analysis James Durbin Department of Statistics, London School of Economics and Political Science Abstract The paper presents a broad

More information

April 20th, Advanced Topics in Machine Learning California Institute of Technology. Markov Chain Monte Carlo for Machine Learning

April 20th, Advanced Topics in Machine Learning California Institute of Technology. Markov Chain Monte Carlo for Machine Learning for for Advanced Topics in California Institute of Technology April 20th, 2017 1 / 50 Table of Contents for 1 2 3 4 2 / 50 History of methods for Enrico Fermi used to calculate incredibly accurate predictions

More information

Introduction to Machine Learning CMU-10701

Introduction to Machine Learning CMU-10701 Introduction to Machine Learning CMU-10701 Markov Chain Monte Carlo Methods Barnabás Póczos & Aarti Singh Contents Markov Chain Monte Carlo Methods Goal & Motivation Sampling Rejection Importance Markov

More information

On Reparametrization and the Gibbs Sampler

On Reparametrization and the Gibbs Sampler On Reparametrization and the Gibbs Sampler Jorge Carlos Román Department of Mathematics Vanderbilt University James P. Hobert Department of Statistics University of Florida March 2014 Brett Presnell Department

More information

17 : Markov Chain Monte Carlo

17 : Markov Chain Monte Carlo 10-708: Probabilistic Graphical Models, Spring 2015 17 : Markov Chain Monte Carlo Lecturer: Eric P. Xing Scribes: Heran Lin, Bin Deng, Yun Huang 1 Review of Monte Carlo Methods 1.1 Overview Monte Carlo

More information

Introduction to Particle Filters for Data Assimilation

Introduction to Particle Filters for Data Assimilation Introduction to Particle Filters for Data Assimilation Mike Dowd Dept of Mathematics & Statistics (and Dept of Oceanography Dalhousie University, Halifax, Canada STATMOS Summer School in Data Assimila5on,

More information

Faithful couplings of Markov chains: now equals forever

Faithful couplings of Markov chains: now equals forever Faithful couplings of Markov chains: now equals forever by Jeffrey S. Rosenthal* Department of Statistics, University of Toronto, Toronto, Ontario, Canada M5S 1A1 Phone: (416) 978-4594; Internet: jeff@utstat.toronto.edu

More information

Bayesian Methods for Machine Learning

Bayesian Methods for Machine Learning Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),

More information

The Metropolis-Hastings Algorithm. June 8, 2012

The Metropolis-Hastings Algorithm. June 8, 2012 The Metropolis-Hastings Algorithm June 8, 22 The Plan. Understand what a simulated distribution is 2. Understand why the Metropolis-Hastings algorithm works 3. Learn how to apply the Metropolis-Hastings

More information

Diagnostic Test for GARCH Models Based on Absolute Residual Autocorrelations

Diagnostic Test for GARCH Models Based on Absolute Residual Autocorrelations Diagnostic Test for GARCH Models Based on Absolute Residual Autocorrelations Farhat Iqbal Department of Statistics, University of Balochistan Quetta-Pakistan farhatiqb@gmail.com Abstract In this paper

More information

16 : Approximate Inference: Markov Chain Monte Carlo

16 : Approximate Inference: Markov Chain Monte Carlo 10-708: Probabilistic Graphical Models 10-708, Spring 2017 16 : Approximate Inference: Markov Chain Monte Carlo Lecturer: Eric P. Xing Scribes: Yuan Yang, Chao-Ming Yen 1 Introduction As the target distribution

More information

AN EFFICIENT TWO-STAGE SAMPLING METHOD IN PARTICLE FILTER. Qi Cheng and Pascal Bondon. CNRS UMR 8506, Université Paris XI, France.

AN EFFICIENT TWO-STAGE SAMPLING METHOD IN PARTICLE FILTER. Qi Cheng and Pascal Bondon. CNRS UMR 8506, Université Paris XI, France. AN EFFICIENT TWO-STAGE SAMPLING METHOD IN PARTICLE FILTER Qi Cheng and Pascal Bondon CNRS UMR 8506, Université Paris XI, France. August 27, 2011 Abstract We present a modified bootstrap filter to draw

More information

Sensor Fusion: Particle Filter

Sensor Fusion: Particle Filter Sensor Fusion: Particle Filter By: Gordana Stojceska stojcesk@in.tum.de Outline Motivation Applications Fundamentals Tracking People Advantages and disadvantages Summary June 05 JASS '05, St.Petersburg,

More information

Calibration of Stochastic Volatility Models using Particle Markov Chain Monte Carlo Methods

Calibration of Stochastic Volatility Models using Particle Markov Chain Monte Carlo Methods Calibration of Stochastic Volatility Models using Particle Markov Chain Monte Carlo Methods Jonas Hallgren 1 1 Department of Mathematics KTH Royal Institute of Technology Stockholm, Sweden BFS 2012 June

More information

Efficient Monitoring for Planetary Rovers

Efficient Monitoring for Planetary Rovers International Symposium on Artificial Intelligence and Robotics in Space (isairas), May, 2003 Efficient Monitoring for Planetary Rovers Vandi Verma vandi@ri.cmu.edu Geoff Gordon ggordon@cs.cmu.edu Carnegie

More information

Geometric ergodicity of the Bayesian lasso

Geometric ergodicity of the Bayesian lasso Geometric ergodicity of the Bayesian lasso Kshiti Khare and James P. Hobert Department of Statistics University of Florida June 3 Abstract Consider the standard linear model y = X +, where the components

More information

EVALUATING SYMMETRIC INFORMATION GAP BETWEEN DYNAMICAL SYSTEMS USING PARTICLE FILTER

EVALUATING SYMMETRIC INFORMATION GAP BETWEEN DYNAMICAL SYSTEMS USING PARTICLE FILTER EVALUATING SYMMETRIC INFORMATION GAP BETWEEN DYNAMICAL SYSTEMS USING PARTICLE FILTER Zhen Zhen 1, Jun Young Lee 2, and Abdus Saboor 3 1 Mingde College, Guizhou University, China zhenz2000@21cn.com 2 Department

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 11 Project

More information

Variable Resolution Particle Filter

Variable Resolution Particle Filter In Proceedings of the International Joint Conference on Artificial intelligence (IJCAI) August 2003. 1 Variable Resolution Particle Filter Vandi Verma, Sebastian Thrun and Reid Simmons Carnegie Mellon

More information

Consistency of Quasi-Maximum Likelihood Estimators for the Regime-Switching GARCH Models

Consistency of Quasi-Maximum Likelihood Estimators for the Regime-Switching GARCH Models Consistency of Quasi-Maximum Likelihood Estimators for the Regime-Switching GARCH Models Yingfu Xie Research Report Centre of Biostochastics Swedish University of Report 2005:3 Agricultural Sciences ISSN

More information

University of Toronto Department of Statistics

University of Toronto Department of Statistics Norm Comparisons for Data Augmentation by James P. Hobert Department of Statistics University of Florida and Jeffrey S. Rosenthal Department of Statistics University of Toronto Technical Report No. 0704

More information

Robust Backtesting Tests for Value-at-Risk Models

Robust Backtesting Tests for Value-at-Risk Models Robust Backtesting Tests for Value-at-Risk Models Jose Olmo City University London (joint work with Juan Carlos Escanciano, Indiana University) Far East and South Asia Meeting of the Econometric Society

More information

Paul Karapanagiotidis ECO4060

Paul Karapanagiotidis ECO4060 Paul Karapanagiotidis ECO4060 The way forward 1) Motivate why Markov-Chain Monte Carlo (MCMC) is useful for econometric modeling 2) Introduce Markov-Chain Monte Carlo (MCMC) - Metropolis-Hastings (MH)

More information

RESEARCH ARTICLE. Online quantization in nonlinear filtering

RESEARCH ARTICLE. Online quantization in nonlinear filtering Journal of Statistical Computation & Simulation Vol. 00, No. 00, Month 200x, 3 RESEARCH ARTICLE Online quantization in nonlinear filtering A. Feuer and G. C. Goodwin Received 00 Month 200x; in final form

More information

Ergodicity in data assimilation methods

Ergodicity in data assimilation methods Ergodicity in data assimilation methods David Kelly Andy Majda Xin Tong Courant Institute New York University New York NY www.dtbkelly.com April 15, 2016 ETH Zurich David Kelly (CIMS) Data assimilation

More information

Exercises Tutorial at ICASSP 2016 Learning Nonlinear Dynamical Models Using Particle Filters

Exercises Tutorial at ICASSP 2016 Learning Nonlinear Dynamical Models Using Particle Filters Exercises Tutorial at ICASSP 216 Learning Nonlinear Dynamical Models Using Particle Filters Andreas Svensson, Johan Dahlin and Thomas B. Schön March 18, 216 Good luck! 1 [Bootstrap particle filter for

More information

On Markov chain Monte Carlo methods for tall data

On Markov chain Monte Carlo methods for tall data On Markov chain Monte Carlo methods for tall data Remi Bardenet, Arnaud Doucet, Chris Holmes Paper review by: David Carlson October 29, 2016 Introduction Many data sets in machine learning and computational

More information

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 13: SEQUENTIAL DATA

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 13: SEQUENTIAL DATA PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 13: SEQUENTIAL DATA Contents in latter part Linear Dynamical Systems What is different from HMM? Kalman filter Its strength and limitation Particle Filter

More information

The autocorrelation and autocovariance functions - helpful tools in the modelling problem

The autocorrelation and autocovariance functions - helpful tools in the modelling problem The autocorrelation and autocovariance functions - helpful tools in the modelling problem J. Nowicka-Zagrajek A. Wy lomańska Institute of Mathematics and Computer Science Wroc law University of Technology,

More information

ON CONVERGENCE RATES OF GIBBS SAMPLERS FOR UNIFORM DISTRIBUTIONS

ON CONVERGENCE RATES OF GIBBS SAMPLERS FOR UNIFORM DISTRIBUTIONS The Annals of Applied Probability 1998, Vol. 8, No. 4, 1291 1302 ON CONVERGENCE RATES OF GIBBS SAMPLERS FOR UNIFORM DISTRIBUTIONS By Gareth O. Roberts 1 and Jeffrey S. Rosenthal 2 University of Cambridge

More information

Pseudo-marginal MCMC methods for inference in latent variable models

Pseudo-marginal MCMC methods for inference in latent variable models Pseudo-marginal MCMC methods for inference in latent variable models Arnaud Doucet Department of Statistics, Oxford University Joint work with George Deligiannidis (Oxford) & Mike Pitt (Kings) MCQMC, 19/08/2016

More information

Density Propagation for Continuous Temporal Chains Generative and Discriminative Models

Density Propagation for Continuous Temporal Chains Generative and Discriminative Models $ Technical Report, University of Toronto, CSRG-501, October 2004 Density Propagation for Continuous Temporal Chains Generative and Discriminative Models Cristian Sminchisescu and Allan Jepson Department

More information

Asymptotic inference for a nonstationary double ar(1) model

Asymptotic inference for a nonstationary double ar(1) model Asymptotic inference for a nonstationary double ar() model By SHIQING LING and DONG LI Department of Mathematics, Hong Kong University of Science and Technology, Hong Kong maling@ust.hk malidong@ust.hk

More information

The Particle Filter. PD Dr. Rudolph Triebel Computer Vision Group. Machine Learning for Computer Vision

The Particle Filter. PD Dr. Rudolph Triebel Computer Vision Group. Machine Learning for Computer Vision The Particle Filter Non-parametric implementation of Bayes filter Represents the belief (posterior) random state samples. by a set of This representation is approximate. Can represent distributions that

More information

An introduction to Sequential Monte Carlo

An introduction to Sequential Monte Carlo An introduction to Sequential Monte Carlo Thang Bui Jes Frellsen Department of Engineering University of Cambridge Research and Communication Club 6 February 2014 1 Sequential Monte Carlo (SMC) methods

More information

Minicourse on: Markov Chain Monte Carlo: Simulation Techniques in Statistics

Minicourse on: Markov Chain Monte Carlo: Simulation Techniques in Statistics Minicourse on: Markov Chain Monte Carlo: Simulation Techniques in Statistics Eric Slud, Statistics Program Lecture 1: Metropolis-Hastings Algorithm, plus background in Simulation and Markov Chains. Lecture

More information

POMP inference via iterated filtering

POMP inference via iterated filtering POMP inference via iterated filtering Edward Ionides University of Michigan, Department of Statistics Lecture 3 at Wharton Statistics Department Thursday 27th April, 2017 Slides are online at http://dept.stat.lsa.umich.edu/~ionides/talks/upenn

More information

POSTERIOR MODE ESTIMATION FOR NONLINEAR AND NON-GAUSSIAN STATE SPACE MODELS

POSTERIOR MODE ESTIMATION FOR NONLINEAR AND NON-GAUSSIAN STATE SPACE MODELS Statistica Sinica 13(2003), 255-274 POSTERIOR MODE ESTIMATION FOR NONLINEAR AND NON-GAUSSIAN STATE SPACE MODELS Mike K. P. So Hong Kong University of Science and Technology Abstract: In this paper, we

More information

Parameter Estimation for ARCH(1) Models Based on Kalman Filter

Parameter Estimation for ARCH(1) Models Based on Kalman Filter Applied Mathematical Sciences, Vol. 8, 2014, no. 56, 2783-2791 HIKARI Ltd, www.m-hikari.com http://dx.doi.org/10.12988/ams.2014.43164 Parameter Estimation for ARCH(1) Models Based on Kalman Filter Jelloul

More information

Continuous Time Approximations to GARCH(1, 1)-Family Models and Their Limiting Properties

Continuous Time Approximations to GARCH(1, 1)-Family Models and Their Limiting Properties Communications for Statistical Applications and Methods 214, Vol. 21, No. 4, 327 334 DOI: http://dx.doi.org/1.5351/csam.214.21.4.327 Print ISSN 2287-7843 / Online ISSN 2383-4757 Continuous Time Approximations

More information

Bayesian Estimation of DSGE Models 1 Chapter 3: A Crash Course in Bayesian Inference

Bayesian Estimation of DSGE Models 1 Chapter 3: A Crash Course in Bayesian Inference 1 The views expressed in this paper are those of the authors and do not necessarily reflect the views of the Federal Reserve Board of Governors or the Federal Reserve System. Bayesian Estimation of DSGE

More information

A Note on Auxiliary Particle Filters

A Note on Auxiliary Particle Filters A Note on Auxiliary Particle Filters Adam M. Johansen a,, Arnaud Doucet b a Department of Mathematics, University of Bristol, UK b Departments of Statistics & Computer Science, University of British Columbia,

More information

Computational statistics

Computational statistics Computational statistics Markov Chain Monte Carlo methods Thierry Denœux March 2017 Thierry Denœux Computational statistics March 2017 1 / 71 Contents of this chapter When a target density f can be evaluated

More information

Bayesian Monte Carlo Filtering for Stochastic Volatility Models

Bayesian Monte Carlo Filtering for Stochastic Volatility Models Bayesian Monte Carlo Filtering for Stochastic Volatility Models Roberto Casarin CEREMADE University Paris IX (Dauphine) and Dept. of Economics University Ca Foscari, Venice Abstract Modelling of the financial

More information

On the Optimal Scaling of the Modified Metropolis-Hastings algorithm

On the Optimal Scaling of the Modified Metropolis-Hastings algorithm On the Optimal Scaling of the Modified Metropolis-Hastings algorithm K. M. Zuev & J. L. Beck Division of Engineering and Applied Science California Institute of Technology, MC 4-44, Pasadena, CA 925, USA

More information

Session 3A: Markov chain Monte Carlo (MCMC)

Session 3A: Markov chain Monte Carlo (MCMC) Session 3A: Markov chain Monte Carlo (MCMC) John Geweke Bayesian Econometrics and its Applications August 15, 2012 ohn Geweke Bayesian Econometrics and its Session Applications 3A: Markov () chain Monte

More information

Nonlinear Parameter Estimation for State-Space ARCH Models with Missing Observations

Nonlinear Parameter Estimation for State-Space ARCH Models with Missing Observations Nonlinear Parameter Estimation for State-Space ARCH Models with Missing Observations SEBASTIÁN OSSANDÓN Pontificia Universidad Católica de Valparaíso Instituto de Matemáticas Blanco Viel 596, Cerro Barón,

More information

Bayesian time series classification

Bayesian time series classification Bayesian time series classification Peter Sykacek Department of Engineering Science University of Oxford Oxford, OX 3PJ, UK psyk@robots.ox.ac.uk Stephen Roberts Department of Engineering Science University

More information

STA205 Probability: Week 8 R. Wolpert

STA205 Probability: Week 8 R. Wolpert INFINITE COIN-TOSS AND THE LAWS OF LARGE NUMBERS The traditional interpretation of the probability of an event E is its asymptotic frequency: the limit as n of the fraction of n repeated, similar, and

More information

Inferring biological dynamics Iterated filtering (IF)

Inferring biological dynamics Iterated filtering (IF) Inferring biological dynamics 101 3. Iterated filtering (IF) IF originated in 2006 [6]. For plug-and-play likelihood-based inference on POMP models, there are not many alternatives. Directly estimating

More information

Introduction to Machine Learning CMU-10701

Introduction to Machine Learning CMU-10701 Introduction to Machine Learning CMU-10701 Markov Chain Monte Carlo Methods Barnabás Póczos Contents Markov Chain Monte Carlo Methods Sampling Rejection Importance Hastings-Metropolis Gibbs Markov Chains

More information

DAG models and Markov Chain Monte Carlo methods a short overview

DAG models and Markov Chain Monte Carlo methods a short overview DAG models and Markov Chain Monte Carlo methods a short overview Søren Højsgaard Institute of Genetics and Biotechnology University of Aarhus August 18, 2008 Printed: August 18, 2008 File: DAGMC-Lecture.tex

More information

Modeling conditional distributions with mixture models: Theory and Inference

Modeling conditional distributions with mixture models: Theory and Inference Modeling conditional distributions with mixture models: Theory and Inference John Geweke University of Iowa, USA Journal of Applied Econometrics Invited Lecture Università di Venezia Italia June 2, 2005

More information

Stochastic Optimization with Inequality Constraints Using Simultaneous Perturbations and Penalty Functions

Stochastic Optimization with Inequality Constraints Using Simultaneous Perturbations and Penalty Functions International Journal of Control Vol. 00, No. 00, January 2007, 1 10 Stochastic Optimization with Inequality Constraints Using Simultaneous Perturbations and Penalty Functions I-JENG WANG and JAMES C.

More information

Kernel Sequential Monte Carlo

Kernel Sequential Monte Carlo Kernel Sequential Monte Carlo Ingmar Schuster (Paris Dauphine) Heiko Strathmann (University College London) Brooks Paige (Oxford) Dino Sejdinovic (Oxford) * equal contribution April 25, 2016 1 / 37 Section

More information

Sequential Bayesian Inference for Dynamic State Space. Model Parameters

Sequential Bayesian Inference for Dynamic State Space. Model Parameters Sequential Bayesian Inference for Dynamic State Space Model Parameters Arnab Bhattacharya and Simon Wilson 1. INTRODUCTION Dynamic state-space models [24], consisting of a latent Markov process X 0,X 1,...

More information

ECE276A: Sensing & Estimation in Robotics Lecture 10: Gaussian Mixture and Particle Filtering

ECE276A: Sensing & Estimation in Robotics Lecture 10: Gaussian Mixture and Particle Filtering ECE276A: Sensing & Estimation in Robotics Lecture 10: Gaussian Mixture and Particle Filtering Lecturer: Nikolay Atanasov: natanasov@ucsd.edu Teaching Assistants: Siwei Guo: s9guo@eng.ucsd.edu Anwesan Pal:

More information

Generalized Autoregressive Score Models

Generalized Autoregressive Score Models Generalized Autoregressive Score Models by: Drew Creal, Siem Jan Koopman, André Lucas To capture the dynamic behavior of univariate and multivariate time series processes, we can allow parameters to be

More information

The Game of Normal Numbers

The Game of Normal Numbers The Game of Normal Numbers Ehud Lehrer September 4, 2003 Abstract We introduce a two-player game where at each period one player, say, Player 2, chooses a distribution and the other player, Player 1, a

More information

RAO-BLACKWELLISED PARTICLE FILTERS: EXAMPLES OF APPLICATIONS

RAO-BLACKWELLISED PARTICLE FILTERS: EXAMPLES OF APPLICATIONS RAO-BLACKWELLISED PARTICLE FILTERS: EXAMPLES OF APPLICATIONS Frédéric Mustière e-mail: mustiere@site.uottawa.ca Miodrag Bolić e-mail: mbolic@site.uottawa.ca Martin Bouchard e-mail: bouchard@site.uottawa.ca

More information

Online appendix to On the stability of the excess sensitivity of aggregate consumption growth in the US

Online appendix to On the stability of the excess sensitivity of aggregate consumption growth in the US Online appendix to On the stability of the excess sensitivity of aggregate consumption growth in the US Gerdie Everaert 1, Lorenzo Pozzi 2, and Ruben Schoonackers 3 1 Ghent University & SHERPPA 2 Erasmus

More information

Kernel adaptive Sequential Monte Carlo

Kernel adaptive Sequential Monte Carlo Kernel adaptive Sequential Monte Carlo Ingmar Schuster (Paris Dauphine) Heiko Strathmann (University College London) Brooks Paige (Oxford) Dino Sejdinovic (Oxford) December 7, 2015 1 / 36 Section 1 Outline

More information

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008 Gaussian processes Chuong B Do (updated by Honglak Lee) November 22, 2008 Many of the classical machine learning algorithms that we talked about during the first half of this course fit the following pattern:

More information

Local consistency of Markov chain Monte Carlo methods

Local consistency of Markov chain Monte Carlo methods Ann Inst Stat Math (2014) 66:63 74 DOI 10.1007/s10463-013-0403-3 Local consistency of Markov chain Monte Carlo methods Kengo Kamatani Received: 12 January 2012 / Revised: 8 March 2013 / Published online:

More information

GARCH processes probabilistic properties (Part 1)

GARCH processes probabilistic properties (Part 1) GARCH processes probabilistic properties (Part 1) Alexander Lindner Centre of Mathematical Sciences Technical University of Munich D 85747 Garching Germany lindner@ma.tum.de http://www-m1.ma.tum.de/m4/pers/lindner/

More information

MODEL selection is a fundamental data analysis task. It

MODEL selection is a fundamental data analysis task. It IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 47, NO. 10, OCTOBER 1999 2667 Joint Bayesian Model Selection and Estimation of Noisy Sinusoids via Reversible Jump MCMC Christophe Andrieu and Arnaud Doucet

More information

MARKOV CHAIN MONTE CARLO

MARKOV CHAIN MONTE CARLO MARKOV CHAIN MONTE CARLO RYAN WANG Abstract. This paper gives a brief introduction to Markov Chain Monte Carlo methods, which offer a general framework for calculating difficult integrals. We start with

More information

A regeneration proof of the central limit theorem for uniformly ergodic Markov chains

A regeneration proof of the central limit theorem for uniformly ergodic Markov chains A regeneration proof of the central limit theorem for uniformly ergodic Markov chains By AJAY JASRA Department of Mathematics, Imperial College London, SW7 2AZ, London, UK and CHAO YANG Department of Mathematics,

More information

The Unscented Particle Filter

The Unscented Particle Filter The Unscented Particle Filter Rudolph van der Merwe (OGI) Nando de Freitas (UC Bereley) Arnaud Doucet (Cambridge University) Eric Wan (OGI) Outline Optimal Estimation & Filtering Optimal Recursive Bayesian

More information

Analysis of the Gibbs sampler for a model. related to James-Stein estimators. Jeffrey S. Rosenthal*

Analysis of the Gibbs sampler for a model. related to James-Stein estimators. Jeffrey S. Rosenthal* Analysis of the Gibbs sampler for a model related to James-Stein estimators by Jeffrey S. Rosenthal* Department of Statistics University of Toronto Toronto, Ontario Canada M5S 1A1 Phone: 416 978-4594.

More information

A note on Reversible Jump Markov Chain Monte Carlo

A note on Reversible Jump Markov Chain Monte Carlo A note on Reversible Jump Markov Chain Monte Carlo Hedibert Freitas Lopes Graduate School of Business The University of Chicago 5807 South Woodlawn Avenue Chicago, Illinois 60637 February, 1st 2006 1 Introduction

More information

Lecture 4: Dynamic models

Lecture 4: Dynamic models linear s Lecture 4: s Hedibert Freitas Lopes The University of Chicago Booth School of Business 5807 South Woodlawn Avenue, Chicago, IL 60637 http://faculty.chicagobooth.edu/hedibert.lopes hlopes@chicagobooth.edu

More information

Markov Chain Monte Carlo Methods for Stochastic Optimization

Markov Chain Monte Carlo Methods for Stochastic Optimization Markov Chain Monte Carlo Methods for Stochastic Optimization John R. Birge The University of Chicago Booth School of Business Joint work with Nicholas Polson, Chicago Booth. JRBirge U of Toronto, MIE,

More information

Goodness-of-Fit Tests for Time Series Models: A Score-Marked Empirical Process Approach

Goodness-of-Fit Tests for Time Series Models: A Score-Marked Empirical Process Approach Goodness-of-Fit Tests for Time Series Models: A Score-Marked Empirical Process Approach By Shiqing Ling Department of Mathematics Hong Kong University of Science and Technology Let {y t : t = 0, ±1, ±2,

More information

SAMPLING ALGORITHMS. In general. Inference in Bayesian models

SAMPLING ALGORITHMS. In general. Inference in Bayesian models SAMPLING ALGORITHMS SAMPLING ALGORITHMS In general A sampling algorithm is an algorithm that outputs samples x 1, x 2,... from a given distribution P or density p. Sampling algorithms can for example be

More information

Monte Carlo Approximation of Monte Carlo Filters

Monte Carlo Approximation of Monte Carlo Filters Monte Carlo Approximation of Monte Carlo Filters Adam M. Johansen et al. Collaborators Include: Arnaud Doucet, Axel Finke, Anthony Lee, Nick Whiteley 7th January 2014 Context & Outline Filtering in State-Space

More information

Time Series Prediction by Kalman Smoother with Cross-Validated Noise Density

Time Series Prediction by Kalman Smoother with Cross-Validated Noise Density Time Series Prediction by Kalman Smoother with Cross-Validated Noise Density Simo Särkkä E-mail: simo.sarkka@hut.fi Aki Vehtari E-mail: aki.vehtari@hut.fi Jouko Lampinen E-mail: jouko.lampinen@hut.fi Abstract

More information

LECTURE 15 Markov chain Monte Carlo

LECTURE 15 Markov chain Monte Carlo LECTURE 15 Markov chain Monte Carlo There are many settings when posterior computation is a challenge in that one does not have a closed form expression for the posterior distribution. Markov chain Monte

More information

Advanced Computational Methods in Statistics: Lecture 5 Sequential Monte Carlo/Particle Filtering

Advanced Computational Methods in Statistics: Lecture 5 Sequential Monte Carlo/Particle Filtering Advanced Computational Methods in Statistics: Lecture 5 Sequential Monte Carlo/Particle Filtering Axel Gandy Department of Mathematics Imperial College London http://www2.imperial.ac.uk/~agandy London

More information

Particle Filtering Approaches for Dynamic Stochastic Optimization

Particle Filtering Approaches for Dynamic Stochastic Optimization Particle Filtering Approaches for Dynamic Stochastic Optimization John R. Birge The University of Chicago Booth School of Business Joint work with Nicholas Polson, Chicago Booth. JRBirge I-Sim Workshop,

More information

Simultaneous drift conditions for Adaptive Markov Chain Monte Carlo algorithms

Simultaneous drift conditions for Adaptive Markov Chain Monte Carlo algorithms Simultaneous drift conditions for Adaptive Markov Chain Monte Carlo algorithms Yan Bai Feb 2009; Revised Nov 2009 Abstract In the paper, we mainly study ergodicity of adaptive MCMC algorithms. Assume that

More information

A Monte Carlo Sequential Estimation for Point Process Optimum Filtering

A Monte Carlo Sequential Estimation for Point Process Optimum Filtering 2006 International Joint Conference on Neural Networks Sheraton Vancouver Wall Centre Hotel, Vancouver, BC, Canada July 16-21, 2006 A Monte Carlo Sequential Estimation for Point Process Optimum Filtering

More information

Markov Chain Monte Carlo Methods

Markov Chain Monte Carlo Methods Markov Chain Monte Carlo Methods John Geweke University of Iowa, USA 2005 Institute on Computational Economics University of Chicago - Argonne National Laboaratories July 22, 2005 The problem p (θ, ω I)

More information

Bayesian Inference for DSGE Models. Lawrence J. Christiano

Bayesian Inference for DSGE Models. Lawrence J. Christiano Bayesian Inference for DSGE Models Lawrence J. Christiano Outline State space-observer form. convenient for model estimation and many other things. Preliminaries. Probabilities. Maximum Likelihood. Bayesian

More information

Time Series I Time Domain Methods

Time Series I Time Domain Methods Astrostatistics Summer School Penn State University University Park, PA 16802 May 21, 2007 Overview Filtering and the Likelihood Function Time series is the study of data consisting of a sequence of DEPENDENT

More information

The Bayesian Approach to Multi-equation Econometric Model Estimation

The Bayesian Approach to Multi-equation Econometric Model Estimation Journal of Statistical and Econometric Methods, vol.3, no.1, 2014, 85-96 ISSN: 2241-0384 (print), 2241-0376 (online) Scienpress Ltd, 2014 The Bayesian Approach to Multi-equation Econometric Model Estimation

More information

A new iterated filtering algorithm

A new iterated filtering algorithm A new iterated filtering algorithm Edward Ionides University of Michigan, Ann Arbor ionides@umich.edu Statistics and Nonlinear Dynamics in Biology and Medicine Thursday July 31, 2014 Overview 1 Introduction

More information

ECO 513 Fall 2009 C. Sims HIDDEN MARKOV CHAIN MODELS

ECO 513 Fall 2009 C. Sims HIDDEN MARKOV CHAIN MODELS ECO 513 Fall 2009 C. Sims HIDDEN MARKOV CHAIN MODELS 1. THE CLASS OF MODELS y t {y s, s < t} p(y t θ t, {y s, s < t}) θ t = θ(s t ) P[S t = i S t 1 = j] = h ij. 2. WHAT S HANDY ABOUT IT Evaluating the

More information

Nonlinear and/or Non-normal Filtering. Jesús Fernández-Villaverde University of Pennsylvania

Nonlinear and/or Non-normal Filtering. Jesús Fernández-Villaverde University of Pennsylvania Nonlinear and/or Non-normal Filtering Jesús Fernández-Villaverde University of Pennsylvania 1 Motivation Nonlinear and/or non-gaussian filtering, smoothing, and forecasting (NLGF) problems are pervasive

More information

Data assimilation in high dimensions

Data assimilation in high dimensions Data assimilation in high dimensions David Kelly Courant Institute New York University New York NY www.dtbkelly.com February 12, 2015 Graduate seminar, CIMS David Kelly (CIMS) Data assimilation February

More information

July 31, 2009 / Ben Kedem Symposium

July 31, 2009 / Ben Kedem Symposium ing The s ing The Department of Statistics North Carolina State University July 31, 2009 / Ben Kedem Symposium Outline ing The s 1 2 s 3 4 5 Ben Kedem ing The s Ben has made many contributions to time

More information

Bayesian Inference for DSGE Models. Lawrence J. Christiano

Bayesian Inference for DSGE Models. Lawrence J. Christiano Bayesian Inference for DSGE Models Lawrence J. Christiano Outline State space-observer form. convenient for model estimation and many other things. Bayesian inference Bayes rule. Monte Carlo integation.

More information

Practical unbiased Monte Carlo for Uncertainty Quantification

Practical unbiased Monte Carlo for Uncertainty Quantification Practical unbiased Monte Carlo for Uncertainty Quantification Sergios Agapiou Department of Statistics, University of Warwick MiR@W day: Uncertainty in Complex Computer Models, 2nd February 2015, University

More information

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo Group Prof. Daniel Cremers 10a. Markov Chain Monte Carlo Markov Chain Monte Carlo In high-dimensional spaces, rejection sampling and importance sampling are very inefficient An alternative is Markov Chain

More information

Towards inference for skewed alpha stable Levy processes

Towards inference for skewed alpha stable Levy processes Towards inference for skewed alpha stable Levy processes Simon Godsill and Tatjana Lemke Signal Processing and Communications Lab. University of Cambridge www-sigproc.eng.cam.ac.uk/~sjg Overview Motivation

More information

Pattern Recognition and Machine Learning

Pattern Recognition and Machine Learning Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability

More information