Recursive Kernel Density Estimation of the Likelihood for Generalized State-Space Models
|
|
- Quentin Dustin Hall
- 5 years ago
- Views:
Transcription
1 Recursive Kernel Density Estimation of the Likelihood for Generalized State-Space Models A.E. Brockwell February 28, 2005 Abstract In time series analysis, the family of generalized state-space models is extremely rich. However, their likelihood functions are intractable, except in certain special cases, and this limits the options in analyses. In practice, a study typically (1) uses some kind of approximation to the likelihood function, for instance, one obtained analytically or by making use of the particle filter or related methods, (2) adopts a standard Markov chain Monte Carlo approach to parameter estimation, or (3) sacrifices goodness-of-fit for numerical convenice by choosing an approximating model for which the likelihood can be computed. Each of these approaches has advantages and disadvantages, but since none of them yields a consistent estimate of the likelihood, model selection remains an outstanding problem for the general family. This paper addresses this problem by introducing a recursive estimator of the log-likelihood for the generalized state-space model, which is obtained as a kernel density estimator driven by the iterations of a Markov chain. The estimator is very simple to compute, and is shown to converge almost surely to the exact log-likelihood as the number of iterations of the Markov chain approaches infinity. Keywords: generalized, state-space model, non-gaussian, nonlinear, likelihood, recursive, kernel density, estimator, Markov chain, dynamic model 1 Introduction The family of generalized state-space models (also sometimes referred to as nonlinear dynamic models, and partly discussed, for instance, in Shumway and Stoffer, 2000; Brockwell and Davis, 2002; West and Harrison, 1997) is arguably the richest family of time series models 1
2 considered in the literature. A generalized state-space model consists of two components, a latent process, called the state process, usually assumed to be Markovian, and an observation process, whose elements have conditional distributions given values of corresponding elements of the latent process. Special cases of the model include classical ARIMA models (Box and Jenkins, 1970) (multivariate and non-gaussian varieties included), financial models such as GARCH models (Engle, 1982; Bollerslev, 1986) and stochastic volatility models (Ghysels et al., 1996; Taylor, 1994), a range of nonlinear models used in engineering (a number of interesting examples can be found in Doucet et al., 2001) models for censored time series, time series of counts (such as the neuron-spiking time series considered in Brockwell et al., 2004), and many others. Even for a number of widely-studied special cases, analysis in the literature is carried out either by using approximating models with tractable likelihood functions, or by making use of likelihood approximations. This has at least two drawbacks. One is that resulting parameter estimates (and forecasts based on those parameters) are likely to be biased. The other is that without the likelihood, model comparison and selection is difficult. Of course, in certain cases, the approximations used may be good enough that the exact likelihood is not necessary, but without a means of computing the exact likelihood (or otherwise bounding the approximation error), it is not possible to know the degree to which the approximation error matters. Some of these problems can be avoided by adopting a Bayesian approach to inference. By doing so, it is possible, at least, to estimate posterior distributions of parameters for more complicated special cases, but model comparison remains a serious problem, since effective methods for likelihood calculation have not been developed in this context. (The key problem here is that the likelihood is in fact an integral over all possible values of the latent state process.) Thus, for the sake of (1) allowing likelihood-based analysis of a richer class of time series models than previously considered in the literature, (2) performing model comparison and selection for such models, and (3) assessing the quality of approximations to the likelihood in various special cases, it is desirable to be able to compute the likelihood for the general family of models. In this paper, we combine the techniques of Markov chain simulation and recursive kernel density estimation to obtain an estimator of the log-likelihood for the model. We then show that the estimator converges almost surely to the true log-likelihood, as the number of iterations of a particular Markov chain increases to infinity. (As is usually the case, there is a penalty to pay for the increase in generality of the family of models. In the few special cases where the likelihood can be obtained exactly, the proposed estimator is more computationally demanding than the standard expressions.) The development of the estimator and proof of the consistency result is far less straightforward than it may sound. Standard Markov chain Monte Carlo techniques such as those developed by Carlin et al. (1992) are not useful for this purpose, because they do not yield draws from distributions required to compute the likelihood. Another nice (gradient-based) approach to parameter estimation for this class of models was developed recently by Andrieu and Doucet (2003), but their scheme requires the model to be stationary, and does not provide a consistent estimator of the likelihood for a 2
3 finite-length time series. Therefore a new simulation scheme is developed, which technically is not a Markov chain Monte Carlo scheme, since the limiting distribution is unknown (even to within a constant of proportionality), even though its marginals are known. Furthermore, although consistency results have been established for recursive kernel density estimators based on iid samples (Wolverton and Wagner, 1969; Yamato, 1971; Wegman and Davies, 1979), and based on samples which are identically distributed but dependent (Masry and Györfi, 1987), such results have not appeared for recursive density estimators based on samples from an ergodic Markov chain (although it is worth noting that Yu, 1994, develops results similar to those needed here, but for non-recursive kernel density estimators.) It is also worth discussing the relationship between the approach developed in this paper and particle filtering-based methods. Such methods, developed in their modern form in Kitagawa (1996); Gordon et al. (1993), and discussed in detail in Doucet et al. (2001), are arguably some of the most important recent developments in dealing with the generalized state-space model. They rely on particle, or Monte Carlo, approximations to conditional distributions of latent states, given observations, and in theory, they could also be used along with kernel density estimation to estimate the log-likelihood for a given model. A key difference between that approach and the approach proposed in this paper is the nature of convergence of the estimator. Particle filtering schemes yield conditional distributions converging to the correct distribution as the number of particles increases to infinity. But it is not possible to increase the number of particles without re-running a particle filter, hence a process of sequentially increasing the number of particles until an estimator becomes good enough is very inefficient. In contrast, the approach developed in this paper simply involves repeated scanning through the time series, and (almost sure) convergence occurs as the number of scans increases. The paper is organized as follows. In Section 2, we give a formal definition of the generalized state-space model, and we introduce our estimator of the log-likelihood. In Section 3 we present the relevant convergence results. In Section 4, we give a simple example of the estimator for simulated data coming from a model where the exact log-likelihood can be computed. In Section 5, we discuss additional potential applications of the results in this paper, and in the appendix, we prove the main results. 3
4 2 The Method 2.1 The Model Formally, the generalized state-space model is defined on a probability space (Ω, F, P ). The Markovian state process {X t R p, t = 1, 2,..., T } satisfies P (X 1 A) = f 0 (x 1 )dλ(x 1 ), for all A B p, A P (X t+1 A X t = x) = f t (x t+1 x t )dλ(x t+1 ), for all A B p, (1) A where B p is the Borel σ-field on R p and f t ( ) is a specified conditional probability density function (the transition density of the Markov chain) with respect to a measure λ on (R p, B p ) (usually, but not necessarily taken to be Lebesgue measure). The observed process {Y t R q, t = 1, 2,..., T } satisfies P (Y t A {X t, t Z}, {Y s, s < t}) = g t (y t x t )dν(y t ), for all A B q, (2) where g t ( ) is a conditional probability density with respect to a measure ν on (R q, B q ) (also often taken to be Lebesgue measure), referred to as the observation density. For the sake of computing likelihoods and residuals, one is interested in the conditional densities π t (x t ) of X t, given observations Y 1,..., Y t, for t = 1, 2,..., T, and the conditional one-step predictive densities p t (x t ) of X t, given observations Y 1,..., Y t 1, for t = 1, 2,..., T. These are referred to, respectively, as the filtering densities and predictive densities, and can be obtained recursively by making use of the equalities p t (x t ) = f t 1 (x t x t 1 )π t 1 (x t 1 )dλ(x t 1 ) (3) and A π t (x t ) p t (x t )g t (y t x t ). (4) (One typically starts with p 1 (x 1 ) = f 1 (x 1 ), then uses the two equations above to compute, in sequence, π 1 (x 1 ), p 2 (x 2 ), π 2 (x 2 ),....) The predictive densities are useful, in particular, because the log-likelihood can be expressed as l(y 1,..., y T ) = T log(q t (y t )), (5) t=1 where q t (y t ) = g t (y t x t )p t (x t )dλ(x t ) (6) 4
5 is the one-step predictive density of Y t, given Y 1 = y 1,..., Y t 1 = y t 1. Evaluation of the likelihood is generally difficult, since integrals accumulate in the recursions (3,4). An important exception is the celebrated special case of this model, the linear Gaussian state-space model, where X t+1 N(AX t, Σ) and Y t N(BX t, Λ), for appropriately sized matrices A, B, Σ, and Λ. In this case, assuming that f 0 (x 1 ) is a (multivariate) normal density, all the filtering and predictive densities turn out to be (multivariate) normal, and their means and variances can be determined using the well-known Kalman recursions (see Kalman, 1960). 2.2 The Estimator Suppose that data {Y t = y t, t = 1, 2,..., T } are observed. Roughly speaking, our estimator of the log-likelihood is obtained by generating a Markov chain {Z i R p T, i = 1, 2,...}, then generating draws {W i R q T } conditionally on the values of {Z i }, with the property that as i, the density of the t-th component of W i approaches q t ( ), and finally, feeding the values W i into recursive kernel density estimators. Because {(Z i, W i )} is Markovian, the update at iteration i can be made without knowledge of values {(Z j, W j ), j < (i 1)}, thus the estimator is truly recursive. Note that the procedure we propose to generate the Markov chain is not a Metropolis-Hastings algorithm, even though it contains superficial similarities to one. For convenience, in what follows, we will make the following assumption. Assumption 2.1 The measure ν( ) in (2) is q-dimensional Lebesgue measure, and each density function g t (y t, ) is strictly positive with a finite upper bound. Remark: It is possible to adapt the scheme and results in this paper to handle cases where this assumption does not hold, but for the sake of clarity of exposition, we do not present those results here. The estimator is obtained as follows. Let Z 0 = (Z 0,1,..., Z 0,T ) be a collection of T random vectors (representing an initial guess of the state sequence {X 1,..., X T }), in R p T. Then construct a Markov chain and a sequence using the following procedure. {Z i R p T, i = 1, 2,...}, Z i = (Z i,1,..., Z i,t ) {W i R q T, i = 1, 2,...}, W i = (W i,1,..., W i,t ) 5
6 Procedure 2.2 (Markov transition from Z i 1 to Z i ) For t = 1, 2,..., T, carry out the following steps. 1. If t = 1, draw Q from f 1 (z), otherwise (if t > 1) draw Q from the transition density f t 1 ( Z i,t 1 ). 2. Draw W i,t from the density g t ( Q). 3. Compute α = min (1, g t (y t Q)/g t (y t Z i 1,t )). With probability α, set Z i,t = Q. Otherwise set Z i,t = Z i 1,t. All draws of Q,W i,t, and acceptance decisions (Step 3), over differing values of t and i, are required to be mutually independent, conditioned on the observations used to determine their distributions. The draws W i,t obtained in Procedure 2.2, for large i, can be regarded as (dependent) draws from a distribution with density approaching q t ( ). Hence it would be possible to construct estimators of the one-step predictive densities q t (y t ), using the standard kernel density estimators q (n) t (y t ) = 1 n n i=1 ( ) 1 yt W i,t K, (7) h q n where K( ) is a kernel function and {h i, i = 1, 2,...} is a sequence of bandwidths. (See Scott, 1992, for more details on kernel density estimation.) Both the function K( ) and the bandwidths h i are required to satisfy certain standard conditions (stated in the next section). The drawback is that to compute each of these estimates, one must keep track of the entire set of draws W i,t, i = 1, 2,..., n. Therefore it is much more convenient to use the recursive form of the kernel density estimator. This has the same form as a standard kernel density estimator, but in (7) the bandwidths are indexed by i, rather than n. In other words, we write ˆq (n) t (y t ) = 1 n ( ) 1 yt W i,t K, (8) n This allows the estimator to be expressed in the recursive form ( ) ( ˆq (n) n 1 yt W n,t t (y t ) = n i=1 h q i h n h i ˆq (n 1) t (y t ) + 1 nh q K n h n ). (9) A natural choice for a recursive estimator of the log-likelihood of the generalized state-space model (c.f. (5)) is then T ˆln (y 1,..., y T ) = log(ˆq (n) t (y t )). (10) t=1 6
7 3 Analysis In this section, we state our two main results. We will use terminology as in Meyn and Tweedie (1993), in particular, a Markov chain {X t } taking values in R d will be said to be uniformly ergodic if it has a unique invariant distribution ν (also referred to as a limiting distribution) such that sup P n (x, ) ν T V 0, as n, x R d where P n (x, ) = Pr(X n+1 A X n = x) is the transition kernel of the chain, and T V denotes the total variation norm on signed measures. (Note that uniform ergodicity implies irreducibility and aperiodicity.) The first result says that the simulation procedure yields draws from distributions converging to the filtering distributions. Theorem 3.1 Suppose that for the model (1,2), Assumption 2.1 holds. Suppose also that an arbitrary initial set of states Z 0 is chosen, and that Procedure 2.2 is used to generate a Markov chain {Z i, i = 1, 2,...}. Then {Z i } is a uniformly ergodic Markov chain with a limiting distribution ν on (R (p T ), B (p T ) ), and the marginal distributions of ν have densities ν 1,..., ν T, with respect to λ, given by ν t (x) = π t (x), where π t denotes the conditional density of X t, given Y 1 = y 1,..., Y t = y t. Remark: Although the scheme contains marginal updates which are consistent with Metropolis-Hastings updates, the overall scheme is not a Metropolis-Hastings scheme. The limiting distribution of the chain {Z i } is not known (at least to the author, at the time of writing this paper), only its marginals are known. We next impose standard restrictions on the kernel function K( ) and the bandwidths {h n }. Assumption 3.2 The kernel function K( ) is bounded, integrable with K(u)du = 1, satisfies uk(u)du = 0, and has an integrable radial majorant ψ(x) = sup y x K(y). Furthermore, the sequence of bandwidths {h j } is given by for some h 0 > 0 and 0 < α < 1/q. h n = h 0 n α, (11) 7
8 Remark: It is possible to allow a more general functional form for h n, but the expression in (11) is widely used, and convenient to compute. In fact, it has been shown (see, e.g. Scott, 1992) that under certain assumptions, in the case where q = 1, choosing h 0 to be 1.06 times the standard deviation of the distribution whose density is being estimated, and α = 1/5 is a good choice for the non-recursive form of the estimator. The next result establishes consistency of the estimator as the number of iterations of the Markov chain increases. It relies on an extension of a consistency result of Masry and Györfi (1987), given in the appendix. Theorem 3.3 Suppose that the conditions of Theorem 3.1 are satisfied. Let ˆl n (y 1,..., y T ) be the recursive estimator of the log-likelihood, given by (9,10), with kernel function and bandwidths satisfying Assumption 3.2. Then, for almost all (y 1,..., y T ), lim ˆln (y 1,..., y T ) = l(y 1,..., y T ), n almost surely. Remark: In Theorem 3.3, almost sure convergence does not occur in the usual sense of convergence as the amount of data becomes infinitely large. Rather, it occurs as the number of Markov chain iterations grows. Thus, arbitrarily precise estimates can be obtained for a given model, and given observed data {y 1,..., y T }, simply by allowing the number of iterations of the Markov chain {Z i } to increase. 4 Example As a simple illustration of the procedure, consider a Gaussian first-order autoregressive process with additive Gaussian noise. To be more precise, let the model be given by X t+1 = 0.5X t + Z t, t = 1, 2,..., {Z t } iid N(0, 1) Y t = X t + W t, {W t } iid N(0, 1), with X 1 N(0, 4/3), where the processes {W t } and {Z t } are independent of each other and of X 1. We simulate observations {y 1,..., y 100 } from this model, and then it is a simple matter to compute the exact log-likelihood using the Kalman filter, and the estimates ˆl n (y 1,..., y 100 ) (in both cases, assuming parameters are known). Figure 1 shows the resulting estimate ˆl n (...) as a function of n, with the true log-likelihood shown as a horizontal line. 8
9 Kernel Density Estimator Exact Log Likelihood Log Likelihood Markov Chain Iteration x 10 4 Figure 1: Comparison of the estimator ˆl n (...) as n increases from 1 to , for a simulated time series of length 100. In this case, since the model is linear and Gaussian, the true loglikelihood, shown as the horizontal line around , can be calculated using the Kalman filter. In this particular example, of course, it is far quicker to use the Kalman filter to compute the log-likelihood. However, the Kalman filter is not applicable in the more general case covered in this paper. Notice also that reasonably good estimates of the log-likelihood are obtained in this example after only a few thousand iterations of the Markov chain. 5 Discussion In this paper, we have introduced a Markov chain simulation procedure which yields draws from distributions approaching the filtering and one-step predictive distributions as the number of Markov chain iterations increases, and shown that the draws from the Markov chain can be used in recursive kernel density estimators to obtain consistent estimates of the loglikelihood. A potentially useful by-product of this procedure, not considered in this paper, is sets of residuals. Since the procedure yields consistent estimates of the one-step predictive densities q t ( ), it can also be readily adapted to give estimates of the one-step predictive cumulative density functions Q t (y t ) = y t q t(y)dy. In the one-dimensional case, these can be trivially converted into residuals by the transformation R t = Φ 1 (Q t (y t )), where Φ( ) denotes the 9
10 inverse of the standard normal cumulative distribution function. If the model were indeed correct, then these residuals would be realizations of a set of iid standard normal random variables. Thus checks for model goodness-of-fit could be carried out using standard batteries of tests for iid normal random variables. Another aspect worth further study is asymptotic distributional properties of the estimators. While consistency is nice, it would be useful, if carrying out optimization over parameter space, for instance, to be able to estimate the error in the log-likelihood estimator, as a function of the number of Markov chain iterations. 6 Acknowledgments The author is grateful to Arnaud Doucet, Larry Wasserman and Peter Spirtes for valuable comments and discussion related to various aspects of parts of this work. A Proofs First we give the proof that Procedure 2.2 generates a Markov chain with limiting distribution whose marginal distributions are the filtering distributions. Proof of Theorem 3.1: The proof consists of two main parts In the first part, we show that {Z i } is a uniformly ergodic Markov chain, while in the second we show that the marginal distributions of the limiting distribution match the desired filtering distributions. It will be useful to define, for m = 1, 2,..., T, Z (m) i = (Z i,1,..., Z i,m ) R p m. (Note that Z (T ) i = Z i.) Part 1. Consider the sequence {Z i,m }, for some fixed m {1,..., T }. It is easily verified that the sequence is a Markov chain. Let its transition kernel be denoted by R k m(x, A) = Pr(Z (m) i+k A Z(m) i = x), x R p m, A B p m. Let µ (m) X : Bp m R denote the distribution of the process {X 1,..., X m }. Then for k 1, Rm(x, 1 A) α m (x, z)dµ (m) X (z), (12) z A 10
11 where α m (x, z) is the probability of accepting all proposals from t = 1 to t = m, given by α m (x, z) = m t=1 t=1 ( min 1, g ) t(y t z t ). (13) g t (y t x t ) This expression can be bounded below by a function of z, m ( ) g t (y t z t ) α m (x, z) α m (z) := min 1, > 0. (14) sup x R p g t (y t x) (The fact that α m (z) > 0 follows immediately from Assumption 2.1.) Combining (12) and (13), we see that for every A such that µ (m) X (A) > 0, Rm 1 (x, A) α m+1 (z)dµ (m+1) X (z) := ζ (m) (A) > 0. (15) A It follows directly from inequality (15), since the measure ζ (m) is non-trivial, and doesn t depend on x, that the entire state-space R p T constitutes a small set (see Meyn and Tweedie, 1993, for the definition of a small set) Thus by Theorem (v) of Meyn and Tweedie (1993), the Markov chain is uniformly ergodic. In other words, it has a limiting distribution ν (m), and (see Theorem of Meyn and Tweedie, 1993) for some constants 0 < c m < and 0 < ρ m < 1, sup R k (x, ) ν m ( ) c m ρ k m, (16) x R p T where denotes the total variation norm of a signed measure. Part 2. It remains to be established that the corresponding marginal distributions match the filtering distributions π t. This can be done inductively. First consider {Z (1) i, i = 1, 2,...}. We know from the previous part that this is a uniformly ergodic Markov chain. It is easily checked that it is also a Metropolis-Hastings chain with proposal densities f 1 ( ) and a limiting distribution with density π 1 ( ). Therefore ν 1 has density π 1. Next, suppose that the marginal densities of ν (m) are ν (m) t ( ) = π t ( ), for t = 1, 2,..., m. We know that ν (m+1) exists, and since the first m components of Z (m+1) i are exactly equal to Z (m) i, the first m marginal densities of Z (m+1) i must be ν (m+1) t ( ) = π t ( ), t = 1, 2,..., m. Now suppose that for some i, Z (m+1) i ν (m+1). Then Z i+1,m π m, and is independent of Z i,m+1, so the proposal Q generated for Z i+1,m+1 has density p m+1 ( ) (recall the definition (3)), and is independent of Z i,m+1. The acceptance probability is min(1, g m+1 (y m+1 Q)/g m+1 (y m+1 Z i,m+1 )). This means that Z i+1,m+1 can be regarded as the result of an application of a Metropolis- Hastings transition kernel to the state Z i,m+1, for which the limiting distribution is proportional to p m+1 ( )g m+1 (y m+1 ) π m+1 ( ). 11
12 Thus there is only one density which both Z i,m+1 and Z i+1,m+1 can have (since this hypothetical Metropolis-Hastings transition kernel has a unique invariant distribution), and that must be π m+1 ( ). In other words, the marginal density ν (m+1) m+1 ( ) must be equal to π m+1 ( ). We have just shown that if the marginal densities of ν (m) are π 1,..., π m, then the (m + 1)st marginal density of ν (m+1) must be π m+1. Therefore, by induction, the marginal densities of ν (which is the same as ν (T ) ) must be π 1,..., π T. This completes the proof. Lemma A.1 Suppose that the conditions of Theorem 3.3 hold. Then the sequence {(Z i, W i ), i = 1, 2,...} is a uniformly ergodic Markov chain taking values in R p T +q T, with a limiting distribution for which the marginal distributions of the components corresponding to W i,1,..., W i,t have densities, with respect to Lebesgue measure, equal to the one-step predictive densities q 1,..., q T. Proof: It is easily verified (from inspection of Procedure 2.2) that the distribution of (Z i, W i ), given {(Z j, W j ), j < i}, depends only on Z i 1. Thus {(Z i, W i )} is a Markov chain. Let its transition kernel be denoted by S 1 (x, C) = Pr((Z i, W i ) C (Z i 1, W i 1 ) = x), x R p T +q T, C B p T +q T. p T +q T Consider a point x = (z, w) R and A B p T, B B q T. Then a particular product set C = A B, with where S 1 (x, C) = Pr(Z i A W i B Z i 1 = z) = Pr(W i B Z i = u)p r(z i du Z i 1 = z) = A Pr(W i B Z i = u)rt 1 (z, du) A Pr(W i B Z i = u)dζ T (u) A = g 1:T (v u)d(λ ζ T )(v, u), (17) A B g 1:T (v, u) = T g i (v i u i ). Since g 1:T is a strictly positive function, the last expression in (17) defines a measure on the family of product sets A B. By the extension theorem, there is a unique extension of this measure to B p T +q T. Let this measure be denoted by ξ( ). Furthermore, note that ξ( ) does not depend on x, and ξ(a B) > 0 whenever µ X (A) > 0 and the Lebesgue measure of B is positive. Thus S 1 (x, C) ξ(c) (18) 12 i=1
13 p T +q T for all x, and ξ( ) is a non-trivial measure. It follows that the entire state-space R constitutes a small set for the Markov chain ({(Z i, W i )}, and hence, by Theorem (v) of Meyn and Tweedie (1993), that {(Z i, W i )} is a uniformly ergodic Markov chain. From Theorem 3.1, we know that {Z i } has an invariant distribution ν whose marginal distributions have densities π 1,..., π T. Thus the marginal distributions of the Z-component of the invariant distribution of the Markov chain {Z i, W i } are also π 1,..., π T. Now suppose that for some i, Z i,t 1 has density π t 1 ( ). It follows from equations (3) and (6), and Steps 1 and 2 of Procedure 2.2, that W i,t must have density q t ( ). Thus the marginal distributions of the W -component of the invariant distribution of {Z i, W i } must have densities q 1,..., q T. This completes the proof. Lemma A.2 Suppose that the conditions of Theorem 3.3 hold. Let µ i,t denote the distribution of W i,t, and µ t denote the corresponding marginal distribution of the limiting distribution of the Markov chain {(Z i, W i )}. Then for almost all y R q, we have 1 lim i h q K i ( y u h i ) dµ i,t (u) = q t (y). Proof: For convenience, define K i (x) = h q i K(x/h i ). Write K i (y u)dµ i,t = K i (y u)dµ t (u) + K i (y u)dν i,t (u), where ν i,t = µ i,t µ t. Taking, limits on both sides, and applying Lemma 3.1 of Masry and Györfi (1987) (since µ t has density q t ), we get, for almost all y R q, lim K i (y u)dµ i,t = q t (y) + lim K i (y u)dν i,t (u). (19) i i We can bound the last term in this expression as follows. Let k denote the upper bound for K( ). Then K i (y u)dν i,t (u) = 1 h q K((y u)/h i )dν i,t (u) i 1 h q k dν i,t (u) i = k h q µ i,t µ t, (20) i where denotes the total variation norm. But by Theorem of Meyn and Tweedie (1993), since {(Z i, W i )} is a uniformly ergodic Markov chain, the total variation distance 13
14 in (20) is bounded (c.f. (16)) by a sequence converging geometrically to zero, that is, for some constants c > 0, 0 < ρ < 1, k h q i µ i,t µ t k h q i cρ i = k cρ i 0, as i. (21) h 0 i α Combining (19), (20), and (21), we get the desired result. (Note that for the absolute error term in (21) to converge to zero, we require that the Markov chain converge in total variation norm faster than the bandwidths h i shrink to zero.) The following result can be regarded as an extension of a special case of Theorem 2.1 of Masry and Györfi (1987). Lemma A.3 Suppose that the conditions of Theorem 3.3 hold. Then for t = 1, 2,..., T, ( ) n (1 αq)/2 ] [ˆq (n) (log n)(log 2 n) 1+δ t (y) E ˆq (n) t (y) 0 almost surely, for almost all y R q, for any δ > 0. Proof: By Theorem of Meyn and Tweedie (1993), the sequence {W i,t, i = 1, 2,...} is asymptotically uncorrelated. The desired result then follows directly from an application of Theorem 2.1 of Masry and Györfi (1987) (using, in their notation, u 2 n = [(log n)(log 2 n)] 1 ). The fact that the theorem still applies in our case, where samples are not identically distributed, but instead only have distributions converging geometrically in total variation distance, can be established by replicating the proof given in Masry and Györfi (1987), replacing use of their Lemma 3.1 with our Lemma A.2. We are now in a position to give the proof of the main convergence result. Proof of Theorem 3.3: Furthermore, since ˆq (n) t follows from Lemma A.2 that For fixed t {1,..., T }, by Lemma A.3, ˆq (n) t (y) E[ˆq (n) t (y)] 0, as n, for almost all y R q. (y) = n 1 n i=1 K i(y W i,t ) and µ i,t is the distribution of W i,t, it E ˆq (n) t (y) q t (y). Thus ˆq (n) t (y) must converge almost surely, for almost all y R q, to q t (y), and consequently log(ˆq (n) t (y)) a.s. log(q t (y)), for almost all y R q. The desired result then follows directly upon examination of definitions (5)and (10). 14
15 References C. Andrieu and A. Doucet. Online expectation-maximization type algorithms. In Proc. IEEE ICASSP, T. Bollerslev. Generalized autoregressive conditional heteroskedasticity. Econometrics, 31: , G. Box and G. Jenkins. Time Series Analysis, Forecasting and Control. Holden-Day, A. Brockwell, A. Rojas, and R. Kass. Recursive Bayesian decoding of motor cortical signals by particle filtering. Journal of Neurophysiology, 91: , P.J. Brockwell and R.A. Davis. Introduction to Time Series and Forecasting. Springer, second edition, B.P. Carlin, N.G. Polson, and D.S. Stoffer. A Monte Carlo approach to nonnormal and nonlinear state-space modeling. Journal of the American Statistical Association, 87(418): , ISSN A. Doucet, N. de Freitas, and N. Gordon, editors. Sequential Monte Carlo Methods in Practice. Springer, New York, R.F. Engle. Autoregressive conditional heteroscedasticity with estimates of the variance of uk inflation. Econometrica, 50: , E. Ghysels, A.C. Harvey, and E. Renault. Stochastic volatility. In Handbook of Statistics, volume 14. Elsevier, N.J. Gordon, D.J. Salmond, and A.F.M. Smith. Novel approach to nonlinear/non-gaussian Bayesian state estimation. IEE Proc. F, 140: , R.E. Kalman. A new approach to linear prediction and filtering problems. Journal of Basic Engineering (ASME), 82D:35 45, G. Kitagawa. Monte Carlo filter and smoother for non-gaussian nonlinear state space models. Journal of Computational and Graphical Statistics, 5(1):1 25, E. Masry and L. Györfi. Strong consistency and rates for recursive probability density estimators of stationary processes. Journal of Multivariate Analysis, 22:79 93, S.P. Meyn and R.L. Tweedie. Markov Chains and Stochastic Stability. Springer, D.W. Scott. Multivariate Density Estimation. Wiley, R.H. Shumway and D.S. Stoffer. Time Series Analysis and its Applications. Springer,
16 J.S. Taylor. Modeling stochastic volatility: a review and comparative study. Mathematics of Finance, 4: , E.J. Wegman and H.I. Davies. Remarks on some recursive estimators of a probability density. Annals of Statistics, 7(2): , M. West and J. Harrison. Bayesian Forecasting and Dynamic Models. Springer, New York, second edition, C.T. Wolverton and T.J. Wagner. Asymptotically optimal discriminant functions for pattern classification. IEEE Transactions on Information Theory, IT-15: , H. Yamato. Sequential estimation of a continuous probability density function and mode. Bull. Math. Statistics, 14:1 12, B. Yu. Estimating the L 1 error of kernel estimators for Markov sampler. Technical Report 409, Dept. of Statistics, U.C. Berkeley,
Introduction. log p θ (y k y 1:k 1 ), k=1
ESAIM: PROCEEDINGS, September 2007, Vol.19, 115-120 Christophe Andrieu & Dan Crisan, Editors DOI: 10.1051/proc:071915 PARTICLE FILTER-BASED APPROXIMATE MAXIMUM LIKELIHOOD INFERENCE ASYMPTOTICS IN STATE-SPACE
More informationLearning Static Parameters in Stochastic Processes
Learning Static Parameters in Stochastic Processes Bharath Ramsundar December 14, 2012 1 Introduction Consider a Markovian stochastic process X T evolving (perhaps nonlinearly) over time variable T. We
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 7 Approximate
More informationSequential Monte Carlo Methods for Bayesian Computation
Sequential Monte Carlo Methods for Bayesian Computation A. Doucet Kyoto Sept. 2012 A. Doucet (MLSS Sept. 2012) Sept. 2012 1 / 136 Motivating Example 1: Generic Bayesian Model Let X be a vector parameter
More informationPart I State space models
Part I State space models 1 Introduction to state space time series analysis James Durbin Department of Statistics, London School of Economics and Political Science Abstract The paper presents a broad
More informationApril 20th, Advanced Topics in Machine Learning California Institute of Technology. Markov Chain Monte Carlo for Machine Learning
for for Advanced Topics in California Institute of Technology April 20th, 2017 1 / 50 Table of Contents for 1 2 3 4 2 / 50 History of methods for Enrico Fermi used to calculate incredibly accurate predictions
More informationIntroduction to Machine Learning CMU-10701
Introduction to Machine Learning CMU-10701 Markov Chain Monte Carlo Methods Barnabás Póczos & Aarti Singh Contents Markov Chain Monte Carlo Methods Goal & Motivation Sampling Rejection Importance Markov
More informationOn Reparametrization and the Gibbs Sampler
On Reparametrization and the Gibbs Sampler Jorge Carlos Román Department of Mathematics Vanderbilt University James P. Hobert Department of Statistics University of Florida March 2014 Brett Presnell Department
More information17 : Markov Chain Monte Carlo
10-708: Probabilistic Graphical Models, Spring 2015 17 : Markov Chain Monte Carlo Lecturer: Eric P. Xing Scribes: Heran Lin, Bin Deng, Yun Huang 1 Review of Monte Carlo Methods 1.1 Overview Monte Carlo
More informationIntroduction to Particle Filters for Data Assimilation
Introduction to Particle Filters for Data Assimilation Mike Dowd Dept of Mathematics & Statistics (and Dept of Oceanography Dalhousie University, Halifax, Canada STATMOS Summer School in Data Assimila5on,
More informationFaithful couplings of Markov chains: now equals forever
Faithful couplings of Markov chains: now equals forever by Jeffrey S. Rosenthal* Department of Statistics, University of Toronto, Toronto, Ontario, Canada M5S 1A1 Phone: (416) 978-4594; Internet: jeff@utstat.toronto.edu
More informationBayesian Methods for Machine Learning
Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),
More informationThe Metropolis-Hastings Algorithm. June 8, 2012
The Metropolis-Hastings Algorithm June 8, 22 The Plan. Understand what a simulated distribution is 2. Understand why the Metropolis-Hastings algorithm works 3. Learn how to apply the Metropolis-Hastings
More informationDiagnostic Test for GARCH Models Based on Absolute Residual Autocorrelations
Diagnostic Test for GARCH Models Based on Absolute Residual Autocorrelations Farhat Iqbal Department of Statistics, University of Balochistan Quetta-Pakistan farhatiqb@gmail.com Abstract In this paper
More information16 : Approximate Inference: Markov Chain Monte Carlo
10-708: Probabilistic Graphical Models 10-708, Spring 2017 16 : Approximate Inference: Markov Chain Monte Carlo Lecturer: Eric P. Xing Scribes: Yuan Yang, Chao-Ming Yen 1 Introduction As the target distribution
More informationAN EFFICIENT TWO-STAGE SAMPLING METHOD IN PARTICLE FILTER. Qi Cheng and Pascal Bondon. CNRS UMR 8506, Université Paris XI, France.
AN EFFICIENT TWO-STAGE SAMPLING METHOD IN PARTICLE FILTER Qi Cheng and Pascal Bondon CNRS UMR 8506, Université Paris XI, France. August 27, 2011 Abstract We present a modified bootstrap filter to draw
More informationSensor Fusion: Particle Filter
Sensor Fusion: Particle Filter By: Gordana Stojceska stojcesk@in.tum.de Outline Motivation Applications Fundamentals Tracking People Advantages and disadvantages Summary June 05 JASS '05, St.Petersburg,
More informationCalibration of Stochastic Volatility Models using Particle Markov Chain Monte Carlo Methods
Calibration of Stochastic Volatility Models using Particle Markov Chain Monte Carlo Methods Jonas Hallgren 1 1 Department of Mathematics KTH Royal Institute of Technology Stockholm, Sweden BFS 2012 June
More informationEfficient Monitoring for Planetary Rovers
International Symposium on Artificial Intelligence and Robotics in Space (isairas), May, 2003 Efficient Monitoring for Planetary Rovers Vandi Verma vandi@ri.cmu.edu Geoff Gordon ggordon@cs.cmu.edu Carnegie
More informationGeometric ergodicity of the Bayesian lasso
Geometric ergodicity of the Bayesian lasso Kshiti Khare and James P. Hobert Department of Statistics University of Florida June 3 Abstract Consider the standard linear model y = X +, where the components
More informationEVALUATING SYMMETRIC INFORMATION GAP BETWEEN DYNAMICAL SYSTEMS USING PARTICLE FILTER
EVALUATING SYMMETRIC INFORMATION GAP BETWEEN DYNAMICAL SYSTEMS USING PARTICLE FILTER Zhen Zhen 1, Jun Young Lee 2, and Abdus Saboor 3 1 Mingde College, Guizhou University, China zhenz2000@21cn.com 2 Department
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 11 Project
More informationVariable Resolution Particle Filter
In Proceedings of the International Joint Conference on Artificial intelligence (IJCAI) August 2003. 1 Variable Resolution Particle Filter Vandi Verma, Sebastian Thrun and Reid Simmons Carnegie Mellon
More informationConsistency of Quasi-Maximum Likelihood Estimators for the Regime-Switching GARCH Models
Consistency of Quasi-Maximum Likelihood Estimators for the Regime-Switching GARCH Models Yingfu Xie Research Report Centre of Biostochastics Swedish University of Report 2005:3 Agricultural Sciences ISSN
More informationUniversity of Toronto Department of Statistics
Norm Comparisons for Data Augmentation by James P. Hobert Department of Statistics University of Florida and Jeffrey S. Rosenthal Department of Statistics University of Toronto Technical Report No. 0704
More informationRobust Backtesting Tests for Value-at-Risk Models
Robust Backtesting Tests for Value-at-Risk Models Jose Olmo City University London (joint work with Juan Carlos Escanciano, Indiana University) Far East and South Asia Meeting of the Econometric Society
More informationPaul Karapanagiotidis ECO4060
Paul Karapanagiotidis ECO4060 The way forward 1) Motivate why Markov-Chain Monte Carlo (MCMC) is useful for econometric modeling 2) Introduce Markov-Chain Monte Carlo (MCMC) - Metropolis-Hastings (MH)
More informationRESEARCH ARTICLE. Online quantization in nonlinear filtering
Journal of Statistical Computation & Simulation Vol. 00, No. 00, Month 200x, 3 RESEARCH ARTICLE Online quantization in nonlinear filtering A. Feuer and G. C. Goodwin Received 00 Month 200x; in final form
More informationErgodicity in data assimilation methods
Ergodicity in data assimilation methods David Kelly Andy Majda Xin Tong Courant Institute New York University New York NY www.dtbkelly.com April 15, 2016 ETH Zurich David Kelly (CIMS) Data assimilation
More informationExercises Tutorial at ICASSP 2016 Learning Nonlinear Dynamical Models Using Particle Filters
Exercises Tutorial at ICASSP 216 Learning Nonlinear Dynamical Models Using Particle Filters Andreas Svensson, Johan Dahlin and Thomas B. Schön March 18, 216 Good luck! 1 [Bootstrap particle filter for
More informationOn Markov chain Monte Carlo methods for tall data
On Markov chain Monte Carlo methods for tall data Remi Bardenet, Arnaud Doucet, Chris Holmes Paper review by: David Carlson October 29, 2016 Introduction Many data sets in machine learning and computational
More informationPATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 13: SEQUENTIAL DATA
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 13: SEQUENTIAL DATA Contents in latter part Linear Dynamical Systems What is different from HMM? Kalman filter Its strength and limitation Particle Filter
More informationThe autocorrelation and autocovariance functions - helpful tools in the modelling problem
The autocorrelation and autocovariance functions - helpful tools in the modelling problem J. Nowicka-Zagrajek A. Wy lomańska Institute of Mathematics and Computer Science Wroc law University of Technology,
More informationON CONVERGENCE RATES OF GIBBS SAMPLERS FOR UNIFORM DISTRIBUTIONS
The Annals of Applied Probability 1998, Vol. 8, No. 4, 1291 1302 ON CONVERGENCE RATES OF GIBBS SAMPLERS FOR UNIFORM DISTRIBUTIONS By Gareth O. Roberts 1 and Jeffrey S. Rosenthal 2 University of Cambridge
More informationPseudo-marginal MCMC methods for inference in latent variable models
Pseudo-marginal MCMC methods for inference in latent variable models Arnaud Doucet Department of Statistics, Oxford University Joint work with George Deligiannidis (Oxford) & Mike Pitt (Kings) MCQMC, 19/08/2016
More informationDensity Propagation for Continuous Temporal Chains Generative and Discriminative Models
$ Technical Report, University of Toronto, CSRG-501, October 2004 Density Propagation for Continuous Temporal Chains Generative and Discriminative Models Cristian Sminchisescu and Allan Jepson Department
More informationAsymptotic inference for a nonstationary double ar(1) model
Asymptotic inference for a nonstationary double ar() model By SHIQING LING and DONG LI Department of Mathematics, Hong Kong University of Science and Technology, Hong Kong maling@ust.hk malidong@ust.hk
More informationThe Particle Filter. PD Dr. Rudolph Triebel Computer Vision Group. Machine Learning for Computer Vision
The Particle Filter Non-parametric implementation of Bayes filter Represents the belief (posterior) random state samples. by a set of This representation is approximate. Can represent distributions that
More informationAn introduction to Sequential Monte Carlo
An introduction to Sequential Monte Carlo Thang Bui Jes Frellsen Department of Engineering University of Cambridge Research and Communication Club 6 February 2014 1 Sequential Monte Carlo (SMC) methods
More informationMinicourse on: Markov Chain Monte Carlo: Simulation Techniques in Statistics
Minicourse on: Markov Chain Monte Carlo: Simulation Techniques in Statistics Eric Slud, Statistics Program Lecture 1: Metropolis-Hastings Algorithm, plus background in Simulation and Markov Chains. Lecture
More informationPOMP inference via iterated filtering
POMP inference via iterated filtering Edward Ionides University of Michigan, Department of Statistics Lecture 3 at Wharton Statistics Department Thursday 27th April, 2017 Slides are online at http://dept.stat.lsa.umich.edu/~ionides/talks/upenn
More informationPOSTERIOR MODE ESTIMATION FOR NONLINEAR AND NON-GAUSSIAN STATE SPACE MODELS
Statistica Sinica 13(2003), 255-274 POSTERIOR MODE ESTIMATION FOR NONLINEAR AND NON-GAUSSIAN STATE SPACE MODELS Mike K. P. So Hong Kong University of Science and Technology Abstract: In this paper, we
More informationParameter Estimation for ARCH(1) Models Based on Kalman Filter
Applied Mathematical Sciences, Vol. 8, 2014, no. 56, 2783-2791 HIKARI Ltd, www.m-hikari.com http://dx.doi.org/10.12988/ams.2014.43164 Parameter Estimation for ARCH(1) Models Based on Kalman Filter Jelloul
More informationContinuous Time Approximations to GARCH(1, 1)-Family Models and Their Limiting Properties
Communications for Statistical Applications and Methods 214, Vol. 21, No. 4, 327 334 DOI: http://dx.doi.org/1.5351/csam.214.21.4.327 Print ISSN 2287-7843 / Online ISSN 2383-4757 Continuous Time Approximations
More informationBayesian Estimation of DSGE Models 1 Chapter 3: A Crash Course in Bayesian Inference
1 The views expressed in this paper are those of the authors and do not necessarily reflect the views of the Federal Reserve Board of Governors or the Federal Reserve System. Bayesian Estimation of DSGE
More informationA Note on Auxiliary Particle Filters
A Note on Auxiliary Particle Filters Adam M. Johansen a,, Arnaud Doucet b a Department of Mathematics, University of Bristol, UK b Departments of Statistics & Computer Science, University of British Columbia,
More informationComputational statistics
Computational statistics Markov Chain Monte Carlo methods Thierry Denœux March 2017 Thierry Denœux Computational statistics March 2017 1 / 71 Contents of this chapter When a target density f can be evaluated
More informationBayesian Monte Carlo Filtering for Stochastic Volatility Models
Bayesian Monte Carlo Filtering for Stochastic Volatility Models Roberto Casarin CEREMADE University Paris IX (Dauphine) and Dept. of Economics University Ca Foscari, Venice Abstract Modelling of the financial
More informationOn the Optimal Scaling of the Modified Metropolis-Hastings algorithm
On the Optimal Scaling of the Modified Metropolis-Hastings algorithm K. M. Zuev & J. L. Beck Division of Engineering and Applied Science California Institute of Technology, MC 4-44, Pasadena, CA 925, USA
More informationSession 3A: Markov chain Monte Carlo (MCMC)
Session 3A: Markov chain Monte Carlo (MCMC) John Geweke Bayesian Econometrics and its Applications August 15, 2012 ohn Geweke Bayesian Econometrics and its Session Applications 3A: Markov () chain Monte
More informationNonlinear Parameter Estimation for State-Space ARCH Models with Missing Observations
Nonlinear Parameter Estimation for State-Space ARCH Models with Missing Observations SEBASTIÁN OSSANDÓN Pontificia Universidad Católica de Valparaíso Instituto de Matemáticas Blanco Viel 596, Cerro Barón,
More informationBayesian time series classification
Bayesian time series classification Peter Sykacek Department of Engineering Science University of Oxford Oxford, OX 3PJ, UK psyk@robots.ox.ac.uk Stephen Roberts Department of Engineering Science University
More informationSTA205 Probability: Week 8 R. Wolpert
INFINITE COIN-TOSS AND THE LAWS OF LARGE NUMBERS The traditional interpretation of the probability of an event E is its asymptotic frequency: the limit as n of the fraction of n repeated, similar, and
More informationInferring biological dynamics Iterated filtering (IF)
Inferring biological dynamics 101 3. Iterated filtering (IF) IF originated in 2006 [6]. For plug-and-play likelihood-based inference on POMP models, there are not many alternatives. Directly estimating
More informationIntroduction to Machine Learning CMU-10701
Introduction to Machine Learning CMU-10701 Markov Chain Monte Carlo Methods Barnabás Póczos Contents Markov Chain Monte Carlo Methods Sampling Rejection Importance Hastings-Metropolis Gibbs Markov Chains
More informationDAG models and Markov Chain Monte Carlo methods a short overview
DAG models and Markov Chain Monte Carlo methods a short overview Søren Højsgaard Institute of Genetics and Biotechnology University of Aarhus August 18, 2008 Printed: August 18, 2008 File: DAGMC-Lecture.tex
More informationModeling conditional distributions with mixture models: Theory and Inference
Modeling conditional distributions with mixture models: Theory and Inference John Geweke University of Iowa, USA Journal of Applied Econometrics Invited Lecture Università di Venezia Italia June 2, 2005
More informationStochastic Optimization with Inequality Constraints Using Simultaneous Perturbations and Penalty Functions
International Journal of Control Vol. 00, No. 00, January 2007, 1 10 Stochastic Optimization with Inequality Constraints Using Simultaneous Perturbations and Penalty Functions I-JENG WANG and JAMES C.
More informationKernel Sequential Monte Carlo
Kernel Sequential Monte Carlo Ingmar Schuster (Paris Dauphine) Heiko Strathmann (University College London) Brooks Paige (Oxford) Dino Sejdinovic (Oxford) * equal contribution April 25, 2016 1 / 37 Section
More informationSequential Bayesian Inference for Dynamic State Space. Model Parameters
Sequential Bayesian Inference for Dynamic State Space Model Parameters Arnab Bhattacharya and Simon Wilson 1. INTRODUCTION Dynamic state-space models [24], consisting of a latent Markov process X 0,X 1,...
More informationECE276A: Sensing & Estimation in Robotics Lecture 10: Gaussian Mixture and Particle Filtering
ECE276A: Sensing & Estimation in Robotics Lecture 10: Gaussian Mixture and Particle Filtering Lecturer: Nikolay Atanasov: natanasov@ucsd.edu Teaching Assistants: Siwei Guo: s9guo@eng.ucsd.edu Anwesan Pal:
More informationGeneralized Autoregressive Score Models
Generalized Autoregressive Score Models by: Drew Creal, Siem Jan Koopman, André Lucas To capture the dynamic behavior of univariate and multivariate time series processes, we can allow parameters to be
More informationThe Game of Normal Numbers
The Game of Normal Numbers Ehud Lehrer September 4, 2003 Abstract We introduce a two-player game where at each period one player, say, Player 2, chooses a distribution and the other player, Player 1, a
More informationRAO-BLACKWELLISED PARTICLE FILTERS: EXAMPLES OF APPLICATIONS
RAO-BLACKWELLISED PARTICLE FILTERS: EXAMPLES OF APPLICATIONS Frédéric Mustière e-mail: mustiere@site.uottawa.ca Miodrag Bolić e-mail: mbolic@site.uottawa.ca Martin Bouchard e-mail: bouchard@site.uottawa.ca
More informationOnline appendix to On the stability of the excess sensitivity of aggregate consumption growth in the US
Online appendix to On the stability of the excess sensitivity of aggregate consumption growth in the US Gerdie Everaert 1, Lorenzo Pozzi 2, and Ruben Schoonackers 3 1 Ghent University & SHERPPA 2 Erasmus
More informationKernel adaptive Sequential Monte Carlo
Kernel adaptive Sequential Monte Carlo Ingmar Schuster (Paris Dauphine) Heiko Strathmann (University College London) Brooks Paige (Oxford) Dino Sejdinovic (Oxford) December 7, 2015 1 / 36 Section 1 Outline
More informationGaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008
Gaussian processes Chuong B Do (updated by Honglak Lee) November 22, 2008 Many of the classical machine learning algorithms that we talked about during the first half of this course fit the following pattern:
More informationLocal consistency of Markov chain Monte Carlo methods
Ann Inst Stat Math (2014) 66:63 74 DOI 10.1007/s10463-013-0403-3 Local consistency of Markov chain Monte Carlo methods Kengo Kamatani Received: 12 January 2012 / Revised: 8 March 2013 / Published online:
More informationGARCH processes probabilistic properties (Part 1)
GARCH processes probabilistic properties (Part 1) Alexander Lindner Centre of Mathematical Sciences Technical University of Munich D 85747 Garching Germany lindner@ma.tum.de http://www-m1.ma.tum.de/m4/pers/lindner/
More informationMODEL selection is a fundamental data analysis task. It
IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 47, NO. 10, OCTOBER 1999 2667 Joint Bayesian Model Selection and Estimation of Noisy Sinusoids via Reversible Jump MCMC Christophe Andrieu and Arnaud Doucet
More informationMARKOV CHAIN MONTE CARLO
MARKOV CHAIN MONTE CARLO RYAN WANG Abstract. This paper gives a brief introduction to Markov Chain Monte Carlo methods, which offer a general framework for calculating difficult integrals. We start with
More informationA regeneration proof of the central limit theorem for uniformly ergodic Markov chains
A regeneration proof of the central limit theorem for uniformly ergodic Markov chains By AJAY JASRA Department of Mathematics, Imperial College London, SW7 2AZ, London, UK and CHAO YANG Department of Mathematics,
More informationThe Unscented Particle Filter
The Unscented Particle Filter Rudolph van der Merwe (OGI) Nando de Freitas (UC Bereley) Arnaud Doucet (Cambridge University) Eric Wan (OGI) Outline Optimal Estimation & Filtering Optimal Recursive Bayesian
More informationAnalysis of the Gibbs sampler for a model. related to James-Stein estimators. Jeffrey S. Rosenthal*
Analysis of the Gibbs sampler for a model related to James-Stein estimators by Jeffrey S. Rosenthal* Department of Statistics University of Toronto Toronto, Ontario Canada M5S 1A1 Phone: 416 978-4594.
More informationA note on Reversible Jump Markov Chain Monte Carlo
A note on Reversible Jump Markov Chain Monte Carlo Hedibert Freitas Lopes Graduate School of Business The University of Chicago 5807 South Woodlawn Avenue Chicago, Illinois 60637 February, 1st 2006 1 Introduction
More informationLecture 4: Dynamic models
linear s Lecture 4: s Hedibert Freitas Lopes The University of Chicago Booth School of Business 5807 South Woodlawn Avenue, Chicago, IL 60637 http://faculty.chicagobooth.edu/hedibert.lopes hlopes@chicagobooth.edu
More informationMarkov Chain Monte Carlo Methods for Stochastic Optimization
Markov Chain Monte Carlo Methods for Stochastic Optimization John R. Birge The University of Chicago Booth School of Business Joint work with Nicholas Polson, Chicago Booth. JRBirge U of Toronto, MIE,
More informationGoodness-of-Fit Tests for Time Series Models: A Score-Marked Empirical Process Approach
Goodness-of-Fit Tests for Time Series Models: A Score-Marked Empirical Process Approach By Shiqing Ling Department of Mathematics Hong Kong University of Science and Technology Let {y t : t = 0, ±1, ±2,
More informationSAMPLING ALGORITHMS. In general. Inference in Bayesian models
SAMPLING ALGORITHMS SAMPLING ALGORITHMS In general A sampling algorithm is an algorithm that outputs samples x 1, x 2,... from a given distribution P or density p. Sampling algorithms can for example be
More informationMonte Carlo Approximation of Monte Carlo Filters
Monte Carlo Approximation of Monte Carlo Filters Adam M. Johansen et al. Collaborators Include: Arnaud Doucet, Axel Finke, Anthony Lee, Nick Whiteley 7th January 2014 Context & Outline Filtering in State-Space
More informationTime Series Prediction by Kalman Smoother with Cross-Validated Noise Density
Time Series Prediction by Kalman Smoother with Cross-Validated Noise Density Simo Särkkä E-mail: simo.sarkka@hut.fi Aki Vehtari E-mail: aki.vehtari@hut.fi Jouko Lampinen E-mail: jouko.lampinen@hut.fi Abstract
More informationLECTURE 15 Markov chain Monte Carlo
LECTURE 15 Markov chain Monte Carlo There are many settings when posterior computation is a challenge in that one does not have a closed form expression for the posterior distribution. Markov chain Monte
More informationAdvanced Computational Methods in Statistics: Lecture 5 Sequential Monte Carlo/Particle Filtering
Advanced Computational Methods in Statistics: Lecture 5 Sequential Monte Carlo/Particle Filtering Axel Gandy Department of Mathematics Imperial College London http://www2.imperial.ac.uk/~agandy London
More informationParticle Filtering Approaches for Dynamic Stochastic Optimization
Particle Filtering Approaches for Dynamic Stochastic Optimization John R. Birge The University of Chicago Booth School of Business Joint work with Nicholas Polson, Chicago Booth. JRBirge I-Sim Workshop,
More informationSimultaneous drift conditions for Adaptive Markov Chain Monte Carlo algorithms
Simultaneous drift conditions for Adaptive Markov Chain Monte Carlo algorithms Yan Bai Feb 2009; Revised Nov 2009 Abstract In the paper, we mainly study ergodicity of adaptive MCMC algorithms. Assume that
More informationA Monte Carlo Sequential Estimation for Point Process Optimum Filtering
2006 International Joint Conference on Neural Networks Sheraton Vancouver Wall Centre Hotel, Vancouver, BC, Canada July 16-21, 2006 A Monte Carlo Sequential Estimation for Point Process Optimum Filtering
More informationMarkov Chain Monte Carlo Methods
Markov Chain Monte Carlo Methods John Geweke University of Iowa, USA 2005 Institute on Computational Economics University of Chicago - Argonne National Laboaratories July 22, 2005 The problem p (θ, ω I)
More informationBayesian Inference for DSGE Models. Lawrence J. Christiano
Bayesian Inference for DSGE Models Lawrence J. Christiano Outline State space-observer form. convenient for model estimation and many other things. Preliminaries. Probabilities. Maximum Likelihood. Bayesian
More informationTime Series I Time Domain Methods
Astrostatistics Summer School Penn State University University Park, PA 16802 May 21, 2007 Overview Filtering and the Likelihood Function Time series is the study of data consisting of a sequence of DEPENDENT
More informationThe Bayesian Approach to Multi-equation Econometric Model Estimation
Journal of Statistical and Econometric Methods, vol.3, no.1, 2014, 85-96 ISSN: 2241-0384 (print), 2241-0376 (online) Scienpress Ltd, 2014 The Bayesian Approach to Multi-equation Econometric Model Estimation
More informationA new iterated filtering algorithm
A new iterated filtering algorithm Edward Ionides University of Michigan, Ann Arbor ionides@umich.edu Statistics and Nonlinear Dynamics in Biology and Medicine Thursday July 31, 2014 Overview 1 Introduction
More informationECO 513 Fall 2009 C. Sims HIDDEN MARKOV CHAIN MODELS
ECO 513 Fall 2009 C. Sims HIDDEN MARKOV CHAIN MODELS 1. THE CLASS OF MODELS y t {y s, s < t} p(y t θ t, {y s, s < t}) θ t = θ(s t ) P[S t = i S t 1 = j] = h ij. 2. WHAT S HANDY ABOUT IT Evaluating the
More informationNonlinear and/or Non-normal Filtering. Jesús Fernández-Villaverde University of Pennsylvania
Nonlinear and/or Non-normal Filtering Jesús Fernández-Villaverde University of Pennsylvania 1 Motivation Nonlinear and/or non-gaussian filtering, smoothing, and forecasting (NLGF) problems are pervasive
More informationData assimilation in high dimensions
Data assimilation in high dimensions David Kelly Courant Institute New York University New York NY www.dtbkelly.com February 12, 2015 Graduate seminar, CIMS David Kelly (CIMS) Data assimilation February
More informationJuly 31, 2009 / Ben Kedem Symposium
ing The s ing The Department of Statistics North Carolina State University July 31, 2009 / Ben Kedem Symposium Outline ing The s 1 2 s 3 4 5 Ben Kedem ing The s Ben has made many contributions to time
More informationBayesian Inference for DSGE Models. Lawrence J. Christiano
Bayesian Inference for DSGE Models Lawrence J. Christiano Outline State space-observer form. convenient for model estimation and many other things. Bayesian inference Bayes rule. Monte Carlo integation.
More informationPractical unbiased Monte Carlo for Uncertainty Quantification
Practical unbiased Monte Carlo for Uncertainty Quantification Sergios Agapiou Department of Statistics, University of Warwick MiR@W day: Uncertainty in Complex Computer Models, 2nd February 2015, University
More informationComputer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo
Group Prof. Daniel Cremers 10a. Markov Chain Monte Carlo Markov Chain Monte Carlo In high-dimensional spaces, rejection sampling and importance sampling are very inefficient An alternative is Markov Chain
More informationTowards inference for skewed alpha stable Levy processes
Towards inference for skewed alpha stable Levy processes Simon Godsill and Tatjana Lemke Signal Processing and Communications Lab. University of Cambridge www-sigproc.eng.cam.ac.uk/~sjg Overview Motivation
More informationPattern Recognition and Machine Learning
Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability
More information