On an adaptive preconditioned Crank-Nicolson algorithm for infinite dimensional Bayesian inferences

Size: px
Start display at page:

Download "On an adaptive preconditioned Crank-Nicolson algorithm for infinite dimensional Bayesian inferences"

Transcription

1 Noname manuscript No. (will be inserted by the editor) On an adaptive preconditioned Crank-Nicolson algorithm for infinite dimensional Bayesian inferences Zixi Hu Zhewei Yao Jinglai Li Received: date / Accepted: date Abstract The preconditioned Crank-Nicolson () method is a MCMC algorithm for implementing the Bayesian inferences in function spaces. A remarkable feature of the algorithm is that, unlike many usual MCMC algorithms, which become arbitrary slow under the mesh refinement, the efficiency of the algorithm is dimension independent. In this work we develop an adaptive version of the algorithm, where the proposal is adaptively improved based on the sample history. Under the chosen parametrization of the proposal distribution, the proposal parameters can be efficiently updated in our algorithm. We show that the resulting adaptive algorithm is dimension independent and has the correct ergodicity properties. Finally we provide numerical examples to demonstrate the efficiency of the proposed algorithm. Keywords Bayesian inference covariance operator dimension independence Markov Chain Monte Carlo. Mathematics Subject Classification (2) 62F5 65C5 Introduction Many scientific problems, such as nonparametric regression [2], and inverse problems [4, 26], require to perform Bayesian inferences in function spaces. In This work was supported by the NSFC under grant number ZH and ZY contribute equally to the work. Z. Hu Z. Yao Department of Mathematics and Zhiyuan College, Shanghai Jiao Tong University, 8 Dongchuan Rd, Shanghai 224, China. J. Li Institute of Natural Sciences, Department of Mathematics, and the MOE Key Laboratory of Scientific and Engineering Computing, Shanghai Jiao Tong University, 8 Dongchuan Rd, Shanghai 224, China. jinglaili@sjtu.edu.cn

2 2 Z. Hu, Z. Yao and J. Li practice, often the posterior distributions do not admit a closed form and need to be computed numerically. Specifically one first represents the unknown function with a finite-dimensional parametrization, for example, by discretizing the function on a pre-determined mesh grid, and then solves the resulting finite dimensional inference problem with the Markov Chain Monte Carlo (MCMC) simulations. It has been known that standard MCMC algorithms, such as the random walk Metropolis-Hastings (RWMH), can become arbitrarily slow as the discretization mesh of the unknown is refined [2,23,4,8]. That is, the mixing time of an algorithm can increase to infinity as the dimension of the discretized parameter approaches to infinity, and in this case the algorithm is said to be dimension-dependent. To this end, a very interesting line of research is to develop dimension-independent MCMC algorithms by requiring the algorithms to be well-defined in the function spaces. In particular, a family of dimension-independent MCMC algorithms were presented in [7] by constructing a Crank-Nicolson discretization of a stochastic partial differential equation (SPDE) that preserves the reference measure. Just like the finite dimensional problems, one can improve the sampling efficiency of the infinite dimensional MCMC by incorporating the data information in the proposal design. To this end, a very popular class of methods guide the proposal with the local derivative information of the likelihood function. Such derivative based methods include: the stochastic Newton MCMC [7, 9], the operator-weighted proposal method [6], the infinitedimensional Metropolis-adjusted Langevin algorithm (MALA) [5, 3], the dimensionindependent likelihood-informed (DILI) MCMC [8], and the generalized preconditioned CN (g) algorithm [24], just to name a few. In this work, we focus on an alternative type of methods to utilize the data information, i.e., the adaptive MCMC (c.f. [, 2, 22] and the references therein), which adjust the proposal based on the sample history. A major advantage of the adaptive methods is that they do not require the knowledge of the gradient, which makes them particularly convenient for problems with black-box models. In a recent work [], we develop an adaptive independence sampler MCMC algorithm for the infinite dimensional problems. A major limitation of independence sampler MCMC algorithms is that the efficiency of such algorithms depends critically on the ability of the chosen proposal, often in a parametrized form, to approximate the posterior in the entire state space, and the algorithm may perform very poorly if the proposal can not well approximate the posterior distribution. In this respect, random walk based algorithms may be advantageous as they do not require such a global proposal. In this work, we present an adaptive random walk MCMC based on the preconditioned Crank-Nicolson () algorithm in [7]. Specifically, we adaptively adjust the preconditioning operator in the algorithm to improve the sampling efficiency. We parametrize the preconditioning operator in a specific form that has been used in [2,], and we provide an algorithm that can efficiently update the parameter values as the iteration proceeds. By design, the acceptance probability is well defined and thus the algorithm is dimension independent. In addition, our algorithm ensures that acceptance probability

3 Adaptive preconditioned Crank-Nicolson algorithm 3 is the same as that in the standard algorithm, which is independent of the proposal distribution. Finally we note that an important issue in designing an adaptive MCMC algorithm is to preserve the ergodicity while allowing the proposal distribution to vary during the iterations. Following the roadmap outlined in [], we provide some theoretical results regarding the ergodicity of the proposed algorithm. We note that two methods that are similar to our works are the g in [24], and the dimension independent adaptive Metropolis (DIAM) proposed in [6]. Compared to the g method, our algorithm utilizes a specific parametrized form of the proposal and as a result the parameters can be updated very efficiently, which makes an adaptive algorithm feasible. The DIAM is also an adaptive MCMC algorithm, and the major difference between it and our method is that, by design, our method preserves an important feature of the standard algorithm, i.e., the acceptance probability being independent on the proposal distribution. It should also be noted that, our algorithm is specifically designed for Gaussian priors, and there are works concerning MCMC algorithms for non-gaussian priors [27, 28]. The rest of the paper is organized as the following. In section 2 we describe the setup of infinite dimensional inference problems and present our adaptive MCMC algorithm in detail. In section 3 we provide several numerical examples to demonstrate the performance of the proposed algorithm. Finally we offer some concluding remarks in section 4. 2 The adaptive preconditioned Crank-Nicolson algorithm 2. Problem setup We present the standard setup of the problem following [26]. We consider a separable Hilbert space X with inner product, X. Our goal is to estimate the unknown u X from data y Y where Y is the data space and y is related to u via a likelihood function L y (u). In the Bayesian inference we assume that the prior µ of u, is a (without loss of generality) zero-mean Gaussian measure defined on X with covariance operator C, i.e. µ = N(, C ). Note that C is symmetric positive and of trace class. The range of C 2, E = {u = C 2 x x X} X, which is a Hilbert space equipped with inner product [9],, E = C 2, C 2 X, is called the Cameron-Martin space of measure µ. In this setting, the posterior measure µ y of u conditional on data y is provided by the Radon-Nikodym derivative: dµ y dµ (u) = L y (x), (2.)

4 4 Z. Hu, Z. Yao and J. Li which can be interpreted as the Bayes rule in the infinite dimensional setting. In a standard setting, the likelihood function takes form of L y (u) = Z exp( Φy (u)), (2.2) where Z is a normalization constant. In what follows, without causing any ambiguity, we shall drop the superscript y in Φ y, L y and µ y for simplicity, while keeping in mind that these functions depends on the data y. For the inference problem to be well-posed, one typically requires the functional Φ to satisfy the Assumptions (6.) in [7]. Finally we quote the following lemma ([9], Chapter ), which will be useful later: Lemma There exists a complete orthonormal basis {e j } j N on X and a sequence of non-negative numbers {α j } j N such that C e j = α j e j and j= α j <, i.e., {e j } k N and {α j } k N being the eigenfunctions and eigenvalues of C respectively. 2.2 The Crank-Nicolson algorithms We start by briefly reviewing the family of Crank-Nicolson algorithms for infinite dimensional Bayesian inferences, developed in [7]. Simply speaking the algorithms are based on the stochastic partial differential equation (SPDE) du ds = KLu + 2K db ds, (2.3) where L = C is the precision operator for µ, K is a positive operator, and b is a Brownian motion in X with covariance operator the identity. The proposal is then derived by applying the Crank-Nicolson (CN) scheme to the SPDE (2.3), yielding, v = u 2 δkl(u + v) + 2K δξ, (2.4) for a white noise ξ and δ (, 2). In [7], two choices of K are proposed, resulting in two different algorithms. First, one can choose K = I, the identity, obtaining: (2C + δi)v = (2C δi)u + 8δw, where w N (, C ), which is known as the plain Crank-Nicolson (CN) algorithm. Alternatively one can choose K = C, resulting in the preconditioned Crank-Nicolson () proposal: where v = ( β 2 ) 2 u + βw, (2.5) β = 8δ 2 + δ. It is easy to see that β [, ]. In both CN and algorithms, the acceptance probability is a(v, u) = min{, L(v) }. (2.6) L(u)

5 Adaptive preconditioned Crank-Nicolson algorithm Parametrizing the operator K A natural extension of the CN and algorithms (which is also proposed in [7]) is to consider other choices of the operator K to improve the algorithm efficiency. To this end, we first rewrite the proposal Eq. (2.4) as v = (I 2 δkl) 2δK (I + 2 δkl)u + (I + 2 δkl)ξ (2.7) Before discussing specific choices of the operator K, we present the following proposition regarding the acceptance probability: Proposition Suppose operator K is symmetric positive and of trace class. Let q(u, ) be the proposal distribution associated to Eq. (2.7). Define measures η(du, dv) = q(u, dv)µ(du) and η (du, dv) = q(v, du)µ(dv) on X X. If K commutes with C, η is absolutely continuous with respect to η, and dη L(v) (u, v) = dη L(u). Proof Define η (u, v) = q(u, dv)µ (u). The measure η is Gaussian. From K and C are commutable, we have E η v v = (I + 2 δkl) 2 (I 2 δkl)2 C + (I + 2 δkl) 2 2δK = C = E η u u. Then η is symmetric in u, v. Now η(du, dv) = q(u, dv)µ(du), η (du, dv) = q(u, dv)µ (du) and µ,µ are equivalent. It follows that η and η are equivalent and dη dη (u, v) = dµ dµ (u) = L(u). Since η is symmetric in u, v we also have that η and η are equivalent and that dη dη (u, v) = L(v). Since equivalence of measures is transitive, it follows that η and η are equivalent and dη L(v) (u, v) = dη L(u).

6 6 Z. Hu, Z. Yao and J. Li Now we discuss how to specify the operator K, and we start with assuming K an appropriate parametrized form. Note that an essential condition in Proposition is that K must commute with C. To satisfy this condition, it is convenient to design a K that has common eigenfunctions with C. Namely, we write K in the form of where H is defined as K = C + H, (2.8a) H = J h j e j, e j, j= (2.8b) with h j being coefficients. Here J is prescribed positive integer that is either smaller or equal to the dimensionality of the problem. It is easy to see that K is a symmetric operator with eigenvalue-eigenfunction pair {λ j, e j } j=, where λ j = α j + h j for j =...J and λ j = α j for j = J +..., which implies that K and C commute. 2.4 The adaptive algorithm A well-adopted rule in designing efficient MCMC algorithms is that the proposal covariance should be close to the covariance operator of the posterior [23, ]. Next we give a heuristic argument for our method to determine the operator K. In the case of small δ, the proposal (2.7) is approximately equal to v u + 2δw, where w N (, K ), which implies that K provides an approximation to the proposal covariance in this case. Thus we shall require K to be close to the posterior covariance. Note that such an approximation is only valid for small δ in principle, and thus we recommend not to use very large δ in the proposed algorithm. Now suppose the posterior covariance is C, and one can determine K by min K C, (2.9) {h j} J h= where is the Hilbert-Schmidt norm and K is given by Eq. (2.8). By some basic algebra, we can show that the optimal solution of Eq (2.9) is h j = Ce j, e j α j or equivalently λ j = Ce j, e j for j =...J. Since C is the posterior covariance, for any v and v X, we have [9], Cv, v = v, u m v, u m µ(du), (2.)

7 Adaptive preconditioned Crank-Nicolson algorithm 7 where m is the mean of µ. Using Eq. (2.), we can derive that h j = (xj u j ) 2 dµ, or, λ j = (x j u j ) 2 dµ, (2.) α j where x j = m, e j and u j = u, e j for j =...J. In practice, the posterior covariance C is not directly available, and so here we determine the operator K with an adaptive MCMC algorithm. Simply speaking, the adaptive algorithm starts with an initial guess of K and then adaptively updates the K based on the sample history of the posterior. The essential part in the algorithm is to update K, i.e. to estimate the values of h j, from posterior samples. To this end, suppose we have a set of posterior samples {u n } n i=, and the values of parameters h j are estimated using the sample average approximation of Eq. (2.): x n j = n u i, e j, n + i= (2.2a) n s n j = (u i j) 2, (2.2b) i= h n j = n+ sn j (xn j )2 + ɛ, (2.2c) 2 α j for j =...J. Here ɛ is a small constant, introduced to ensure the stability of the algorithm, i.e., to prevent h n j becoming arbitrarily large. For efficiency s sake, we can rewrite Eq (2.2) in a recursive form x n j = n n + xn j + n + un, e j, (2.3a) s n j = s n j + (u n j ) 2, (2.3b) h n j = n+ sn j, (xn j )2 α j (2.3c) for j =...J and n >. Let us denote the operator K resulting from {h n j }J j= as K n and it is easy to see that K n is symmetric positive and of trace class. As a result we can rewrite the proposal v = (I 2 δk nl) 2δ (I + 2 δk nl) u + (I + 2 δk w, (2.4) nl) where w N (, K n ). Finally we note that, it is not robust to estimate the parameter values with a very small number of samples, and to address the issue, we first draw a certain number of samples with a standard algorithm and then start the adaptive algorithm. We describe the complete adaptive (A) algorithm in Algorithm.

8 8 Z. Hu, Z. Yao and J. Li Algorithm The adaptive algorithm : Initialize u S; 2: for n = to n do 3: Propose v using Eq (2.5); 4: Draw ρ U[, ] 5: Let a := min{, 6: if ρ a then 7: u n+ = v; 8: else 9: u n+ = u n ; : end if : end for 2: Compute {x n j, sn j L(v) L(u n ) };, hn j }J j= using Eq. (2.2) and samples {ui } n i= ; 3: for n = n to N do 4: Compute K n from Eqs. (2.8) with {h n j }J j= ; 5: Propose v using Eq (2.4); 6: Draw ρ U[, ] 7: Let a := min{, 8: if ρ a then 9: u n+ = v; 2: else 2: u n+ = u n ; 22: end if 23: Compute {h n+ 24: end for L(v) L(u n ) }; j } J j= using Eqs. (2.3); 2.5 Ergodicity analysis As has been mentioned, an important issue in an adaptive MCMC algorithm is to verify that it has the correct ergodic properties. Directly proving the ergodicity property in the infinite dimensional setting is rather challenging. However, as eventually the algorithm must be implemented in a finite dimensional setting, it is reasonable to consider the ergodic properties of the finite dimensional implementation instead. Namely, we first approximate u with a d-dimensional representation, say z = P d u. In this case, the state space X becomes R d and the prior µ (dz) of z reduces to a d-variate Gaussian distribution over R d. Now we shall perform our ergodicity analysis on this finite dimensional problem. In particular we follow the analysis outlined in [], which requires to make a small modification to the likelihood function (2.2), L S (z) = { Z exp( Φ(z)), u S, u / S. (2.5) Here S = {z R d z 2 2 < R} where R > is a positive constant that can be chosen arbitrarily. The posterior of z becomes dµ d dµ (z) = L S (z).

9 Adaptive preconditioned Crank-Nicolson algorithm 9 We emphasize that modifying the likelihood function is only for the convenience of proof (as the technique employed in [] requires the posterior support to be bounded), and clearly the modified likelihood function well approximates the original one provided a sufficiently large R is chosen. In this setting we have the following theorem indicating the ergodicity of our algorithm: Theorem The chain {z n } generated by Algorithm, with any initial distribution (the distribution of u ) on S, simulates properly the target distribution µ d : for any bounded and µ measurable function f : S R, the equality holds almost surely. lim n n + n f(z i ) = E µd [f(z)], i= We leave the proof in Appendix A. 3 Numerical examples 3. An ODE example Our first example is a simple inverse problem where the forward model is governed by an ordinary differential equation (ODE): x(t) t = u(t)x(t) with a prescribed initial condition. We assume that we observe the solution x(t) several times in the interval [, T ], and we want to infer the unknown coefficient u(t). In our experiments, we let the initial condition be x() = and T =. Now suppose that the solution is measured every T/ time unit from to T and the error in each measurement is assumed to be an independent Gaussian N(,.5 2 ). The data is generated by applying the forward model to a true coefficient u and then adding noise to the result. The data and the truth that is used to generate the data are shown in Fig.. In the inference, 2 equally spaced grid points are used to represent the unknown u. The prior is chosen to be a zero-mean Gaussian measure in X with an exponential covariance function: K(t, t 2 ) = exp( t t 2 /2). We sample the posterior with both the and the A algorithms, each with 6 samples. In the, we choose β = /5 and in the A we choose δ = /4. These parameter values are chosen in a way such that the two algorithms result in reasonable acceptance probabilities. Moreover, in the A algorithm, we choose J = and ɛ = 4. The average acceptance probability of is 28% and that of the A is 3%. First we shall show

10 Z. Hu, Z. Yao and J. Li u(t) x(t) data without noise data with noise t t Fig. (for the ODE example) Left: the true coefficient. Right: the data generated with the true coefficient: blue solid line is the simulated data without observation noise and the red dashed line is the simulated data with observation noise. that the adaptation diminishes as the number of iterations increases. Thus, in Fig. 2, we plot the estimated values of λ and λ as a function of the number of iterations, and we can see from the plots that the values of these two parameters converge as the iterations proceed. Next we shall compare the performance of the two algorithms, and a commonly used performance indicator is the autocorrelation function (ACF). We particularly consider the unknown at t =.2,.5 and.8 and we plot the ACF for all the three points in Fig. 3. One can see from the figure that, for all three points, the ACF of the chain generated by the A decreases much faster than that of the standard, suggesting that the A method achieves a significantly higher efficiency. Alternatively, we compute the ACF of lag at all the grid points, which is plotted in Fig. 4 (left), and we can see that, the ACF of the chain generated by the A is much lower than that of the standard at all the grid points. The effective sample size (ESS) is another popular measure of the sampling efficiency of MCMC [5]. ESS is computed by ESS = N + 2τ, where τ is the integrated autocorrelation time and N is the total sample size, and it gives an estimate of the number of effectively independent draws in the chain. We compute the ESS of the unknown u at each grid point and show the results in Fig. 4 (right). The results show that the A algorithm produces much more effectively independent samples than the standard. 3.2 Estimating the Robin coefficient In this example we consider the one dimensional heat conduction equation in the region x [, L], u t (x, t) = 2 u (x, t), x2 (3.a) u(x, ) = g(x), (3.b)

11 Adaptive preconditioned Crank-Nicolson algorithm λ λ numer of iterations x number of iterations x 5 Fig. 2 (for the ODE example) The estimate of λ (left) and λ plotted as a function of the number of iterations. t =.2 t =.5 t = ACF.6.4 A ACF.6.4 A ACF.6.4 A lag lag lag Fig. 3 (for the ODE example) Autocorrelation functions (ACF) for the and the A methods. Left: ACF of the OMF plotted as a function of lags. Right: the lag ACF for u at each grid point. x 4 4 ACF (lag ) A t ESS 3 2 A t Fig. 4 (for the ODE example) Autocorrelation functions (ACF) for the and the A methods at different grid points: from left to right, at t =.2, t =.5 and t =.8. with the following Robin boundary conditions: u x (, t) + ρ(t)u(, t) = h (t), u x (L, t) + ρ(t)u(l, t) = h (t). (3.c) (3.d) Suppose the functions g(x), h (x) and h (x) are all known, and we want to estimate the unknown Robin coefficient ρ(t) from certain measurements of the temperature u(x, t). The Robin coefficient ρ(t) characterizes thermal proper-

12 2 Z. Hu, Z. Yao and J. Li u(t) x(t) data without noise data with noise t t Fig. 5 (for example ) Autocorrelation functions (ACF) for the and the A methods. Left: ACF of the OMF plotted as a function of lags. Right: the lag ACF for u at each grid point..6.4 λ λ numer of iterations x number of iterations x 5 Fig. 6 (for example ) The estimate of λ (left) and λ plotted as a function of the number of iterations. ties of the conductive medium on the interface which in turn provides information on certain physical processes near the boundary, e.g., corrosion [3]. In this example we choose L =, T = and the functions to be g(x) = x 2 +, h = t(2t + ), h = 2 + t(2t + 2). The solution is measured every T/5 time unit from to T and the error in each measurement is assumed to be an independent Gaussian N(,.5 2 ). The true Robin coefficient and the resulting data are shown in Fig. 5. In the computation, equally spaced grid points are used to represent the unknown. Moreover, the prior is the same as that used in the ODE example. We sample the posterior with both the and the A algorithms, each with 6 samples. In the, we choose β = /4 and in the A we choose δ = 2/. We choose J = and ɛ = 4 in the A algorithm. The average acceptance probability of is 28% and that of the A is 3%. As is in the ODE example, we first plot the estimated values of λ and λ as a function of the number of iterations in Fig. 6, where we can observe the convergence of the two parameters. We then plot the ACF for the unknown at grid points t =.2,.5 and.8, in Fig. 7. Next we compute the ACF of lag at all the grid points, and plot the results in Fig. 8 (left). In all these

13 Adaptive preconditioned Crank-Nicolson algorithm 3 t =.2 t =.5 t = ACF.6 ApCH ACF.6.4 A ACF.6.4 A lag lag lag Fig. 7 (for the Robin example) Autocorrelation functions (ACF) for the and the A methods. Left: ACF of the OMF plotted as a function of lags. Right: the lag ACF for u at each grid point x 4 A ACF (lag ).6.4 A ESS t t Fig. 8 (for example ) Autocorrelation functions (ACF) for the and the A methods. Left: ACF of the OMF plotted as a function of lags. Right: the lag ACF for u at each grid point. ACF plots, we can see that the results of our A algorithm are significantly better than those of the standard method. Finally we compute the ESS of the unknown u at each grid point and show the results in Fig. 8 (right), which once again indicates that the A algorithm outperforms the standard evidently. 4 Conclusions In summary, we consider MCMC simulations for Bayesian inferences in function spaces. In particular, we develop an adaptive version of the algorithm to improve the sampling efficiency. The implementation of the A algorithm is rather simple, without requiring any information of the underlying models, and during the iteration the proposal can be efficiently updated with explicit formulas. We also show that the adaptive algorithm has the correct ergodicity property. Finally we demonstrate the effectiveness and efficiency of the A algorithm with several numerical examples. We expect the A algorithm can be of use in many practical problems, especially in those involving blackbox models. It should be noted that, in the present work, we consider the ergodicity properties of the finite dimensional approximation of the algorithm. It is cer-

14 4 Z. Hu, Z. Yao and J. Li tainly desirable to ensure that the infinite dimensional MCMC algorithm itself has the correct ergodicity properties, which may require certain modifications of the present adaptive algorithm. We plan to work on this problem in the future. A Proof of Theorem Recall that, in the finite dimensional setting, our target distribution µ d is supported by S. Let M (S) denote the set of finite measures on S. The norm on M (S) is the total variation norm. Assume that K n(z, z,, z n 2, z) is the operator K at step n computed from z,, z n 2, z (i.e., z = z n ). For simplicity, let ζ n 2 = (z,, z n 2 ) and K n,ζn 2 (z) = K n(z, z,, z n 2, z). q n,ζn 2 (z; ) is the proposal distribution given by v = I 2 δk n,ζ n 2 (z)l 2δ I + 2 δk n,ζ n 2 (z)l z + I + 2 δk n,ζ n 2 (z)l w, where w N (, K n,ζn 2 (z)). It should be noted that all the operators reduce to matrices in the finite dimensional setting. Then define Q n,ζn 2 (z; dv) = acc(z, v)q n,ζn 2 (z, dv) + δ z(dv)( acc(z, x)q n,ζn 2 (z, dx)) as the transition probability at step n, where δ z( ) is a point mass, and the acceptance probability is acc(z, v) = min{, L S(v) L S (z) }. And define Q n(z, z,, z n 2, z; dv) = Q n,ζn 2 (z; dv), as the transition probability from (z, z,, z n 2, z) to v. Let T be a transition probability on S and set µ T µ 2 T Γ (T ) = sup µ,µ 2 µ µ 2 where the supremum is taken over distinct probability measures µ, µ 2 on S. Now we introduce some new notations. First following [], we use νt to denote the measure A S T (z; A)ν(dz), and for bounded measurable functions we write T f(z) = S T (z; dy)f(y) as well as νf = S f(y)ν(dy). Then we have the following proposition: Proposition 2 The transition probabilities (Q n) satisfy the following three conditions: I. There is a constant γ (, ) such that Γ (Q n,ζn 2 ) γ <, for ζ n 2 S n and n 2. II. There is a fixed positive constant γ 2 such that Q n,ζn 2 Q n+k,ζn+k 2 M(S) M(S) γ 2 k n where n, k and one assumes that ζ n+k 2 is a direct continuation of ζ n 2. III. There is a constant γ 3 such that µ d Q n,ζn 2 µ d γ 3 n, for ζ n 2 S n and n 2.

15 Adaptive preconditioned Crank-Nicolson algorithm 5 Proof and I. Let A n,ζn 2 (z) = I + 2 δk n,ζ n 2 (z)l, B n,ζn 2 (z) = I 2 δk n,ζ n 2 (z)l. Define that, for j =...d, and a n,ζn 2,i(z) = + 2 δλ n,ζ n 2,j(z)α j, b n,ζn 2,j(z) = 2 δλ n,ζ n 2,j(z)α j, where λ n,ζn 2,j(z) is the eigenvalue of K n,ζn 2 (z). Obviously, a n,ζn 2,j(z) and b n,ζn 2,j(z) are the eigenvalues of A n,ζn 2 (z) and B n,ζn 2 (z) respectively. And we know for j =...d, < a n,ζn 2,j(z) < M, and b n,ζn 2,j(z) < M for a positive constant M. According to the proposal, q n,ζn 2 (z; ) = N (A n,ζn 2 (z) B n,ζn 2 (z)z, 2δA n,ζn 2 (z) 2 K n,ζn 2 (z)). Since < a n,ζn 2,j(z) < M and by design, M 2 λ n,ζn 2,j(z) M 3 for some constants M 2, M 3 >, we have M 4 I 2δA n,ζn 2 (z) 2 K n,ζn 2 (z) M 5 I, for some constants M 4, M 5 >. And for any z S, there exists a constant M 6 > such that A n,ζn 2 (z) B n,ζn 2 (z)z 2 = d a n,ζn 2,j(z) 2 b n,ζn 2,j(z) 2 z, e i 2 M 6. i= Thus the density of q n,ζn 2 (z; ) is bounded below on S. Then it is trivial that q n,ζn 2 (z; A) cµ (A) for all z S, all A S, and a constant c >. Then we know that Γ (Q n,ζn 2 ) γ < (c.f. [?]). II. For any given ζ n 2, one has Q n,ζn 2 Q n+k,ζn+k 2 M(S) M(S) 2 sup Q n,ζn 2 (z; A) Q n+k,ζn+k 2 (z; A). z S,A S We then can show that Q n,ζn 2 (z; A) Q n+k,ζn+k 2 (z; A) 2 q n,ζn 2 (z; v) q(v) dv + 2 q(v) q n+k,ζn+k 2 (z; v) dv, R d R d (A.) where q is the Gaussian measure that has the same mean with q n,ζn 2 (z; ) and has the same covariance with q n+k,ζn+k 2 (z; ). Let and I = q n,ζn 2 (z; v) q(v) dv, R d I 2 = R d q(v) q n+k,ζn+k 2 (z; v) dv.

16 6 Z. Hu, Z. Yao and J. Li Let β n,j = 2δa n,ζn 2,j(z) 2 λ n,ζn 2,j(z). Then β n,j are eigenvalues of the covariance of q n,ζn 2 (z, ). It is easy to see that λ n,ζn 2,j(z) λ n+k,ζn+k 2,j(z) M 2 k n, (A.2) for a constant M 2 >, and it follows that β n,j β n+k,j M 22 k n, (A.3) for a constant M 22 >. And obviously, there is a positive constant M 23 such that β n,j, β n+k,j M 23. We first consider I. Actually, d I exp( z2 d j ) exp( z2 j ) dz dz d R d 2πβn,j 2β i= n,j 2πβn+k,j 2β i= n+k,j Thanks to Eq. (A.3), by some elementary calculations, we can show that I M 24 k/n for some constant M 24 >. We now consider I 2. Let Here we have I 2 R d z = A n,ζn 2 (z) B n,ζn 2 (z)z A n,ζn 2 (z) B n,ζn 2 (z)z. d i= Using Eq. (A.2), we have, and exp( (z j z, e j ) 2 ) 2πβn+k,j 2β n+k,j d exp( z2 j ) dz dz d. 2πβn+k,j 2β i= n+k,j a n,ζn 2,j(z n ) a n+k,ζn+k 2,j(z) < M 26 k n b n,ζn 2,j(z n ) b n+k,ζn+k 2,j(z) < M 26 k n, for some constant M 26 >. Thus, we have z, e j = a n+k,ζn+k 2,j(z) b n+k,ζn+k 2,j(z) a n,ζn 2,j(z) b n,ζn 2,j(z) z, e j M 27 k n. and so I 2 M 28 k/n for a constant M 28 >. We thus can come to the conclusion that for some constant γ 2 >. Q n,ζn 2 Q n+k,ζn+k 2 M(s) M(s) γ 2 k n,

17 Adaptive preconditioned Crank-Nicolson algorithm 7 III. Assume that K = K n,ζn 3 (z n 2 ). Define q (z; dv) to be the transition kernel according to (I 2 δk L)v = (I + 2 δk L)z + 2δN (, K ). Let Q (z; dv) = acc(z, v)q (z; dv) + δ z(dv)( q (z; dx)acc(x, z)). It is easy to see that the transition kernel Q satisfies the condition of detailed balance, and thus we have µ d Q = µ d. Since λ n,ζn 3,j(z n 2 ) λ n,ζn 2,j(u) M 3 n for i =...d and a constant M 3 >. Also, there exists M 32, M 33 >, such that M 32 < λ n,ζn 3,j(z n 2 ), λ n,ζn 2,j(z) < M 33. By a similar procedure to that of condition (II), we can obtain, Q n,ζn 2 Q M(s) M(s) M 34 n, for some constant M 34 >. It follows that for some constant γ 3 >. µ d Q n,ζn 2 µ d = µ d (Q n,ζn 2 Q ) γ 3 n, Now we have proved Proposition 2 for our algorithm, and thus Theorem follows immediately from Theorem 2 in [], Finally, it is worth noting that, following the analysis of [25], it may be possible to relax the requirement that the posterior must have a bounded support. Nevertheless, the investigation of unbounded support is not in the scope of the present work. References. Christophe Andrieu and Johannes Thoms, A tutorial on adaptive mcmc, Statistics and Computing, 8 (28), pp Yves Atchade, Gersende Fort, Eric Moulines, and Pierre Priouret, Adaptive markov chain monte carlo: theory and methods, Preprint, (29). 3. Alexandros Beskos, A stable manifold MCMC method for high dimensions, Statistics & Probability Letters, 9 (24), pp Alexandros Beskos, Gareth Roberts, Andrew Stuart, et al., Optimal scalings for local metropolis hastings chains on nonproduct targets in high dimensions, The Annals of Applied Probability, 9 (29), pp Alexandros Beskos, Gareth Roberts, Andrew Stuart, and Jochen Voss, Mcmc methods for diffusion bridges, Stochastics and Dynamics, 8 (28), pp Yuxin Chen, David Keyes, Kody JH Law, and Hatem Ltaief, Accelerated dimensionindependent adaptive metropolis, arxiv preprint arxiv:56.574, (25). 7. Simon L Cotter, Gareth O Roberts, AM Stuart, David White, et al., Mcmc methods for functions: modifying old algorithms to make them faster, Statistical Science, 28 (23), pp Tiangang Cui, Kody JH Law, and Youssef M Marzouk, Dimension-independent likelihood-informed mcmc, arxiv preprint arxiv:4.3688, (24). 9. Giuseppe Da Prato, An introduction to infinite-dimensional analysis, Springer, 26.. Zhe Feng and Jinglai Li, An adaptive independence sampler mcmc algorithm for infinite dimensional bayesian inferences, arxiv preprint arxiv: , (25).. Heikki Haario, Eero Saksman, and Johanna Tamminen, An adaptive metropolis algorithm, Bernoulli, (2), pp Nils Lid Hjort, Chris Holmes, Peter Müller, and Stephen G Walker, Bayesian nonparametrics, vol. 28, Cambridge University Press, Gabriele Inglese, An inverse problem in corrosion detection, Inverse problems, 3 (997), p. 977.

18 8 Z. Hu, Z. Yao and J. Li 4. Jari Kaipio and Erkki Somersalo, Statistical and computational inverse problems, vol. 6, Springer, Robert E. Kass, Bradley P. Carlin, Andrew Gelman, and Radford M. Neal, Markov Chain Monte Carlo in Practice: A Roundtable Discussion, The American Statistician, 52 (998), pp Kody JH Law, Proposals which speed up function-space mcmc, Journal of Computational and Applied Mathematics, 262 (24), pp James Martin, Lucas C Wilcox, Carsten Burstedde, and Omar Ghattas, A stochastic newton mcmc method for large-scale statistical inverse problems with application to seismic inversion, SIAM Journal on Scientific Computing, 34 (22), pp. A46 A Jonathan C Mattingly, Natesh S Pillai, Andrew M Stuart, et al., Diffusion limits of the random walk metropolis algorithm in high dimensions, The Annals of Applied Probability, 22 (22), pp Noemi Petra, James Martin, Georg Stadler, and Omar Ghattas, A computational framework for infinite-dimensional bayesian inverse problems, part ii: Stochastic newton mcmc with application to ice sheet flow inverse problems, SIAM Journal on Scientific Computing, 36 (24), pp. A525 A Frank J Pinski, Gideon Simpson, Andrew M Stuart, and Hendrik Weber, Algorithms for kullback-leibler approximation of probability measures in infinite dimensions, arxiv preprint arxiv:48.92, (24). 2. Gareth O Roberts, Andrew Gelman, Walter R Gilks, et al., Weak convergence and optimal scaling of random walk metropolis algorithms, The annals of applied probability, 7 (997), pp Gareth O Roberts and Jeffrey S Rosenthal, Examples of adaptive mcmc, Journal of Computational and Graphical Statistics, 8 (29), pp Gareth O Roberts, Jeffrey S Rosenthal, et al., Optimal scaling for various metropolis-hastings algorithms, Statistical science, 6 (2), pp Daniel Rudolf and Björn Sprungk, On a generalization of the preconditioned cranknicolson metropolis algorithm, arxiv preprint arxiv:54.346, (25). 25. Eero Saksman, Matti Vihola, et al., On the ergodicity of the adaptive metropolis algorithm on unbounded domains, The Annals of applied probability, 2 (2), pp A. M. Stuart, Inverse problems: a Bayesian perspective, Acta Numerica, 9 (2), pp Sebastian J Vollmer, Dimension-independent mcmc sampling for inverse problems with non-gaussian priors, arxiv preprint arxiv:32.223, (23). 28. Zhewei Yao, Zixi Hu, and Jinglai Li, A tv-gaussian prior for infinite-dimensional bayesian inverse problems and its numerical implementations, arxiv preprint arxiv:5.5239, (25).

Dimension-Independent likelihood-informed (DILI) MCMC

Dimension-Independent likelihood-informed (DILI) MCMC Dimension-Independent likelihood-informed (DILI) MCMC Tiangang Cui, Kody Law 2, Youssef Marzouk Massachusetts Institute of Technology 2 Oak Ridge National Laboratory 2 August 25 TC, KL, YM DILI MCMC USC

More information

Computer Practical: Metropolis-Hastings-based MCMC

Computer Practical: Metropolis-Hastings-based MCMC Computer Practical: Metropolis-Hastings-based MCMC Andrea Arnold and Franz Hamilton North Carolina State University July 30, 2016 A. Arnold / F. Hamilton (NCSU) MH-based MCMC July 30, 2016 1 / 19 Markov

More information

Some Results on the Ergodicity of Adaptive MCMC Algorithms

Some Results on the Ergodicity of Adaptive MCMC Algorithms Some Results on the Ergodicity of Adaptive MCMC Algorithms Omar Khalil Supervisor: Jeffrey Rosenthal September 2, 2011 1 Contents 1 Andrieu-Moulines 4 2 Roberts-Rosenthal 7 3 Atchadé and Fort 8 4 Relationship

More information

Uncertainty quantification for inverse problems with a weak wave-equation constraint

Uncertainty quantification for inverse problems with a weak wave-equation constraint Uncertainty quantification for inverse problems with a weak wave-equation constraint Zhilong Fang*, Curt Da Silva*, Rachel Kuske** and Felix J. Herrmann* *Seismic Laboratory for Imaging and Modeling (SLIM),

More information

Recent Advances in Bayesian Inference for Inverse Problems

Recent Advances in Bayesian Inference for Inverse Problems Recent Advances in Bayesian Inference for Inverse Problems Felix Lucka University College London, UK f.lucka@ucl.ac.uk Applied Inverse Problems Helsinki, May 25, 2015 Bayesian Inference for Inverse Problems

More information

A Dirichlet Form approach to MCMC Optimal Scaling

A Dirichlet Form approach to MCMC Optimal Scaling A Dirichlet Form approach to MCMC Optimal Scaling Giacomo Zanella, Wilfrid S. Kendall, and Mylène Bédard. g.zanella@warwick.ac.uk, w.s.kendall@warwick.ac.uk, mylene.bedard@umontreal.ca Supported by EPSRC

More information

Robust MCMC Sampling with Non-Gaussian and Hierarchical Priors

Robust MCMC Sampling with Non-Gaussian and Hierarchical Priors Division of Engineering & Applied Science Robust MCMC Sampling with Non-Gaussian and Hierarchical Priors IPAM, UCLA, November 14, 2017 Matt Dunlop Victor Chen (Caltech) Omiros Papaspiliopoulos (ICREA,

More information

MCMC Sampling for Bayesian Inference using L1-type Priors

MCMC Sampling for Bayesian Inference using L1-type Priors MÜNSTER MCMC Sampling for Bayesian Inference using L1-type Priors (what I do whenever the ill-posedness of EEG/MEG is just not frustrating enough!) AG Imaging Seminar Felix Lucka 26.06.2012 , MÜNSTER Sampling

More information

Bayesian Methods for Machine Learning

Bayesian Methods for Machine Learning Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),

More information

Markov Chain Monte Carlo methods

Markov Chain Monte Carlo methods Markov Chain Monte Carlo methods By Oleg Makhnin 1 Introduction a b c M = d e f g h i 0 f(x)dx 1.1 Motivation 1.1.1 Just here Supresses numbering 1.1.2 After this 1.2 Literature 2 Method 2.1 New math As

More information

Sequential Monte Carlo Samplers for Applications in High Dimensions

Sequential Monte Carlo Samplers for Applications in High Dimensions Sequential Monte Carlo Samplers for Applications in High Dimensions Alexandros Beskos National University of Singapore KAUST, 26th February 2014 Joint work with: Dan Crisan, Ajay Jasra, Nik Kantas, Alex

More information

An introduction to adaptive MCMC

An introduction to adaptive MCMC An introduction to adaptive MCMC Gareth Roberts MIRAW Day on Monte Carlo methods March 2011 Mainly joint work with Jeff Rosenthal. http://www2.warwick.ac.uk/fac/sci/statistics/crism/ Conferences and workshops

More information

Stochastic Spectral Approaches to Bayesian Inference

Stochastic Spectral Approaches to Bayesian Inference Stochastic Spectral Approaches to Bayesian Inference Prof. Nathan L. Gibson Department of Mathematics Applied Mathematics and Computation Seminar March 4, 2011 Prof. Gibson (OSU) Spectral Approaches to

More information

Markov Chain Monte Carlo (MCMC)

Markov Chain Monte Carlo (MCMC) Markov Chain Monte Carlo (MCMC Dependent Sampling Suppose we wish to sample from a density π, and we can evaluate π as a function but have no means to directly generate a sample. Rejection sampling can

More information

Markov Chain Monte Carlo

Markov Chain Monte Carlo Markov Chain Monte Carlo Recall: To compute the expectation E ( h(y ) ) we use the approximation E(h(Y )) 1 n n h(y ) t=1 with Y (1),..., Y (n) h(y). Thus our aim is to sample Y (1),..., Y (n) from f(y).

More information

Monte Carlo methods for sampling-based Stochastic Optimization

Monte Carlo methods for sampling-based Stochastic Optimization Monte Carlo methods for sampling-based Stochastic Optimization Gersende FORT LTCI CNRS & Telecom ParisTech Paris, France Joint works with B. Jourdain, T. Lelièvre, G. Stoltz from ENPC and E. Kuhn from

More information

Hierarchical Bayesian Inversion

Hierarchical Bayesian Inversion Hierarchical Bayesian Inversion Andrew M Stuart Computing and Mathematical Sciences, Caltech cw/ S. Agapiou, J. Bardsley and O. Papaspiliopoulos SIAM/ASA JUQ 2(2014), pp. 511--544 cw/ M. Dunlop and M.

More information

Markov chain Monte Carlo methods

Markov chain Monte Carlo methods Markov chain Monte Carlo methods Youssef Marzouk Department of Aeronautics and Astronatics Massachusetts Institute of Technology ymarz@mit.edu 22 June 2015 Marzouk (MIT) IMA Summer School 22 June 2015

More information

Computational statistics

Computational statistics Computational statistics Markov Chain Monte Carlo methods Thierry Denœux March 2017 Thierry Denœux Computational statistics March 2017 1 / 71 Contents of this chapter When a target density f can be evaluated

More information

Simultaneous drift conditions for Adaptive Markov Chain Monte Carlo algorithms

Simultaneous drift conditions for Adaptive Markov Chain Monte Carlo algorithms Simultaneous drift conditions for Adaptive Markov Chain Monte Carlo algorithms Yan Bai Feb 2009; Revised Nov 2009 Abstract In the paper, we mainly study ergodicity of adaptive MCMC algorithms. Assume that

More information

16 : Approximate Inference: Markov Chain Monte Carlo

16 : Approximate Inference: Markov Chain Monte Carlo 10-708: Probabilistic Graphical Models 10-708, Spring 2017 16 : Approximate Inference: Markov Chain Monte Carlo Lecturer: Eric P. Xing Scribes: Yuan Yang, Chao-Ming Yen 1 Introduction As the target distribution

More information

Slice Sampling with Adaptive Multivariate Steps: The Shrinking-Rank Method

Slice Sampling with Adaptive Multivariate Steps: The Shrinking-Rank Method Slice Sampling with Adaptive Multivariate Steps: The Shrinking-Rank Method Madeleine B. Thompson Radford M. Neal Abstract The shrinking rank method is a variation of slice sampling that is efficient at

More information

Dereversibilizing Metropolis-Hastings: simple implementation of non-reversible MCMC methods

Dereversibilizing Metropolis-Hastings: simple implementation of non-reversible MCMC methods 1st Symposium on Advances in Approximate Bayesian Inference, 2018 1 6 Dereversibilizing Metropolis-Hastings: simple implementation of non-reversible MCMC methods Florian Maire maire@dms.umontreal.ca Département

More information

Asymptotics for posterior hazards

Asymptotics for posterior hazards Asymptotics for posterior hazards Pierpaolo De Blasi University of Turin 10th August 2007, BNR Workshop, Isaac Newton Intitute, Cambridge, UK Joint work with Giovanni Peccati (Université Paris VI) and

More information

Kernel adaptive Sequential Monte Carlo

Kernel adaptive Sequential Monte Carlo Kernel adaptive Sequential Monte Carlo Ingmar Schuster (Paris Dauphine) Heiko Strathmann (University College London) Brooks Paige (Oxford) Dino Sejdinovic (Oxford) December 7, 2015 1 / 36 Section 1 Outline

More information

Computational Complexity of Metropolis-Hastings Methods in High Dimensions

Computational Complexity of Metropolis-Hastings Methods in High Dimensions Computational Complexity of Metropolis-Hastings Methods in High Dimensions Alexandros Beskos and Andrew Stuart Abstract This article contains an overview of the literature concerning the computational

More information

A Review of Pseudo-Marginal Markov Chain Monte Carlo

A Review of Pseudo-Marginal Markov Chain Monte Carlo A Review of Pseudo-Marginal Markov Chain Monte Carlo Discussed by: Yizhe Zhang October 21, 2016 Outline 1 Overview 2 Paper review 3 experiment 4 conclusion Motivation & overview Notation: θ denotes the

More information

Nonparametric Drift Estimation for Stochastic Differential Equations

Nonparametric Drift Estimation for Stochastic Differential Equations Nonparametric Drift Estimation for Stochastic Differential Equations Gareth Roberts 1 Department of Statistics University of Warwick Brazilian Bayesian meeting, March 2010 Joint work with O. Papaspiliopoulos,

More information

Lecture 8: The Metropolis-Hastings Algorithm

Lecture 8: The Metropolis-Hastings Algorithm 30.10.2008 What we have seen last time: Gibbs sampler Key idea: Generate a Markov chain by updating the component of (X 1,..., X p ) in turn by drawing from the full conditionals: X (t) j Two drawbacks:

More information

Reversible Markov chains

Reversible Markov chains Reversible Markov chains Variational representations and ordering Chris Sherlock Abstract This pedagogical document explains three variational representations that are useful when comparing the efficiencies

More information

Point spread function reconstruction from the image of a sharp edge

Point spread function reconstruction from the image of a sharp edge DOE/NV/5946--49 Point spread function reconstruction from the image of a sharp edge John Bardsley, Kevin Joyce, Aaron Luttman The University of Montana National Security Technologies LLC Montana Uncertainty

More information

Scalable algorithms for optimal experimental design for infinite-dimensional nonlinear Bayesian inverse problems

Scalable algorithms for optimal experimental design for infinite-dimensional nonlinear Bayesian inverse problems Scalable algorithms for optimal experimental design for infinite-dimensional nonlinear Bayesian inverse problems Alen Alexanderian (Math/NC State), Omar Ghattas (ICES/UT-Austin), Noémi Petra (Applied Math/UC

More information

STAT 518 Intro Student Presentation

STAT 518 Intro Student Presentation STAT 518 Intro Student Presentation Wen Wei Loh April 11, 2013 Title of paper Radford M. Neal [1999] Bayesian Statistics, 6: 475-501, 1999 What the paper is about Regression and Classification Flexible

More information

Controlled sequential Monte Carlo

Controlled sequential Monte Carlo Controlled sequential Monte Carlo Jeremy Heng, Department of Statistics, Harvard University Joint work with Adrian Bishop (UTS, CSIRO), George Deligiannidis & Arnaud Doucet (Oxford) Bayesian Computation

More information

MALA versus Random Walk Metropolis Dootika Vats June 4, 2017

MALA versus Random Walk Metropolis Dootika Vats June 4, 2017 MALA versus Random Walk Metropolis Dootika Vats June 4, 2017 Introduction My research thus far has predominantly been on output analysis for Markov chain Monte Carlo. The examples on which I have implemented

More information

Bayesian Methods and Uncertainty Quantification for Nonlinear Inverse Problems

Bayesian Methods and Uncertainty Quantification for Nonlinear Inverse Problems Bayesian Methods and Uncertainty Quantification for Nonlinear Inverse Problems John Bardsley, University of Montana Collaborators: H. Haario, J. Kaipio, M. Laine, Y. Marzouk, A. Seppänen, A. Solonen, Z.

More information

The University of Auckland Applied Mathematics Bayesian Methods for Inverse Problems : why and how Colin Fox Tiangang Cui, Mike O Sullivan (Auckland),

The University of Auckland Applied Mathematics Bayesian Methods for Inverse Problems : why and how Colin Fox Tiangang Cui, Mike O Sullivan (Auckland), The University of Auckland Applied Mathematics Bayesian Methods for Inverse Problems : why and how Colin Fox Tiangang Cui, Mike O Sullivan (Auckland), Geoff Nicholls (Statistics, Oxford) fox@math.auckland.ac.nz

More information

Optimizing and Adapting the Metropolis Algorithm

Optimizing and Adapting the Metropolis Algorithm 6 Optimizing and Adapting the Metropolis Algorithm Jeffrey S. Rosenthal University of Toronto, Toronto, ON 6.1 Introduction Many modern scientific questions involve high-dimensional data and complicated

More information

Brownian Motion. 1 Definition Brownian Motion Wiener measure... 3

Brownian Motion. 1 Definition Brownian Motion Wiener measure... 3 Brownian Motion Contents 1 Definition 2 1.1 Brownian Motion................................. 2 1.2 Wiener measure.................................. 3 2 Construction 4 2.1 Gaussian process.................................

More information

Practical unbiased Monte Carlo for Uncertainty Quantification

Practical unbiased Monte Carlo for Uncertainty Quantification Practical unbiased Monte Carlo for Uncertainty Quantification Sergios Agapiou Department of Statistics, University of Warwick MiR@W day: Uncertainty in Complex Computer Models, 2nd February 2015, University

More information

Exact Simulation of Diffusions and Jump Diffusions

Exact Simulation of Diffusions and Jump Diffusions Exact Simulation of Diffusions and Jump Diffusions A work by: Prof. Gareth O. Roberts Dr. Alexandros Beskos Dr. Omiros Papaspiliopoulos Dr. Bruno Casella 28 th May, 2008 Content 1 Exact Algorithm Construction

More information

ON CONVERGENCE RATES OF GIBBS SAMPLERS FOR UNIFORM DISTRIBUTIONS

ON CONVERGENCE RATES OF GIBBS SAMPLERS FOR UNIFORM DISTRIBUTIONS The Annals of Applied Probability 1998, Vol. 8, No. 4, 1291 1302 ON CONVERGENCE RATES OF GIBBS SAMPLERS FOR UNIFORM DISTRIBUTIONS By Gareth O. Roberts 1 and Jeffrey S. Rosenthal 2 University of Cambridge

More information

Stat 451 Lecture Notes Markov Chain Monte Carlo. Ryan Martin UIC

Stat 451 Lecture Notes Markov Chain Monte Carlo. Ryan Martin UIC Stat 451 Lecture Notes 07 12 Markov Chain Monte Carlo Ryan Martin UIC www.math.uic.edu/~rgmartin 1 Based on Chapters 8 9 in Givens & Hoeting, Chapters 25 27 in Lange 2 Updated: April 4, 2016 1 / 42 Outline

More information

Deblurring Jupiter (sampling in GLIP faster than regularized inversion) Colin Fox Richard A. Norton, J.

Deblurring Jupiter (sampling in GLIP faster than regularized inversion) Colin Fox Richard A. Norton, J. Deblurring Jupiter (sampling in GLIP faster than regularized inversion) Colin Fox fox@physics.otago.ac.nz Richard A. Norton, J. Andrés Christen Topics... Backstory (?) Sampling in linear-gaussian hierarchical

More information

Bagging During Markov Chain Monte Carlo for Smoother Predictions

Bagging During Markov Chain Monte Carlo for Smoother Predictions Bagging During Markov Chain Monte Carlo for Smoother Predictions Herbert K. H. Lee University of California, Santa Cruz Abstract: Making good predictions from noisy data is a challenging problem. Methods

More information

Hastings-within-Gibbs Algorithm: Introduction and Application on Hierarchical Model

Hastings-within-Gibbs Algorithm: Introduction and Application on Hierarchical Model UNIVERSITY OF TEXAS AT SAN ANTONIO Hastings-within-Gibbs Algorithm: Introduction and Application on Hierarchical Model Liang Jing April 2010 1 1 ABSTRACT In this paper, common MCMC algorithms are introduced

More information

A quick introduction to Markov chains and Markov chain Monte Carlo (revised version)

A quick introduction to Markov chains and Markov chain Monte Carlo (revised version) A quick introduction to Markov chains and Markov chain Monte Carlo (revised version) Rasmus Waagepetersen Institute of Mathematical Sciences Aalborg University 1 Introduction These notes are intended to

More information

Bayesian data analysis in practice: Three simple examples

Bayesian data analysis in practice: Three simple examples Bayesian data analysis in practice: Three simple examples Martin P. Tingley Introduction These notes cover three examples I presented at Climatea on 5 October 0. Matlab code is available by request to

More information

Bayesian Estimation of DSGE Models 1 Chapter 3: A Crash Course in Bayesian Inference

Bayesian Estimation of DSGE Models 1 Chapter 3: A Crash Course in Bayesian Inference 1 The views expressed in this paper are those of the authors and do not necessarily reflect the views of the Federal Reserve Board of Governors or the Federal Reserve System. Bayesian Estimation of DSGE

More information

Adaptive Posterior Approximation within MCMC

Adaptive Posterior Approximation within MCMC Adaptive Posterior Approximation within MCMC Tiangang Cui (MIT) Colin Fox (University of Otago) Mike O Sullivan (University of Auckland) Youssef Marzouk (MIT) Karen Willcox (MIT) 06/January/2012 C, F,

More information

MARKOV CHAIN MONTE CARLO

MARKOV CHAIN MONTE CARLO MARKOV CHAIN MONTE CARLO RYAN WANG Abstract. This paper gives a brief introduction to Markov Chain Monte Carlo methods, which offer a general framework for calculating difficult integrals. We start with

More information

Pattern Recognition and Machine Learning. Bishop Chapter 11: Sampling Methods

Pattern Recognition and Machine Learning. Bishop Chapter 11: Sampling Methods Pattern Recognition and Machine Learning Chapter 11: Sampling Methods Elise Arnaud Jakob Verbeek May 22, 2008 Outline of the chapter 11.1 Basic Sampling Algorithms 11.2 Markov Chain Monte Carlo 11.3 Gibbs

More information

Adaptive Markov Chain Monte Carlo: Theory and Methods

Adaptive Markov Chain Monte Carlo: Theory and Methods Chapter Adaptive Markov Chain Monte Carlo: Theory and Methods Yves Atchadé, Gersende Fort and Eric Moulines 2, Pierre Priouret 3. Introduction Markov chain Monte Carlo (MCMC methods allow to generate samples

More information

Transport maps and dimension reduction for Bayesian computation Youssef Marzouk

Transport maps and dimension reduction for Bayesian computation Youssef Marzouk Transport maps and dimension reduction for Bayesian computation Youssef Marzouk Massachusetts Institute of Technology Department of Aeronautics & Astronautics Center for Computational Engineering http://uqgroup.mit.edu

More information

MCMC for big data. Geir Storvik. BigInsight lunch - May Geir Storvik MCMC for big data BigInsight lunch - May / 17

MCMC for big data. Geir Storvik. BigInsight lunch - May Geir Storvik MCMC for big data BigInsight lunch - May / 17 MCMC for big data Geir Storvik BigInsight lunch - May 2 2018 Geir Storvik MCMC for big data BigInsight lunch - May 2 2018 1 / 17 Outline Why ordinary MCMC is not scalable Different approaches for making

More information

Sampling Methods (11/30/04)

Sampling Methods (11/30/04) CS281A/Stat241A: Statistical Learning Theory Sampling Methods (11/30/04) Lecturer: Michael I. Jordan Scribe: Jaspal S. Sandhu 1 Gibbs Sampling Figure 1: Undirected and directed graphs, respectively, with

More information

Recent Advances in Regional Adaptation for MCMC

Recent Advances in Regional Adaptation for MCMC Recent Advances in Regional Adaptation for MCMC Radu Craiu Department of Statistics University of Toronto Collaborators: Yan Bai (Statistics, Toronto) Antonio Fabio di Narzo (Statistics, Bologna) Jeffrey

More information

A SCALED STOCHASTIC NEWTON ALGORITHM FOR MARKOV CHAIN MONTE CARLO SIMULATIONS

A SCALED STOCHASTIC NEWTON ALGORITHM FOR MARKOV CHAIN MONTE CARLO SIMULATIONS A SCALED STOCHASTIC NEWTON ALGORITHM FOR MARKOV CHAIN MONTE CARLO SIMULATIONS TAN BUI-THANH AND OMAR GHATTAS Abstract. We propose a scaled stochastic Newton algorithm ssn) for local Metropolis-Hastings

More information

Quantitative Non-Geometric Convergence Bounds for Independence Samplers

Quantitative Non-Geometric Convergence Bounds for Independence Samplers Quantitative Non-Geometric Convergence Bounds for Independence Samplers by Gareth O. Roberts * and Jeffrey S. Rosenthal ** (September 28; revised July 29.) 1. Introduction. Markov chain Monte Carlo (MCMC)

More information

Sequential Monte Carlo Methods in High Dimensions

Sequential Monte Carlo Methods in High Dimensions Sequential Monte Carlo Methods in High Dimensions Alexandros Beskos Statistical Science, UCL Oxford, 24th September 2012 Joint work with: Dan Crisan, Ajay Jasra, Nik Kantas, Andrew Stuart Imperial College,

More information

Zig-Zag Monte Carlo. Delft University of Technology. Joris Bierkens February 7, 2017

Zig-Zag Monte Carlo. Delft University of Technology. Joris Bierkens February 7, 2017 Zig-Zag Monte Carlo Delft University of Technology Joris Bierkens February 7, 2017 Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, 2017 1 / 33 Acknowledgements Collaborators Andrew Duncan Paul

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 7 Approximate

More information

Convergence of the Ensemble Kalman Filter in Hilbert Space

Convergence of the Ensemble Kalman Filter in Hilbert Space Convergence of the Ensemble Kalman Filter in Hilbert Space Jan Mandel Center for Computational Mathematics Department of Mathematical and Statistical Sciences University of Colorado Denver Parts based

More information

Spectral properties of Markov operators in Markov chain Monte Carlo

Spectral properties of Markov operators in Markov chain Monte Carlo Spectral properties of Markov operators in Markov chain Monte Carlo Qian Qin Advisor: James P. Hobert October 2017 1 Introduction Markov chain Monte Carlo (MCMC) is an indispensable tool in Bayesian statistics.

More information

arxiv: v1 [math.st] 26 Mar 2012

arxiv: v1 [math.st] 26 Mar 2012 POSTERIOR CONSISTENCY OF THE BAYESIAN APPROACH TO LINEAR ILL-POSED INVERSE PROBLEMS arxiv:103.5753v1 [math.st] 6 Mar 01 By Sergios Agapiou Stig Larsson and Andrew M. Stuart University of Warwick and Chalmers

More information

Web Appendix for Hierarchical Adaptive Regression Kernels for Regression with Functional Predictors by D. B. Woodard, C. Crainiceanu, and D.

Web Appendix for Hierarchical Adaptive Regression Kernels for Regression with Functional Predictors by D. B. Woodard, C. Crainiceanu, and D. Web Appendix for Hierarchical Adaptive Regression Kernels for Regression with Functional Predictors by D. B. Woodard, C. Crainiceanu, and D. Ruppert A. EMPIRICAL ESTIMATE OF THE KERNEL MIXTURE Here we

More information

Kernel Sequential Monte Carlo

Kernel Sequential Monte Carlo Kernel Sequential Monte Carlo Ingmar Schuster (Paris Dauphine) Heiko Strathmann (University College London) Brooks Paige (Oxford) Dino Sejdinovic (Oxford) * equal contribution April 25, 2016 1 / 37 Section

More information

Sequential Monte Carlo Methods for High-Dimensional Inverse Problems: A case study for the Navier-Stokes equations

Sequential Monte Carlo Methods for High-Dimensional Inverse Problems: A case study for the Navier-Stokes equations SIAM/ASA J. UNCERTAINTY QUANTIFICATION Vol. xx, pp. x c xxxx Society for Industrial and Applied Mathematics x x Sequential Monte Carlo Methods for High-Dimensional Inverse Problems: A case study for the

More information

Introduction to Machine Learning CMU-10701

Introduction to Machine Learning CMU-10701 Introduction to Machine Learning CMU-10701 Markov Chain Monte Carlo Methods Barnabás Póczos & Aarti Singh Contents Markov Chain Monte Carlo Methods Goal & Motivation Sampling Rejection Importance Markov

More information

arxiv: v1 [stat.co] 2 Nov 2017

arxiv: v1 [stat.co] 2 Nov 2017 Binary Bouncy Particle Sampler arxiv:1711.922v1 [stat.co] 2 Nov 217 Ari Pakman Department of Statistics Center for Theoretical Neuroscience Grossman Center for the Statistics of Mind Columbia University

More information

List of projects. FMS020F NAMS002 Statistical inference for partially observed stochastic processes, 2016

List of projects. FMS020F NAMS002 Statistical inference for partially observed stochastic processes, 2016 List of projects FMS020F NAMS002 Statistical inference for partially observed stochastic processes, 206 Work in groups of two (if this is absolutely not possible for some reason, please let the lecturers

More information

Markov chain Monte Carlo methods in atmospheric remote sensing

Markov chain Monte Carlo methods in atmospheric remote sensing 1 / 45 Markov chain Monte Carlo methods in atmospheric remote sensing Johanna Tamminen johanna.tamminen@fmi.fi ESA Summer School on Earth System Monitoring and Modeling July 3 Aug 11, 212, Frascati July,

More information

A Note on Auxiliary Particle Filters

A Note on Auxiliary Particle Filters A Note on Auxiliary Particle Filters Adam M. Johansen a,, Arnaud Doucet b a Department of Mathematics, University of Bristol, UK b Departments of Statistics & Computer Science, University of British Columbia,

More information

The Particle Filter. PD Dr. Rudolph Triebel Computer Vision Group. Machine Learning for Computer Vision

The Particle Filter. PD Dr. Rudolph Triebel Computer Vision Group. Machine Learning for Computer Vision The Particle Filter Non-parametric implementation of Bayes filter Represents the belief (posterior) random state samples. by a set of This representation is approximate. Can represent distributions that

More information

Stat 535 C - Statistical Computing & Monte Carlo Methods. Lecture 15-7th March Arnaud Doucet

Stat 535 C - Statistical Computing & Monte Carlo Methods. Lecture 15-7th March Arnaud Doucet Stat 535 C - Statistical Computing & Monte Carlo Methods Lecture 15-7th March 2006 Arnaud Doucet Email: arnaud@cs.ubc.ca 1 1.1 Outline Mixture and composition of kernels. Hybrid algorithms. Examples Overview

More information

17 : Markov Chain Monte Carlo

17 : Markov Chain Monte Carlo 10-708: Probabilistic Graphical Models, Spring 2015 17 : Markov Chain Monte Carlo Lecturer: Eric P. Xing Scribes: Heran Lin, Bin Deng, Yun Huang 1 Review of Monte Carlo Methods 1.1 Overview Monte Carlo

More information

Bayesian Nonparametric Regression for Diabetes Deaths

Bayesian Nonparametric Regression for Diabetes Deaths Bayesian Nonparametric Regression for Diabetes Deaths Brian M. Hartman PhD Student, 2010 Texas A&M University College Station, TX, USA David B. Dahl Assistant Professor Texas A&M University College Station,

More information

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008 Gaussian processes Chuong B Do (updated by Honglak Lee) November 22, 2008 Many of the classical machine learning algorithms that we talked about during the first half of this course fit the following pattern:

More information

On some weighted fractional porous media equations

On some weighted fractional porous media equations On some weighted fractional porous media equations Gabriele Grillo Politecnico di Milano September 16 th, 2015 Anacapri Joint works with M. Muratori and F. Punzo Gabriele Grillo Weighted Fractional PME

More information

Bayesian Dynamic Linear Modelling for. Complex Computer Models

Bayesian Dynamic Linear Modelling for. Complex Computer Models Bayesian Dynamic Linear Modelling for Complex Computer Models Fei Liu, Liang Zhang, Mike West Abstract Computer models may have functional outputs. With no loss of generality, we assume that a single computer

More information

The Metropolis-Hastings Algorithm. June 8, 2012

The Metropolis-Hastings Algorithm. June 8, 2012 The Metropolis-Hastings Algorithm June 8, 22 The Plan. Understand what a simulated distribution is 2. Understand why the Metropolis-Hastings algorithm works 3. Learn how to apply the Metropolis-Hastings

More information

Introduction to Bayesian methods in inverse problems

Introduction to Bayesian methods in inverse problems Introduction to Bayesian methods in inverse problems Ville Kolehmainen 1 1 Department of Applied Physics, University of Eastern Finland, Kuopio, Finland March 4 2013 Manchester, UK. Contents Introduction

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning MCMC and Non-Parametric Bayes Mark Schmidt University of British Columbia Winter 2016 Admin I went through project proposals: Some of you got a message on Piazza. No news is

More information

A regeneration proof of the central limit theorem for uniformly ergodic Markov chains

A regeneration proof of the central limit theorem for uniformly ergodic Markov chains A regeneration proof of the central limit theorem for uniformly ergodic Markov chains By AJAY JASRA Department of Mathematics, Imperial College London, SW7 2AZ, London, UK and CHAO YANG Department of Mathematics,

More information

Scalable Algorithms for Optimal Control of Systems Governed by PDEs Under Uncertainty

Scalable Algorithms for Optimal Control of Systems Governed by PDEs Under Uncertainty Scalable Algorithms for Optimal Control of Systems Governed by PDEs Under Uncertainty Alen Alexanderian 1, Omar Ghattas 2, Noémi Petra 3, Georg Stadler 4 1 Department of Mathematics North Carolina State

More information

University of Toronto Department of Statistics

University of Toronto Department of Statistics Norm Comparisons for Data Augmentation by James P. Hobert Department of Statistics University of Florida and Jeffrey S. Rosenthal Department of Statistics University of Toronto Technical Report No. 0704

More information

Efficient MCMC Sampling for Hierarchical Bayesian Inverse Problems

Efficient MCMC Sampling for Hierarchical Bayesian Inverse Problems Efficient MCMC Sampling for Hierarchical Bayesian Inverse Problems Andrew Brown 1,2, Arvind Saibaba 3, Sarah Vallélian 2,3 CCNS Transition Workshop SAMSI May 5, 2016 Supported by SAMSI Visiting Research

More information

A Concise Course on Stochastic Partial Differential Equations

A Concise Course on Stochastic Partial Differential Equations A Concise Course on Stochastic Partial Differential Equations Michael Röckner Reference: C. Prevot, M. Röckner: Springer LN in Math. 1905, Berlin (2007) And see the references therein for the original

More information

Multimodal Nested Sampling

Multimodal Nested Sampling Multimodal Nested Sampling Farhan Feroz Astrophysics Group, Cavendish Lab, Cambridge Inverse Problems & Cosmology Most obvious example: standard CMB data analysis pipeline But many others: object detection,

More information

Slice Sampling Mixture Models

Slice Sampling Mixture Models Slice Sampling Mixture Models Maria Kalli, Jim E. Griffin & Stephen G. Walker Centre for Health Services Studies, University of Kent Institute of Mathematics, Statistics & Actuarial Science, University

More information

Adaptive Metropolis with Online Relabeling

Adaptive Metropolis with Online Relabeling Adaptive Metropolis with Online Relabeling Anonymous Unknown Abstract We propose a novel adaptive MCMC algorithm named AMOR (Adaptive Metropolis with Online Relabeling) for efficiently simulating from

More information

Adaptive Rejection Sampling with fixed number of nodes

Adaptive Rejection Sampling with fixed number of nodes Adaptive Rejection Sampling with fixed number of nodes L. Martino, F. Louzada Institute of Mathematical Sciences and Computing, Universidade de São Paulo, São Carlos (São Paulo). Abstract The adaptive

More information

Examples of Adaptive MCMC

Examples of Adaptive MCMC Examples of Adaptive MCMC by Gareth O. Roberts * and Jeffrey S. Rosenthal ** (September, 2006.) Abstract. We investigate the use of adaptive MCMC algorithms to automatically tune the Markov chain parameters

More information

Kernel Adaptive Metropolis-Hastings

Kernel Adaptive Metropolis-Hastings Kernel Adaptive Metropolis-Hastings Arthur Gretton,?? Gatsby Unit, CSML, University College London NIPS, December 2015 Arthur Gretton (Gatsby Unit, UCL) Kernel Adaptive Metropolis-Hastings 12/12/2015 1

More information

Supplement to A Hierarchical Approach for Fitting Curves to Response Time Measurements

Supplement to A Hierarchical Approach for Fitting Curves to Response Time Measurements Supplement to A Hierarchical Approach for Fitting Curves to Response Time Measurements Jeffrey N. Rouder Francis Tuerlinckx Paul L. Speckman Jun Lu & Pablo Gomez May 4 008 1 The Weibull regression model

More information

ICES REPORT March Tan Bui-Thanh And Mark Andrew Girolami

ICES REPORT March Tan Bui-Thanh And Mark Andrew Girolami ICES REPORT 4- March 4 Solving Large-Scale Pde-Constrained Bayesian Inverse Problems With Riemann Manifold Hamiltonian Monte Carlo by Tan Bui-Thanh And Mark Andrew Girolami The Institute for Computational

More information

Partially Collapsed Gibbs Samplers: Theory and Methods. Ever increasing computational power along with ever more sophisticated statistical computing

Partially Collapsed Gibbs Samplers: Theory and Methods. Ever increasing computational power along with ever more sophisticated statistical computing Partially Collapsed Gibbs Samplers: Theory and Methods David A. van Dyk 1 and Taeyoung Park Ever increasing computational power along with ever more sophisticated statistical computing techniques is making

More information

Lecture 4: Dynamic models

Lecture 4: Dynamic models linear s Lecture 4: s Hedibert Freitas Lopes The University of Chicago Booth School of Business 5807 South Woodlawn Avenue, Chicago, IL 60637 http://faculty.chicagobooth.edu/hedibert.lopes hlopes@chicagobooth.edu

More information

On Reparametrization and the Gibbs Sampler

On Reparametrization and the Gibbs Sampler On Reparametrization and the Gibbs Sampler Jorge Carlos Román Department of Mathematics Vanderbilt University James P. Hobert Department of Statistics University of Florida March 2014 Brett Presnell Department

More information

Bayesian Inverse problem, Data assimilation and Localization

Bayesian Inverse problem, Data assimilation and Localization Bayesian Inverse problem, Data assimilation and Localization Xin T Tong National University of Singapore ICIP, Singapore 2018 X.Tong Localization 1 / 37 Content What is Bayesian inverse problem? What is

More information