On an adaptive preconditioned Crank-Nicolson algorithm for infinite dimensional Bayesian inferences
|
|
- Melvin Hawkins
- 5 years ago
- Views:
Transcription
1 Noname manuscript No. (will be inserted by the editor) On an adaptive preconditioned Crank-Nicolson algorithm for infinite dimensional Bayesian inferences Zixi Hu Zhewei Yao Jinglai Li Received: date / Accepted: date Abstract The preconditioned Crank-Nicolson () method is a MCMC algorithm for implementing the Bayesian inferences in function spaces. A remarkable feature of the algorithm is that, unlike many usual MCMC algorithms, which become arbitrary slow under the mesh refinement, the efficiency of the algorithm is dimension independent. In this work we develop an adaptive version of the algorithm, where the proposal is adaptively improved based on the sample history. Under the chosen parametrization of the proposal distribution, the proposal parameters can be efficiently updated in our algorithm. We show that the resulting adaptive algorithm is dimension independent and has the correct ergodicity properties. Finally we provide numerical examples to demonstrate the efficiency of the proposed algorithm. Keywords Bayesian inference covariance operator dimension independence Markov Chain Monte Carlo. Mathematics Subject Classification (2) 62F5 65C5 Introduction Many scientific problems, such as nonparametric regression [2], and inverse problems [4, 26], require to perform Bayesian inferences in function spaces. In This work was supported by the NSFC under grant number ZH and ZY contribute equally to the work. Z. Hu Z. Yao Department of Mathematics and Zhiyuan College, Shanghai Jiao Tong University, 8 Dongchuan Rd, Shanghai 224, China. J. Li Institute of Natural Sciences, Department of Mathematics, and the MOE Key Laboratory of Scientific and Engineering Computing, Shanghai Jiao Tong University, 8 Dongchuan Rd, Shanghai 224, China. jinglaili@sjtu.edu.cn
2 2 Z. Hu, Z. Yao and J. Li practice, often the posterior distributions do not admit a closed form and need to be computed numerically. Specifically one first represents the unknown function with a finite-dimensional parametrization, for example, by discretizing the function on a pre-determined mesh grid, and then solves the resulting finite dimensional inference problem with the Markov Chain Monte Carlo (MCMC) simulations. It has been known that standard MCMC algorithms, such as the random walk Metropolis-Hastings (RWMH), can become arbitrarily slow as the discretization mesh of the unknown is refined [2,23,4,8]. That is, the mixing time of an algorithm can increase to infinity as the dimension of the discretized parameter approaches to infinity, and in this case the algorithm is said to be dimension-dependent. To this end, a very interesting line of research is to develop dimension-independent MCMC algorithms by requiring the algorithms to be well-defined in the function spaces. In particular, a family of dimension-independent MCMC algorithms were presented in [7] by constructing a Crank-Nicolson discretization of a stochastic partial differential equation (SPDE) that preserves the reference measure. Just like the finite dimensional problems, one can improve the sampling efficiency of the infinite dimensional MCMC by incorporating the data information in the proposal design. To this end, a very popular class of methods guide the proposal with the local derivative information of the likelihood function. Such derivative based methods include: the stochastic Newton MCMC [7, 9], the operator-weighted proposal method [6], the infinitedimensional Metropolis-adjusted Langevin algorithm (MALA) [5, 3], the dimensionindependent likelihood-informed (DILI) MCMC [8], and the generalized preconditioned CN (g) algorithm [24], just to name a few. In this work, we focus on an alternative type of methods to utilize the data information, i.e., the adaptive MCMC (c.f. [, 2, 22] and the references therein), which adjust the proposal based on the sample history. A major advantage of the adaptive methods is that they do not require the knowledge of the gradient, which makes them particularly convenient for problems with black-box models. In a recent work [], we develop an adaptive independence sampler MCMC algorithm for the infinite dimensional problems. A major limitation of independence sampler MCMC algorithms is that the efficiency of such algorithms depends critically on the ability of the chosen proposal, often in a parametrized form, to approximate the posterior in the entire state space, and the algorithm may perform very poorly if the proposal can not well approximate the posterior distribution. In this respect, random walk based algorithms may be advantageous as they do not require such a global proposal. In this work, we present an adaptive random walk MCMC based on the preconditioned Crank-Nicolson () algorithm in [7]. Specifically, we adaptively adjust the preconditioning operator in the algorithm to improve the sampling efficiency. We parametrize the preconditioning operator in a specific form that has been used in [2,], and we provide an algorithm that can efficiently update the parameter values as the iteration proceeds. By design, the acceptance probability is well defined and thus the algorithm is dimension independent. In addition, our algorithm ensures that acceptance probability
3 Adaptive preconditioned Crank-Nicolson algorithm 3 is the same as that in the standard algorithm, which is independent of the proposal distribution. Finally we note that an important issue in designing an adaptive MCMC algorithm is to preserve the ergodicity while allowing the proposal distribution to vary during the iterations. Following the roadmap outlined in [], we provide some theoretical results regarding the ergodicity of the proposed algorithm. We note that two methods that are similar to our works are the g in [24], and the dimension independent adaptive Metropolis (DIAM) proposed in [6]. Compared to the g method, our algorithm utilizes a specific parametrized form of the proposal and as a result the parameters can be updated very efficiently, which makes an adaptive algorithm feasible. The DIAM is also an adaptive MCMC algorithm, and the major difference between it and our method is that, by design, our method preserves an important feature of the standard algorithm, i.e., the acceptance probability being independent on the proposal distribution. It should also be noted that, our algorithm is specifically designed for Gaussian priors, and there are works concerning MCMC algorithms for non-gaussian priors [27, 28]. The rest of the paper is organized as the following. In section 2 we describe the setup of infinite dimensional inference problems and present our adaptive MCMC algorithm in detail. In section 3 we provide several numerical examples to demonstrate the performance of the proposed algorithm. Finally we offer some concluding remarks in section 4. 2 The adaptive preconditioned Crank-Nicolson algorithm 2. Problem setup We present the standard setup of the problem following [26]. We consider a separable Hilbert space X with inner product, X. Our goal is to estimate the unknown u X from data y Y where Y is the data space and y is related to u via a likelihood function L y (u). In the Bayesian inference we assume that the prior µ of u, is a (without loss of generality) zero-mean Gaussian measure defined on X with covariance operator C, i.e. µ = N(, C ). Note that C is symmetric positive and of trace class. The range of C 2, E = {u = C 2 x x X} X, which is a Hilbert space equipped with inner product [9],, E = C 2, C 2 X, is called the Cameron-Martin space of measure µ. In this setting, the posterior measure µ y of u conditional on data y is provided by the Radon-Nikodym derivative: dµ y dµ (u) = L y (x), (2.)
4 4 Z. Hu, Z. Yao and J. Li which can be interpreted as the Bayes rule in the infinite dimensional setting. In a standard setting, the likelihood function takes form of L y (u) = Z exp( Φy (u)), (2.2) where Z is a normalization constant. In what follows, without causing any ambiguity, we shall drop the superscript y in Φ y, L y and µ y for simplicity, while keeping in mind that these functions depends on the data y. For the inference problem to be well-posed, one typically requires the functional Φ to satisfy the Assumptions (6.) in [7]. Finally we quote the following lemma ([9], Chapter ), which will be useful later: Lemma There exists a complete orthonormal basis {e j } j N on X and a sequence of non-negative numbers {α j } j N such that C e j = α j e j and j= α j <, i.e., {e j } k N and {α j } k N being the eigenfunctions and eigenvalues of C respectively. 2.2 The Crank-Nicolson algorithms We start by briefly reviewing the family of Crank-Nicolson algorithms for infinite dimensional Bayesian inferences, developed in [7]. Simply speaking the algorithms are based on the stochastic partial differential equation (SPDE) du ds = KLu + 2K db ds, (2.3) where L = C is the precision operator for µ, K is a positive operator, and b is a Brownian motion in X with covariance operator the identity. The proposal is then derived by applying the Crank-Nicolson (CN) scheme to the SPDE (2.3), yielding, v = u 2 δkl(u + v) + 2K δξ, (2.4) for a white noise ξ and δ (, 2). In [7], two choices of K are proposed, resulting in two different algorithms. First, one can choose K = I, the identity, obtaining: (2C + δi)v = (2C δi)u + 8δw, where w N (, C ), which is known as the plain Crank-Nicolson (CN) algorithm. Alternatively one can choose K = C, resulting in the preconditioned Crank-Nicolson () proposal: where v = ( β 2 ) 2 u + βw, (2.5) β = 8δ 2 + δ. It is easy to see that β [, ]. In both CN and algorithms, the acceptance probability is a(v, u) = min{, L(v) }. (2.6) L(u)
5 Adaptive preconditioned Crank-Nicolson algorithm Parametrizing the operator K A natural extension of the CN and algorithms (which is also proposed in [7]) is to consider other choices of the operator K to improve the algorithm efficiency. To this end, we first rewrite the proposal Eq. (2.4) as v = (I 2 δkl) 2δK (I + 2 δkl)u + (I + 2 δkl)ξ (2.7) Before discussing specific choices of the operator K, we present the following proposition regarding the acceptance probability: Proposition Suppose operator K is symmetric positive and of trace class. Let q(u, ) be the proposal distribution associated to Eq. (2.7). Define measures η(du, dv) = q(u, dv)µ(du) and η (du, dv) = q(v, du)µ(dv) on X X. If K commutes with C, η is absolutely continuous with respect to η, and dη L(v) (u, v) = dη L(u). Proof Define η (u, v) = q(u, dv)µ (u). The measure η is Gaussian. From K and C are commutable, we have E η v v = (I + 2 δkl) 2 (I 2 δkl)2 C + (I + 2 δkl) 2 2δK = C = E η u u. Then η is symmetric in u, v. Now η(du, dv) = q(u, dv)µ(du), η (du, dv) = q(u, dv)µ (du) and µ,µ are equivalent. It follows that η and η are equivalent and dη dη (u, v) = dµ dµ (u) = L(u). Since η is symmetric in u, v we also have that η and η are equivalent and that dη dη (u, v) = L(v). Since equivalence of measures is transitive, it follows that η and η are equivalent and dη L(v) (u, v) = dη L(u).
6 6 Z. Hu, Z. Yao and J. Li Now we discuss how to specify the operator K, and we start with assuming K an appropriate parametrized form. Note that an essential condition in Proposition is that K must commute with C. To satisfy this condition, it is convenient to design a K that has common eigenfunctions with C. Namely, we write K in the form of where H is defined as K = C + H, (2.8a) H = J h j e j, e j, j= (2.8b) with h j being coefficients. Here J is prescribed positive integer that is either smaller or equal to the dimensionality of the problem. It is easy to see that K is a symmetric operator with eigenvalue-eigenfunction pair {λ j, e j } j=, where λ j = α j + h j for j =...J and λ j = α j for j = J +..., which implies that K and C commute. 2.4 The adaptive algorithm A well-adopted rule in designing efficient MCMC algorithms is that the proposal covariance should be close to the covariance operator of the posterior [23, ]. Next we give a heuristic argument for our method to determine the operator K. In the case of small δ, the proposal (2.7) is approximately equal to v u + 2δw, where w N (, K ), which implies that K provides an approximation to the proposal covariance in this case. Thus we shall require K to be close to the posterior covariance. Note that such an approximation is only valid for small δ in principle, and thus we recommend not to use very large δ in the proposed algorithm. Now suppose the posterior covariance is C, and one can determine K by min K C, (2.9) {h j} J h= where is the Hilbert-Schmidt norm and K is given by Eq. (2.8). By some basic algebra, we can show that the optimal solution of Eq (2.9) is h j = Ce j, e j α j or equivalently λ j = Ce j, e j for j =...J. Since C is the posterior covariance, for any v and v X, we have [9], Cv, v = v, u m v, u m µ(du), (2.)
7 Adaptive preconditioned Crank-Nicolson algorithm 7 where m is the mean of µ. Using Eq. (2.), we can derive that h j = (xj u j ) 2 dµ, or, λ j = (x j u j ) 2 dµ, (2.) α j where x j = m, e j and u j = u, e j for j =...J. In practice, the posterior covariance C is not directly available, and so here we determine the operator K with an adaptive MCMC algorithm. Simply speaking, the adaptive algorithm starts with an initial guess of K and then adaptively updates the K based on the sample history of the posterior. The essential part in the algorithm is to update K, i.e. to estimate the values of h j, from posterior samples. To this end, suppose we have a set of posterior samples {u n } n i=, and the values of parameters h j are estimated using the sample average approximation of Eq. (2.): x n j = n u i, e j, n + i= (2.2a) n s n j = (u i j) 2, (2.2b) i= h n j = n+ sn j (xn j )2 + ɛ, (2.2c) 2 α j for j =...J. Here ɛ is a small constant, introduced to ensure the stability of the algorithm, i.e., to prevent h n j becoming arbitrarily large. For efficiency s sake, we can rewrite Eq (2.2) in a recursive form x n j = n n + xn j + n + un, e j, (2.3a) s n j = s n j + (u n j ) 2, (2.3b) h n j = n+ sn j, (xn j )2 α j (2.3c) for j =...J and n >. Let us denote the operator K resulting from {h n j }J j= as K n and it is easy to see that K n is symmetric positive and of trace class. As a result we can rewrite the proposal v = (I 2 δk nl) 2δ (I + 2 δk nl) u + (I + 2 δk w, (2.4) nl) where w N (, K n ). Finally we note that, it is not robust to estimate the parameter values with a very small number of samples, and to address the issue, we first draw a certain number of samples with a standard algorithm and then start the adaptive algorithm. We describe the complete adaptive (A) algorithm in Algorithm.
8 8 Z. Hu, Z. Yao and J. Li Algorithm The adaptive algorithm : Initialize u S; 2: for n = to n do 3: Propose v using Eq (2.5); 4: Draw ρ U[, ] 5: Let a := min{, 6: if ρ a then 7: u n+ = v; 8: else 9: u n+ = u n ; : end if : end for 2: Compute {x n j, sn j L(v) L(u n ) };, hn j }J j= using Eq. (2.2) and samples {ui } n i= ; 3: for n = n to N do 4: Compute K n from Eqs. (2.8) with {h n j }J j= ; 5: Propose v using Eq (2.4); 6: Draw ρ U[, ] 7: Let a := min{, 8: if ρ a then 9: u n+ = v; 2: else 2: u n+ = u n ; 22: end if 23: Compute {h n+ 24: end for L(v) L(u n ) }; j } J j= using Eqs. (2.3); 2.5 Ergodicity analysis As has been mentioned, an important issue in an adaptive MCMC algorithm is to verify that it has the correct ergodic properties. Directly proving the ergodicity property in the infinite dimensional setting is rather challenging. However, as eventually the algorithm must be implemented in a finite dimensional setting, it is reasonable to consider the ergodic properties of the finite dimensional implementation instead. Namely, we first approximate u with a d-dimensional representation, say z = P d u. In this case, the state space X becomes R d and the prior µ (dz) of z reduces to a d-variate Gaussian distribution over R d. Now we shall perform our ergodicity analysis on this finite dimensional problem. In particular we follow the analysis outlined in [], which requires to make a small modification to the likelihood function (2.2), L S (z) = { Z exp( Φ(z)), u S, u / S. (2.5) Here S = {z R d z 2 2 < R} where R > is a positive constant that can be chosen arbitrarily. The posterior of z becomes dµ d dµ (z) = L S (z).
9 Adaptive preconditioned Crank-Nicolson algorithm 9 We emphasize that modifying the likelihood function is only for the convenience of proof (as the technique employed in [] requires the posterior support to be bounded), and clearly the modified likelihood function well approximates the original one provided a sufficiently large R is chosen. In this setting we have the following theorem indicating the ergodicity of our algorithm: Theorem The chain {z n } generated by Algorithm, with any initial distribution (the distribution of u ) on S, simulates properly the target distribution µ d : for any bounded and µ measurable function f : S R, the equality holds almost surely. lim n n + n f(z i ) = E µd [f(z)], i= We leave the proof in Appendix A. 3 Numerical examples 3. An ODE example Our first example is a simple inverse problem where the forward model is governed by an ordinary differential equation (ODE): x(t) t = u(t)x(t) with a prescribed initial condition. We assume that we observe the solution x(t) several times in the interval [, T ], and we want to infer the unknown coefficient u(t). In our experiments, we let the initial condition be x() = and T =. Now suppose that the solution is measured every T/ time unit from to T and the error in each measurement is assumed to be an independent Gaussian N(,.5 2 ). The data is generated by applying the forward model to a true coefficient u and then adding noise to the result. The data and the truth that is used to generate the data are shown in Fig.. In the inference, 2 equally spaced grid points are used to represent the unknown u. The prior is chosen to be a zero-mean Gaussian measure in X with an exponential covariance function: K(t, t 2 ) = exp( t t 2 /2). We sample the posterior with both the and the A algorithms, each with 6 samples. In the, we choose β = /5 and in the A we choose δ = /4. These parameter values are chosen in a way such that the two algorithms result in reasonable acceptance probabilities. Moreover, in the A algorithm, we choose J = and ɛ = 4. The average acceptance probability of is 28% and that of the A is 3%. First we shall show
10 Z. Hu, Z. Yao and J. Li u(t) x(t) data without noise data with noise t t Fig. (for the ODE example) Left: the true coefficient. Right: the data generated with the true coefficient: blue solid line is the simulated data without observation noise and the red dashed line is the simulated data with observation noise. that the adaptation diminishes as the number of iterations increases. Thus, in Fig. 2, we plot the estimated values of λ and λ as a function of the number of iterations, and we can see from the plots that the values of these two parameters converge as the iterations proceed. Next we shall compare the performance of the two algorithms, and a commonly used performance indicator is the autocorrelation function (ACF). We particularly consider the unknown at t =.2,.5 and.8 and we plot the ACF for all the three points in Fig. 3. One can see from the figure that, for all three points, the ACF of the chain generated by the A decreases much faster than that of the standard, suggesting that the A method achieves a significantly higher efficiency. Alternatively, we compute the ACF of lag at all the grid points, which is plotted in Fig. 4 (left), and we can see that, the ACF of the chain generated by the A is much lower than that of the standard at all the grid points. The effective sample size (ESS) is another popular measure of the sampling efficiency of MCMC [5]. ESS is computed by ESS = N + 2τ, where τ is the integrated autocorrelation time and N is the total sample size, and it gives an estimate of the number of effectively independent draws in the chain. We compute the ESS of the unknown u at each grid point and show the results in Fig. 4 (right). The results show that the A algorithm produces much more effectively independent samples than the standard. 3.2 Estimating the Robin coefficient In this example we consider the one dimensional heat conduction equation in the region x [, L], u t (x, t) = 2 u (x, t), x2 (3.a) u(x, ) = g(x), (3.b)
11 Adaptive preconditioned Crank-Nicolson algorithm λ λ numer of iterations x number of iterations x 5 Fig. 2 (for the ODE example) The estimate of λ (left) and λ plotted as a function of the number of iterations. t =.2 t =.5 t = ACF.6.4 A ACF.6.4 A ACF.6.4 A lag lag lag Fig. 3 (for the ODE example) Autocorrelation functions (ACF) for the and the A methods. Left: ACF of the OMF plotted as a function of lags. Right: the lag ACF for u at each grid point. x 4 4 ACF (lag ) A t ESS 3 2 A t Fig. 4 (for the ODE example) Autocorrelation functions (ACF) for the and the A methods at different grid points: from left to right, at t =.2, t =.5 and t =.8. with the following Robin boundary conditions: u x (, t) + ρ(t)u(, t) = h (t), u x (L, t) + ρ(t)u(l, t) = h (t). (3.c) (3.d) Suppose the functions g(x), h (x) and h (x) are all known, and we want to estimate the unknown Robin coefficient ρ(t) from certain measurements of the temperature u(x, t). The Robin coefficient ρ(t) characterizes thermal proper-
12 2 Z. Hu, Z. Yao and J. Li u(t) x(t) data without noise data with noise t t Fig. 5 (for example ) Autocorrelation functions (ACF) for the and the A methods. Left: ACF of the OMF plotted as a function of lags. Right: the lag ACF for u at each grid point..6.4 λ λ numer of iterations x number of iterations x 5 Fig. 6 (for example ) The estimate of λ (left) and λ plotted as a function of the number of iterations. ties of the conductive medium on the interface which in turn provides information on certain physical processes near the boundary, e.g., corrosion [3]. In this example we choose L =, T = and the functions to be g(x) = x 2 +, h = t(2t + ), h = 2 + t(2t + 2). The solution is measured every T/5 time unit from to T and the error in each measurement is assumed to be an independent Gaussian N(,.5 2 ). The true Robin coefficient and the resulting data are shown in Fig. 5. In the computation, equally spaced grid points are used to represent the unknown. Moreover, the prior is the same as that used in the ODE example. We sample the posterior with both the and the A algorithms, each with 6 samples. In the, we choose β = /4 and in the A we choose δ = 2/. We choose J = and ɛ = 4 in the A algorithm. The average acceptance probability of is 28% and that of the A is 3%. As is in the ODE example, we first plot the estimated values of λ and λ as a function of the number of iterations in Fig. 6, where we can observe the convergence of the two parameters. We then plot the ACF for the unknown at grid points t =.2,.5 and.8, in Fig. 7. Next we compute the ACF of lag at all the grid points, and plot the results in Fig. 8 (left). In all these
13 Adaptive preconditioned Crank-Nicolson algorithm 3 t =.2 t =.5 t = ACF.6 ApCH ACF.6.4 A ACF.6.4 A lag lag lag Fig. 7 (for the Robin example) Autocorrelation functions (ACF) for the and the A methods. Left: ACF of the OMF plotted as a function of lags. Right: the lag ACF for u at each grid point x 4 A ACF (lag ).6.4 A ESS t t Fig. 8 (for example ) Autocorrelation functions (ACF) for the and the A methods. Left: ACF of the OMF plotted as a function of lags. Right: the lag ACF for u at each grid point. ACF plots, we can see that the results of our A algorithm are significantly better than those of the standard method. Finally we compute the ESS of the unknown u at each grid point and show the results in Fig. 8 (right), which once again indicates that the A algorithm outperforms the standard evidently. 4 Conclusions In summary, we consider MCMC simulations for Bayesian inferences in function spaces. In particular, we develop an adaptive version of the algorithm to improve the sampling efficiency. The implementation of the A algorithm is rather simple, without requiring any information of the underlying models, and during the iteration the proposal can be efficiently updated with explicit formulas. We also show that the adaptive algorithm has the correct ergodicity property. Finally we demonstrate the effectiveness and efficiency of the A algorithm with several numerical examples. We expect the A algorithm can be of use in many practical problems, especially in those involving blackbox models. It should be noted that, in the present work, we consider the ergodicity properties of the finite dimensional approximation of the algorithm. It is cer-
14 4 Z. Hu, Z. Yao and J. Li tainly desirable to ensure that the infinite dimensional MCMC algorithm itself has the correct ergodicity properties, which may require certain modifications of the present adaptive algorithm. We plan to work on this problem in the future. A Proof of Theorem Recall that, in the finite dimensional setting, our target distribution µ d is supported by S. Let M (S) denote the set of finite measures on S. The norm on M (S) is the total variation norm. Assume that K n(z, z,, z n 2, z) is the operator K at step n computed from z,, z n 2, z (i.e., z = z n ). For simplicity, let ζ n 2 = (z,, z n 2 ) and K n,ζn 2 (z) = K n(z, z,, z n 2, z). q n,ζn 2 (z; ) is the proposal distribution given by v = I 2 δk n,ζ n 2 (z)l 2δ I + 2 δk n,ζ n 2 (z)l z + I + 2 δk n,ζ n 2 (z)l w, where w N (, K n,ζn 2 (z)). It should be noted that all the operators reduce to matrices in the finite dimensional setting. Then define Q n,ζn 2 (z; dv) = acc(z, v)q n,ζn 2 (z, dv) + δ z(dv)( acc(z, x)q n,ζn 2 (z, dx)) as the transition probability at step n, where δ z( ) is a point mass, and the acceptance probability is acc(z, v) = min{, L S(v) L S (z) }. And define Q n(z, z,, z n 2, z; dv) = Q n,ζn 2 (z; dv), as the transition probability from (z, z,, z n 2, z) to v. Let T be a transition probability on S and set µ T µ 2 T Γ (T ) = sup µ,µ 2 µ µ 2 where the supremum is taken over distinct probability measures µ, µ 2 on S. Now we introduce some new notations. First following [], we use νt to denote the measure A S T (z; A)ν(dz), and for bounded measurable functions we write T f(z) = S T (z; dy)f(y) as well as νf = S f(y)ν(dy). Then we have the following proposition: Proposition 2 The transition probabilities (Q n) satisfy the following three conditions: I. There is a constant γ (, ) such that Γ (Q n,ζn 2 ) γ <, for ζ n 2 S n and n 2. II. There is a fixed positive constant γ 2 such that Q n,ζn 2 Q n+k,ζn+k 2 M(S) M(S) γ 2 k n where n, k and one assumes that ζ n+k 2 is a direct continuation of ζ n 2. III. There is a constant γ 3 such that µ d Q n,ζn 2 µ d γ 3 n, for ζ n 2 S n and n 2.
15 Adaptive preconditioned Crank-Nicolson algorithm 5 Proof and I. Let A n,ζn 2 (z) = I + 2 δk n,ζ n 2 (z)l, B n,ζn 2 (z) = I 2 δk n,ζ n 2 (z)l. Define that, for j =...d, and a n,ζn 2,i(z) = + 2 δλ n,ζ n 2,j(z)α j, b n,ζn 2,j(z) = 2 δλ n,ζ n 2,j(z)α j, where λ n,ζn 2,j(z) is the eigenvalue of K n,ζn 2 (z). Obviously, a n,ζn 2,j(z) and b n,ζn 2,j(z) are the eigenvalues of A n,ζn 2 (z) and B n,ζn 2 (z) respectively. And we know for j =...d, < a n,ζn 2,j(z) < M, and b n,ζn 2,j(z) < M for a positive constant M. According to the proposal, q n,ζn 2 (z; ) = N (A n,ζn 2 (z) B n,ζn 2 (z)z, 2δA n,ζn 2 (z) 2 K n,ζn 2 (z)). Since < a n,ζn 2,j(z) < M and by design, M 2 λ n,ζn 2,j(z) M 3 for some constants M 2, M 3 >, we have M 4 I 2δA n,ζn 2 (z) 2 K n,ζn 2 (z) M 5 I, for some constants M 4, M 5 >. And for any z S, there exists a constant M 6 > such that A n,ζn 2 (z) B n,ζn 2 (z)z 2 = d a n,ζn 2,j(z) 2 b n,ζn 2,j(z) 2 z, e i 2 M 6. i= Thus the density of q n,ζn 2 (z; ) is bounded below on S. Then it is trivial that q n,ζn 2 (z; A) cµ (A) for all z S, all A S, and a constant c >. Then we know that Γ (Q n,ζn 2 ) γ < (c.f. [?]). II. For any given ζ n 2, one has Q n,ζn 2 Q n+k,ζn+k 2 M(S) M(S) 2 sup Q n,ζn 2 (z; A) Q n+k,ζn+k 2 (z; A). z S,A S We then can show that Q n,ζn 2 (z; A) Q n+k,ζn+k 2 (z; A) 2 q n,ζn 2 (z; v) q(v) dv + 2 q(v) q n+k,ζn+k 2 (z; v) dv, R d R d (A.) where q is the Gaussian measure that has the same mean with q n,ζn 2 (z; ) and has the same covariance with q n+k,ζn+k 2 (z; ). Let and I = q n,ζn 2 (z; v) q(v) dv, R d I 2 = R d q(v) q n+k,ζn+k 2 (z; v) dv.
16 6 Z. Hu, Z. Yao and J. Li Let β n,j = 2δa n,ζn 2,j(z) 2 λ n,ζn 2,j(z). Then β n,j are eigenvalues of the covariance of q n,ζn 2 (z, ). It is easy to see that λ n,ζn 2,j(z) λ n+k,ζn+k 2,j(z) M 2 k n, (A.2) for a constant M 2 >, and it follows that β n,j β n+k,j M 22 k n, (A.3) for a constant M 22 >. And obviously, there is a positive constant M 23 such that β n,j, β n+k,j M 23. We first consider I. Actually, d I exp( z2 d j ) exp( z2 j ) dz dz d R d 2πβn,j 2β i= n,j 2πβn+k,j 2β i= n+k,j Thanks to Eq. (A.3), by some elementary calculations, we can show that I M 24 k/n for some constant M 24 >. We now consider I 2. Let Here we have I 2 R d z = A n,ζn 2 (z) B n,ζn 2 (z)z A n,ζn 2 (z) B n,ζn 2 (z)z. d i= Using Eq. (A.2), we have, and exp( (z j z, e j ) 2 ) 2πβn+k,j 2β n+k,j d exp( z2 j ) dz dz d. 2πβn+k,j 2β i= n+k,j a n,ζn 2,j(z n ) a n+k,ζn+k 2,j(z) < M 26 k n b n,ζn 2,j(z n ) b n+k,ζn+k 2,j(z) < M 26 k n, for some constant M 26 >. Thus, we have z, e j = a n+k,ζn+k 2,j(z) b n+k,ζn+k 2,j(z) a n,ζn 2,j(z) b n,ζn 2,j(z) z, e j M 27 k n. and so I 2 M 28 k/n for a constant M 28 >. We thus can come to the conclusion that for some constant γ 2 >. Q n,ζn 2 Q n+k,ζn+k 2 M(s) M(s) γ 2 k n,
17 Adaptive preconditioned Crank-Nicolson algorithm 7 III. Assume that K = K n,ζn 3 (z n 2 ). Define q (z; dv) to be the transition kernel according to (I 2 δk L)v = (I + 2 δk L)z + 2δN (, K ). Let Q (z; dv) = acc(z, v)q (z; dv) + δ z(dv)( q (z; dx)acc(x, z)). It is easy to see that the transition kernel Q satisfies the condition of detailed balance, and thus we have µ d Q = µ d. Since λ n,ζn 3,j(z n 2 ) λ n,ζn 2,j(u) M 3 n for i =...d and a constant M 3 >. Also, there exists M 32, M 33 >, such that M 32 < λ n,ζn 3,j(z n 2 ), λ n,ζn 2,j(z) < M 33. By a similar procedure to that of condition (II), we can obtain, Q n,ζn 2 Q M(s) M(s) M 34 n, for some constant M 34 >. It follows that for some constant γ 3 >. µ d Q n,ζn 2 µ d = µ d (Q n,ζn 2 Q ) γ 3 n, Now we have proved Proposition 2 for our algorithm, and thus Theorem follows immediately from Theorem 2 in [], Finally, it is worth noting that, following the analysis of [25], it may be possible to relax the requirement that the posterior must have a bounded support. Nevertheless, the investigation of unbounded support is not in the scope of the present work. References. Christophe Andrieu and Johannes Thoms, A tutorial on adaptive mcmc, Statistics and Computing, 8 (28), pp Yves Atchade, Gersende Fort, Eric Moulines, and Pierre Priouret, Adaptive markov chain monte carlo: theory and methods, Preprint, (29). 3. Alexandros Beskos, A stable manifold MCMC method for high dimensions, Statistics & Probability Letters, 9 (24), pp Alexandros Beskos, Gareth Roberts, Andrew Stuart, et al., Optimal scalings for local metropolis hastings chains on nonproduct targets in high dimensions, The Annals of Applied Probability, 9 (29), pp Alexandros Beskos, Gareth Roberts, Andrew Stuart, and Jochen Voss, Mcmc methods for diffusion bridges, Stochastics and Dynamics, 8 (28), pp Yuxin Chen, David Keyes, Kody JH Law, and Hatem Ltaief, Accelerated dimensionindependent adaptive metropolis, arxiv preprint arxiv:56.574, (25). 7. Simon L Cotter, Gareth O Roberts, AM Stuart, David White, et al., Mcmc methods for functions: modifying old algorithms to make them faster, Statistical Science, 28 (23), pp Tiangang Cui, Kody JH Law, and Youssef M Marzouk, Dimension-independent likelihood-informed mcmc, arxiv preprint arxiv:4.3688, (24). 9. Giuseppe Da Prato, An introduction to infinite-dimensional analysis, Springer, 26.. Zhe Feng and Jinglai Li, An adaptive independence sampler mcmc algorithm for infinite dimensional bayesian inferences, arxiv preprint arxiv: , (25).. Heikki Haario, Eero Saksman, and Johanna Tamminen, An adaptive metropolis algorithm, Bernoulli, (2), pp Nils Lid Hjort, Chris Holmes, Peter Müller, and Stephen G Walker, Bayesian nonparametrics, vol. 28, Cambridge University Press, Gabriele Inglese, An inverse problem in corrosion detection, Inverse problems, 3 (997), p. 977.
18 8 Z. Hu, Z. Yao and J. Li 4. Jari Kaipio and Erkki Somersalo, Statistical and computational inverse problems, vol. 6, Springer, Robert E. Kass, Bradley P. Carlin, Andrew Gelman, and Radford M. Neal, Markov Chain Monte Carlo in Practice: A Roundtable Discussion, The American Statistician, 52 (998), pp Kody JH Law, Proposals which speed up function-space mcmc, Journal of Computational and Applied Mathematics, 262 (24), pp James Martin, Lucas C Wilcox, Carsten Burstedde, and Omar Ghattas, A stochastic newton mcmc method for large-scale statistical inverse problems with application to seismic inversion, SIAM Journal on Scientific Computing, 34 (22), pp. A46 A Jonathan C Mattingly, Natesh S Pillai, Andrew M Stuart, et al., Diffusion limits of the random walk metropolis algorithm in high dimensions, The Annals of Applied Probability, 22 (22), pp Noemi Petra, James Martin, Georg Stadler, and Omar Ghattas, A computational framework for infinite-dimensional bayesian inverse problems, part ii: Stochastic newton mcmc with application to ice sheet flow inverse problems, SIAM Journal on Scientific Computing, 36 (24), pp. A525 A Frank J Pinski, Gideon Simpson, Andrew M Stuart, and Hendrik Weber, Algorithms for kullback-leibler approximation of probability measures in infinite dimensions, arxiv preprint arxiv:48.92, (24). 2. Gareth O Roberts, Andrew Gelman, Walter R Gilks, et al., Weak convergence and optimal scaling of random walk metropolis algorithms, The annals of applied probability, 7 (997), pp Gareth O Roberts and Jeffrey S Rosenthal, Examples of adaptive mcmc, Journal of Computational and Graphical Statistics, 8 (29), pp Gareth O Roberts, Jeffrey S Rosenthal, et al., Optimal scaling for various metropolis-hastings algorithms, Statistical science, 6 (2), pp Daniel Rudolf and Björn Sprungk, On a generalization of the preconditioned cranknicolson metropolis algorithm, arxiv preprint arxiv:54.346, (25). 25. Eero Saksman, Matti Vihola, et al., On the ergodicity of the adaptive metropolis algorithm on unbounded domains, The Annals of applied probability, 2 (2), pp A. M. Stuart, Inverse problems: a Bayesian perspective, Acta Numerica, 9 (2), pp Sebastian J Vollmer, Dimension-independent mcmc sampling for inverse problems with non-gaussian priors, arxiv preprint arxiv:32.223, (23). 28. Zhewei Yao, Zixi Hu, and Jinglai Li, A tv-gaussian prior for infinite-dimensional bayesian inverse problems and its numerical implementations, arxiv preprint arxiv:5.5239, (25).
Dimension-Independent likelihood-informed (DILI) MCMC
Dimension-Independent likelihood-informed (DILI) MCMC Tiangang Cui, Kody Law 2, Youssef Marzouk Massachusetts Institute of Technology 2 Oak Ridge National Laboratory 2 August 25 TC, KL, YM DILI MCMC USC
More informationComputer Practical: Metropolis-Hastings-based MCMC
Computer Practical: Metropolis-Hastings-based MCMC Andrea Arnold and Franz Hamilton North Carolina State University July 30, 2016 A. Arnold / F. Hamilton (NCSU) MH-based MCMC July 30, 2016 1 / 19 Markov
More informationSome Results on the Ergodicity of Adaptive MCMC Algorithms
Some Results on the Ergodicity of Adaptive MCMC Algorithms Omar Khalil Supervisor: Jeffrey Rosenthal September 2, 2011 1 Contents 1 Andrieu-Moulines 4 2 Roberts-Rosenthal 7 3 Atchadé and Fort 8 4 Relationship
More informationUncertainty quantification for inverse problems with a weak wave-equation constraint
Uncertainty quantification for inverse problems with a weak wave-equation constraint Zhilong Fang*, Curt Da Silva*, Rachel Kuske** and Felix J. Herrmann* *Seismic Laboratory for Imaging and Modeling (SLIM),
More informationRecent Advances in Bayesian Inference for Inverse Problems
Recent Advances in Bayesian Inference for Inverse Problems Felix Lucka University College London, UK f.lucka@ucl.ac.uk Applied Inverse Problems Helsinki, May 25, 2015 Bayesian Inference for Inverse Problems
More informationA Dirichlet Form approach to MCMC Optimal Scaling
A Dirichlet Form approach to MCMC Optimal Scaling Giacomo Zanella, Wilfrid S. Kendall, and Mylène Bédard. g.zanella@warwick.ac.uk, w.s.kendall@warwick.ac.uk, mylene.bedard@umontreal.ca Supported by EPSRC
More informationRobust MCMC Sampling with Non-Gaussian and Hierarchical Priors
Division of Engineering & Applied Science Robust MCMC Sampling with Non-Gaussian and Hierarchical Priors IPAM, UCLA, November 14, 2017 Matt Dunlop Victor Chen (Caltech) Omiros Papaspiliopoulos (ICREA,
More informationMCMC Sampling for Bayesian Inference using L1-type Priors
MÜNSTER MCMC Sampling for Bayesian Inference using L1-type Priors (what I do whenever the ill-posedness of EEG/MEG is just not frustrating enough!) AG Imaging Seminar Felix Lucka 26.06.2012 , MÜNSTER Sampling
More informationBayesian Methods for Machine Learning
Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),
More informationMarkov Chain Monte Carlo methods
Markov Chain Monte Carlo methods By Oleg Makhnin 1 Introduction a b c M = d e f g h i 0 f(x)dx 1.1 Motivation 1.1.1 Just here Supresses numbering 1.1.2 After this 1.2 Literature 2 Method 2.1 New math As
More informationSequential Monte Carlo Samplers for Applications in High Dimensions
Sequential Monte Carlo Samplers for Applications in High Dimensions Alexandros Beskos National University of Singapore KAUST, 26th February 2014 Joint work with: Dan Crisan, Ajay Jasra, Nik Kantas, Alex
More informationAn introduction to adaptive MCMC
An introduction to adaptive MCMC Gareth Roberts MIRAW Day on Monte Carlo methods March 2011 Mainly joint work with Jeff Rosenthal. http://www2.warwick.ac.uk/fac/sci/statistics/crism/ Conferences and workshops
More informationStochastic Spectral Approaches to Bayesian Inference
Stochastic Spectral Approaches to Bayesian Inference Prof. Nathan L. Gibson Department of Mathematics Applied Mathematics and Computation Seminar March 4, 2011 Prof. Gibson (OSU) Spectral Approaches to
More informationMarkov Chain Monte Carlo (MCMC)
Markov Chain Monte Carlo (MCMC Dependent Sampling Suppose we wish to sample from a density π, and we can evaluate π as a function but have no means to directly generate a sample. Rejection sampling can
More informationMarkov Chain Monte Carlo
Markov Chain Monte Carlo Recall: To compute the expectation E ( h(y ) ) we use the approximation E(h(Y )) 1 n n h(y ) t=1 with Y (1),..., Y (n) h(y). Thus our aim is to sample Y (1),..., Y (n) from f(y).
More informationMonte Carlo methods for sampling-based Stochastic Optimization
Monte Carlo methods for sampling-based Stochastic Optimization Gersende FORT LTCI CNRS & Telecom ParisTech Paris, France Joint works with B. Jourdain, T. Lelièvre, G. Stoltz from ENPC and E. Kuhn from
More informationHierarchical Bayesian Inversion
Hierarchical Bayesian Inversion Andrew M Stuart Computing and Mathematical Sciences, Caltech cw/ S. Agapiou, J. Bardsley and O. Papaspiliopoulos SIAM/ASA JUQ 2(2014), pp. 511--544 cw/ M. Dunlop and M.
More informationMarkov chain Monte Carlo methods
Markov chain Monte Carlo methods Youssef Marzouk Department of Aeronautics and Astronatics Massachusetts Institute of Technology ymarz@mit.edu 22 June 2015 Marzouk (MIT) IMA Summer School 22 June 2015
More informationComputational statistics
Computational statistics Markov Chain Monte Carlo methods Thierry Denœux March 2017 Thierry Denœux Computational statistics March 2017 1 / 71 Contents of this chapter When a target density f can be evaluated
More informationSimultaneous drift conditions for Adaptive Markov Chain Monte Carlo algorithms
Simultaneous drift conditions for Adaptive Markov Chain Monte Carlo algorithms Yan Bai Feb 2009; Revised Nov 2009 Abstract In the paper, we mainly study ergodicity of adaptive MCMC algorithms. Assume that
More information16 : Approximate Inference: Markov Chain Monte Carlo
10-708: Probabilistic Graphical Models 10-708, Spring 2017 16 : Approximate Inference: Markov Chain Monte Carlo Lecturer: Eric P. Xing Scribes: Yuan Yang, Chao-Ming Yen 1 Introduction As the target distribution
More informationSlice Sampling with Adaptive Multivariate Steps: The Shrinking-Rank Method
Slice Sampling with Adaptive Multivariate Steps: The Shrinking-Rank Method Madeleine B. Thompson Radford M. Neal Abstract The shrinking rank method is a variation of slice sampling that is efficient at
More informationDereversibilizing Metropolis-Hastings: simple implementation of non-reversible MCMC methods
1st Symposium on Advances in Approximate Bayesian Inference, 2018 1 6 Dereversibilizing Metropolis-Hastings: simple implementation of non-reversible MCMC methods Florian Maire maire@dms.umontreal.ca Département
More informationAsymptotics for posterior hazards
Asymptotics for posterior hazards Pierpaolo De Blasi University of Turin 10th August 2007, BNR Workshop, Isaac Newton Intitute, Cambridge, UK Joint work with Giovanni Peccati (Université Paris VI) and
More informationKernel adaptive Sequential Monte Carlo
Kernel adaptive Sequential Monte Carlo Ingmar Schuster (Paris Dauphine) Heiko Strathmann (University College London) Brooks Paige (Oxford) Dino Sejdinovic (Oxford) December 7, 2015 1 / 36 Section 1 Outline
More informationComputational Complexity of Metropolis-Hastings Methods in High Dimensions
Computational Complexity of Metropolis-Hastings Methods in High Dimensions Alexandros Beskos and Andrew Stuart Abstract This article contains an overview of the literature concerning the computational
More informationA Review of Pseudo-Marginal Markov Chain Monte Carlo
A Review of Pseudo-Marginal Markov Chain Monte Carlo Discussed by: Yizhe Zhang October 21, 2016 Outline 1 Overview 2 Paper review 3 experiment 4 conclusion Motivation & overview Notation: θ denotes the
More informationNonparametric Drift Estimation for Stochastic Differential Equations
Nonparametric Drift Estimation for Stochastic Differential Equations Gareth Roberts 1 Department of Statistics University of Warwick Brazilian Bayesian meeting, March 2010 Joint work with O. Papaspiliopoulos,
More informationLecture 8: The Metropolis-Hastings Algorithm
30.10.2008 What we have seen last time: Gibbs sampler Key idea: Generate a Markov chain by updating the component of (X 1,..., X p ) in turn by drawing from the full conditionals: X (t) j Two drawbacks:
More informationReversible Markov chains
Reversible Markov chains Variational representations and ordering Chris Sherlock Abstract This pedagogical document explains three variational representations that are useful when comparing the efficiencies
More informationPoint spread function reconstruction from the image of a sharp edge
DOE/NV/5946--49 Point spread function reconstruction from the image of a sharp edge John Bardsley, Kevin Joyce, Aaron Luttman The University of Montana National Security Technologies LLC Montana Uncertainty
More informationScalable algorithms for optimal experimental design for infinite-dimensional nonlinear Bayesian inverse problems
Scalable algorithms for optimal experimental design for infinite-dimensional nonlinear Bayesian inverse problems Alen Alexanderian (Math/NC State), Omar Ghattas (ICES/UT-Austin), Noémi Petra (Applied Math/UC
More informationSTAT 518 Intro Student Presentation
STAT 518 Intro Student Presentation Wen Wei Loh April 11, 2013 Title of paper Radford M. Neal [1999] Bayesian Statistics, 6: 475-501, 1999 What the paper is about Regression and Classification Flexible
More informationControlled sequential Monte Carlo
Controlled sequential Monte Carlo Jeremy Heng, Department of Statistics, Harvard University Joint work with Adrian Bishop (UTS, CSIRO), George Deligiannidis & Arnaud Doucet (Oxford) Bayesian Computation
More informationMALA versus Random Walk Metropolis Dootika Vats June 4, 2017
MALA versus Random Walk Metropolis Dootika Vats June 4, 2017 Introduction My research thus far has predominantly been on output analysis for Markov chain Monte Carlo. The examples on which I have implemented
More informationBayesian Methods and Uncertainty Quantification for Nonlinear Inverse Problems
Bayesian Methods and Uncertainty Quantification for Nonlinear Inverse Problems John Bardsley, University of Montana Collaborators: H. Haario, J. Kaipio, M. Laine, Y. Marzouk, A. Seppänen, A. Solonen, Z.
More informationThe University of Auckland Applied Mathematics Bayesian Methods for Inverse Problems : why and how Colin Fox Tiangang Cui, Mike O Sullivan (Auckland),
The University of Auckland Applied Mathematics Bayesian Methods for Inverse Problems : why and how Colin Fox Tiangang Cui, Mike O Sullivan (Auckland), Geoff Nicholls (Statistics, Oxford) fox@math.auckland.ac.nz
More informationOptimizing and Adapting the Metropolis Algorithm
6 Optimizing and Adapting the Metropolis Algorithm Jeffrey S. Rosenthal University of Toronto, Toronto, ON 6.1 Introduction Many modern scientific questions involve high-dimensional data and complicated
More informationBrownian Motion. 1 Definition Brownian Motion Wiener measure... 3
Brownian Motion Contents 1 Definition 2 1.1 Brownian Motion................................. 2 1.2 Wiener measure.................................. 3 2 Construction 4 2.1 Gaussian process.................................
More informationPractical unbiased Monte Carlo for Uncertainty Quantification
Practical unbiased Monte Carlo for Uncertainty Quantification Sergios Agapiou Department of Statistics, University of Warwick MiR@W day: Uncertainty in Complex Computer Models, 2nd February 2015, University
More informationExact Simulation of Diffusions and Jump Diffusions
Exact Simulation of Diffusions and Jump Diffusions A work by: Prof. Gareth O. Roberts Dr. Alexandros Beskos Dr. Omiros Papaspiliopoulos Dr. Bruno Casella 28 th May, 2008 Content 1 Exact Algorithm Construction
More informationON CONVERGENCE RATES OF GIBBS SAMPLERS FOR UNIFORM DISTRIBUTIONS
The Annals of Applied Probability 1998, Vol. 8, No. 4, 1291 1302 ON CONVERGENCE RATES OF GIBBS SAMPLERS FOR UNIFORM DISTRIBUTIONS By Gareth O. Roberts 1 and Jeffrey S. Rosenthal 2 University of Cambridge
More informationStat 451 Lecture Notes Markov Chain Monte Carlo. Ryan Martin UIC
Stat 451 Lecture Notes 07 12 Markov Chain Monte Carlo Ryan Martin UIC www.math.uic.edu/~rgmartin 1 Based on Chapters 8 9 in Givens & Hoeting, Chapters 25 27 in Lange 2 Updated: April 4, 2016 1 / 42 Outline
More informationDeblurring Jupiter (sampling in GLIP faster than regularized inversion) Colin Fox Richard A. Norton, J.
Deblurring Jupiter (sampling in GLIP faster than regularized inversion) Colin Fox fox@physics.otago.ac.nz Richard A. Norton, J. Andrés Christen Topics... Backstory (?) Sampling in linear-gaussian hierarchical
More informationBagging During Markov Chain Monte Carlo for Smoother Predictions
Bagging During Markov Chain Monte Carlo for Smoother Predictions Herbert K. H. Lee University of California, Santa Cruz Abstract: Making good predictions from noisy data is a challenging problem. Methods
More informationHastings-within-Gibbs Algorithm: Introduction and Application on Hierarchical Model
UNIVERSITY OF TEXAS AT SAN ANTONIO Hastings-within-Gibbs Algorithm: Introduction and Application on Hierarchical Model Liang Jing April 2010 1 1 ABSTRACT In this paper, common MCMC algorithms are introduced
More informationA quick introduction to Markov chains and Markov chain Monte Carlo (revised version)
A quick introduction to Markov chains and Markov chain Monte Carlo (revised version) Rasmus Waagepetersen Institute of Mathematical Sciences Aalborg University 1 Introduction These notes are intended to
More informationBayesian data analysis in practice: Three simple examples
Bayesian data analysis in practice: Three simple examples Martin P. Tingley Introduction These notes cover three examples I presented at Climatea on 5 October 0. Matlab code is available by request to
More informationBayesian Estimation of DSGE Models 1 Chapter 3: A Crash Course in Bayesian Inference
1 The views expressed in this paper are those of the authors and do not necessarily reflect the views of the Federal Reserve Board of Governors or the Federal Reserve System. Bayesian Estimation of DSGE
More informationAdaptive Posterior Approximation within MCMC
Adaptive Posterior Approximation within MCMC Tiangang Cui (MIT) Colin Fox (University of Otago) Mike O Sullivan (University of Auckland) Youssef Marzouk (MIT) Karen Willcox (MIT) 06/January/2012 C, F,
More informationMARKOV CHAIN MONTE CARLO
MARKOV CHAIN MONTE CARLO RYAN WANG Abstract. This paper gives a brief introduction to Markov Chain Monte Carlo methods, which offer a general framework for calculating difficult integrals. We start with
More informationPattern Recognition and Machine Learning. Bishop Chapter 11: Sampling Methods
Pattern Recognition and Machine Learning Chapter 11: Sampling Methods Elise Arnaud Jakob Verbeek May 22, 2008 Outline of the chapter 11.1 Basic Sampling Algorithms 11.2 Markov Chain Monte Carlo 11.3 Gibbs
More informationAdaptive Markov Chain Monte Carlo: Theory and Methods
Chapter Adaptive Markov Chain Monte Carlo: Theory and Methods Yves Atchadé, Gersende Fort and Eric Moulines 2, Pierre Priouret 3. Introduction Markov chain Monte Carlo (MCMC methods allow to generate samples
More informationTransport maps and dimension reduction for Bayesian computation Youssef Marzouk
Transport maps and dimension reduction for Bayesian computation Youssef Marzouk Massachusetts Institute of Technology Department of Aeronautics & Astronautics Center for Computational Engineering http://uqgroup.mit.edu
More informationMCMC for big data. Geir Storvik. BigInsight lunch - May Geir Storvik MCMC for big data BigInsight lunch - May / 17
MCMC for big data Geir Storvik BigInsight lunch - May 2 2018 Geir Storvik MCMC for big data BigInsight lunch - May 2 2018 1 / 17 Outline Why ordinary MCMC is not scalable Different approaches for making
More informationSampling Methods (11/30/04)
CS281A/Stat241A: Statistical Learning Theory Sampling Methods (11/30/04) Lecturer: Michael I. Jordan Scribe: Jaspal S. Sandhu 1 Gibbs Sampling Figure 1: Undirected and directed graphs, respectively, with
More informationRecent Advances in Regional Adaptation for MCMC
Recent Advances in Regional Adaptation for MCMC Radu Craiu Department of Statistics University of Toronto Collaborators: Yan Bai (Statistics, Toronto) Antonio Fabio di Narzo (Statistics, Bologna) Jeffrey
More informationA SCALED STOCHASTIC NEWTON ALGORITHM FOR MARKOV CHAIN MONTE CARLO SIMULATIONS
A SCALED STOCHASTIC NEWTON ALGORITHM FOR MARKOV CHAIN MONTE CARLO SIMULATIONS TAN BUI-THANH AND OMAR GHATTAS Abstract. We propose a scaled stochastic Newton algorithm ssn) for local Metropolis-Hastings
More informationQuantitative Non-Geometric Convergence Bounds for Independence Samplers
Quantitative Non-Geometric Convergence Bounds for Independence Samplers by Gareth O. Roberts * and Jeffrey S. Rosenthal ** (September 28; revised July 29.) 1. Introduction. Markov chain Monte Carlo (MCMC)
More informationSequential Monte Carlo Methods in High Dimensions
Sequential Monte Carlo Methods in High Dimensions Alexandros Beskos Statistical Science, UCL Oxford, 24th September 2012 Joint work with: Dan Crisan, Ajay Jasra, Nik Kantas, Andrew Stuart Imperial College,
More informationZig-Zag Monte Carlo. Delft University of Technology. Joris Bierkens February 7, 2017
Zig-Zag Monte Carlo Delft University of Technology Joris Bierkens February 7, 2017 Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, 2017 1 / 33 Acknowledgements Collaborators Andrew Duncan Paul
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 7 Approximate
More informationConvergence of the Ensemble Kalman Filter in Hilbert Space
Convergence of the Ensemble Kalman Filter in Hilbert Space Jan Mandel Center for Computational Mathematics Department of Mathematical and Statistical Sciences University of Colorado Denver Parts based
More informationSpectral properties of Markov operators in Markov chain Monte Carlo
Spectral properties of Markov operators in Markov chain Monte Carlo Qian Qin Advisor: James P. Hobert October 2017 1 Introduction Markov chain Monte Carlo (MCMC) is an indispensable tool in Bayesian statistics.
More informationarxiv: v1 [math.st] 26 Mar 2012
POSTERIOR CONSISTENCY OF THE BAYESIAN APPROACH TO LINEAR ILL-POSED INVERSE PROBLEMS arxiv:103.5753v1 [math.st] 6 Mar 01 By Sergios Agapiou Stig Larsson and Andrew M. Stuart University of Warwick and Chalmers
More informationWeb Appendix for Hierarchical Adaptive Regression Kernels for Regression with Functional Predictors by D. B. Woodard, C. Crainiceanu, and D.
Web Appendix for Hierarchical Adaptive Regression Kernels for Regression with Functional Predictors by D. B. Woodard, C. Crainiceanu, and D. Ruppert A. EMPIRICAL ESTIMATE OF THE KERNEL MIXTURE Here we
More informationKernel Sequential Monte Carlo
Kernel Sequential Monte Carlo Ingmar Schuster (Paris Dauphine) Heiko Strathmann (University College London) Brooks Paige (Oxford) Dino Sejdinovic (Oxford) * equal contribution April 25, 2016 1 / 37 Section
More informationSequential Monte Carlo Methods for High-Dimensional Inverse Problems: A case study for the Navier-Stokes equations
SIAM/ASA J. UNCERTAINTY QUANTIFICATION Vol. xx, pp. x c xxxx Society for Industrial and Applied Mathematics x x Sequential Monte Carlo Methods for High-Dimensional Inverse Problems: A case study for the
More informationIntroduction to Machine Learning CMU-10701
Introduction to Machine Learning CMU-10701 Markov Chain Monte Carlo Methods Barnabás Póczos & Aarti Singh Contents Markov Chain Monte Carlo Methods Goal & Motivation Sampling Rejection Importance Markov
More informationarxiv: v1 [stat.co] 2 Nov 2017
Binary Bouncy Particle Sampler arxiv:1711.922v1 [stat.co] 2 Nov 217 Ari Pakman Department of Statistics Center for Theoretical Neuroscience Grossman Center for the Statistics of Mind Columbia University
More informationList of projects. FMS020F NAMS002 Statistical inference for partially observed stochastic processes, 2016
List of projects FMS020F NAMS002 Statistical inference for partially observed stochastic processes, 206 Work in groups of two (if this is absolutely not possible for some reason, please let the lecturers
More informationMarkov chain Monte Carlo methods in atmospheric remote sensing
1 / 45 Markov chain Monte Carlo methods in atmospheric remote sensing Johanna Tamminen johanna.tamminen@fmi.fi ESA Summer School on Earth System Monitoring and Modeling July 3 Aug 11, 212, Frascati July,
More informationA Note on Auxiliary Particle Filters
A Note on Auxiliary Particle Filters Adam M. Johansen a,, Arnaud Doucet b a Department of Mathematics, University of Bristol, UK b Departments of Statistics & Computer Science, University of British Columbia,
More informationThe Particle Filter. PD Dr. Rudolph Triebel Computer Vision Group. Machine Learning for Computer Vision
The Particle Filter Non-parametric implementation of Bayes filter Represents the belief (posterior) random state samples. by a set of This representation is approximate. Can represent distributions that
More informationStat 535 C - Statistical Computing & Monte Carlo Methods. Lecture 15-7th March Arnaud Doucet
Stat 535 C - Statistical Computing & Monte Carlo Methods Lecture 15-7th March 2006 Arnaud Doucet Email: arnaud@cs.ubc.ca 1 1.1 Outline Mixture and composition of kernels. Hybrid algorithms. Examples Overview
More information17 : Markov Chain Monte Carlo
10-708: Probabilistic Graphical Models, Spring 2015 17 : Markov Chain Monte Carlo Lecturer: Eric P. Xing Scribes: Heran Lin, Bin Deng, Yun Huang 1 Review of Monte Carlo Methods 1.1 Overview Monte Carlo
More informationBayesian Nonparametric Regression for Diabetes Deaths
Bayesian Nonparametric Regression for Diabetes Deaths Brian M. Hartman PhD Student, 2010 Texas A&M University College Station, TX, USA David B. Dahl Assistant Professor Texas A&M University College Station,
More informationGaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008
Gaussian processes Chuong B Do (updated by Honglak Lee) November 22, 2008 Many of the classical machine learning algorithms that we talked about during the first half of this course fit the following pattern:
More informationOn some weighted fractional porous media equations
On some weighted fractional porous media equations Gabriele Grillo Politecnico di Milano September 16 th, 2015 Anacapri Joint works with M. Muratori and F. Punzo Gabriele Grillo Weighted Fractional PME
More informationBayesian Dynamic Linear Modelling for. Complex Computer Models
Bayesian Dynamic Linear Modelling for Complex Computer Models Fei Liu, Liang Zhang, Mike West Abstract Computer models may have functional outputs. With no loss of generality, we assume that a single computer
More informationThe Metropolis-Hastings Algorithm. June 8, 2012
The Metropolis-Hastings Algorithm June 8, 22 The Plan. Understand what a simulated distribution is 2. Understand why the Metropolis-Hastings algorithm works 3. Learn how to apply the Metropolis-Hastings
More informationIntroduction to Bayesian methods in inverse problems
Introduction to Bayesian methods in inverse problems Ville Kolehmainen 1 1 Department of Applied Physics, University of Eastern Finland, Kuopio, Finland March 4 2013 Manchester, UK. Contents Introduction
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning MCMC and Non-Parametric Bayes Mark Schmidt University of British Columbia Winter 2016 Admin I went through project proposals: Some of you got a message on Piazza. No news is
More informationA regeneration proof of the central limit theorem for uniformly ergodic Markov chains
A regeneration proof of the central limit theorem for uniformly ergodic Markov chains By AJAY JASRA Department of Mathematics, Imperial College London, SW7 2AZ, London, UK and CHAO YANG Department of Mathematics,
More informationScalable Algorithms for Optimal Control of Systems Governed by PDEs Under Uncertainty
Scalable Algorithms for Optimal Control of Systems Governed by PDEs Under Uncertainty Alen Alexanderian 1, Omar Ghattas 2, Noémi Petra 3, Georg Stadler 4 1 Department of Mathematics North Carolina State
More informationUniversity of Toronto Department of Statistics
Norm Comparisons for Data Augmentation by James P. Hobert Department of Statistics University of Florida and Jeffrey S. Rosenthal Department of Statistics University of Toronto Technical Report No. 0704
More informationEfficient MCMC Sampling for Hierarchical Bayesian Inverse Problems
Efficient MCMC Sampling for Hierarchical Bayesian Inverse Problems Andrew Brown 1,2, Arvind Saibaba 3, Sarah Vallélian 2,3 CCNS Transition Workshop SAMSI May 5, 2016 Supported by SAMSI Visiting Research
More informationA Concise Course on Stochastic Partial Differential Equations
A Concise Course on Stochastic Partial Differential Equations Michael Röckner Reference: C. Prevot, M. Röckner: Springer LN in Math. 1905, Berlin (2007) And see the references therein for the original
More informationMultimodal Nested Sampling
Multimodal Nested Sampling Farhan Feroz Astrophysics Group, Cavendish Lab, Cambridge Inverse Problems & Cosmology Most obvious example: standard CMB data analysis pipeline But many others: object detection,
More informationSlice Sampling Mixture Models
Slice Sampling Mixture Models Maria Kalli, Jim E. Griffin & Stephen G. Walker Centre for Health Services Studies, University of Kent Institute of Mathematics, Statistics & Actuarial Science, University
More informationAdaptive Metropolis with Online Relabeling
Adaptive Metropolis with Online Relabeling Anonymous Unknown Abstract We propose a novel adaptive MCMC algorithm named AMOR (Adaptive Metropolis with Online Relabeling) for efficiently simulating from
More informationAdaptive Rejection Sampling with fixed number of nodes
Adaptive Rejection Sampling with fixed number of nodes L. Martino, F. Louzada Institute of Mathematical Sciences and Computing, Universidade de São Paulo, São Carlos (São Paulo). Abstract The adaptive
More informationExamples of Adaptive MCMC
Examples of Adaptive MCMC by Gareth O. Roberts * and Jeffrey S. Rosenthal ** (September, 2006.) Abstract. We investigate the use of adaptive MCMC algorithms to automatically tune the Markov chain parameters
More informationKernel Adaptive Metropolis-Hastings
Kernel Adaptive Metropolis-Hastings Arthur Gretton,?? Gatsby Unit, CSML, University College London NIPS, December 2015 Arthur Gretton (Gatsby Unit, UCL) Kernel Adaptive Metropolis-Hastings 12/12/2015 1
More informationSupplement to A Hierarchical Approach for Fitting Curves to Response Time Measurements
Supplement to A Hierarchical Approach for Fitting Curves to Response Time Measurements Jeffrey N. Rouder Francis Tuerlinckx Paul L. Speckman Jun Lu & Pablo Gomez May 4 008 1 The Weibull regression model
More informationICES REPORT March Tan Bui-Thanh And Mark Andrew Girolami
ICES REPORT 4- March 4 Solving Large-Scale Pde-Constrained Bayesian Inverse Problems With Riemann Manifold Hamiltonian Monte Carlo by Tan Bui-Thanh And Mark Andrew Girolami The Institute for Computational
More informationPartially Collapsed Gibbs Samplers: Theory and Methods. Ever increasing computational power along with ever more sophisticated statistical computing
Partially Collapsed Gibbs Samplers: Theory and Methods David A. van Dyk 1 and Taeyoung Park Ever increasing computational power along with ever more sophisticated statistical computing techniques is making
More informationLecture 4: Dynamic models
linear s Lecture 4: s Hedibert Freitas Lopes The University of Chicago Booth School of Business 5807 South Woodlawn Avenue, Chicago, IL 60637 http://faculty.chicagobooth.edu/hedibert.lopes hlopes@chicagobooth.edu
More informationOn Reparametrization and the Gibbs Sampler
On Reparametrization and the Gibbs Sampler Jorge Carlos Román Department of Mathematics Vanderbilt University James P. Hobert Department of Statistics University of Florida March 2014 Brett Presnell Department
More informationBayesian Inverse problem, Data assimilation and Localization
Bayesian Inverse problem, Data assimilation and Localization Xin T Tong National University of Singapore ICIP, Singapore 2018 X.Tong Localization 1 / 37 Content What is Bayesian inverse problem? What is
More information