Reversible Markov chains
|
|
- Lenard Stevenson
- 5 years ago
- Views:
Transcription
1 Reversible Markov chains Variational representations and ordering Chris Sherlock Abstract This pedagogical document explains three variational representations that are useful when comparing the efficiencies of reversible Markov chains: (i) the Dirichlet form and the associated variational representations of the spectral gaps; (ii) a variational representation of the asymptotic variance of an ergodic average; and (iii) the conductance, and the equivalence of a non-zero conductance to a non-zero right spectral gap. Introduction This document relates to variational representations of aspects of a reversible Markov kernel, P, which has a limiting (hence, stationary) distribution, π. It is a record of my own learning, but I hope it might be useful to others. The central idea is the Dirichlet form, which can be thought of as a generalisation of expected squared jumping distance to cover all functions that are square-integrable with respect to π. The minimum (over all such functions with an expectation of and a variance of 1) of the Dirichlet forms is the right spectral gap. A key quantity of interest to researchers in MCMC is the asymptotic variance which relates to the the variance of an ergodic average, Var [ 1 n n i=1 f(x i) ] as n, and where X 1,..., X n arise from successive applications of P, and f is some square-integrable function. A second variational representation, of f, (I P ) 1 f, allows us to relate a limit of the variance of an ergodic average to the Dirichlet form. Finally, a third variational quantity, the conductance of a Markov kernel is introduced, and is also related to the Dirichlet form and hence to the variance of an ergodic average. I found these variational representations particular useful in two pieces of research that I performed, mostly in 16. Sherlock et al. (17) uses the first two representations, while Sherlock and Lee (17) uses conductance. In both pieces of work the variational representations allowed us to compare pairs of Markov kernels; if one Markov kernel has a particular property, such as the variance of an ergodic average being a particular value, and if we can relate aspects of this Markov kernel, such as its Dirichlet form or its conductance, to a sec- 1
2 ond Markov kernel, then we can often obtain a bound on the property of interest for the second kernel. This is not the only use of variational representations; e.g. in Lawler and Sokal (1988) conductance is used directly to obtain bounds on the spectral gap of several discrete-statespace Markov chains. The most natural framework for representing the kernel is that of a bounded, self-adjoint operator on a Hilbert space. I was almost entirely unfamiliar with this before I embarked on the above pieces of research and so I will start by setting down the key aspects of this. Preliminaries Hilbert Space Let L (π) be the Hilbert space of real functions that are square integrable with respect to some probability measure, π: π(dx)f(x) < f L (π), equipped with the inner product (finite through Cauchy-Schwarz) and associated norm: f, g = π(dx)f(x)g(x), f = f, f. Let L (π) L (π) be the Hilbert space that uses the same inner product but includes only functions with E π [f] = : π(dx)f(x) = f, 1 =. For these functions, f, g = Cov [f, g] and f, f = Var π [f]. Markov kernel and detailed balance Let {X t } t= be a Markov chain on a statespace X with a kernel of P (x, dy) which satisfies detailed balance with respect to π: π(dx)p (x, dy) = π(dy)p (y, dx).
3 For any measure, ν, we define νp := ν(dx)p (x, dy) Then πp = π(dx)p (x, dy) = π(dy)p (y, dx) = π, so π is stationary for P. x x x P is bounded and self adjoint Given a kernel (or operator ) P we use the shorthand: P f(x) = (P f)(x) := P (x, dy)f(y). Jensen s inequality gives [ [(P f)(x)] = P (x, dy)f(y)] P (x, dy)f(y) = (P f )(x). Hence, because π is stationary for P, P f = π(dx)[(p f)(x)] π(dx)(p f )(x) = π(dx)p (x, dy)f (y) = π(dy)f (y) = f. Thus P f / f 1, and P is a bounded linear operator. Further, if P satisfies detailed balance with respect to π P f, g = π(dx)p (x, dy)f(y)g(x) = π(dy)p (y, dx)f(y)g(x) = π(dx)p (x, dy)f(x)g(y) = f, P g ; P is self-adjoint. The spectrum of a bounded, self adjoint operator The spectrum of P in H is {ρ : P ρi is not invertible in H} 1 ; the spectrum of a bounded, self-adjoint operator is (accept it, but see below) a closed, bounded set on the real line; let 1 i.e. there is at least one g H such that there is no unique f H with (P ρi)f = g 3
4 the upper and lower bounds be λ max and λ min. When P is a self-adjoint Markov kernel and H = L (π), λ max = 1 and λ min 1. The spectral decomposition theorem for bounded, self-adjoint operators states that there is a finite positive measure, a (dλ) with support contained in the real interval [λ min, λ max ] such that f, P n f = λmax λ min λ n a (dλ). This decomposition is used twice hereafter; however it may be unfamiliar so the remainder of this section provides an intuition in terms of the eigenfunctions and eigenvalues of P. Let P be a bounded, self-adjoint operator. A right eigenfunction e of P is a function that satisfies P e = λe for some scalar, λ, the corresponding eigenvalue. Let e (x), e 1 (x),... be a set of eigenfunctions of P, scaled so that e i = e i, e i = 1, and with corresponding eigenvectors λ, λ 1,.... Since, by definition, (P λ i I)e i =, the spectrum is a superset of the set of eigenvalues. The intuition below comes from the case where the spectrum is precisely the set of eigenvalues, and the eigenfunctions span H. 1. Just as for the eigenvectors of a finite self-adjoint matrix, it is possible to choose the eigenfunctions such that e i, e j = (i j);. moreover, as with self-adjoint matrices, all of the eigenvalues are real. 3. Furthermore, since P is bounded, all of the eigenvalues satisfy λ 1. If the eigenfunctions of an operator, P, span the Hilbert space, H, for any f H, f = a i e i, i= where a i = f, e i, and from which f, P n f = a i a j e i, P n e j = i= j= a i λ n i. i= With n = we obtain i= a i = f <. Thus, if the eigenfunctions span the space then a is discrete with mass a i at λ i. For a Markov kernel, P 1 = 1, where 1 is the constant function and is, thus, a right eigenfunction with an eigenvalue of 1. 4
5 Spectral gaps and the Dirichlet form Spectrum of P in L (π); spectral gaps and geometric ergodicity For any kernel P, we define P k f = P k 1 P f, recursively. Any function f(x) L (π) can be written as a 1 + f (x) where f L (π). Here a = f, 1 = E π [f(x)]. Thus P n f = E π [f(x)] + P n f (x) E π [f(x)] if P is ergodic. To bound the size of the remainder term, consider the spectrum of P restricted to functions in L (π). This must be confined to [λ min, λ max ] with 1 λ min λ max 1, where (by definition) λ max := sup f L (π) f, P f f, f = sup f L (π): f,f =1 f, P f and λ min := inf f L (π): f,f =1 f, P f. Let λ = max( λ min, λ max P n f = f P n, P n f = f, P n f = ). The (squared) size of the remainder is then λ max λ min λ max λ n a (dλ) λ n λ min a (dλ) = λ n f. Thus P n f geometrically quickly provided λ < 1; i.e., provided the inequality is strict. P is then called geometrically ergodic. The right spectral gap is ρ right := 1 λ max and the left spectral gap is ρ left := 1+λ min. Both must be non-zero for geometric ergodicity. Henceforth, for notational simplicity, we drop the subscript in λ max/min. Aside: when one or both of the spectral gaps is zero (e.g. the spectrum is [ 1, 1]), then for any fixed n we can always find functions, f, with f = 1, (albeit fewer and fewer as n ) such that P n f >.1, say. Dirichlet form, E P (f) The concept of a spectral gap motivates the Dirichlet form for P and f, since ρ right = E P (f) := f, (I P )f = f, f f, P f, inf E P (f) and ρ left = sup E P (f). f L (π): f,f =1 f L (π): f,f =1 5
6 Directly from the definition we have for two kernels P 1 and P and some γ > : E P1 (f) γe P (f) f L (π) ρ right P 1 γρ right P. (1) An alternative expression for the Dirichlet form provides a very natural intuition: E P (f) = f, f f, P f = π(dx)p (x, dy)f(x)[f(x) f(y)] = π(dx)p (x, dy)f(y)[f(y) f(x)] = 1 π(dx)p (x, dy)[f(y) f(x)], () where the penultimate line follows because P satisfies detailed balance with respect to π and the final line arises from the average of the two preceding lines. The Dirichlet form can, therefore, be thought of as a generalisation of expected squared jumping distance of the ith component of x, π(dx)p (x, dy)(y i x i ), to consider the expected squared changes for any f L (π). Variance of an ergodic average Suppose that we are interested in E π [h(x)] for some h L (π) and we estimate it by an ] average of the values in the Markov chain: ĥn := 1 n n i=1 P i 1 h(x). Typically Var [ĥn as n, but scaling by n should keep it O(1). We are, therefore, interested in [ n ] Var(P, h) := lim Var ĥ n. n So as to just consider mixing, we assume X π. Without loss of generality we may assume h L (π) (else just subtract its expectation). Since P is time-homogeneous, [ n ] n Var h(x i ) = E [ h (X i ) ] n 1 n + E [h(x i ), h(x i+1 )] + E [h(x i ), h(x i+ )] i=1 i=1 i=1 + + E [h(x 1 ), h(x n )] = ne [ h (X) ] + (n 1)E [h(x 1 ), h(x )] + + E [h(x 1 ), h(x n )] = n h, h + (n 1) h, P h + (n ) h, P h + + h, P n 1 h. 6 i=1
7 Using (I P ) 1 to denote I + P + P + P , we obtain Var(P, h) = h, h + = h, P i h i=1 h, P i h h, h i= = h, (I P ) 1 h h, h, Writing h, h = h, (I P )(I P ) 1 h gives an equivalent form: Var(P, h) = h, (I + P )(I P ) 1 h. The spectral decomposition of P in L (π) gives Var(P, h) = λ max λ min 1 + λ 1 λ a (dλ) 1 + λmax 1 λ max λ max λ min a (dλ) = 1 + λmax 1 λ max Var π [h], (3) where the inequality follows since (1+λ)/(1 λ) is an increasing function of λ. The supremum of the spectrum of L (π) provides an efficiency bound over all h L (π). Variance bounding kernels The bound on the variance in (3) is in terms of λ max, not λ min. eigenvalue λ min Even if there exists an = 1, so the chain never converges (as opposed to the more usual case where the spectrum of P is [ 1, a] 1, but there is no eigenvalue at 1) the asymptotic variance can be finite. As an extreme example, consider the Markov chain with a transition matrix of ( 1 1 After n iterations this has been in state 1 exactly 1/ = π({1}) of the time. Roberts and Rosenthal (8) realised that in almost all applications of MCMC it was the variance of the ergodic average that was important, and not (directly) convergence to the target. Specifically, it did not matter if ρ left =. A kernel where ρ right > was termed variance bounding and this was shown to be equivalent to Var(P, f) < for all f L (π); i.e. if [hypothetical!] simple Monte Carlo would lead to the standard n rate of convergence of the ergodic average, a n rate would also be observed from the MCMC kernel. This is easy to see if the left and right spectral gaps of P are non-zero. When at least one gap is zero, for any β < 1, we have Var(βP, h) = h, h + (1 1/n)β h, P h + (1 /n)β h, P h + + (1 (n 1)/n)β n 1 h, P n 1 h h, h + i=1 βi h, P i h <. Letting β 1 then gives the result, even though Var(P, h) may be infinite. 7 ).
8 Variational representation of f, Q 1 f, and a key ordering Firstly, notice that for real numbers f, g and q, sup g fg g q is f /q, which is achieved when g = f/q. The same holds for the operator Q = I P when f and g are functions (see Appendix A for a proof): f, (I P ) 1 f = sup f, g g, (I P )g = sup f, g E P (g). g L (π) g L (π) To see this, think of a reversible matrix, P and diagonalise it; the result for such matrices is just equivalent to the scalar result applied to each eigenvalue. This leads to the alternative representation: Var(P, h) = sup 4 h, g E P (g) h, h. g L (π) Clearly, E P1 (g) E P (g) g L (π) Var(P 1, h) Var(P, h), giving a much simpler proof of a result which was originally proved for finite-statespace Markov chains in Peskun (1973) and then generalised to general statespaces by Tierney (1998). But we can go further: suppose that E P1 (g) γe P (g) for all g L (π) and some γ >, then, for any g L (π) h, g E P1 (g) h, g γe P (g) = 1 γ { h, g E P (g )}, where g := γg L (π). So and hence sup h, g E P1 (g) 1 g L (π) γ sup h, g E P (g), g L (π) E P1 (f) γe P (f) f L (π) Var(P 1, h) + h, h 1 γ {Var(P, h) + h, h }. (4) This result appears as Lemma 3 in Andrieu et al. (16) (see also Caracciolo et al., 199). Propose-accept-reject kernels Many kernels consist of making a proposal, which is then either accepted or rejected: P (x, dy) := q(x, dy)α(x, y) + (1 α(x))δ(y x), 8
9 where α(x) := q(x, dy)α(x, y) is the average acceptance probability from x. Then, from (), E P (f) = 1 π(dx)q(x, dy)α(x, y){f(y) f(x)}. The right spectral gap is, therefore, ρ right = 1 inf f L (π): f,f =1 π(dx)q(x, dy)α(x, y){f(y) f(x)}. There is a similar formula for the left spectral gap. Results (1) and (4) then lead directly to the following: if two propose-accept-reject kernels, both reversible with respect to π, satisfy q 1 (x, dy)α 1 (x, y) γq (x, dy)α (x, y) x, y then ρ right P 1 γρ right P and Var(P 1, h) + Var π [h] 1 γ {Var(P, h) + Var π [h]}. From (4), if for some γ > E P1 (f) γe P (f) f L (π) and P is variance bounding, then so is P 1. Unfortunately it is rare that we can be sure of the ordering of the Dirichlet forms for all f; indeed we might be sure that there is no fixed γ for which the ordering does hold. When we cannot simply resort to a uniform ordering of Dirichlet forms then the elegant concept of conductance can come to our aid. Conductance Consider any measurable set, A X. The conductance of A is defined as κ P (A) := 1 π(dx)p (x, dy), π(a) x A,y A c where π(a) := A π(dx). Loosely speaking, κ P (A) is the probability of moving to A c conditional on the chain currently following the stationary distribution truncated to A: π(dx)1 x A. Our analysis of the properties of κ P (A) will be via the symmetric quantity κ P (A) = κ P (A c 1 ) := π(dx)p (x, dy) = κ P (A) π(a)π(a c ) x A,y A π(a c c ). 9
10 The first equality follows because the kernel is reversible with respect to π, and this also implies that π(a)κ P (A) = π(a c )κ P (A c ). Since κ P (A c ) 1, κ P (A) π(a c )/π(a), which can be arbitrarily small. To define the conductance of the kernel P we therefore only consider sets A with π(a) 1/: No such restriction is required for: κ P := inf κ(a). A:π(A) 1/ κ P := inf A κ P (A). Further, since π(a) 1/ 1/ π(a c ) 1, we have κ P κ P κ P. Setting f A (x) = [1 x A π(a)]/ π(a)π(a c ) (so that f A L (π) and f A, f A = 1) in the expression for the Dirichlet form () gives { 1 E P (f A ) = π(dx)p (x, dy) + π(a)π(a c ) x A,y A c So ρ right = x A c,y A } π(dx)p (x, dy) = κ P (A) = κ P (A) π(a c ). inf E P (f) inf E f L (π): f,f =1 P (f A ) = κ P κ P. (5) A Hence, if P has a right spectral gap (so it is variance bounding) then its conductance is non-zero. Amazingly, the converse is also true: if the conductance of P is non-zero then P has a right spectral gap: ρ right κ P κ P. (6) Thus, non-zero conductance is equivalent to a non-zero right spectral gap is equivalent to variance bounding. Equations (5) and (6) together are sometimes called Cheeger bounds. A proof (6) for finite Markov chains is given in Diaconis and Stroock (1991). A more accessible and, as far as I can see, more general, proof of the looser inequality, that ρ right κ P /8 is given in Lawler and Sokal (1988). This has a standard bit which uses the Cauchy- Schwarz inequality to obtain an inequality for ρ right, a beautiful bit which relates the expression from the standard bit to conductance, and then an ugly bit, which proves that the final expression is always greater than if κ P >. I will follow Lawler and Sokal (1988) for the first two parts and then provide a neater solution to the final part. Denote the symmetric measure π(dx)p (x, dy) by µ(dx, dy). We also set g(x) = f(x) + c for some (currently) arbitrary real constant, c. 1
11 The standard bit. Then Cauchy-Schwarz (twice) and the symmetry of µ gives: E µ [ g(x) g(y ) ] = E µ [ g(x) g(y ) g(x) + g(y ) ] [ E ] [ µ {g(x) g(y )} E ] µ {g(x) + g(y )} [ E ] [ µ {f(x) f(y )} E µ g(x) + g(y ) ] [ = 4E ] [ µ {f(x) f(y )} E ] π g(x). So The beautiful bit. We now relate the numerator of the above expression to the conductance. E P (f) 1 8E π [g(x) ] E [ µ g(x) g(y ) ]. The proof is symmetrical, the first half manipulates the denominator so that conductance may be used, with the second half reversing the route of the first. Set A t := {x : g(x) t}. Then by the symmetry of µ, µ(dx, dy) g(y) g(x) = µ(dx, dy)1 g(x) <g(y) {g(y) g(x) } X X X X = µ(dx, dy) dt1 g(x) t<g(y) X X t= = dt µ(dx, dy)1 g(x) t<g(y) t= X X = dt µ(dx, dy) So But since this is true for all c, we have ρ right κ P 8 κ P = κ P t= t= X X A t A c T dt A t A c t π(dx)π(dy) π(dx)π(dy) g(y) g(x). E P (f) κ P E π π [ g(x) g(y ) ]. 8E π [g(x) ] inf sup f L (π): f,f =1 c E π π [ {f(y ) + c} {f(x) + c} ]. E π [{f(x) + c} ] A less ugly bit. Since E π [{f(x) + c} ] = 1 + c, we need to show that E π π [ {f(y ) + c} {f(x) + c} ] inf sup f L (π): f,f =1 c 1 + c 1. 11
12 Let A = f(x) and B = f(y ) be independent and identically distributed with an expectation of and a variance of 1. We are interested in Below, I will show that E [ {A + c} {B + c} ]. (7) 1 + c E [ A B ] + 4E [ A B ]. (8) Then, as in Lawler and Sokal (1988), consider letting c or setting c =. With the former: E [ {A + c} {B + c} ] lim c 1 + c = E [ A B ], so if E [ A B ] > 1/ we are done. If not, setting c = in (7) and using (8) leaves us: To prove (8): E [ A B ] = But = P ( B > t + a ) dt = E [ A B ] 1. P ( A B > t ) dt = P ( A > t + B ) + P ( B > t + A ) dt P ( B > t + A ) [ dt = E A P ( B > t + A ) ] dt A. = 1 a P ( B > v ) dv = a 1 a up ( B > u) du P ( B > v ) dv P ( B > u) = 1 a E [ B ]. Combining the two end results gives E [ A B ] 4E [ A ] E [ B ]; i.e. E [ A B ] + 4E [ A ]. a P ( B > v ) dv However Jensen s inequality provides: E [ A B ] = E [E [ A B ] A] E [ A E [B] ] = E [ A ], and (8) follows. Acknowledgements I am grateful to Dr. Daniel Elton for providing the proof in Appendix A and Dr. Dootika Vats for spotting an error in an earlier version of this document. 1
13 A Variational representation of f, Q 1 f Let H be a Hilbert space and let Q be a positive operator on H (i.e., an operator that has a square root). Then for f H, f, Q 1 f = sup Re( f, g ) g, Qg. g H (Our Hilbert space is real, so we do not need the Re() function.) Proof Let φ = Q 1 f, so f = Qφ. The left-hand side is then Qφ, φ = Q 1/ φ, Q 1/ φ = Q 1/ φ Q 1/ φ Q 1/ (φ g) = Re( Q 1/ φ, Q 1/ g ) Q 1/ g = Re( Qφ, g ) g, Qg = Re( f, g ) g, Qg. The construction of the inequality shows that the supremum is achieved at g = Q 1 f. References Andrieu, C., Lee, A., and Vihola, M. (16). Uniform ergodicity of the iterated conditional SMC and geometric ergodicity of particle Gibbs samplers. Bernoulli. to appear. Caracciolo, S., Pelissetto, A., and Sokal, A. D. (199). Nonlocal monte carlo algorithm for self-avoiding walks with fixed endpoints. Journal of Statistical Physics, 6(1):1 53. Diaconis, P. and Stroock, D. (1991). Geometric bounds for eigenvalues of Markov chains. Ann. Appl. Probab., 1(1): Lawler, G. F. and Sokal, A. D. (1988). Bounds on the L spectrum for Markov chains and Markov processes: a generalization of Cheeger s inequality. Trans. Amer. Math. Soc., 39(): Peskun, P. H. (1973). Optimum Monte-Carlo sampling using Markov chains. Biometrika, 6: Roberts, G. O. and Rosenthal, J. S. (8). Variance bounding Markov chains. Ann. Appl. Probab., 18(3):
14 Sherlock, C. and Lee, A. (17). Variance bounding of delayed-acceptance kernels. ArXiv e-prints. Sherlock, C., Thiery, A., and Lee, A. (17). Pseudo-marginal Metropolis Hastings using averages of unbiased estimators. Biometrika. Accepted subject to minor revisions. Tierney, L. (1998). A note on Metropolis Hastings kernels for general state spaces. Ann. Appl. Probab., 8(1):
University of Toronto Department of Statistics
Norm Comparisons for Data Augmentation by James P. Hobert Department of Statistics University of Florida and Jeffrey S. Rosenthal Department of Statistics University of Toronto Technical Report No. 0704
More informationSpectral properties of Markov operators in Markov chain Monte Carlo
Spectral properties of Markov operators in Markov chain Monte Carlo Qian Qin Advisor: James P. Hobert October 2017 1 Introduction Markov chain Monte Carlo (MCMC) is an indispensable tool in Bayesian statistics.
More informationOn Reparametrization and the Gibbs Sampler
On Reparametrization and the Gibbs Sampler Jorge Carlos Román Department of Mathematics Vanderbilt University James P. Hobert Department of Statistics University of Florida March 2014 Brett Presnell Department
More information215 Problem 1. (a) Define the total variation distance µ ν tv for probability distributions µ, ν on a finite set S. Show that
15 Problem 1. (a) Define the total variation distance µ ν tv for probability distributions µ, ν on a finite set S. Show that µ ν tv = (1/) x S µ(x) ν(x) = x S(µ(x) ν(x)) + where a + = max(a, 0). Show that
More informationVariance Bounding Markov Chains
Variance Bounding Markov Chains by Gareth O. Roberts * and Jeffrey S. Rosenthal ** (September 2006; revised April 2007.) Abstract. We introduce a new property of Markov chains, called variance bounding.
More informationSTA 294: Stochastic Processes & Bayesian Nonparametrics
MARKOV CHAINS AND CONVERGENCE CONCEPTS Markov chains are among the simplest stochastic processes, just one step beyond iid sequences of random variables. Traditionally they ve been used in modelling a
More informationPseudo-marginal Metropolis-Hastings: a simple explanation and (partial) review of theory
Pseudo-arginal Metropolis-Hastings: a siple explanation and (partial) review of theory Chris Sherlock Motivation Iagine a stochastic process V which arises fro soe distribution with density p(v θ ). Iagine
More informationIntroduction to Machine Learning CMU-10701
Introduction to Machine Learning CMU-10701 Markov Chain Monte Carlo Methods Barnabás Póczos & Aarti Singh Contents Markov Chain Monte Carlo Methods Goal & Motivation Sampling Rejection Importance Markov
More informationSome Results on the Ergodicity of Adaptive MCMC Algorithms
Some Results on the Ergodicity of Adaptive MCMC Algorithms Omar Khalil Supervisor: Jeffrey Rosenthal September 2, 2011 1 Contents 1 Andrieu-Moulines 4 2 Roberts-Rosenthal 7 3 Atchadé and Fort 8 4 Relationship
More informationMarkov Chain Monte Carlo (MCMC)
Markov Chain Monte Carlo (MCMC Dependent Sampling Suppose we wish to sample from a density π, and we can evaluate π as a function but have no means to directly generate a sample. Rejection sampling can
More informationMonte Carlo methods for sampling-based Stochastic Optimization
Monte Carlo methods for sampling-based Stochastic Optimization Gersende FORT LTCI CNRS & Telecom ParisTech Paris, France Joint works with B. Jourdain, T. Lelièvre, G. Stoltz from ENPC and E. Kuhn from
More informationA Geometric Interpretation of the Metropolis Hastings Algorithm
Statistical Science 2, Vol. 6, No., 5 9 A Geometric Interpretation of the Metropolis Hastings Algorithm Louis J. Billera and Persi Diaconis Abstract. The Metropolis Hastings algorithm transforms a given
More informationImproving the Asymptotic Performance of Markov Chain Monte-Carlo by Inserting Vortices
Improving the Asymptotic Performance of Markov Chain Monte-Carlo by Inserting Vortices Yi Sun IDSIA Galleria, Manno CH-98, Switzerland yi@idsia.ch Faustino Gomez IDSIA Galleria, Manno CH-98, Switzerland
More informationON CONVERGENCE RATES OF GIBBS SAMPLERS FOR UNIFORM DISTRIBUTIONS
The Annals of Applied Probability 1998, Vol. 8, No. 4, 1291 1302 ON CONVERGENCE RATES OF GIBBS SAMPLERS FOR UNIFORM DISTRIBUTIONS By Gareth O. Roberts 1 and Jeffrey S. Rosenthal 2 University of Cambridge
More informationA Dirichlet Form approach to MCMC Optimal Scaling
A Dirichlet Form approach to MCMC Optimal Scaling Giacomo Zanella, Wilfrid S. Kendall, and Mylène Bédard. g.zanella@warwick.ac.uk, w.s.kendall@warwick.ac.uk, mylene.bedard@umontreal.ca Supported by EPSRC
More information6 Markov Chain Monte Carlo (MCMC)
6 Markov Chain Monte Carlo (MCMC) The underlying idea in MCMC is to replace the iid samples of basic MC methods, with dependent samples from an ergodic Markov chain, whose limiting (stationary) distribution
More informationMarkov Chain Monte Carlo
Chapter 5 Markov Chain Monte Carlo MCMC is a kind of improvement of the Monte Carlo method By sampling from a Markov chain whose stationary distribution is the desired sampling distributuion, it is possible
More informationLinear Algebra Massoud Malek
CSUEB Linear Algebra Massoud Malek Inner Product and Normed Space In all that follows, the n n identity matrix is denoted by I n, the n n zero matrix by Z n, and the zero vector by θ n An inner product
More informationAdaptive Monte Carlo methods
Adaptive Monte Carlo methods Jean-Michel Marin Projet Select, INRIA Futurs, Université Paris-Sud joint with Randal Douc (École Polytechnique), Arnaud Guillin (Université de Marseille) and Christian Robert
More informationComputer Vision Group Prof. Daniel Cremers. 11. Sampling Methods: Markov Chain Monte Carlo
Group Prof. Daniel Cremers 11. Sampling Methods: Markov Chain Monte Carlo Markov Chain Monte Carlo In high-dimensional spaces, rejection sampling and importance sampling are very inefficient An alternative
More informationA quick introduction to Markov chains and Markov chain Monte Carlo (revised version)
A quick introduction to Markov chains and Markov chain Monte Carlo (revised version) Rasmus Waagepetersen Institute of Mathematical Sciences Aalborg University 1 Introduction These notes are intended to
More informationThe simple slice sampler is a specialised type of MCMC auxiliary variable method (Swendsen and Wang, 1987; Edwards and Sokal, 1988; Besag and Green, 1
Recent progress on computable bounds and the simple slice sampler by Gareth O. Roberts* and Jerey S. Rosenthal** (May, 1999.) This paper discusses general quantitative bounds on the convergence rates of
More informationLECTURE 15 Markov chain Monte Carlo
LECTURE 15 Markov chain Monte Carlo There are many settings when posterior computation is a challenge in that one does not have a closed form expression for the posterior distribution. Markov chain Monte
More informationThe Monte Carlo Computation Error of Transition Probabilities
Zuse Institute Berlin Takustr. 7 495 Berlin Germany ADAM NILSN The Monte Carlo Computation rror of Transition Probabilities ZIB Report 6-37 (July 206) Zuse Institute Berlin Takustr. 7 D-495 Berlin Telefon:
More informationBrief introduction to Markov Chain Monte Carlo
Brief introduction to Department of Probability and Mathematical Statistics seminar Stochastic modeling in economics and finance November 7, 2011 Brief introduction to Content 1 and motivation Classical
More informationA NOTE ON THE POINCARÉ AND CHEEGER INEQUALITIES FOR SIMPLE RANDOM WALK ON A CONNECTED GRAPH. John Pike
A NOTE ON THE POINCARÉ AND CHEEGER INEQUALITIES FOR SIMPLE RANDOM WALK ON A CONNECTED GRAPH John Pike Department of Mathematics University of Southern California jpike@usc.edu Abstract. In 1991, Persi
More informationVector Spaces. Vector space, ν, over the field of complex numbers, C, is a set of elements a, b,..., satisfying the following axioms.
Vector Spaces Vector space, ν, over the field of complex numbers, C, is a set of elements a, b,..., satisfying the following axioms. For each two vectors a, b ν there exists a summation procedure: a +
More informationConvex Optimization CMU-10725
Convex Optimization CMU-10725 Simulated Annealing Barnabás Póczos & Ryan Tibshirani Andrey Markov Markov Chains 2 Markov Chains Markov chain: Homogen Markov chain: 3 Markov Chains Assume that the state
More informationMonte Carlo simulation inspired by computational optimization. Colin Fox Al Parker, John Bardsley MCQMC Feb 2012, Sydney
Monte Carlo simulation inspired by computational optimization Colin Fox fox@physics.otago.ac.nz Al Parker, John Bardsley MCQMC Feb 2012, Sydney Sampling from π(x) Consider : x is high-dimensional (10 4
More informationLecture 12: Detailed balance and Eigenfunction methods
Miranda Holmes-Cerfon Applied Stochastic Analysis, Spring 2015 Lecture 12: Detailed balance and Eigenfunction methods Readings Recommended: Pavliotis [2014] 4.5-4.7 (eigenfunction methods and reversibility),
More informationMarkov chain convergence: from finite to infinite. Jeffrey S. Rosenthal*
Markov chain convergence: from finite to infinite by Jeffrey S. Rosenthal* Department of Statistics, University of Toronto, Toronto, Ontario, Canada M5S 1A1 Phone: (416) 978-4594. Internet: jeff@utstat.toronto.edu
More informationGeometric Ergodicity and Hybrid Markov Chains
Geometric Ergodicity and Hybrid Markov Chains by Gareth O. Roberts* and Jeffrey S. Rosenthal** (August 1, 1996; revised, April 11, 1997.) Abstract. Various notions of geometric ergodicity for Markov chains
More informationSampling from complex probability distributions
Sampling from complex probability distributions Louis J. M. Aslett (louis.aslett@durham.ac.uk) Department of Mathematical Sciences Durham University UTOPIAE Training School II 4 July 2017 1/37 Motivation
More informationFunctional Analysis Review
Outline 9.520: Statistical Learning Theory and Applications February 8, 2010 Outline 1 2 3 4 Vector Space Outline A vector space is a set V with binary operations +: V V V and : R V V such that for all
More informationLecture 8: The Metropolis-Hastings Algorithm
30.10.2008 What we have seen last time: Gibbs sampler Key idea: Generate a Markov chain by updating the component of (X 1,..., X p ) in turn by drawing from the full conditionals: X (t) j Two drawbacks:
More informationMath 127C, Spring 2006 Final Exam Solutions. x 2 ), g(y 1, y 2 ) = ( y 1 y 2, y1 2 + y2) 2. (g f) (0) = g (f(0))f (0).
Math 27C, Spring 26 Final Exam Solutions. Define f : R 2 R 2 and g : R 2 R 2 by f(x, x 2 (sin x 2 x, e x x 2, g(y, y 2 ( y y 2, y 2 + y2 2. Use the chain rule to compute the matrix of (g f (,. By the chain
More informationPseudo-marginal MCMC methods for inference in latent variable models
Pseudo-marginal MCMC methods for inference in latent variable models Arnaud Doucet Department of Statistics, Oxford University Joint work with George Deligiannidis (Oxford) & Mike Pitt (Kings) MCQMC, 19/08/2016
More informationReview and problem list for Applied Math I
Review and problem list for Applied Math I (This is a first version of a serious review sheet; it may contain errors and it certainly omits a number of topic which were covered in the course. Let me know
More informationA regeneration proof of the central limit theorem for uniformly ergodic Markov chains
A regeneration proof of the central limit theorem for uniformly ergodic Markov chains By AJAY JASRA Department of Mathematics, Imperial College London, SW7 2AZ, London, UK and CHAO YANG Department of Mathematics,
More informationHilbert Spaces. Hilbert space is a vector space with some extra structure. We start with formal (axiomatic) definition of a vector space.
Hilbert Spaces Hilbert space is a vector space with some extra structure. We start with formal (axiomatic) definition of a vector space. Vector Space. Vector space, ν, over the field of complex numbers,
More informationChapter 7. Markov chain background. 7.1 Finite state space
Chapter 7 Markov chain background A stochastic process is a family of random variables {X t } indexed by a varaible t which we will think of as time. Time can be discrete or continuous. We will only consider
More informationMathematical Analysis Outline. William G. Faris
Mathematical Analysis Outline William G. Faris January 8, 2007 2 Chapter 1 Metric spaces and continuous maps 1.1 Metric spaces A metric space is a set X together with a real distance function (x, x ) d(x,
More informationMATH 51H Section 4. October 16, Recall what it means for a function between metric spaces to be continuous:
MATH 51H Section 4 October 16, 2015 1 Continuity Recall what it means for a function between metric spaces to be continuous: Definition. Let (X, d X ), (Y, d Y ) be metric spaces. A function f : X Y is
More informationAn introduction to adaptive MCMC
An introduction to adaptive MCMC Gareth Roberts MIRAW Day on Monte Carlo methods March 2011 Mainly joint work with Jeff Rosenthal. http://www2.warwick.ac.uk/fac/sci/statistics/crism/ Conferences and workshops
More informationAdaptive Markov Chain Monte Carlo: Theory and Methods
Chapter Adaptive Markov Chain Monte Carlo: Theory and Methods Yves Atchadé, Gersende Fort and Eric Moulines 2, Pierre Priouret 3. Introduction Markov chain Monte Carlo (MCMC methods allow to generate samples
More informationQuantitative Non-Geometric Convergence Bounds for Independence Samplers
Quantitative Non-Geometric Convergence Bounds for Independence Samplers by Gareth O. Roberts * and Jeffrey S. Rosenthal ** (September 28; revised July 29.) 1. Introduction. Markov chain Monte Carlo (MCMC)
More informationStochastic optimization Markov Chain Monte Carlo
Stochastic optimization Markov Chain Monte Carlo Ethan Fetaya Weizmann Institute of Science 1 Motivation Markov chains Stationary distribution Mixing time 2 Algorithms Metropolis-Hastings Simulated Annealing
More informationThe Particle Filter. PD Dr. Rudolph Triebel Computer Vision Group. Machine Learning for Computer Vision
The Particle Filter Non-parametric implementation of Bayes filter Represents the belief (posterior) random state samples. by a set of This representation is approximate. Can represent distributions that
More informationAsymptotics and Simulation of Heavy-Tailed Processes
Asymptotics and Simulation of Heavy-Tailed Processes Department of Mathematics Stockholm, Sweden Workshop on Heavy-tailed Distributions and Extreme Value Theory ISI Kolkata January 14-17, 2013 Outline
More informationConvergence of Conditional Metropolis-Hastings Samplers
Convergence of Conditional Metropolis-Hastings Samplers Galin L. Jones Gareth O. Roberts Jeffrey S. Rosenthal (June, 2012; revised March 2013 and June 2013) Abstract We consider Markov chain Monte Carlo
More informationMarkov chain Monte Carlo
Markov chain Monte Carlo Karl Oskar Ekvall Galin L. Jones University of Minnesota March 12, 2019 Abstract Practically relevant statistical models often give rise to probability distributions that are analytically
More informationIntroduction to Machine Learning CMU-10701
Introduction to Machine Learning CMU-10701 Markov Chain Monte Carlo Methods Barnabás Póczos Contents Markov Chain Monte Carlo Methods Sampling Rejection Importance Hastings-Metropolis Gibbs Markov Chains
More informationMarkov chain Monte Carlo Lecture 9
Markov chain Monte Carlo Lecture 9 David Sontag New York University Slides adapted from Eric Xing and Qirong Ho (CMU) Limitations of Monte Carlo Direct (unconditional) sampling Hard to get rare events
More informationSimultaneous drift conditions for Adaptive Markov Chain Monte Carlo algorithms
Simultaneous drift conditions for Adaptive Markov Chain Monte Carlo algorithms Yan Bai Feb 2009; Revised Nov 2009 Abstract In the paper, we mainly study ergodicity of adaptive MCMC algorithms. Assume that
More informationInference in state-space models with multiple paths from conditional SMC
Inference in state-space models with multiple paths from conditional SMC Sinan Yıldırım (Sabancı) joint work with Christophe Andrieu (Bristol), Arnaud Doucet (Oxford) and Nicolas Chopin (ENSAE) September
More informationMARKOV CHAIN MONTE CARLO
MARKOV CHAIN MONTE CARLO RYAN WANG Abstract. This paper gives a brief introduction to Markov Chain Monte Carlo methods, which offer a general framework for calculating difficult integrals. We start with
More informationModern Discrete Probability Spectral Techniques
Modern Discrete Probability VI - Spectral Techniques Background Sébastien Roch UW Madison Mathematics December 22, 2014 1 Review 2 3 4 Mixing time I Theorem (Convergence to stationarity) Consider a finite
More informationMath 5520 Homework 2 Solutions
Math 552 Homework 2 Solutions March, 26. Consider the function fx) = 2x ) 3 if x, 3x ) 2 if < x 2. Determine for which k there holds f H k, 2). Find D α f for α k. Solution. We show that k = 2. The formulas
More informationGeometric Convergence Rates for Time-Sampled Markov Chains
Geometric Convergence Rates for Time-Sampled Markov Chains by Jeffrey S. Rosenthal* (April 5, 2002; last revised April 17, 2003) Abstract. We consider time-sampled Markov chain kernels, of the form P µ
More informationControlled sequential Monte Carlo
Controlled sequential Monte Carlo Jeremy Heng, Department of Statistics, Harvard University Joint work with Adrian Bishop (UTS, CSIRO), George Deligiannidis & Arnaud Doucet (Oxford) Bayesian Computation
More information16 : Approximate Inference: Markov Chain Monte Carlo
10-708: Probabilistic Graphical Models 10-708, Spring 2017 16 : Approximate Inference: Markov Chain Monte Carlo Lecturer: Eric P. Xing Scribes: Yuan Yang, Chao-Ming Yen 1 Introduction As the target distribution
More informationGeometric ρ-mixing property of the interarrival times of a stationary Markovian Arrival Process
Author manuscript, published in "Journal of Applied Probability 50, 2 (2013) 598-601" Geometric ρ-mixing property of the interarrival times of a stationary Markovian Arrival Process L. Hervé and J. Ledoux
More informationThe Laplacian PDF Distance: A Cost Function for Clustering in a Kernel Feature Space
The Laplacian PDF Distance: A Cost Function for Clustering in a Kernel Feature Space Robert Jenssen, Deniz Erdogmus 2, Jose Principe 2, Torbjørn Eltoft Department of Physics, University of Tromsø, Norway
More informationGEOMETRIC ERGODICITY AND HYBRID MARKOV CHAINS
Elect. Comm. in Probab. 2 (1997) 13 25 ELECTRONIC COMMUNICATIONS in PROBABILITY GEOMETRIC ERGODICITY AND HYBRID MARKOV CHAINS GARETH O. ROBERTS Statistical Laboratory, University of Cambridge Cambridge
More informationMonte Carlo Methods. Leon Gu CSD, CMU
Monte Carlo Methods Leon Gu CSD, CMU Approximate Inference EM: y-observed variables; x-hidden variables; θ-parameters; E-step: q(x) = p(x y, θ t 1 ) M-step: θ t = arg max E q(x) [log p(y, x θ)] θ Monte
More informationKernel Sequential Monte Carlo
Kernel Sequential Monte Carlo Ingmar Schuster (Paris Dauphine) Heiko Strathmann (University College London) Brooks Paige (Oxford) Dino Sejdinovic (Oxford) * equal contribution April 25, 2016 1 / 37 Section
More informationMarkov Chain Monte Carlo Using the Ratio-of-Uniforms Transformation. Luke Tierney Department of Statistics & Actuarial Science University of Iowa
Markov Chain Monte Carlo Using the Ratio-of-Uniforms Transformation Luke Tierney Department of Statistics & Actuarial Science University of Iowa Basic Ratio of Uniforms Method Introduced by Kinderman and
More informationMarkov Chain Monte Carlo
Markov Chain Monte Carlo Recall: To compute the expectation E ( h(y ) ) we use the approximation E(h(Y )) 1 n n h(y ) t=1 with Y (1),..., Y (n) h(y). Thus our aim is to sample Y (1),..., Y (n) from f(y).
More informationOptimum Monte-Carlo sampling using Markov chains
Biometrika (1973), 60, 3, p. 607 607 Printed in Great Britain Optimum Monte-Carlo sampling using Markov chains BY P. H. PESKUN York University, Toronto SUMMABY The sampling method proposed by Metropolis
More informationThe Pennsylvania State University The Graduate School RATIO-OF-UNIFORMS MARKOV CHAIN MONTE CARLO FOR GAUSSIAN PROCESS MODELS
The Pennsylvania State University The Graduate School RATIO-OF-UNIFORMS MARKOV CHAIN MONTE CARLO FOR GAUSSIAN PROCESS MODELS A Thesis in Statistics by Chris Groendyke c 2008 Chris Groendyke Submitted in
More informationComputer Practical: Metropolis-Hastings-based MCMC
Computer Practical: Metropolis-Hastings-based MCMC Andrea Arnold and Franz Hamilton North Carolina State University July 30, 2016 A. Arnold / F. Hamilton (NCSU) MH-based MCMC July 30, 2016 1 / 19 Markov
More information08a. Operators on Hilbert spaces. 1. Boundedness, continuity, operator norms
(February 24, 2017) 08a. Operators on Hilbert spaces Paul Garrett garrett@math.umn.edu http://www.math.umn.edu/ garrett/ [This document is http://www.math.umn.edu/ garrett/m/real/notes 2016-17/08a-ops
More informationarxiv:math/ v1 [math.pr] 31 Jan 2001
Chin. Sci. Bulletin, 1999, 44(23), 2465 2470 (Chinese Ed.); 2000, 45:9 (English Ed.), 769 774 Eigenvalues, inequalities and ergodic theory arxiv:math/0101257v1 [math.pr] 31 Jan 2001 Mu-Fa Chen (Beijing
More informationMarkov chain Monte Carlo
Markov chain Monte Carlo Probabilistic Models of Cognition, 2011 http://www.ipam.ucla.edu/programs/gss2011/ Roadmap: Motivation Monte Carlo basics What is MCMC? Metropolis Hastings and Gibbs...more tomorrow.
More informationKernel adaptive Sequential Monte Carlo
Kernel adaptive Sequential Monte Carlo Ingmar Schuster (Paris Dauphine) Heiko Strathmann (University College London) Brooks Paige (Oxford) Dino Sejdinovic (Oxford) December 7, 2015 1 / 36 Section 1 Outline
More informationEmbeddings of finite metric spaces in Euclidean space: a probabilistic view
Embeddings of finite metric spaces in Euclidean space: a probabilistic view Yuval Peres May 11, 2006 Talk based on work joint with: Assaf Naor, Oded Schramm and Scott Sheffield Definition: An invertible
More informationMarkov Chain Monte Carlo (MCMC)
School of Computer Science 10-708 Probabilistic Graphical Models Markov Chain Monte Carlo (MCMC) Readings: MacKay Ch. 29 Jordan Ch. 21 Matt Gormley Lecture 16 March 14, 2016 1 Homework 2 Housekeeping Due
More informationSecond Order Elliptic PDE
Second Order Elliptic PDE T. Muthukumar tmk@iitk.ac.in December 16, 2014 Contents 1 A Quick Introduction to PDE 1 2 Classification of Second Order PDE 3 3 Linear Second Order Elliptic Operators 4 4 Periodic
More informationMarkov chain Monte Carlo
Markov chain Monte Carlo Feng Li feng.li@cufe.edu.cn School of Statistics and Mathematics Central University of Finance and Economics Revised on April 24, 2017 Today we are going to learn... 1 Markov Chains
More informationMath Stochastic Processes & Simulation. Davar Khoshnevisan University of Utah
Math 5040 1 Stochastic Processes & Simulation Davar Khoshnevisan University of Utah Module 1 Generation of Discrete Random Variables Just about every programming language and environment has a randomnumber
More informationAPPENDIX A. Background Mathematics. A.1 Linear Algebra. Vector algebra. Let x denote the n-dimensional column vector with components x 1 x 2.
APPENDIX A Background Mathematics A. Linear Algebra A.. Vector algebra Let x denote the n-dimensional column vector with components 0 x x 2 B C @. A x n Definition 6 (scalar product). The scalar product
More informationThe following definition is fundamental.
1. Some Basics from Linear Algebra With these notes, I will try and clarify certain topics that I only quickly mention in class. First and foremost, I will assume that you are familiar with many basic
More informationFall TMA4145 Linear Methods. Solutions to exercise set 9. 1 Let X be a Hilbert space and T a bounded linear operator on X.
TMA445 Linear Methods Fall 26 Norwegian University of Science and Technology Department of Mathematical Sciences Solutions to exercise set 9 Let X be a Hilbert space and T a bounded linear operator on
More informationOn the Applicability of Regenerative Simulation in Markov Chain Monte Carlo
On the Applicability of Regenerative Simulation in Markov Chain Monte Carlo James P. Hobert 1, Galin L. Jones 2, Brett Presnell 1, and Jeffrey S. Rosenthal 3 1 Department of Statistics University of Florida
More informationMarkov chain Monte Carlo
Markov chain Monte Carlo Peter Beerli October 10, 2005 [this chapter is highly influenced by chapter 1 in Markov chain Monte Carlo in Practice, eds Gilks W. R. et al. Chapman and Hall/CRC, 1996] 1 Short
More information6.842 Randomness and Computation March 3, Lecture 8
6.84 Randomness and Computation March 3, 04 Lecture 8 Lecturer: Ronitt Rubinfeld Scribe: Daniel Grier Useful Linear Algebra Let v = (v, v,..., v n ) be a non-zero n-dimensional row vector and P an n n
More informationLecture 12: Detailed balance and Eigenfunction methods
Lecture 12: Detailed balance and Eigenfunction methods Readings Recommended: Pavliotis [2014] 4.5-4.7 (eigenfunction methods and reversibility), 4.2-4.4 (explicit examples of eigenfunction methods) Gardiner
More informationComputer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo
Group Prof. Daniel Cremers 10a. Markov Chain Monte Carlo Markov Chain Monte Carlo In high-dimensional spaces, rejection sampling and importance sampling are very inefficient An alternative is Markov Chain
More informationOne Pseudo-Sample is Enough in Approximate Bayesian Computation MCMC
Biometrika (?), 99, 1, pp. 1 10 Printed in Great Britain Submitted to Biometrika One Pseudo-Sample is Enough in Approximate Bayesian Computation MCMC BY LUKE BORNN, NATESH PILLAI Department of Statistics,
More informationLECTURE 7. k=1 (, v k)u k. Moreover r
LECTURE 7 Finite rank operators Definition. T is said to be of rank r (r < ) if dim T(H) = r. The class of operators of rank r is denoted by K r and K := r K r. Theorem 1. T K r iff T K r. Proof. Let T
More information25.1 Markov Chain Monte Carlo (MCMC)
CS880: Approximations Algorithms Scribe: Dave Andrzejewski Lecturer: Shuchi Chawla Topic: Approx counting/sampling, MCMC methods Date: 4/4/07 The previous lecture showed that, for self-reducible problems,
More informationBayesian Inference in GLMs. Frequentists typically base inferences on MLEs, asymptotic confidence
Bayesian Inference in GLMs Frequentists typically base inferences on MLEs, asymptotic confidence limits, and log-likelihood ratio tests Bayesians base inferences on the posterior distribution of the unknowns
More informationSpring 2012 Math 541B Exam 1
Spring 2012 Math 541B Exam 1 1. A sample of size n is drawn without replacement from an urn containing N balls, m of which are red and N m are black; the balls are otherwise indistinguishable. Let X denote
More informationNormed and Banach spaces
Normed and Banach spaces László Erdős Nov 11, 2006 1 Norms We recall that the norm is a function on a vectorspace V, : V R +, satisfying the following properties x + y x + y cx = c x x = 0 x = 0 We always
More informationPCMI LECTURE NOTES ON PROPERTY (T ), EXPANDER GRAPHS AND APPROXIMATE GROUPS (PRELIMINARY VERSION)
PCMI LECTURE NOTES ON PROPERTY (T ), EXPANDER GRAPHS AND APPROXIMATE GROUPS (PRELIMINARY VERSION) EMMANUEL BREUILLARD 1. Lecture 1, Spectral gaps for infinite groups and non-amenability The final aim of
More informationd(x n, x) d(x n, x nk ) + d(x nk, x) where we chose any fixed k > N
Problem 1. Let f : A R R have the property that for every x A, there exists ɛ > 0 such that f(t) > ɛ if t (x ɛ, x + ɛ) A. If the set A is compact, prove there exists c > 0 such that f(x) > c for all x
More informationUniversity of Toronto Department of Statistics
Optimal Proposal Distributions and Adaptive MCMC by Jeffrey S. Rosenthal Department of Statistics University of Toronto Technical Report No. 0804 June 30, 2008 TECHNICAL REPORT SERIES University of Toronto
More informationExamples of Adaptive MCMC
Examples of Adaptive MCMC by Gareth O. Roberts * and Jeffrey S. Rosenthal ** (September, 2006.) Abstract. We investigate the use of adaptive MCMC algorithms to automatically tune the Markov chain parameters
More informationComputational statistics
Computational statistics Markov Chain Monte Carlo methods Thierry Denœux March 2017 Thierry Denœux Computational statistics March 2017 1 / 71 Contents of this chapter When a target density f can be evaluated
More informationMarkov Chain Monte Carlo Inference. Siamak Ravanbakhsh Winter 2018
Graphical Models Markov Chain Monte Carlo Inference Siamak Ravanbakhsh Winter 2018 Learning objectives Markov chains the idea behind Markov Chain Monte Carlo (MCMC) two important examples: Gibbs sampling
More information