Reversible Markov chains

Size: px
Start display at page:

Download "Reversible Markov chains"

Transcription

1 Reversible Markov chains Variational representations and ordering Chris Sherlock Abstract This pedagogical document explains three variational representations that are useful when comparing the efficiencies of reversible Markov chains: (i) the Dirichlet form and the associated variational representations of the spectral gaps; (ii) a variational representation of the asymptotic variance of an ergodic average; and (iii) the conductance, and the equivalence of a non-zero conductance to a non-zero right spectral gap. Introduction This document relates to variational representations of aspects of a reversible Markov kernel, P, which has a limiting (hence, stationary) distribution, π. It is a record of my own learning, but I hope it might be useful to others. The central idea is the Dirichlet form, which can be thought of as a generalisation of expected squared jumping distance to cover all functions that are square-integrable with respect to π. The minimum (over all such functions with an expectation of and a variance of 1) of the Dirichlet forms is the right spectral gap. A key quantity of interest to researchers in MCMC is the asymptotic variance which relates to the the variance of an ergodic average, Var [ 1 n n i=1 f(x i) ] as n, and where X 1,..., X n arise from successive applications of P, and f is some square-integrable function. A second variational representation, of f, (I P ) 1 f, allows us to relate a limit of the variance of an ergodic average to the Dirichlet form. Finally, a third variational quantity, the conductance of a Markov kernel is introduced, and is also related to the Dirichlet form and hence to the variance of an ergodic average. I found these variational representations particular useful in two pieces of research that I performed, mostly in 16. Sherlock et al. (17) uses the first two representations, while Sherlock and Lee (17) uses conductance. In both pieces of work the variational representations allowed us to compare pairs of Markov kernels; if one Markov kernel has a particular property, such as the variance of an ergodic average being a particular value, and if we can relate aspects of this Markov kernel, such as its Dirichlet form or its conductance, to a sec- 1

2 ond Markov kernel, then we can often obtain a bound on the property of interest for the second kernel. This is not the only use of variational representations; e.g. in Lawler and Sokal (1988) conductance is used directly to obtain bounds on the spectral gap of several discrete-statespace Markov chains. The most natural framework for representing the kernel is that of a bounded, self-adjoint operator on a Hilbert space. I was almost entirely unfamiliar with this before I embarked on the above pieces of research and so I will start by setting down the key aspects of this. Preliminaries Hilbert Space Let L (π) be the Hilbert space of real functions that are square integrable with respect to some probability measure, π: π(dx)f(x) < f L (π), equipped with the inner product (finite through Cauchy-Schwarz) and associated norm: f, g = π(dx)f(x)g(x), f = f, f. Let L (π) L (π) be the Hilbert space that uses the same inner product but includes only functions with E π [f] = : π(dx)f(x) = f, 1 =. For these functions, f, g = Cov [f, g] and f, f = Var π [f]. Markov kernel and detailed balance Let {X t } t= be a Markov chain on a statespace X with a kernel of P (x, dy) which satisfies detailed balance with respect to π: π(dx)p (x, dy) = π(dy)p (y, dx).

3 For any measure, ν, we define νp := ν(dx)p (x, dy) Then πp = π(dx)p (x, dy) = π(dy)p (y, dx) = π, so π is stationary for P. x x x P is bounded and self adjoint Given a kernel (or operator ) P we use the shorthand: P f(x) = (P f)(x) := P (x, dy)f(y). Jensen s inequality gives [ [(P f)(x)] = P (x, dy)f(y)] P (x, dy)f(y) = (P f )(x). Hence, because π is stationary for P, P f = π(dx)[(p f)(x)] π(dx)(p f )(x) = π(dx)p (x, dy)f (y) = π(dy)f (y) = f. Thus P f / f 1, and P is a bounded linear operator. Further, if P satisfies detailed balance with respect to π P f, g = π(dx)p (x, dy)f(y)g(x) = π(dy)p (y, dx)f(y)g(x) = π(dx)p (x, dy)f(x)g(y) = f, P g ; P is self-adjoint. The spectrum of a bounded, self adjoint operator The spectrum of P in H is {ρ : P ρi is not invertible in H} 1 ; the spectrum of a bounded, self-adjoint operator is (accept it, but see below) a closed, bounded set on the real line; let 1 i.e. there is at least one g H such that there is no unique f H with (P ρi)f = g 3

4 the upper and lower bounds be λ max and λ min. When P is a self-adjoint Markov kernel and H = L (π), λ max = 1 and λ min 1. The spectral decomposition theorem for bounded, self-adjoint operators states that there is a finite positive measure, a (dλ) with support contained in the real interval [λ min, λ max ] such that f, P n f = λmax λ min λ n a (dλ). This decomposition is used twice hereafter; however it may be unfamiliar so the remainder of this section provides an intuition in terms of the eigenfunctions and eigenvalues of P. Let P be a bounded, self-adjoint operator. A right eigenfunction e of P is a function that satisfies P e = λe for some scalar, λ, the corresponding eigenvalue. Let e (x), e 1 (x),... be a set of eigenfunctions of P, scaled so that e i = e i, e i = 1, and with corresponding eigenvectors λ, λ 1,.... Since, by definition, (P λ i I)e i =, the spectrum is a superset of the set of eigenvalues. The intuition below comes from the case where the spectrum is precisely the set of eigenvalues, and the eigenfunctions span H. 1. Just as for the eigenvectors of a finite self-adjoint matrix, it is possible to choose the eigenfunctions such that e i, e j = (i j);. moreover, as with self-adjoint matrices, all of the eigenvalues are real. 3. Furthermore, since P is bounded, all of the eigenvalues satisfy λ 1. If the eigenfunctions of an operator, P, span the Hilbert space, H, for any f H, f = a i e i, i= where a i = f, e i, and from which f, P n f = a i a j e i, P n e j = i= j= a i λ n i. i= With n = we obtain i= a i = f <. Thus, if the eigenfunctions span the space then a is discrete with mass a i at λ i. For a Markov kernel, P 1 = 1, where 1 is the constant function and is, thus, a right eigenfunction with an eigenvalue of 1. 4

5 Spectral gaps and the Dirichlet form Spectrum of P in L (π); spectral gaps and geometric ergodicity For any kernel P, we define P k f = P k 1 P f, recursively. Any function f(x) L (π) can be written as a 1 + f (x) where f L (π). Here a = f, 1 = E π [f(x)]. Thus P n f = E π [f(x)] + P n f (x) E π [f(x)] if P is ergodic. To bound the size of the remainder term, consider the spectrum of P restricted to functions in L (π). This must be confined to [λ min, λ max ] with 1 λ min λ max 1, where (by definition) λ max := sup f L (π) f, P f f, f = sup f L (π): f,f =1 f, P f and λ min := inf f L (π): f,f =1 f, P f. Let λ = max( λ min, λ max P n f = f P n, P n f = f, P n f = ). The (squared) size of the remainder is then λ max λ min λ max λ n a (dλ) λ n λ min a (dλ) = λ n f. Thus P n f geometrically quickly provided λ < 1; i.e., provided the inequality is strict. P is then called geometrically ergodic. The right spectral gap is ρ right := 1 λ max and the left spectral gap is ρ left := 1+λ min. Both must be non-zero for geometric ergodicity. Henceforth, for notational simplicity, we drop the subscript in λ max/min. Aside: when one or both of the spectral gaps is zero (e.g. the spectrum is [ 1, 1]), then for any fixed n we can always find functions, f, with f = 1, (albeit fewer and fewer as n ) such that P n f >.1, say. Dirichlet form, E P (f) The concept of a spectral gap motivates the Dirichlet form for P and f, since ρ right = E P (f) := f, (I P )f = f, f f, P f, inf E P (f) and ρ left = sup E P (f). f L (π): f,f =1 f L (π): f,f =1 5

6 Directly from the definition we have for two kernels P 1 and P and some γ > : E P1 (f) γe P (f) f L (π) ρ right P 1 γρ right P. (1) An alternative expression for the Dirichlet form provides a very natural intuition: E P (f) = f, f f, P f = π(dx)p (x, dy)f(x)[f(x) f(y)] = π(dx)p (x, dy)f(y)[f(y) f(x)] = 1 π(dx)p (x, dy)[f(y) f(x)], () where the penultimate line follows because P satisfies detailed balance with respect to π and the final line arises from the average of the two preceding lines. The Dirichlet form can, therefore, be thought of as a generalisation of expected squared jumping distance of the ith component of x, π(dx)p (x, dy)(y i x i ), to consider the expected squared changes for any f L (π). Variance of an ergodic average Suppose that we are interested in E π [h(x)] for some h L (π) and we estimate it by an ] average of the values in the Markov chain: ĥn := 1 n n i=1 P i 1 h(x). Typically Var [ĥn as n, but scaling by n should keep it O(1). We are, therefore, interested in [ n ] Var(P, h) := lim Var ĥ n. n So as to just consider mixing, we assume X π. Without loss of generality we may assume h L (π) (else just subtract its expectation). Since P is time-homogeneous, [ n ] n Var h(x i ) = E [ h (X i ) ] n 1 n + E [h(x i ), h(x i+1 )] + E [h(x i ), h(x i+ )] i=1 i=1 i=1 + + E [h(x 1 ), h(x n )] = ne [ h (X) ] + (n 1)E [h(x 1 ), h(x )] + + E [h(x 1 ), h(x n )] = n h, h + (n 1) h, P h + (n ) h, P h + + h, P n 1 h. 6 i=1

7 Using (I P ) 1 to denote I + P + P + P , we obtain Var(P, h) = h, h + = h, P i h i=1 h, P i h h, h i= = h, (I P ) 1 h h, h, Writing h, h = h, (I P )(I P ) 1 h gives an equivalent form: Var(P, h) = h, (I + P )(I P ) 1 h. The spectral decomposition of P in L (π) gives Var(P, h) = λ max λ min 1 + λ 1 λ a (dλ) 1 + λmax 1 λ max λ max λ min a (dλ) = 1 + λmax 1 λ max Var π [h], (3) where the inequality follows since (1+λ)/(1 λ) is an increasing function of λ. The supremum of the spectrum of L (π) provides an efficiency bound over all h L (π). Variance bounding kernels The bound on the variance in (3) is in terms of λ max, not λ min. eigenvalue λ min Even if there exists an = 1, so the chain never converges (as opposed to the more usual case where the spectrum of P is [ 1, a] 1, but there is no eigenvalue at 1) the asymptotic variance can be finite. As an extreme example, consider the Markov chain with a transition matrix of ( 1 1 After n iterations this has been in state 1 exactly 1/ = π({1}) of the time. Roberts and Rosenthal (8) realised that in almost all applications of MCMC it was the variance of the ergodic average that was important, and not (directly) convergence to the target. Specifically, it did not matter if ρ left =. A kernel where ρ right > was termed variance bounding and this was shown to be equivalent to Var(P, f) < for all f L (π); i.e. if [hypothetical!] simple Monte Carlo would lead to the standard n rate of convergence of the ergodic average, a n rate would also be observed from the MCMC kernel. This is easy to see if the left and right spectral gaps of P are non-zero. When at least one gap is zero, for any β < 1, we have Var(βP, h) = h, h + (1 1/n)β h, P h + (1 /n)β h, P h + + (1 (n 1)/n)β n 1 h, P n 1 h h, h + i=1 βi h, P i h <. Letting β 1 then gives the result, even though Var(P, h) may be infinite. 7 ).

8 Variational representation of f, Q 1 f, and a key ordering Firstly, notice that for real numbers f, g and q, sup g fg g q is f /q, which is achieved when g = f/q. The same holds for the operator Q = I P when f and g are functions (see Appendix A for a proof): f, (I P ) 1 f = sup f, g g, (I P )g = sup f, g E P (g). g L (π) g L (π) To see this, think of a reversible matrix, P and diagonalise it; the result for such matrices is just equivalent to the scalar result applied to each eigenvalue. This leads to the alternative representation: Var(P, h) = sup 4 h, g E P (g) h, h. g L (π) Clearly, E P1 (g) E P (g) g L (π) Var(P 1, h) Var(P, h), giving a much simpler proof of a result which was originally proved for finite-statespace Markov chains in Peskun (1973) and then generalised to general statespaces by Tierney (1998). But we can go further: suppose that E P1 (g) γe P (g) for all g L (π) and some γ >, then, for any g L (π) h, g E P1 (g) h, g γe P (g) = 1 γ { h, g E P (g )}, where g := γg L (π). So and hence sup h, g E P1 (g) 1 g L (π) γ sup h, g E P (g), g L (π) E P1 (f) γe P (f) f L (π) Var(P 1, h) + h, h 1 γ {Var(P, h) + h, h }. (4) This result appears as Lemma 3 in Andrieu et al. (16) (see also Caracciolo et al., 199). Propose-accept-reject kernels Many kernels consist of making a proposal, which is then either accepted or rejected: P (x, dy) := q(x, dy)α(x, y) + (1 α(x))δ(y x), 8

9 where α(x) := q(x, dy)α(x, y) is the average acceptance probability from x. Then, from (), E P (f) = 1 π(dx)q(x, dy)α(x, y){f(y) f(x)}. The right spectral gap is, therefore, ρ right = 1 inf f L (π): f,f =1 π(dx)q(x, dy)α(x, y){f(y) f(x)}. There is a similar formula for the left spectral gap. Results (1) and (4) then lead directly to the following: if two propose-accept-reject kernels, both reversible with respect to π, satisfy q 1 (x, dy)α 1 (x, y) γq (x, dy)α (x, y) x, y then ρ right P 1 γρ right P and Var(P 1, h) + Var π [h] 1 γ {Var(P, h) + Var π [h]}. From (4), if for some γ > E P1 (f) γe P (f) f L (π) and P is variance bounding, then so is P 1. Unfortunately it is rare that we can be sure of the ordering of the Dirichlet forms for all f; indeed we might be sure that there is no fixed γ for which the ordering does hold. When we cannot simply resort to a uniform ordering of Dirichlet forms then the elegant concept of conductance can come to our aid. Conductance Consider any measurable set, A X. The conductance of A is defined as κ P (A) := 1 π(dx)p (x, dy), π(a) x A,y A c where π(a) := A π(dx). Loosely speaking, κ P (A) is the probability of moving to A c conditional on the chain currently following the stationary distribution truncated to A: π(dx)1 x A. Our analysis of the properties of κ P (A) will be via the symmetric quantity κ P (A) = κ P (A c 1 ) := π(dx)p (x, dy) = κ P (A) π(a)π(a c ) x A,y A π(a c c ). 9

10 The first equality follows because the kernel is reversible with respect to π, and this also implies that π(a)κ P (A) = π(a c )κ P (A c ). Since κ P (A c ) 1, κ P (A) π(a c )/π(a), which can be arbitrarily small. To define the conductance of the kernel P we therefore only consider sets A with π(a) 1/: No such restriction is required for: κ P := inf κ(a). A:π(A) 1/ κ P := inf A κ P (A). Further, since π(a) 1/ 1/ π(a c ) 1, we have κ P κ P κ P. Setting f A (x) = [1 x A π(a)]/ π(a)π(a c ) (so that f A L (π) and f A, f A = 1) in the expression for the Dirichlet form () gives { 1 E P (f A ) = π(dx)p (x, dy) + π(a)π(a c ) x A,y A c So ρ right = x A c,y A } π(dx)p (x, dy) = κ P (A) = κ P (A) π(a c ). inf E P (f) inf E f L (π): f,f =1 P (f A ) = κ P κ P. (5) A Hence, if P has a right spectral gap (so it is variance bounding) then its conductance is non-zero. Amazingly, the converse is also true: if the conductance of P is non-zero then P has a right spectral gap: ρ right κ P κ P. (6) Thus, non-zero conductance is equivalent to a non-zero right spectral gap is equivalent to variance bounding. Equations (5) and (6) together are sometimes called Cheeger bounds. A proof (6) for finite Markov chains is given in Diaconis and Stroock (1991). A more accessible and, as far as I can see, more general, proof of the looser inequality, that ρ right κ P /8 is given in Lawler and Sokal (1988). This has a standard bit which uses the Cauchy- Schwarz inequality to obtain an inequality for ρ right, a beautiful bit which relates the expression from the standard bit to conductance, and then an ugly bit, which proves that the final expression is always greater than if κ P >. I will follow Lawler and Sokal (1988) for the first two parts and then provide a neater solution to the final part. Denote the symmetric measure π(dx)p (x, dy) by µ(dx, dy). We also set g(x) = f(x) + c for some (currently) arbitrary real constant, c. 1

11 The standard bit. Then Cauchy-Schwarz (twice) and the symmetry of µ gives: E µ [ g(x) g(y ) ] = E µ [ g(x) g(y ) g(x) + g(y ) ] [ E ] [ µ {g(x) g(y )} E ] µ {g(x) + g(y )} [ E ] [ µ {f(x) f(y )} E µ g(x) + g(y ) ] [ = 4E ] [ µ {f(x) f(y )} E ] π g(x). So The beautiful bit. We now relate the numerator of the above expression to the conductance. E P (f) 1 8E π [g(x) ] E [ µ g(x) g(y ) ]. The proof is symmetrical, the first half manipulates the denominator so that conductance may be used, with the second half reversing the route of the first. Set A t := {x : g(x) t}. Then by the symmetry of µ, µ(dx, dy) g(y) g(x) = µ(dx, dy)1 g(x) <g(y) {g(y) g(x) } X X X X = µ(dx, dy) dt1 g(x) t<g(y) X X t= = dt µ(dx, dy)1 g(x) t<g(y) t= X X = dt µ(dx, dy) So But since this is true for all c, we have ρ right κ P 8 κ P = κ P t= t= X X A t A c T dt A t A c t π(dx)π(dy) π(dx)π(dy) g(y) g(x). E P (f) κ P E π π [ g(x) g(y ) ]. 8E π [g(x) ] inf sup f L (π): f,f =1 c E π π [ {f(y ) + c} {f(x) + c} ]. E π [{f(x) + c} ] A less ugly bit. Since E π [{f(x) + c} ] = 1 + c, we need to show that E π π [ {f(y ) + c} {f(x) + c} ] inf sup f L (π): f,f =1 c 1 + c 1. 11

12 Let A = f(x) and B = f(y ) be independent and identically distributed with an expectation of and a variance of 1. We are interested in Below, I will show that E [ {A + c} {B + c} ]. (7) 1 + c E [ A B ] + 4E [ A B ]. (8) Then, as in Lawler and Sokal (1988), consider letting c or setting c =. With the former: E [ {A + c} {B + c} ] lim c 1 + c = E [ A B ], so if E [ A B ] > 1/ we are done. If not, setting c = in (7) and using (8) leaves us: To prove (8): E [ A B ] = But = P ( B > t + a ) dt = E [ A B ] 1. P ( A B > t ) dt = P ( A > t + B ) + P ( B > t + A ) dt P ( B > t + A ) [ dt = E A P ( B > t + A ) ] dt A. = 1 a P ( B > v ) dv = a 1 a up ( B > u) du P ( B > v ) dv P ( B > u) = 1 a E [ B ]. Combining the two end results gives E [ A B ] 4E [ A ] E [ B ]; i.e. E [ A B ] + 4E [ A ]. a P ( B > v ) dv However Jensen s inequality provides: E [ A B ] = E [E [ A B ] A] E [ A E [B] ] = E [ A ], and (8) follows. Acknowledgements I am grateful to Dr. Daniel Elton for providing the proof in Appendix A and Dr. Dootika Vats for spotting an error in an earlier version of this document. 1

13 A Variational representation of f, Q 1 f Let H be a Hilbert space and let Q be a positive operator on H (i.e., an operator that has a square root). Then for f H, f, Q 1 f = sup Re( f, g ) g, Qg. g H (Our Hilbert space is real, so we do not need the Re() function.) Proof Let φ = Q 1 f, so f = Qφ. The left-hand side is then Qφ, φ = Q 1/ φ, Q 1/ φ = Q 1/ φ Q 1/ φ Q 1/ (φ g) = Re( Q 1/ φ, Q 1/ g ) Q 1/ g = Re( Qφ, g ) g, Qg = Re( f, g ) g, Qg. The construction of the inequality shows that the supremum is achieved at g = Q 1 f. References Andrieu, C., Lee, A., and Vihola, M. (16). Uniform ergodicity of the iterated conditional SMC and geometric ergodicity of particle Gibbs samplers. Bernoulli. to appear. Caracciolo, S., Pelissetto, A., and Sokal, A. D. (199). Nonlocal monte carlo algorithm for self-avoiding walks with fixed endpoints. Journal of Statistical Physics, 6(1):1 53. Diaconis, P. and Stroock, D. (1991). Geometric bounds for eigenvalues of Markov chains. Ann. Appl. Probab., 1(1): Lawler, G. F. and Sokal, A. D. (1988). Bounds on the L spectrum for Markov chains and Markov processes: a generalization of Cheeger s inequality. Trans. Amer. Math. Soc., 39(): Peskun, P. H. (1973). Optimum Monte-Carlo sampling using Markov chains. Biometrika, 6: Roberts, G. O. and Rosenthal, J. S. (8). Variance bounding Markov chains. Ann. Appl. Probab., 18(3):

14 Sherlock, C. and Lee, A. (17). Variance bounding of delayed-acceptance kernels. ArXiv e-prints. Sherlock, C., Thiery, A., and Lee, A. (17). Pseudo-marginal Metropolis Hastings using averages of unbiased estimators. Biometrika. Accepted subject to minor revisions. Tierney, L. (1998). A note on Metropolis Hastings kernels for general state spaces. Ann. Appl. Probab., 8(1):

University of Toronto Department of Statistics

University of Toronto Department of Statistics Norm Comparisons for Data Augmentation by James P. Hobert Department of Statistics University of Florida and Jeffrey S. Rosenthal Department of Statistics University of Toronto Technical Report No. 0704

More information

Spectral properties of Markov operators in Markov chain Monte Carlo

Spectral properties of Markov operators in Markov chain Monte Carlo Spectral properties of Markov operators in Markov chain Monte Carlo Qian Qin Advisor: James P. Hobert October 2017 1 Introduction Markov chain Monte Carlo (MCMC) is an indispensable tool in Bayesian statistics.

More information

On Reparametrization and the Gibbs Sampler

On Reparametrization and the Gibbs Sampler On Reparametrization and the Gibbs Sampler Jorge Carlos Román Department of Mathematics Vanderbilt University James P. Hobert Department of Statistics University of Florida March 2014 Brett Presnell Department

More information

215 Problem 1. (a) Define the total variation distance µ ν tv for probability distributions µ, ν on a finite set S. Show that

215 Problem 1. (a) Define the total variation distance µ ν tv for probability distributions µ, ν on a finite set S. Show that 15 Problem 1. (a) Define the total variation distance µ ν tv for probability distributions µ, ν on a finite set S. Show that µ ν tv = (1/) x S µ(x) ν(x) = x S(µ(x) ν(x)) + where a + = max(a, 0). Show that

More information

Variance Bounding Markov Chains

Variance Bounding Markov Chains Variance Bounding Markov Chains by Gareth O. Roberts * and Jeffrey S. Rosenthal ** (September 2006; revised April 2007.) Abstract. We introduce a new property of Markov chains, called variance bounding.

More information

STA 294: Stochastic Processes & Bayesian Nonparametrics

STA 294: Stochastic Processes & Bayesian Nonparametrics MARKOV CHAINS AND CONVERGENCE CONCEPTS Markov chains are among the simplest stochastic processes, just one step beyond iid sequences of random variables. Traditionally they ve been used in modelling a

More information

Pseudo-marginal Metropolis-Hastings: a simple explanation and (partial) review of theory

Pseudo-marginal Metropolis-Hastings: a simple explanation and (partial) review of theory Pseudo-arginal Metropolis-Hastings: a siple explanation and (partial) review of theory Chris Sherlock Motivation Iagine a stochastic process V which arises fro soe distribution with density p(v θ ). Iagine

More information

Introduction to Machine Learning CMU-10701

Introduction to Machine Learning CMU-10701 Introduction to Machine Learning CMU-10701 Markov Chain Monte Carlo Methods Barnabás Póczos & Aarti Singh Contents Markov Chain Monte Carlo Methods Goal & Motivation Sampling Rejection Importance Markov

More information

Some Results on the Ergodicity of Adaptive MCMC Algorithms

Some Results on the Ergodicity of Adaptive MCMC Algorithms Some Results on the Ergodicity of Adaptive MCMC Algorithms Omar Khalil Supervisor: Jeffrey Rosenthal September 2, 2011 1 Contents 1 Andrieu-Moulines 4 2 Roberts-Rosenthal 7 3 Atchadé and Fort 8 4 Relationship

More information

Markov Chain Monte Carlo (MCMC)

Markov Chain Monte Carlo (MCMC) Markov Chain Monte Carlo (MCMC Dependent Sampling Suppose we wish to sample from a density π, and we can evaluate π as a function but have no means to directly generate a sample. Rejection sampling can

More information

Monte Carlo methods for sampling-based Stochastic Optimization

Monte Carlo methods for sampling-based Stochastic Optimization Monte Carlo methods for sampling-based Stochastic Optimization Gersende FORT LTCI CNRS & Telecom ParisTech Paris, France Joint works with B. Jourdain, T. Lelièvre, G. Stoltz from ENPC and E. Kuhn from

More information

A Geometric Interpretation of the Metropolis Hastings Algorithm

A Geometric Interpretation of the Metropolis Hastings Algorithm Statistical Science 2, Vol. 6, No., 5 9 A Geometric Interpretation of the Metropolis Hastings Algorithm Louis J. Billera and Persi Diaconis Abstract. The Metropolis Hastings algorithm transforms a given

More information

Improving the Asymptotic Performance of Markov Chain Monte-Carlo by Inserting Vortices

Improving the Asymptotic Performance of Markov Chain Monte-Carlo by Inserting Vortices Improving the Asymptotic Performance of Markov Chain Monte-Carlo by Inserting Vortices Yi Sun IDSIA Galleria, Manno CH-98, Switzerland yi@idsia.ch Faustino Gomez IDSIA Galleria, Manno CH-98, Switzerland

More information

ON CONVERGENCE RATES OF GIBBS SAMPLERS FOR UNIFORM DISTRIBUTIONS

ON CONVERGENCE RATES OF GIBBS SAMPLERS FOR UNIFORM DISTRIBUTIONS The Annals of Applied Probability 1998, Vol. 8, No. 4, 1291 1302 ON CONVERGENCE RATES OF GIBBS SAMPLERS FOR UNIFORM DISTRIBUTIONS By Gareth O. Roberts 1 and Jeffrey S. Rosenthal 2 University of Cambridge

More information

A Dirichlet Form approach to MCMC Optimal Scaling

A Dirichlet Form approach to MCMC Optimal Scaling A Dirichlet Form approach to MCMC Optimal Scaling Giacomo Zanella, Wilfrid S. Kendall, and Mylène Bédard. g.zanella@warwick.ac.uk, w.s.kendall@warwick.ac.uk, mylene.bedard@umontreal.ca Supported by EPSRC

More information

6 Markov Chain Monte Carlo (MCMC)

6 Markov Chain Monte Carlo (MCMC) 6 Markov Chain Monte Carlo (MCMC) The underlying idea in MCMC is to replace the iid samples of basic MC methods, with dependent samples from an ergodic Markov chain, whose limiting (stationary) distribution

More information

Markov Chain Monte Carlo

Markov Chain Monte Carlo Chapter 5 Markov Chain Monte Carlo MCMC is a kind of improvement of the Monte Carlo method By sampling from a Markov chain whose stationary distribution is the desired sampling distributuion, it is possible

More information

Linear Algebra Massoud Malek

Linear Algebra Massoud Malek CSUEB Linear Algebra Massoud Malek Inner Product and Normed Space In all that follows, the n n identity matrix is denoted by I n, the n n zero matrix by Z n, and the zero vector by θ n An inner product

More information

Adaptive Monte Carlo methods

Adaptive Monte Carlo methods Adaptive Monte Carlo methods Jean-Michel Marin Projet Select, INRIA Futurs, Université Paris-Sud joint with Randal Douc (École Polytechnique), Arnaud Guillin (Université de Marseille) and Christian Robert

More information

Computer Vision Group Prof. Daniel Cremers. 11. Sampling Methods: Markov Chain Monte Carlo

Computer Vision Group Prof. Daniel Cremers. 11. Sampling Methods: Markov Chain Monte Carlo Group Prof. Daniel Cremers 11. Sampling Methods: Markov Chain Monte Carlo Markov Chain Monte Carlo In high-dimensional spaces, rejection sampling and importance sampling are very inefficient An alternative

More information

A quick introduction to Markov chains and Markov chain Monte Carlo (revised version)

A quick introduction to Markov chains and Markov chain Monte Carlo (revised version) A quick introduction to Markov chains and Markov chain Monte Carlo (revised version) Rasmus Waagepetersen Institute of Mathematical Sciences Aalborg University 1 Introduction These notes are intended to

More information

The simple slice sampler is a specialised type of MCMC auxiliary variable method (Swendsen and Wang, 1987; Edwards and Sokal, 1988; Besag and Green, 1

The simple slice sampler is a specialised type of MCMC auxiliary variable method (Swendsen and Wang, 1987; Edwards and Sokal, 1988; Besag and Green, 1 Recent progress on computable bounds and the simple slice sampler by Gareth O. Roberts* and Jerey S. Rosenthal** (May, 1999.) This paper discusses general quantitative bounds on the convergence rates of

More information

LECTURE 15 Markov chain Monte Carlo

LECTURE 15 Markov chain Monte Carlo LECTURE 15 Markov chain Monte Carlo There are many settings when posterior computation is a challenge in that one does not have a closed form expression for the posterior distribution. Markov chain Monte

More information

The Monte Carlo Computation Error of Transition Probabilities

The Monte Carlo Computation Error of Transition Probabilities Zuse Institute Berlin Takustr. 7 495 Berlin Germany ADAM NILSN The Monte Carlo Computation rror of Transition Probabilities ZIB Report 6-37 (July 206) Zuse Institute Berlin Takustr. 7 D-495 Berlin Telefon:

More information

Brief introduction to Markov Chain Monte Carlo

Brief introduction to Markov Chain Monte Carlo Brief introduction to Department of Probability and Mathematical Statistics seminar Stochastic modeling in economics and finance November 7, 2011 Brief introduction to Content 1 and motivation Classical

More information

A NOTE ON THE POINCARÉ AND CHEEGER INEQUALITIES FOR SIMPLE RANDOM WALK ON A CONNECTED GRAPH. John Pike

A NOTE ON THE POINCARÉ AND CHEEGER INEQUALITIES FOR SIMPLE RANDOM WALK ON A CONNECTED GRAPH. John Pike A NOTE ON THE POINCARÉ AND CHEEGER INEQUALITIES FOR SIMPLE RANDOM WALK ON A CONNECTED GRAPH John Pike Department of Mathematics University of Southern California jpike@usc.edu Abstract. In 1991, Persi

More information

Vector Spaces. Vector space, ν, over the field of complex numbers, C, is a set of elements a, b,..., satisfying the following axioms.

Vector Spaces. Vector space, ν, over the field of complex numbers, C, is a set of elements a, b,..., satisfying the following axioms. Vector Spaces Vector space, ν, over the field of complex numbers, C, is a set of elements a, b,..., satisfying the following axioms. For each two vectors a, b ν there exists a summation procedure: a +

More information

Convex Optimization CMU-10725

Convex Optimization CMU-10725 Convex Optimization CMU-10725 Simulated Annealing Barnabás Póczos & Ryan Tibshirani Andrey Markov Markov Chains 2 Markov Chains Markov chain: Homogen Markov chain: 3 Markov Chains Assume that the state

More information

Monte Carlo simulation inspired by computational optimization. Colin Fox Al Parker, John Bardsley MCQMC Feb 2012, Sydney

Monte Carlo simulation inspired by computational optimization. Colin Fox Al Parker, John Bardsley MCQMC Feb 2012, Sydney Monte Carlo simulation inspired by computational optimization Colin Fox fox@physics.otago.ac.nz Al Parker, John Bardsley MCQMC Feb 2012, Sydney Sampling from π(x) Consider : x is high-dimensional (10 4

More information

Lecture 12: Detailed balance and Eigenfunction methods

Lecture 12: Detailed balance and Eigenfunction methods Miranda Holmes-Cerfon Applied Stochastic Analysis, Spring 2015 Lecture 12: Detailed balance and Eigenfunction methods Readings Recommended: Pavliotis [2014] 4.5-4.7 (eigenfunction methods and reversibility),

More information

Markov chain convergence: from finite to infinite. Jeffrey S. Rosenthal*

Markov chain convergence: from finite to infinite. Jeffrey S. Rosenthal* Markov chain convergence: from finite to infinite by Jeffrey S. Rosenthal* Department of Statistics, University of Toronto, Toronto, Ontario, Canada M5S 1A1 Phone: (416) 978-4594. Internet: jeff@utstat.toronto.edu

More information

Geometric Ergodicity and Hybrid Markov Chains

Geometric Ergodicity and Hybrid Markov Chains Geometric Ergodicity and Hybrid Markov Chains by Gareth O. Roberts* and Jeffrey S. Rosenthal** (August 1, 1996; revised, April 11, 1997.) Abstract. Various notions of geometric ergodicity for Markov chains

More information

Sampling from complex probability distributions

Sampling from complex probability distributions Sampling from complex probability distributions Louis J. M. Aslett (louis.aslett@durham.ac.uk) Department of Mathematical Sciences Durham University UTOPIAE Training School II 4 July 2017 1/37 Motivation

More information

Functional Analysis Review

Functional Analysis Review Outline 9.520: Statistical Learning Theory and Applications February 8, 2010 Outline 1 2 3 4 Vector Space Outline A vector space is a set V with binary operations +: V V V and : R V V such that for all

More information

Lecture 8: The Metropolis-Hastings Algorithm

Lecture 8: The Metropolis-Hastings Algorithm 30.10.2008 What we have seen last time: Gibbs sampler Key idea: Generate a Markov chain by updating the component of (X 1,..., X p ) in turn by drawing from the full conditionals: X (t) j Two drawbacks:

More information

Math 127C, Spring 2006 Final Exam Solutions. x 2 ), g(y 1, y 2 ) = ( y 1 y 2, y1 2 + y2) 2. (g f) (0) = g (f(0))f (0).

Math 127C, Spring 2006 Final Exam Solutions. x 2 ), g(y 1, y 2 ) = ( y 1 y 2, y1 2 + y2) 2. (g f) (0) = g (f(0))f (0). Math 27C, Spring 26 Final Exam Solutions. Define f : R 2 R 2 and g : R 2 R 2 by f(x, x 2 (sin x 2 x, e x x 2, g(y, y 2 ( y y 2, y 2 + y2 2. Use the chain rule to compute the matrix of (g f (,. By the chain

More information

Pseudo-marginal MCMC methods for inference in latent variable models

Pseudo-marginal MCMC methods for inference in latent variable models Pseudo-marginal MCMC methods for inference in latent variable models Arnaud Doucet Department of Statistics, Oxford University Joint work with George Deligiannidis (Oxford) & Mike Pitt (Kings) MCQMC, 19/08/2016

More information

Review and problem list for Applied Math I

Review and problem list for Applied Math I Review and problem list for Applied Math I (This is a first version of a serious review sheet; it may contain errors and it certainly omits a number of topic which were covered in the course. Let me know

More information

A regeneration proof of the central limit theorem for uniformly ergodic Markov chains

A regeneration proof of the central limit theorem for uniformly ergodic Markov chains A regeneration proof of the central limit theorem for uniformly ergodic Markov chains By AJAY JASRA Department of Mathematics, Imperial College London, SW7 2AZ, London, UK and CHAO YANG Department of Mathematics,

More information

Hilbert Spaces. Hilbert space is a vector space with some extra structure. We start with formal (axiomatic) definition of a vector space.

Hilbert Spaces. Hilbert space is a vector space with some extra structure. We start with formal (axiomatic) definition of a vector space. Hilbert Spaces Hilbert space is a vector space with some extra structure. We start with formal (axiomatic) definition of a vector space. Vector Space. Vector space, ν, over the field of complex numbers,

More information

Chapter 7. Markov chain background. 7.1 Finite state space

Chapter 7. Markov chain background. 7.1 Finite state space Chapter 7 Markov chain background A stochastic process is a family of random variables {X t } indexed by a varaible t which we will think of as time. Time can be discrete or continuous. We will only consider

More information

Mathematical Analysis Outline. William G. Faris

Mathematical Analysis Outline. William G. Faris Mathematical Analysis Outline William G. Faris January 8, 2007 2 Chapter 1 Metric spaces and continuous maps 1.1 Metric spaces A metric space is a set X together with a real distance function (x, x ) d(x,

More information

MATH 51H Section 4. October 16, Recall what it means for a function between metric spaces to be continuous:

MATH 51H Section 4. October 16, Recall what it means for a function between metric spaces to be continuous: MATH 51H Section 4 October 16, 2015 1 Continuity Recall what it means for a function between metric spaces to be continuous: Definition. Let (X, d X ), (Y, d Y ) be metric spaces. A function f : X Y is

More information

An introduction to adaptive MCMC

An introduction to adaptive MCMC An introduction to adaptive MCMC Gareth Roberts MIRAW Day on Monte Carlo methods March 2011 Mainly joint work with Jeff Rosenthal. http://www2.warwick.ac.uk/fac/sci/statistics/crism/ Conferences and workshops

More information

Adaptive Markov Chain Monte Carlo: Theory and Methods

Adaptive Markov Chain Monte Carlo: Theory and Methods Chapter Adaptive Markov Chain Monte Carlo: Theory and Methods Yves Atchadé, Gersende Fort and Eric Moulines 2, Pierre Priouret 3. Introduction Markov chain Monte Carlo (MCMC methods allow to generate samples

More information

Quantitative Non-Geometric Convergence Bounds for Independence Samplers

Quantitative Non-Geometric Convergence Bounds for Independence Samplers Quantitative Non-Geometric Convergence Bounds for Independence Samplers by Gareth O. Roberts * and Jeffrey S. Rosenthal ** (September 28; revised July 29.) 1. Introduction. Markov chain Monte Carlo (MCMC)

More information

Stochastic optimization Markov Chain Monte Carlo

Stochastic optimization Markov Chain Monte Carlo Stochastic optimization Markov Chain Monte Carlo Ethan Fetaya Weizmann Institute of Science 1 Motivation Markov chains Stationary distribution Mixing time 2 Algorithms Metropolis-Hastings Simulated Annealing

More information

The Particle Filter. PD Dr. Rudolph Triebel Computer Vision Group. Machine Learning for Computer Vision

The Particle Filter. PD Dr. Rudolph Triebel Computer Vision Group. Machine Learning for Computer Vision The Particle Filter Non-parametric implementation of Bayes filter Represents the belief (posterior) random state samples. by a set of This representation is approximate. Can represent distributions that

More information

Asymptotics and Simulation of Heavy-Tailed Processes

Asymptotics and Simulation of Heavy-Tailed Processes Asymptotics and Simulation of Heavy-Tailed Processes Department of Mathematics Stockholm, Sweden Workshop on Heavy-tailed Distributions and Extreme Value Theory ISI Kolkata January 14-17, 2013 Outline

More information

Convergence of Conditional Metropolis-Hastings Samplers

Convergence of Conditional Metropolis-Hastings Samplers Convergence of Conditional Metropolis-Hastings Samplers Galin L. Jones Gareth O. Roberts Jeffrey S. Rosenthal (June, 2012; revised March 2013 and June 2013) Abstract We consider Markov chain Monte Carlo

More information

Markov chain Monte Carlo

Markov chain Monte Carlo Markov chain Monte Carlo Karl Oskar Ekvall Galin L. Jones University of Minnesota March 12, 2019 Abstract Practically relevant statistical models often give rise to probability distributions that are analytically

More information

Introduction to Machine Learning CMU-10701

Introduction to Machine Learning CMU-10701 Introduction to Machine Learning CMU-10701 Markov Chain Monte Carlo Methods Barnabás Póczos Contents Markov Chain Monte Carlo Methods Sampling Rejection Importance Hastings-Metropolis Gibbs Markov Chains

More information

Markov chain Monte Carlo Lecture 9

Markov chain Monte Carlo Lecture 9 Markov chain Monte Carlo Lecture 9 David Sontag New York University Slides adapted from Eric Xing and Qirong Ho (CMU) Limitations of Monte Carlo Direct (unconditional) sampling Hard to get rare events

More information

Simultaneous drift conditions for Adaptive Markov Chain Monte Carlo algorithms

Simultaneous drift conditions for Adaptive Markov Chain Monte Carlo algorithms Simultaneous drift conditions for Adaptive Markov Chain Monte Carlo algorithms Yan Bai Feb 2009; Revised Nov 2009 Abstract In the paper, we mainly study ergodicity of adaptive MCMC algorithms. Assume that

More information

Inference in state-space models with multiple paths from conditional SMC

Inference in state-space models with multiple paths from conditional SMC Inference in state-space models with multiple paths from conditional SMC Sinan Yıldırım (Sabancı) joint work with Christophe Andrieu (Bristol), Arnaud Doucet (Oxford) and Nicolas Chopin (ENSAE) September

More information

MARKOV CHAIN MONTE CARLO

MARKOV CHAIN MONTE CARLO MARKOV CHAIN MONTE CARLO RYAN WANG Abstract. This paper gives a brief introduction to Markov Chain Monte Carlo methods, which offer a general framework for calculating difficult integrals. We start with

More information

Modern Discrete Probability Spectral Techniques

Modern Discrete Probability Spectral Techniques Modern Discrete Probability VI - Spectral Techniques Background Sébastien Roch UW Madison Mathematics December 22, 2014 1 Review 2 3 4 Mixing time I Theorem (Convergence to stationarity) Consider a finite

More information

Math 5520 Homework 2 Solutions

Math 5520 Homework 2 Solutions Math 552 Homework 2 Solutions March, 26. Consider the function fx) = 2x ) 3 if x, 3x ) 2 if < x 2. Determine for which k there holds f H k, 2). Find D α f for α k. Solution. We show that k = 2. The formulas

More information

Geometric Convergence Rates for Time-Sampled Markov Chains

Geometric Convergence Rates for Time-Sampled Markov Chains Geometric Convergence Rates for Time-Sampled Markov Chains by Jeffrey S. Rosenthal* (April 5, 2002; last revised April 17, 2003) Abstract. We consider time-sampled Markov chain kernels, of the form P µ

More information

Controlled sequential Monte Carlo

Controlled sequential Monte Carlo Controlled sequential Monte Carlo Jeremy Heng, Department of Statistics, Harvard University Joint work with Adrian Bishop (UTS, CSIRO), George Deligiannidis & Arnaud Doucet (Oxford) Bayesian Computation

More information

16 : Approximate Inference: Markov Chain Monte Carlo

16 : Approximate Inference: Markov Chain Monte Carlo 10-708: Probabilistic Graphical Models 10-708, Spring 2017 16 : Approximate Inference: Markov Chain Monte Carlo Lecturer: Eric P. Xing Scribes: Yuan Yang, Chao-Ming Yen 1 Introduction As the target distribution

More information

Geometric ρ-mixing property of the interarrival times of a stationary Markovian Arrival Process

Geometric ρ-mixing property of the interarrival times of a stationary Markovian Arrival Process Author manuscript, published in "Journal of Applied Probability 50, 2 (2013) 598-601" Geometric ρ-mixing property of the interarrival times of a stationary Markovian Arrival Process L. Hervé and J. Ledoux

More information

The Laplacian PDF Distance: A Cost Function for Clustering in a Kernel Feature Space

The Laplacian PDF Distance: A Cost Function for Clustering in a Kernel Feature Space The Laplacian PDF Distance: A Cost Function for Clustering in a Kernel Feature Space Robert Jenssen, Deniz Erdogmus 2, Jose Principe 2, Torbjørn Eltoft Department of Physics, University of Tromsø, Norway

More information

GEOMETRIC ERGODICITY AND HYBRID MARKOV CHAINS

GEOMETRIC ERGODICITY AND HYBRID MARKOV CHAINS Elect. Comm. in Probab. 2 (1997) 13 25 ELECTRONIC COMMUNICATIONS in PROBABILITY GEOMETRIC ERGODICITY AND HYBRID MARKOV CHAINS GARETH O. ROBERTS Statistical Laboratory, University of Cambridge Cambridge

More information

Monte Carlo Methods. Leon Gu CSD, CMU

Monte Carlo Methods. Leon Gu CSD, CMU Monte Carlo Methods Leon Gu CSD, CMU Approximate Inference EM: y-observed variables; x-hidden variables; θ-parameters; E-step: q(x) = p(x y, θ t 1 ) M-step: θ t = arg max E q(x) [log p(y, x θ)] θ Monte

More information

Kernel Sequential Monte Carlo

Kernel Sequential Monte Carlo Kernel Sequential Monte Carlo Ingmar Schuster (Paris Dauphine) Heiko Strathmann (University College London) Brooks Paige (Oxford) Dino Sejdinovic (Oxford) * equal contribution April 25, 2016 1 / 37 Section

More information

Markov Chain Monte Carlo Using the Ratio-of-Uniforms Transformation. Luke Tierney Department of Statistics & Actuarial Science University of Iowa

Markov Chain Monte Carlo Using the Ratio-of-Uniforms Transformation. Luke Tierney Department of Statistics & Actuarial Science University of Iowa Markov Chain Monte Carlo Using the Ratio-of-Uniforms Transformation Luke Tierney Department of Statistics & Actuarial Science University of Iowa Basic Ratio of Uniforms Method Introduced by Kinderman and

More information

Markov Chain Monte Carlo

Markov Chain Monte Carlo Markov Chain Monte Carlo Recall: To compute the expectation E ( h(y ) ) we use the approximation E(h(Y )) 1 n n h(y ) t=1 with Y (1),..., Y (n) h(y). Thus our aim is to sample Y (1),..., Y (n) from f(y).

More information

Optimum Monte-Carlo sampling using Markov chains

Optimum Monte-Carlo sampling using Markov chains Biometrika (1973), 60, 3, p. 607 607 Printed in Great Britain Optimum Monte-Carlo sampling using Markov chains BY P. H. PESKUN York University, Toronto SUMMABY The sampling method proposed by Metropolis

More information

The Pennsylvania State University The Graduate School RATIO-OF-UNIFORMS MARKOV CHAIN MONTE CARLO FOR GAUSSIAN PROCESS MODELS

The Pennsylvania State University The Graduate School RATIO-OF-UNIFORMS MARKOV CHAIN MONTE CARLO FOR GAUSSIAN PROCESS MODELS The Pennsylvania State University The Graduate School RATIO-OF-UNIFORMS MARKOV CHAIN MONTE CARLO FOR GAUSSIAN PROCESS MODELS A Thesis in Statistics by Chris Groendyke c 2008 Chris Groendyke Submitted in

More information

Computer Practical: Metropolis-Hastings-based MCMC

Computer Practical: Metropolis-Hastings-based MCMC Computer Practical: Metropolis-Hastings-based MCMC Andrea Arnold and Franz Hamilton North Carolina State University July 30, 2016 A. Arnold / F. Hamilton (NCSU) MH-based MCMC July 30, 2016 1 / 19 Markov

More information

08a. Operators on Hilbert spaces. 1. Boundedness, continuity, operator norms

08a. Operators on Hilbert spaces. 1. Boundedness, continuity, operator norms (February 24, 2017) 08a. Operators on Hilbert spaces Paul Garrett garrett@math.umn.edu http://www.math.umn.edu/ garrett/ [This document is http://www.math.umn.edu/ garrett/m/real/notes 2016-17/08a-ops

More information

arxiv:math/ v1 [math.pr] 31 Jan 2001

arxiv:math/ v1 [math.pr] 31 Jan 2001 Chin. Sci. Bulletin, 1999, 44(23), 2465 2470 (Chinese Ed.); 2000, 45:9 (English Ed.), 769 774 Eigenvalues, inequalities and ergodic theory arxiv:math/0101257v1 [math.pr] 31 Jan 2001 Mu-Fa Chen (Beijing

More information

Markov chain Monte Carlo

Markov chain Monte Carlo Markov chain Monte Carlo Probabilistic Models of Cognition, 2011 http://www.ipam.ucla.edu/programs/gss2011/ Roadmap: Motivation Monte Carlo basics What is MCMC? Metropolis Hastings and Gibbs...more tomorrow.

More information

Kernel adaptive Sequential Monte Carlo

Kernel adaptive Sequential Monte Carlo Kernel adaptive Sequential Monte Carlo Ingmar Schuster (Paris Dauphine) Heiko Strathmann (University College London) Brooks Paige (Oxford) Dino Sejdinovic (Oxford) December 7, 2015 1 / 36 Section 1 Outline

More information

Embeddings of finite metric spaces in Euclidean space: a probabilistic view

Embeddings of finite metric spaces in Euclidean space: a probabilistic view Embeddings of finite metric spaces in Euclidean space: a probabilistic view Yuval Peres May 11, 2006 Talk based on work joint with: Assaf Naor, Oded Schramm and Scott Sheffield Definition: An invertible

More information

Markov Chain Monte Carlo (MCMC)

Markov Chain Monte Carlo (MCMC) School of Computer Science 10-708 Probabilistic Graphical Models Markov Chain Monte Carlo (MCMC) Readings: MacKay Ch. 29 Jordan Ch. 21 Matt Gormley Lecture 16 March 14, 2016 1 Homework 2 Housekeeping Due

More information

Second Order Elliptic PDE

Second Order Elliptic PDE Second Order Elliptic PDE T. Muthukumar tmk@iitk.ac.in December 16, 2014 Contents 1 A Quick Introduction to PDE 1 2 Classification of Second Order PDE 3 3 Linear Second Order Elliptic Operators 4 4 Periodic

More information

Markov chain Monte Carlo

Markov chain Monte Carlo Markov chain Monte Carlo Feng Li feng.li@cufe.edu.cn School of Statistics and Mathematics Central University of Finance and Economics Revised on April 24, 2017 Today we are going to learn... 1 Markov Chains

More information

Math Stochastic Processes & Simulation. Davar Khoshnevisan University of Utah

Math Stochastic Processes & Simulation. Davar Khoshnevisan University of Utah Math 5040 1 Stochastic Processes & Simulation Davar Khoshnevisan University of Utah Module 1 Generation of Discrete Random Variables Just about every programming language and environment has a randomnumber

More information

APPENDIX A. Background Mathematics. A.1 Linear Algebra. Vector algebra. Let x denote the n-dimensional column vector with components x 1 x 2.

APPENDIX A. Background Mathematics. A.1 Linear Algebra. Vector algebra. Let x denote the n-dimensional column vector with components x 1 x 2. APPENDIX A Background Mathematics A. Linear Algebra A.. Vector algebra Let x denote the n-dimensional column vector with components 0 x x 2 B C @. A x n Definition 6 (scalar product). The scalar product

More information

The following definition is fundamental.

The following definition is fundamental. 1. Some Basics from Linear Algebra With these notes, I will try and clarify certain topics that I only quickly mention in class. First and foremost, I will assume that you are familiar with many basic

More information

Fall TMA4145 Linear Methods. Solutions to exercise set 9. 1 Let X be a Hilbert space and T a bounded linear operator on X.

Fall TMA4145 Linear Methods. Solutions to exercise set 9. 1 Let X be a Hilbert space and T a bounded linear operator on X. TMA445 Linear Methods Fall 26 Norwegian University of Science and Technology Department of Mathematical Sciences Solutions to exercise set 9 Let X be a Hilbert space and T a bounded linear operator on

More information

On the Applicability of Regenerative Simulation in Markov Chain Monte Carlo

On the Applicability of Regenerative Simulation in Markov Chain Monte Carlo On the Applicability of Regenerative Simulation in Markov Chain Monte Carlo James P. Hobert 1, Galin L. Jones 2, Brett Presnell 1, and Jeffrey S. Rosenthal 3 1 Department of Statistics University of Florida

More information

Markov chain Monte Carlo

Markov chain Monte Carlo Markov chain Monte Carlo Peter Beerli October 10, 2005 [this chapter is highly influenced by chapter 1 in Markov chain Monte Carlo in Practice, eds Gilks W. R. et al. Chapman and Hall/CRC, 1996] 1 Short

More information

6.842 Randomness and Computation March 3, Lecture 8

6.842 Randomness and Computation March 3, Lecture 8 6.84 Randomness and Computation March 3, 04 Lecture 8 Lecturer: Ronitt Rubinfeld Scribe: Daniel Grier Useful Linear Algebra Let v = (v, v,..., v n ) be a non-zero n-dimensional row vector and P an n n

More information

Lecture 12: Detailed balance and Eigenfunction methods

Lecture 12: Detailed balance and Eigenfunction methods Lecture 12: Detailed balance and Eigenfunction methods Readings Recommended: Pavliotis [2014] 4.5-4.7 (eigenfunction methods and reversibility), 4.2-4.4 (explicit examples of eigenfunction methods) Gardiner

More information

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo Group Prof. Daniel Cremers 10a. Markov Chain Monte Carlo Markov Chain Monte Carlo In high-dimensional spaces, rejection sampling and importance sampling are very inefficient An alternative is Markov Chain

More information

One Pseudo-Sample is Enough in Approximate Bayesian Computation MCMC

One Pseudo-Sample is Enough in Approximate Bayesian Computation MCMC Biometrika (?), 99, 1, pp. 1 10 Printed in Great Britain Submitted to Biometrika One Pseudo-Sample is Enough in Approximate Bayesian Computation MCMC BY LUKE BORNN, NATESH PILLAI Department of Statistics,

More information

LECTURE 7. k=1 (, v k)u k. Moreover r

LECTURE 7. k=1 (, v k)u k. Moreover r LECTURE 7 Finite rank operators Definition. T is said to be of rank r (r < ) if dim T(H) = r. The class of operators of rank r is denoted by K r and K := r K r. Theorem 1. T K r iff T K r. Proof. Let T

More information

25.1 Markov Chain Monte Carlo (MCMC)

25.1 Markov Chain Monte Carlo (MCMC) CS880: Approximations Algorithms Scribe: Dave Andrzejewski Lecturer: Shuchi Chawla Topic: Approx counting/sampling, MCMC methods Date: 4/4/07 The previous lecture showed that, for self-reducible problems,

More information

Bayesian Inference in GLMs. Frequentists typically base inferences on MLEs, asymptotic confidence

Bayesian Inference in GLMs. Frequentists typically base inferences on MLEs, asymptotic confidence Bayesian Inference in GLMs Frequentists typically base inferences on MLEs, asymptotic confidence limits, and log-likelihood ratio tests Bayesians base inferences on the posterior distribution of the unknowns

More information

Spring 2012 Math 541B Exam 1

Spring 2012 Math 541B Exam 1 Spring 2012 Math 541B Exam 1 1. A sample of size n is drawn without replacement from an urn containing N balls, m of which are red and N m are black; the balls are otherwise indistinguishable. Let X denote

More information

Normed and Banach spaces

Normed and Banach spaces Normed and Banach spaces László Erdős Nov 11, 2006 1 Norms We recall that the norm is a function on a vectorspace V, : V R +, satisfying the following properties x + y x + y cx = c x x = 0 x = 0 We always

More information

PCMI LECTURE NOTES ON PROPERTY (T ), EXPANDER GRAPHS AND APPROXIMATE GROUPS (PRELIMINARY VERSION)

PCMI LECTURE NOTES ON PROPERTY (T ), EXPANDER GRAPHS AND APPROXIMATE GROUPS (PRELIMINARY VERSION) PCMI LECTURE NOTES ON PROPERTY (T ), EXPANDER GRAPHS AND APPROXIMATE GROUPS (PRELIMINARY VERSION) EMMANUEL BREUILLARD 1. Lecture 1, Spectral gaps for infinite groups and non-amenability The final aim of

More information

d(x n, x) d(x n, x nk ) + d(x nk, x) where we chose any fixed k > N

d(x n, x) d(x n, x nk ) + d(x nk, x) where we chose any fixed k > N Problem 1. Let f : A R R have the property that for every x A, there exists ɛ > 0 such that f(t) > ɛ if t (x ɛ, x + ɛ) A. If the set A is compact, prove there exists c > 0 such that f(x) > c for all x

More information

University of Toronto Department of Statistics

University of Toronto Department of Statistics Optimal Proposal Distributions and Adaptive MCMC by Jeffrey S. Rosenthal Department of Statistics University of Toronto Technical Report No. 0804 June 30, 2008 TECHNICAL REPORT SERIES University of Toronto

More information

Examples of Adaptive MCMC

Examples of Adaptive MCMC Examples of Adaptive MCMC by Gareth O. Roberts * and Jeffrey S. Rosenthal ** (September, 2006.) Abstract. We investigate the use of adaptive MCMC algorithms to automatically tune the Markov chain parameters

More information

Computational statistics

Computational statistics Computational statistics Markov Chain Monte Carlo methods Thierry Denœux March 2017 Thierry Denœux Computational statistics March 2017 1 / 71 Contents of this chapter When a target density f can be evaluated

More information

Markov Chain Monte Carlo Inference. Siamak Ravanbakhsh Winter 2018

Markov Chain Monte Carlo Inference. Siamak Ravanbakhsh Winter 2018 Graphical Models Markov Chain Monte Carlo Inference Siamak Ravanbakhsh Winter 2018 Learning objectives Markov chains the idea behind Markov Chain Monte Carlo (MCMC) two important examples: Gibbs sampling

More information