Metropolis Sampler and Markov Chains

Size: px

Start display at page:

Download "Metropolis Sampler and Markov Chains"

Ezra Waters
6 years ago
Views:

1 Lecture 9 Metropolis Sampler and Markov Chains MCMC: Markov Chain Monte Carlo

2 Last &me: Simulated Annealing Minimize by iden+fying with the energy of an imaginary physical system undergoing an annealing process. Move from to via a proposal. If the new state has lower energy, accept. If the new state has higher energy, accept with probability

3 Today from annealing to Metropolis markov chains and MCMC Metropolis and an introduc8on to Metropolis-Has8ngs

4 Annealing Recap stochas(c acceptance of higher energy states, allows our process to escape local minima. high T local minima discouraged low T only few uphill moves Thus, if we get our temperature decrease schedule right, we can hope that we will converge to a global minimum.

5 If the lowering of the temperature is sufficiently slow, the system reaches "thermal equilibrium" at each temperature. Then Boltzmann's applies: where

6 Proposal it proposes a new posi-on x from a neighborhood at which to evaluate the func-on. all the posi-ons x in the domain we wish to minimize a func-on over ought to be able to communicate. detailed balance: proposal is symmetric ensures generated by simulated annealing is a sta-onary markov chain with target boltzmann distribu-on: equilibrium

7 Example:

8 If you iden+fy and Then:

9 Normalized Boltzmann distribu2on M global minima in set func2on minimum value : As from above, this becomes if and 0 otherwise.

10 Sampling a Distribu0on Turn the ques,on on its head. Suppose we wanted to sample from a distribu,on $p(x)$ (corresponding to a minimiza,on of energy $-log(p(x))$). keep our symmetric proposal (reversibility!). Need irreducibility to sample from full distribu,on set T=1, and use our simulated annealing method Metropolis

11 def metropolis(p, qdraw, nsamp, xinit): samples=np.empty(nsamp) x_prev = xinit for i in range(nsamp): x_star = qdraw(x_prev) p_star = p(x_star) p_prev = p(x_prev) pdfratio = p_star/p_prev if np.random.uniform() < min(1, pdfratio): samples[i] = x_star x_prev = x_star else:#we always get a sample samples[i]= x_prev return samples

12 Uniform Proposal to sample the standard gaussian from scipy.stats import uniform def propmaker(delta): rv = uniform(-delta, 2*delta) return rv uni = propmaker(0.5) def uniprop(xprev): return xprev+uni.rvs()

13 Why do this? Why not rejec-on? wasteful more wasteful in higher dimensions curse of dimensionality in higher dimensions volume around mode gets smaller interplay of density and volume

14 Curse of dimensionality as dimensionality increases, center is lower volume, outside has more volume

15 Sampling from gaussian with uniform proposal

16 Markov Chain non IID, stochas-c process but one step memory only widely applicable, first order equa-ons

18 Sta$onarity or or Con$nuous case: define so that: then

19 Jargon Irreducible: can go from anywhere to everywhere Aperiodic: no finite loops Recurrent: visited repeatedly. Harris recurrent if all states are visited infinitely as.

21 Sta$onarity, again A irreducible (goes everywhere) and aperiodic (no cycles) markov chain will eventually converge to a sta:onary markov chain. It is the marginal distribu:on of this chain that we want to sample from, and which we do in metropolis (and for that ma?er, in simulated annealing). BURNIN

22 Ergodicity (stronger statement) Aperiodic, irreducible, posi/ve Harris recurrent markov chains are ergodic, that is, in the limit of infinite (many) steps, the marginal distribu/on of the chain is the same.

23 Detailed balance is enough for sta3onarity If one sums both sides over sta/onarity condi/on from above. which gives us back the

24 aperiodic and irreducible Rainy Sunny Markov chain

25 Transi'on matrix, applied again and again array([[ , ], [ 0.5, 0.5 ]]) [[ ] [ ]] [[ ] [ ]] [[ ] [ ]] [[ ] [ ]] [[ ] [ ]]

26 Sta$onary distribu$on can be solved for: Assume that it is Then: gives us and thus np.dot([0.9,0.1], tm_before): array([ , ])

27 MCMC Markov Chain Monte Carlo Foo1ng for Metropolis Find a markov chain whose sta2onary distribu2on is the distribu2on we need to sample from As long detailed balance we are ok:

28 Transi'on matrix for Metropolis: where is the Metropolis acceptance probability and is the rejec*on term.

29 Intui&on: approaches typical set Instead of sampling p we sample q, yielding a new state, and a new proposal distribu7on from which to sample.

30 The possibility of rejec2on in the Metropolis algorithm based on the throw of a random uniform makes the chain aperiodic. And if we want it to be irreducible, we need to make sure q can go everywhere that p can, or that the support of q includes everywhere the support of p Thus our Metropolis algorithm converges.

31 Metropolis-Has-ngs no$ce tails works on metropolis because we compare uniform to nega$ve we could reject but this is wrong leads to asymmetric proposal might want to use a posi$ve, 0-1 distribu$on like beta anyway. But asymmetric.

32 Metropolis-Has-ngs def metropolis_hastings(p,q, qdraw, nsamp, xinit): samples=np.empty(nsamp) x_prev = xinit for i in range(nsamp): x_star = qdraw(x_prev) p_star = p(x_star) p_prev = p(x_prev) pdfratio = p_star/p_prev proposalratio = q(x_prev, x_star)/q(x_star, x_prev) if np.random.uniform() < min(1, pdfratio*proposalratio): samples[i] = x_star x_prev = x_star else:#we always get a sample samples[i]= x_prev return samples

6 Markov Chain Monte Carlo (MCMC)

6 Markov Chain Monte Carlo (MCMC) The underlying idea in MCMC is to replace the iid samples of basic MC methods, with dependent samples from an ergodic Markov chain, whose limiting (stationary) distribution