Markov Chain Monte Carlo Method

Markov Chain Monte Carlo Method Macoto Kikuchi Cybermedia Center, Osaka University 6th July 2017

Thermal Simulations 1 Why temperature 2 Statistical mechanics in a nutshell 3 Temperature in computers 4 Introduction to Markov Chain Monte Carlo method 5 Remarks 6 New methodology

Why temperature is important Thermal motion (fluctuation) is important for Liquid state Ordeing phanomena (crystal growth) Soft matters Macromolecules, polymer, gel Biomolecules (in vivo, in vitro) DNA, Proteins, Membrane

Thermal fluctuation of Kinesin

cont. Phase transitions liquid-solid, liquid-gas, paramagnet-ferromagnet Superconductivity, Superfluidity melting of metalic materials Electric conduction Resistance by lattice vibration

Ferromagnetic transition 1 10 0.8 8 0.6 6 m 0.4 chi 4 0.2 2 0 0.6 0.8 1 1.2 1.4 T/Tc 0 0.6 0.8 1 1.2 1.4 T/Tc magnetization and magnetic susceptibility

computer simulations To treat thermal effect in molecular-level simulations Molecular dynamics (MD) Markov Chain Monte Carlo method (MCMC or Metropolis method) Both method is for simulating thermal equilibrium (and nonequilibrium, with special care)

Statistical mechanics in a nutshell Consider a many-particle system N 10 23 in real systems As many particles as we can treat in computer simulations We would like to know properties of matters in the thermal equilibrium state

thermal equilibrium Macroscopic state of matter reached after a long time under a certain external condition Stable as long as the external condition is kept unchanged Distinguished by only a few thermodynamic (macroscopic) quantities: temperature, pressure, volume, total energy etc.

Two levels of state of matter Microscopic (molecular level) states particle configurations distinguishable microscopically Macroscopic states distinguishable only by macroscopic (thermodynamic) quantities such as total energy Thermal equilibrium is macroscopically still. But from the microscopic point of view, the matter changes its microscopic state rapidly.

Basic formula Boltzman s formula for entropy S(E) = k B log W (E) W (E): number of microscopic states having energy E k B : Bolzman s constant Thermodynamic definition of temperature 1 T = S(E) E

Simple model system Collection of N elements each of which can take one of two states two levels of energy Energy of i-th element e i = 0, ϵ Energy of the total system E = i e i

Microscopic states One assignment of enegy for all the elements defines one microscopic state 2 N distinguishable microscopic states in total Macroscopic states Distinguishable by total energy E = nϵ (We assume N n) Number of the corresponding microscopic states is ( ) N W (E) = W (nϵ) = n

Consider a closed system No interaction with external environment Total energy E is kept unchanged (conservation law of energy) Principle of equal weight The microscopic states having the same total energy realize in the same probability All the microscopic states of the same energy are equally probable to appear While the total energy is kept constant, the system constantly itenerates from a microstate to another to another... This is the basic assumption for thermal equilibrium

(math.) Stirling s formula log n! n log n n Boltzman s entropy ( ) N S(nϵ) = k B log n N log N n log n (N n) log(n n)

Temperature 1 T = S E = k B ϵ n log W = k B ϵ log N n n k B ϵ log N n

Consider a part of the system (called subsystem ) consisting of m elements (N m, and estimate the probability that the subsystem has the energy lϵ. Number of microscopic states of the total system in which the subsystem has the energy lϵ: ( )( ) m N m ω(lϵ) = l n l Probability that such microscopic states realize: P(lϵ) = ω(lϵ) W (nϵ)

From the Stirling s formula (via a tedious calculation) ) log ( N m n l ( N n) =(N m) log(n m) + (N n) log(n n) + n log n (n l) log(n l) (N m n + l) log(n m n + l) N log N l log n N = ϵ k B T

Finally, we have an important result ( ) ( m P(lϵ) = exp lϵ ) l k B T Probability is (Number of microscopic states of the subsystem having the energy lϵ) (exp( energy/k B T ))

General result for the thermal equilibrium state of any system contacting with a very large system (heat bath) of the temperature T Boltzman distribution Appearance probability of a microscopic state having the energy ε is ( P(ε) exp ε ) k B T

Thermal averages e.g. Average energy for the thermal equilibrium state of the temperature T can be calculated as E = 1 ( ε i exp ε ) i Z k B T with Z = ( exp ε ) i k B T i i is the index for microscopic states. i

Markov Chain Monte Carlo Sample microscopic states of a fixed temperature using the computer simulations of a Markov chain specially designed to realize thermal equilibrium state The same method can also be used for Baysian inference Similar to the Boltzman machine in the field of AI Also used in the simulated annealing for optimization Basis of Metropolis light transport in the field

Construct a Markov process of that the average quantities (e.g. energy) in the steady state coincide with the thermal averages at the temperature T goal lim N 1 N N A i = A i=1 l.h.s.: avarage in the steady state of the Markov process r.h.s.: thermal average

Markov process is defined by a set of the transition probability w ij from jth microscopic state to ith states requirement 1 2 0 w ij 1 w ij = 1 i

Consider the probability distribution of microscopic state i at t-th step P i (t), then P i (t) = 1 i One step of evolution of the state according to the transition probability is P i (t + 1) = j w ij P j (t)

In the vector and matrix notation W: Markov matrix The large step limit P(t + 1) = W P(t) P( ) = lim n W n P(t0 ) Simce the largest eigenvalue of the Markov matrix is 1, P is a steady state that satisfies W P( ) = P( )

requirement for W Ergodicity System at an arbitrary state can reach all the states in finite steps State space should be singly connected, otherwise the steady state is not uniquely determined. Since the number of states is finite, the steady state is reached in a finite steps.

We require that the steady state coincides with the thermal equilibrium state requirement ( P i ( ) exp ε ) i k B T The following is the sufficient condition Detailed balance ( w ij exp ε ) j k B T ( = w ji exp ε ) i k B T

or Detailed balance 2 ( ) w ij εij = exp w ji k B T where ε ij ε i ε j The most widely used transition probability is Metropolis transition probability [ ( w ij = min 1, exp ε )] ij k B T

Problem The state space is usually astronomically huge In case of the two-state system with 1000 elements (very small considering today s computing power), the number of the microscopic states is 2 1000 10 300. Thus the Markov matrix is 10 300 10 300. Solution Instead of having the distribution vector P, we carry a single microscopic state and follow its trajectory in the state spece by simulating the stochastic process.

Problem We cannot obtain the absolute probability P i, becarse we do not know the number of microstates ω(ε). Instead, we sample microstates in the relative probability that is proportional to P i Solution Compute only the thermal averages. Forget about computing P i itself.

Procedure 1 Prepare any initial state i 2 Make a candidate state j for the next step 3 Generate a random number R in [0, 1] and compare to the transition probability w ji 4 If R w ji, change the state to j. Otherwise, keep the state i. 5 Repeat many times After sufficiently long steps, the system reaches the thermal equilibrium state. After that, the states obtained by the simulation are samples from the thermal equilibrium.

A simple example Ising model A model for ferromagnet, binary alloy, neural network etc. Defined on a lattice with N lattice points Two-state elements S(called spin ) are located on the lattice points. each element can take one of the two states S = ±1 (called up and down ) Total number of the microscopic states is 2 N

Ising model (cont d) If the two spins located in a neibouring lattice points have the same value, they have the energy J, otherwise have the energy J The energy of the two spins are defined as ε ij = JS i S j (i, j indicates the lattice points) The total energy of the system is ε = J ij S i S j (The sum is taken over all the neighboring lattice points)

Procedure (Metropolis method) Preparation: Assign +1 or 1 to all the spins Flip: Choose a spin to flip Calculate the energy difference ε due to a flip. Note that the energy difference can be calculated locally If ε < 0 then flip the spin. Otherwise ( ) make a random number R. If R < exp ε k B then flip T the spin Repeat many times

Remarks Initial relaxation Microscopic states obtained in the earlier stages of the simulation is not the sample from the equilibrium state, because the effect of the initial (arbitrary) state remains. Therefore, samples from earlier steps should be discarded (thermalization process

Sampling interval Microscopic states obtained from nearby steps are close to each other. So they are not the statistically independent samples. Samples should be collected with a sufficient interval.

Statistical analysis Since we have only a small fraction of microstates among all the possible states, thermal averages are suffered from statistical error. So, the standard error analyses as those used in experimental science should be employed. In that sense, MCMC is a kind of computer experiment. Many sophisticated statistical analysis methods have been proposed.

Random number Random numbers are pseudorandom Random numbers generated (RNG) by any algorithm are not truely random. Therefore, quality of the random number generator is important. ex. Mersenne-Twister (MT) is one of the candidate of the good RNG Physical RNG using noises in electric circuit is a good candidate, because it s truely random. But such RNGs are not always available. And also they lack the repeatability.

Caution There are very bad RNGs, sometimes even in standard libraries. Thus selecting a good RNG is really important in MCMC. There is no RNG that is good for any purpose. So, the RNGs should be tested for each application. For example, MT is good for most MCMC simulations, but is not appropriate for using in encryption.

New methods and rare event sampling

Histogram reweighting If we get the histogram of the energy H(ε, T ) by MCMC at temperature T ( H(ε, T ) ω(ε) exp ε ) k B T where ω(ε) is the number of microstates with ε. Then the energy distribution for a different temperature T can be estimated as P(ε, T ) H(ε, T ) exp {( 1 k B T 1 k B T ) } ε

3000 P(E;β) 2000 1000 0-1800 -1600-1400 -1200-1000 E Original histogram 8 10-4 6 10-4

P(E;β P(E;β Why temperature statistical mechanics Markov Chain Monte Carlo New methods 4 10-4 1000 2 10-4 0 0-1800 -1600-1400 -1200-1000 E -1800-1600 -1400-1200 -1000 E 8 10-4 6 10-4 2 10 19 P(E;β ) 4 10-4 2 10-4 P(E;β ) 1 10 19 0-1800 -1600-1400 -1200-1000 E 0-1800 -1600-1400 -1200-1000 E Reweighted histogram 2 10 19 Reweighted histograms become jaggy at the 1 10 19 shifted side. But it is a basis of the extended 0ensemble methods P(E;β ) -1800-1600 -1400-1200 -1000 E

Extended ensemble methods Accelarate the relaxation especially in low temperature or crossing the energy barrier Calculate thermal quantities for wide range of temperatures by a single simulation Count the number of microstates and compute entropy Rare event sampling

Exchange method Simulate many identical systems with different temperature simultaneously, and exchange their temperature from time to time according to the transition rate that satisfies the following condition: W (1, 2 2, 1) W (2, 1 1, 2) = exp { ( 1 k B T 1 1 k B T 2 Then simultaneous equilibrium at all the temperatures is reaches. ) (ε 2 ε 1 ) }

Multicanonical method Equilibrium distribution is made inversely proportional to ω(ε) P(ε) 1 ω(ε) Very broad histogram is obtained Transition rate is determinde through Learning process (machine learning) Most frequently used method is Wang-Landau method Thermal equilibrium can be obtained by the histogram reweighting method

Conceptual difference between the conventional MCMC and the multicanonical MC

Multicanonical method to unphysical direction In order to bipass some physical constraint, we can use multicanonical method that relaxes the constraint example: Multi-self-overlap ensemble for lattice polymer

Relaxing the self-avoidance condition, the polymer can readily transit among these three configurations.

Rare event sampling using multicanonical method We can generate very rare configuration using multicanonical MC and estimate its appearance probability This method can be applied even to non-physical systems by defining appropriate energy function

example Count the number of magic squares Number of 30 30 magic square was estimated to be 6.56(29) 10 2056

Report: Make a MCMC program for Ising model.