Numerical methods for lattice field theory

Numerical methods for lattice field theory Mike Peardon Trinity College Dublin August 9, 2007 Mike Peardon (Trinity College Dublin) Numerical methods for lattice field theory August 9, 2007 1 / 37

Numerical methods - references Good introduction to the concepts: Simulation, Sheldon M. Ross, Academic Press ISBN 0-12-598053-1 Detail on the theory of Markov chains: Markov chains, Gibbs fields, Monte Carlo simulations and queues, Pierre Brémaud, Springer ISBN 0-387-98509-3 A classic: The Art of Computer Programming, Volume 2 Donald E. Knuth, Addison-Wesley ISBN 0-201-48541-9. And another: Numerical Recipes: The Art of Scientific Computing (3rd Edition), Press, Teukolsky, Vetterling and Flannery, CUP ISBN 0-521-88068-8 Applications to field theory: The Monte Carlo method in quantum field theory, Colin Morningstar arxiv:hep-lat/0702020 Mike Peardon (Trinity College Dublin) Numerical methods for lattice field theory August 9, 2007 2 / 37

An opening comment... Monte Carlo is the worst method for computing non-perturbative properties of quantum field theories - except for all other methods that have been tried from time to time. Churchill said: Democracy is the worst form of government, except for all those other forms that have been tried from time to time (in a speech to the House of Commons after losing the 1947 general election). Mike Peardon (Trinity College Dublin) Numerical methods for lattice field theory August 9, 2007 3 / 37

Probability (1) Outcomes, Events and Sample Spaces Assume we have a precise description of the outcome, ω of the observation of a random process. The set of all possible outcomes is called the sample space, Ω. Any subset A Ω can be regarded as representing an event occuring. The notation of set theory is helpful. A particularly useful collection of events, A 1, A 2,... A n are those that are mutually incompatible and exhaustive, which implies A i A j = when i j and N k=1 A k = Ω Mike Peardon (Trinity College Dublin) Numerical methods for lattice field theory August 9, 2007 4 / 37

Probability (2) A random variable is a function X : Ω R such that for a R, the event {ω, X (ω) a} can be assigned a probability. A discrete random variable is a random variable for which X : Ω E, where E is a denumerable set. Axioms of probability A probability (measure) on (F, Ω) a is a mapping P : F R such that 1 0 P(A) 1 2 P(Ω) = 1 3 P( N k=1 A k) = N k=1 P(A k) if A k are mutually incompatible. a F is the collection of events that are assigned a probability; for obscure reasons it is not the same as Ω Mike Peardon (Trinity College Dublin) Numerical methods for lattice field theory August 9, 2007 5 / 37

Probability (3) Two events are called independent if P(A B) = P(A)P(B) Similarly, two random variables, X and Y are called independent if for all a, b R, P(X a, Y b) = P(X a)p(y b) A conditional probability of event A given B is P(A B) = P(A B) P(B) The cumulative distribution function, F X describes the properties of a random variable X and is F X (x) = P(X x) The probability density function, f X is defined when F X can be written F X (x) = x f X (z)dz Mike Peardon (Trinity College Dublin) Numerical methods for lattice field theory August 9, 2007 6 / 37

Statistics (1) Expectation For an absolutely continuous c.d.f, and a function g, the expectation, E[g(X )] is defined as Mean and variance The mean, µ X of X is and the variance, σ 2 X is E[g(X )] = µ X = E[X ] = g(z) f X (z)dz z f X (z)dz σ 2 X = E[(X µ X ) 2 ] = E[X 2 ] µ 2 X Mike Peardon (Trinity College Dublin) Numerical methods for lattice field theory August 9, 2007 7 / 37

Statistics (2) The strong law of large numbers (Kolmogorov) If X (n) is the sample mean of n independent, identically distributed random numbers, {X 1, X 2,... X n } with well-defined expected value µ X, and variance σ X then X (n) converges almost surely to µ X ; ( P lim n X (n) = µ X ) = 1 Bernoulli:...so simple that even the stupidest man instinctively knows it is true The central limit theorem (de Moivre, Laplace, Lyapunov,... ) ( ) X (n) lim P µ X n σ X / n z = 1 z e y 2 /2 dy 2π As more statistics are gathered, all sample means become normally distributed. Mike Peardon (Trinity College Dublin) Numerical methods for lattice field theory August 9, 2007 8 / 37

The central limit theorem X Prob. densities X (n) n X (n) n = 2 n = 5 n = 50 Mike Peardon (Trinity College Dublin) Numerical methods for lattice field theory August 9, 2007 9 / 37

Generating (pseudo-)random numbers on a computer A deterministic machine like a computer is incapable of generating really random numbers. Randomness is faked; algorithms can be devised that generate sequences of numbers that appear random. To appear random implies statistical identities hold for large sets of these numbers. Testing randomness is done by both looking at statistical correlations (for example checking that using d adjacent entries in a sequence as co-ordinates of a d-dimensional point does not result in points that lie on certain planes in preference) and using a simulation to compute something known analytically. The best-known is Marsaglia s Diehard test. The elements of these sequences that have passed all the appropriate statistical tests are called pseudo-random. The original example is the linear congruential sequence. For details see eg. Knuth. Mike Peardon (Trinity College Dublin) Numerical methods for lattice field theory August 9, 2007 10 / 37

Generating (pseudo-)random numbers on a computer (2) Suppose we generate a sequence by applying some deterministic algorithm A A A X1 X2,... X 0 If any entry in the sequence can take any of m values, then the sequence must repeat itself in at most m iterations. The number of iterations before a sequence repeats itself is called the cycle, which may depend on the initial state, X 0. This repetition of the sequence over-and-over again is definitely not random! A useful generator must thus have m large - at least as large as the number of random numbers needed in the computer program, preferrably much more. The maximum possible cycle, independent of starting point, X 0. Mike Peardon (Trinity College Dublin) Numerical methods for lattice field theory August 9, 2007 11 / 37

Generating (pseudo-)random numbers on a computer (3) The best-known algorithm is: The linear congruential sequence The sequence, X 0, X 1, X 2,... with 0 X i < m is generated by the recursion X n+1 = ax n + c mod m with X 0 the seed a the multiplier 0 a < m c the increment 0 c < m m the modulus m > 0 The period can be maximised by ensuring c is relatively prime to m, b = a 1 is a multiple of p for every prime p dividing m and b is a multiple of 4 if m is a multiple of 4. See Knuth for details. Mike Peardon (Trinity College Dublin) Numerical methods for lattice field theory August 9, 2007 12 / 37

An example linear congruential sequence To demonstrate this in action, let s pick parameters according to the prescription above m = 90 = 5 3 3 2 then c = 13, b = 5 3 2 a = 31 and seed with X 0 = 0 The sequence is below. No number appears more than once. 0, 13, 56, 39, 52, 5, 78, 1, 44, 27, 40, 83, 66, 79, 32, 15, 28, 71, 54, 67, 20, 3, 16, 59, 42, 55, 8, 81, 4, 47, 30, 43, 86, 69, 82, 35, 18, 31, 74, 57, 70, 23, 6, 19, 62, 45, 58, 11, 84, 7, 50, 33, 46, 89, 72, 85, 38, 21, 34, 77, 60, 73, 26, 9, 22, 65, 48, 61, 14, 87, 10, 53, 36, 49, 2, 75, 88, 41, 24, 37, 80, 63, 76, 29, 12, 25, 68, 51, 64, 17 Mike Peardon (Trinity College Dublin) Numerical methods for lattice field theory August 9, 2007 13 / 37

Generating (pseudo-)random numbers on a computer (4) Many more algorithms exist now, and are used in real lattice QCD codes: Lüscher s ranlux This generator, based on Marsaglia and Zaman s RCARRY algorithm has a long period (10 171 ) when set up properly. The quality is established by considering the dynamics of chaotic systems. (M. Lüscher, A portable high-quality random number generator for lattice field theory simulations, Comput. Phys. Commun 79 (1994), 100-110. hep-lat/9309020) Matsumoto and Nishimura s mersenne twister The period length is a Mersenne prime, and the most commonly-known variant has a period of 2 19937 1 4 10 6001. (M. Matsumoto and T. Nishimura, Mersenne Twister: A 623-dimensionally equidistributed uniform pseudorandom number generator, ACM Trans. on Modeling and Computer Simulation Vol. 8, No. 1, January (1998) 3-30) Mike Peardon (Trinity College Dublin) Numerical methods for lattice field theory August 9, 2007 14 / 37

Generating more general distributions We will find it useful soon to generate random numbers with more general distributions. Consider taking a uniform variate U and applying some function v (defined in [0, 1]; let us also assume it is increasing in that region). We get a new random variable, V = v(u). What is the probability distribution of V? A simple analysis, shown in the figure (right) shows that equating the probabilities v max gives dv f V (v) = du dv = ( ) dv 1 du v min 0 du 1 Mike Peardon (Trinity College Dublin) Numerical methods for lattice field theory August 9, 2007 15 / 37

The transformation method This observation suggest an algorithm, provided the function v can be found. v is the solution to u = v(u) f X (z)dz unfortunately, there are few pdfs for which this method is efficient! Perhaps the most important example for QCD: the Box-Muller algorithm, which generates pairs of normally distributed random numbers. The Box-Muller algorithm 1 Generate two independent u.v. random numbers, U 1,2 (0, 1] 2 Set Θ = 2πU 1 and R = 2 ln U 2 3 Return R cos Θ or R sin Θ (or both; they are independent). Mike Peardon (Trinity College Dublin) Numerical methods for lattice field theory August 9, 2007 16 / 37

The rejection method Consider the following experiment, for regions F and Q F : 1 Generate a random point, P with co-ordinates (X, Y ) inside F 2 If P is also inside Q, return X. 3 Otherwise go back to step 1. Region Q y Region F x What is the p.d.f. for X? Answer: q(x)/ q(z) dz where q bounds the region Q. (There is no need to know the normalisation of q). The rejection method 1 Generate X from p.d.f. f 2 Generate Y = c f (X )U where U is a uniform variate and c is a constant such that f (x) > q(x). 3 If Y < q(x ), return X otherwise go to step 1. Mike Peardon (Trinity College Dublin) Numerical methods for lattice field theory August 9, 2007 17 / 37

Monte Carlo integration Consider X, a random point in V with probability density function p X (x). A new stochastic variable F is defined by applying a function f to X, so F = f (X ). The expected value of F is then E(F ) = f (x)p X (x)dx V If N independent instances of X are generated, and then N instances of F are derived, the mean of this ensemble is F N = 1 N F i N The basic result of Monte Carlo lim F N = E(F ) = N V i=1 f (x)p X (x)dx Mike Peardon (Trinity College Dublin) Numerical methods for lattice field theory August 9, 2007 18 / 37

Monte Carlo integration - an example Estimate 1 sin 10π(1 x) I = dx 0 x using Monte Carlo. Generate an ensemble of uniform variate points, {U 1, U 2,... }. Then evaluate F i = f (U i ) and compute the mean of this ensemble. 0.3 0.2 Monte Carlo estimate, I(n) 0.1 0-0.1-0.2-0.3 10 2 10 3 10 4 10 5 10 6 10 7 10 8 Number of Monte Carlo samples, n Mike Peardon (Trinity College Dublin) Numerical methods for lattice field theory August 9, 2007 19 / 37

Importance Sampling For almost all the physics problems we want to solve, we find just a tiny fraction of configuration space contributes to the integral; variance reduction will be crucial. The most useful variance reduction technique is importance sampling Instead of a flat sampling over the integration volume, the regions where the integral dominates are sampled preferentially For any p(x) > 0 in V, we have I = V f (z) Dz = V f (z) p(z) Dz p(z) Suppose we generate an ensemble of points, {X 1, X 2,... } in V with p.d.f. p and compute the random variable J i = f (X i) p(x i ) Mike Peardon (Trinity College Dublin) Numerical methods for lattice field theory August 9, 2007 20 / 37

Importance Sampling Then (as seen above), E[J] = I, so the sample mean converges to the answer, independently of p but...... the variance is σj 2 = E[J2 ] E[J] 2 = V f (z) 2 p(z) 2 p(z) Dz I 2 = V f (z) 2 p(z) Dz I 2 and so this does depend on the choice of p. The optimal choice of p would minimise σj 2, so a functional variation calculation yields the optimal choice to be p(x) = V f (x) f (z) Dz In practice, it is usually difficult to draw from this sampling (since if we could, it is likely we could compute I directly!), but this suggests p should be peaked where f is peaked. For QFTs, importance sampling is just a synonym for using the Boltzmann weight in the Euclidean path integral (doesn t have to be!) Mike Peardon (Trinity College Dublin) Numerical methods for lattice field theory August 9, 2007 21 / 37

Importance Sampling (3) Another example. Using Monte Carlo, estimate Note that I (a) = 2 a a Simple Monte Carlo 1 Generate independent uniform variates X, Y [0, a] 2 Compute F = e (X 2 +Y 2 )/2 3 Repeat N times, and average result, F (N) return 2a 2 F (N) 0 0 lim I (a) = π a e (x2 +y 2 )/2 dxdy Importance sampling 1 Generate normally distributed X, Y [0, ] 2 G = 1 if X a, Y a. 3 Repeat N times, and average result, Ḡ (N) return πḡ (N) As a, the region contributing almost all the integral (near the origin) becomes fractionally smaller and smaller - variance in simple MC estimate will diverge. Mike Peardon (Trinity College Dublin) Numerical methods for lattice field theory August 9, 2007 22 / 37

Importance Sampling (4) 3.5 3 Estimated integral, I(a) 2.5 2 1.5 1 0.5 0 0 5 10 15 20 box length, a The variance in the importance sampling estimator becomes zero as a Unfortunately, so far we have only developed algorithms to generate random numbers with a general probability distribution function in very low dimensions - for field theory applications, we want to calculate integrals in many thousands of dimensions... Mike Peardon (Trinity College Dublin) Numerical methods for lattice field theory August 9, 2007 23 / 37

Markov Processes A more general method for generating points in configuration space is provided by considering Markov processes. In 1906, Markov was interested in demonstrating that independence was not necessary to prove the (weak) law of large numbers. He analysed the alternating patterns of vowels and consonants in Pushkin s novel Eugene Onegin. In a Markov process, a system makes stochastic transitions such that the probability of a transition occuring depends only on the start and end states. The system retains no memory of how it came to be in the current state. The resulting sequence of states of the system is called a Markov chain. Mike Peardon (Trinity College Dublin) Numerical methods for lattice field theory August 9, 2007 24 / 37

Markov Processes (2) A Markov Chain Let {ψ i } for i = 0..n + 1 be a sequence of states generated from a finite state space Ω by a stochastic process. Let χ k Ω be a state, such that χ i χ j = if i j and Ω = χ 1 χ 2 χ 3... χ m. If the conditional probability obeys P(ψ n+1 = χ i ψ n = χ j, ψ n 1 = χ jn 1,..., ψ 0 = χ j0 ) = P(ψ n+1 = χ i ψ n = χ j ), then the sequence is called a Markov Chain Moreover, if P(ψ n+1 = χ i ψ n = χ j ) is independent of n, the sequence is a homogenous Markov chain. From now on, most of the Markov chains we will consider will be homogenous, so we ll drop the label. Mike Peardon (Trinity College Dublin) Numerical methods for lattice field theory August 9, 2007 25 / 37

Markov Processes (3) The transition probabilites fully describe the system. They can be written in matrix form (usually called the Markov matrix); M ij = P(ψ n+1 = χ i ψ n = χ j ) The probability the system is in a given state after one application of the process is then P(ψ n+1 = χ i ) = m P(ψ n+1 = χ i ψ n = χ j )P(ψ n = χ j ) j=1 Writing the probabilistic state of the system as a vector, application of the process looks like linear algebra p i (n + 1) = P(ψ n+1 = χ i ) = M ij p j (n) Mike Peardon (Trinity College Dublin) Numerical methods for lattice field theory August 9, 2007 26 / 37

Markov Processes (4) An example: Seattle s weather. It is noticed by a resident that on a rainy day in Seattle, the probability tomorrow is rainy is 80%. Similarly, on a sunny day the probability tomorrow is sunny is 40%. This suggests Seattle s weather can be described by a (homogenous) Markov process. From this data, can we compute the probability any given day is sunny or rainy? For this system, the Markov matrix is Sunny Rainy ( Sunny 0.4 0.6 Rainy ) 0.2 0.8 Mike Peardon (Trinity College Dublin) Numerical methods for lattice field theory August 9, 2007 27 / 37

Markov Processes (5) ( ) 1 If today is sunny, then ψ 0 =, the state vector for tomorrow is 0 ( ) ( ) ( ) 0.4 0.28 0.256 then ψ 1 =, and ψ 0.6 2 =, ψ 0.72 3 =,... 0.744 ( ) 0 If today is rainy, then ψ 0 =, the state vector for tomorrow is 1 ( ) ( ) ( ) 0.2 0.24 0.248 then ψ 1 =, and ψ 0.8 2 =, ψ 0.76 3 =, 0.752 The vector ψ quickly collapses to a fixed-point, which must be π, the eigenvector of M with eigenvalue 1, normalised such that 2 i=1 π i = 1. ( ) 0.25 We find π =. This is the invariant probability distribution 0.75 of the process; with no prior information these are the probabilities any given day is sunny (25%) or rainy (75%). Mike Peardon (Trinity College Dublin) Numerical methods for lattice field theory August 9, 2007 28 / 37

The Markov Matrix (1) The Markov matrix has some elementary properties 1 Since all elements are probabilities, 0 M ij 1 2 Since the system always ends in Ω, N M ij = 1 i=1 From these properties alone, the eigenvalues of M must be in the unit disk; λ 1, since if v is an eigenvector, M ij v j = λv i = M ij v j = λ v i = M ij v j λ v i j j j ( v j ) M ij λ v i = 1 λ j i i Mike Peardon (Trinity College Dublin) Numerical methods for lattice field theory August 9, 2007 29 / 37

The Markov Matrix (2) Also, a Markov matrix must have at least one eigenvalue equal to unity. Considering the vector v i = 1, i, we see v i M ij = M ij = 1, j i i and thus v is a left-eigenvector, with eigenvalue 1. Similarly, for the right-eigenvectors, M ij v j = λv i = v j M ij = λ v i = j i i j j and so either λ = 1 or if λ 1 then i v i = 0 v j = λ i v i Mike Peardon (Trinity College Dublin) Numerical methods for lattice field theory August 9, 2007 30 / 37