Guiding Monte Carlo Simulations with Machine Learning Yang Qi Department of Physics, Massachusetts Institute of Technology Joining Fudan University in 2017 KITS, June 29 2017. 1/ 33
References Works led by Liang Fu, Zi-Yang Meng, and Lei Wang. Jun-Wei Liu, Xiao-Yan Xu, Hui-Tao Shen, Yuri Nagai, Li Huang, Yi-Feng Yang etc. References: 1. J Liu, YQ, Z-Y Meng and L Fu, PRB 95, 041101(R) (2017). 2. J Liu, H Shen, YQ, Z-Y Meng and L Fu, PRB 95, 241104(R) (2017). 3. X-Y Xu, YQ, J Liu, L Fu and Z-Y Meng, arxiv:1612.03804 [cond-mat.str-el]. 4. Y Nagai, H Shen, YQ and L Fu, arxiv:1705.06724 [cond-mat.str-el]. 5. Li Huang, Lei Wang, PRB 95, 035105(R) (2017). 6. Li Huang, Yi-Feng Yang, Lei Wang, PRE 95, 031301(R)(2017). 7. Lei Wang, arxiv:1702.08586 [cond-mat.str-el]. 2/ 33
Outline 1 Introduction to Markov Chain Monte Carlo 2 The Importance of Update 3 Example: Ising Model and Cluster Update 4 Example: Bosonic Model and Cumulative Update 3/ 33
Outline 1 Introduction to Markov Chain Monte Carlo 2 The Importance of Update 3 Example: Ising Model and Cluster Update 4 Example: Bosonic Model and Cumulative Update 4/ 33
Monte Carlo simulation: an unbiased method A widely used numerical method in statistical physics and quantum many-body physics. Unbiased: reliable statistical error bar. Fast: error decreases as 1/ N. Universal: applies to any model without the sign program. 5/ 33
Introduction to MCMC Consider a statistical mechanics model: Z = C e βh[c] = C W (C). The Markov-chain Monte Carlo (MCMC) is a way to do importance sampling. A Markov chain is constructed, C i 1 C i C i+1 Markov chain: p(c i C j ) only depends on C i (no memory). Goal: distribution of C converges to the Boltzmann distribution W (C). Any observable can be measured from a Markov chain, O(C)W (C) O = 1 O(C i ). W (C) N i 6/ 33
Detailed balance Detailed balance: p(c D) p(d C) = W (D) W (C). Detailed balance (and ergodicity) guarantees that IF the MC converges, it converges to the desired distribution W (C). Metropolis-Hastings algorithm: propose accept/reject. p(c D) = q(c D)α(C D). { α(c D) = min 1, W (D) } q(d C). W (C) q(c D) 7/ 33
Autocorrelation time Autocorrelation time measures the efficiency of the update algorithm. time sequence: O(t 1) O(t) O(t + 1), O(t) = O[C(t)]. Autocorrelation function A O ( t) = O(t)O(t + t) O(t) 2 e t/τ. Complexity τ O = 1 O(C i ). N Statistical error δo 1 N i only if O(i) are M(t)M(s + t) M 2 0.035 0.030 0.025 0.020 0.015 0.010 0.005 Local Self-learning Naive-Wolff 10-2 10-3 0 100 200 300 400 500 600 700 800 independent. 0.000 0 100 200 300 400 500 600 700 800 t 8/ 33
Outline 1 Introduction to Markov Chain Monte Carlo 2 The Importance of Update 3 Example: Ising Model and Cluster Update 4 Example: Bosonic Model and Cumulative Update 9/ 33
Metropolis algorithm Local update: randomly select a site and flip the spin. q(c D) = q(d C) = 1 N { }. α(c D) = min. 1, W (D) W (C) N trials are counted as one MC step. Very general: applies to any model. N Metropolis, A W Rosenbluth, M N Rosenbluth, A H Teller, and E Teller, J Chem Phys 21, 1087 (1953). 10/ 33
Metropolis algorithm Local update: randomly select a site and flip the spin. q(c D) = q(d C) = 1 N { }. α(c D) = min. 1, W (D) W (C) N trials are counted as one MC step. Very general: applies to any model. N Metropolis, A W Rosenbluth, M N Rosenbluth, A H Teller, and E Teller, J Chem Phys 21, 1087 (1953). 10/ 33
Critical slowing down The real dynamical relaxation time diverges at the critical point: a critical system is slow to equilibrate. The local update mimics the real relaxation process: also exhibits the critical slowing down phenomena. τ L z, z = 2.125 for the 2D Ising model. 2500 10 4 2000 1500 1000 500 τ L z 10 3 10 2 0 0 10 20 30 40 50 60 70 80 90 10 1 10 1 10 2 There s a way around this: MCMC simulation does not have to mimic the real dynamics... 11/ 33
Wolff algorithm: a cluster update A cluster is built one bond at a time. Probability of activating a bond is cleverly designed, such that q(c D) q(d C) = W (D) W (C). Thus we have the ideal acceptance ratio α = 1. Very efficient update: τ L 0.35. Only works with two-body interactions: H = ij J ijs i s j. U Wolff, PRL. 62, 361 (1989). 12/ 33
Wolff algorithm: a cluster update A cluster is built one bond at a time. Probability of activating a bond is cleverly designed, such that q(c D) q(d C) = W (D) W (C). Thus we have the ideal acceptance ratio α = 1. Very efficient update: τ L 0.35. Only works with two-body interactions: H = ij J ijs i s j. U Wolff, PRL. 62, 361 (1989). 12/ 33
Wolff algorithm: a cluster update A cluster is built one bond at a time. Probability of activating a bond is cleverly designed, such that q(c D) q(d C) = W (D) W (C). Thus we have the ideal acceptance ratio α = 1. Very efficient update: τ L 0.35. Only works with two-body interactions: H = ij J ijs i s j. U Wolff, PRL. 62, 361 (1989). 12/ 33
Wolff algorithm: a cluster update A cluster is built one bond at a time. Probability of activating a bond is cleverly designed, such that q(c D) q(d C) = W (D) W (C). Thus we have the ideal acceptance ratio α = 1. Very efficient update: τ L 0.35. Only works with two-body interactions: H = ij J ijs i s j. U Wolff, PRL. 62, 361 (1989). 12/ 33
Wolff algorithm: a cluster update A cluster is built one bond at a time. Probability of activating a bond is cleverly designed, such that q(c D) q(d C) = W (D) W (C). Thus we have the ideal acceptance ratio α = 1. Very efficient update: τ L 0.35. Only works with two-body interactions: H = ij J ijs i s j. U Wolff, PRL. 62, 361 (1989). 12/ 33
Wolff algorithm: a cluster update A cluster is built one bond at a time. Probability of activating a bond is cleverly designed, such that q(c D) q(d C) = W (D) W (C). Thus we have the ideal acceptance ratio α = 1. Very efficient update: τ L 0.35. Only works with two-body interactions: H = ij J ijs i s j. U Wolff, PRL. 62, 361 (1989). 12/ 33
Wolff algorithm: a cluster update A cluster is built one bond at a time. Probability of activating a bond is cleverly designed, such that q(c D) q(d C) = W (D) W (C). Thus we have the ideal acceptance ratio α = 1. Very efficient update: τ L 0.35. Only works with two-body interactions: H = ij J ijs i s j. U Wolff, PRL. 62, 361 (1989). 12/ 33
Wolff algorithm: a cluster update A cluster is built one bond at a time. Probability of activating a bond is cleverly designed, such that q(c D) q(d C) = W (D) W (C). Thus we have the ideal acceptance ratio α = 1. Very efficient update: τ L 0.35. Only works with two-body interactions: H = ij J ijs i s j. U Wolff, PRL. 62, 361 (1989). 12/ 33
Challenge Local update is too slow for many models: critical slowing down, glassy behavior, separation of energy scales, etc. Global update is only available for certain models. Like Wolff algorithm for two-body interactions. A good update algorithm for generic models? 13/ 33
A good update samples low-energy states A dilemma of local updates: Step is too small: high acceptance, small difference. Step is too big: low acceptance, big difference. Global updates: explore the low-energy configurations. 14/ 33
A good update samples low-energy states A dilemma of local updates: Step is too small: high acceptance, small difference. Step is too big: low acceptance, big difference. Global updates: explore the low-energy configurations. 14/ 33
A good update samples low-energy states A dilemma of local updates: Step is too small: high acceptance, small difference. Step is too big: low acceptance, big difference. Global updates: explore the low-energy configurations. 14/ 33
Modeling the update process A stochastic function f : C D. Parameterized: f = f (p 1, p 2,..., p n ). Analytically known q(c f D). Optimize p i, such that Choose the acceptance ratio α(c D) = min The approximation q(c D) f W (D) q(d C) f W (C) long as we know q(c f D) accurately. q(c f D) q(d f C) W (D) W (C). { } 1, W (D) q(d f C) 1. W (C) q(c f D) does not introduce error in the MC simulation, as 15/ 33
Design (i) Trial simulation by local update (ii) Learning Machine Learning Ηeff (iii) (iv) Simulating Ηeff Η Propose trial Conf. Detailed balance 16/ 33
Outline 1 Introduction to Markov Chain Monte Carlo 2 The Importance of Update 3 Example: Ising Model and Cluster Update 4 Example: Bosonic Model and Cumulative Update 17/ 33
Example: The model: The effective model: H = J 1 s i s j J 2 ij H eff = J eff 1 ijkl s i s j. ij s i s j s k s l. We consider J 1 = 1, J 2 = 0.2. An Ising transition at T c = 2.493. Ising model without J 2 has T c = 2.269. 18/ 33
The self-learning update The update: cluster is constructed using the effective model. q(c D) q(d C) = W eff(d) W eff (C). The acceptance ratio: { α(c D) = min 1, W (D) W (C) W eff (C) W eff (D) }. The algorithm still satisfies the detailed balance exactly, although the effective model is approximate. 19/ 33
Learning the effective model 1. Generate a sample using the local update. 2. Perform a linear regression, H eff = J eff 1 S i S j + E 0. ij 20/ 33
Unsupervised machine learning The linear regression we just did can be viewed as an unsupervised machine learning. Unsupervised ML: learning the underlying distribution from a sample of configurations. We pretend that we didn t know H, and learn a simpler model H eff. The learned model describes the low-energy effective theory, where the high-energy fluctuations are treated as noise in the sample. Can incorporate more advanced ML models and algorithms. 21/ 33
Learning the effective model 1. Generate a sample using the local update, at T = 5 > T c. 2. Perform a linear regression, H eff = J eff 1 S i S j + E 0. 3. J eff 1 = 1.0726. 4. Generate another sample using the self-learning update, at T = T c. 5. J eff 1 = 1.1064. ij 22/ 33
Autocorrelation time M(t)M(s + t) M 2 0.035 0.030 0.025 0.020 0.015 0.010 0.005 Local Self-learning Naive-Wolff 10-2 10-3 0 100200300400 500600700800 0.000 0 100 200 300 400 500 600 700 800 t System size 40 40. 23/ 33
SLMC works well at moderate sizes 2000 Local Self-Learned τ 1500 10 3 10 2 10 1 1000 10 0 0 20 40 60 80 500 0 0 10 20 30 40 50 60 70 80 L The results look good at moderate sizes L = 20 80, but the scaling behavior is exponential... Eventually, the self-learning update becomes slower than the local update. 24/ 33
Limiting the cluster size H = H H eff scales with the perimeter of the cluster L. τ e L/L0 because α e β H e L/L0. A simple effective model is not accurate: need more accurate model, with a hierarchy of energy scales. Quick fix: limit the cluster size during cluster growing. Still satisfies the detailed balance, if α is corrected correspondingly. A block update: same scaling as the local update, but with a smaller prefactor. 25/ 33
Restricted-cluster-size update τ 8000 7000 6000 5000 4000 3000 2000 Local Self-Learned Restricted τ L L 2.2 2000 1500 1000 500 0 0 20 40 60 80 τ R L 2.1 1000 0 0 50 100 150 200 250 300 350 L We restrict the cluster size to r = 40. 26/ 33
Summary τ 8000 7000 6000 5000 4000 3000 2000 Local Self-learning Restricted τ L L 2.2 2000 1500 1000 500 0 0 20 40 60 80 τ R L 2.1 1000 0 0 50 100 150 200 250 300 350 L 27/ 33
Outline 1 Introduction to Markov Chain Monte Carlo 2 The Importance of Update 3 Example: Ising Model and Cluster Update 4 Example: Bosonic Model and Cumulative Update 28/ 33
Determinant Monte Carlo: computational cost Fermion coupling to (auxiliary) bosonic field. H = ij t ij c i c j + i σ i c i c i + H[σ]. Simulation: use bosonic field as configurations. C = {σ i (t)}, i = 1,... N; 0 < t < β. Weight of each configuration W (C): integrate out the fermions. Computational cost: βn 3. Fast update: βn 3 for βn steps of local updates. Generating two independent configurations: βn 3 τ L. 29/ 33
Cumulative update Idea: replace W (C) by W (C), which is much faster to evaluate. Local updates guided by W (C): C D : C C 1 C i 1 C i C i+1 C M D. Detailed balance: p(c i C i+1 ) p(c i+1 C i ) = W (C i+1) W (C i ) For each path: p(c C 1 ) p(c i C i+1 )p(c M D) p(c 1 C) p(c i+1 C i )p(d C M ) = W (C 1 ) W (C 2 ) W (C) W (C 1 ) W (C M ) W (D) W (C M 1 ) W (C M ) = W (D) W (C). Summing over all paths: q(c D) p(c D) = q(d C) p(d C) = C i p(c C 1 ) p(c i C i+1 )p(c M D) C i p(c 1 C) p(c i+1 C i )p(d C M ) = W (D) W (C). 30/ 33
Cumulative update Approximated weight: Update scheme: Acceptance ratio: Choose good W and M τ L : τ CU 1. Computational cost: βn 3 τ βn 3. W (C) W (C). q(c D) q(d C) = W (D) W (C) W (D) W (C). { α(c D) = min 1, W (D) W } (C) W (C) W 1. (D) Again, W (C) W (C) does not introduce approximation. 31/ 33
Constructing effective weights 1. Physical intuition: bosonic effective model of W eff [σ] + Linear Regression. Works best for gapped fermions. H = ij t ij c iα c jα + i σ i c iα σz αβc iβ + H[σ] W eff [σ] = ij J ij σ i σ j. 2. Machine Learning models: Restricted Boltzmann Machine, Neural Networks, etc. 32/ 33
Self-Learning MC in action Zi-Hong Liu, Xiao-Yan Xu, Yang Qi, Kai Sun and Zi-Yang Meng, to appear. Fermion + boson model, new critical universality. Simulated at the critical point. SLMC: L = 20 30. superposition of spin up and spin down Ising site Ising spin 3 b coupling L=24, β=8 Fermi pockets Hot spots fermion 2 c a λ=2 k=q ky -t BKT phase Bare Fermi surface ln(χ -1 (q,0)- χ -1 (0,0)) fermion site L=30, β=8 a ln( q )+ln(c ) q q 1 0-1 λ=1 SDW Induced shallow band Clock phase kx PM phase a q =1.997 ± 0.028 ln(c )= -2.159x10 q -2-1 -0.5 0 ln( q ) 0.5-3 ± 0.0116 1 1.5 33/ 33