Guiding Monte Carlo Simulations with Machine Learning

Guiding Monte Carlo Simulations with Machine Learning Yang Qi Department of Physics, Massachusetts Institute of Technology Joining Fudan University in 2017 KITS, June 29 2017. 1/ 33

References Works led by Liang Fu, Zi-Yang Meng, and Lei Wang. Jun-Wei Liu, Xiao-Yan Xu, Hui-Tao Shen, Yuri Nagai, Li Huang, Yi-Feng Yang etc. References: 1. J Liu, YQ, Z-Y Meng and L Fu, PRB 95, 041101(R) (2017). 2. J Liu, H Shen, YQ, Z-Y Meng and L Fu, PRB 95, 241104(R) (2017). 3. X-Y Xu, YQ, J Liu, L Fu and Z-Y Meng, arxiv:1612.03804 [cond-mat.str-el]. 4. Y Nagai, H Shen, YQ and L Fu, arxiv:1705.06724 [cond-mat.str-el]. 5. Li Huang, Lei Wang, PRB 95, 035105(R) (2017). 6. Li Huang, Yi-Feng Yang, Lei Wang, PRE 95, 031301(R)(2017). 7. Lei Wang, arxiv:1702.08586 [cond-mat.str-el]. 2/ 33

Outline 1 Introduction to Markov Chain Monte Carlo 2 The Importance of Update 3 Example: Ising Model and Cluster Update 4 Example: Bosonic Model and Cumulative Update 3/ 33

Outline 1 Introduction to Markov Chain Monte Carlo 2 The Importance of Update 3 Example: Ising Model and Cluster Update 4 Example: Bosonic Model and Cumulative Update 4/ 33

Monte Carlo simulation: an unbiased method A widely used numerical method in statistical physics and quantum many-body physics. Unbiased: reliable statistical error bar. Fast: error decreases as 1/ N. Universal: applies to any model without the sign program. 5/ 33

Introduction to MCMC Consider a statistical mechanics model: Z = C e βh[c] = C W (C). The Markov-chain Monte Carlo (MCMC) is a way to do importance sampling. A Markov chain is constructed, C i 1 C i C i+1 Markov chain: p(c i C j ) only depends on C i (no memory). Goal: distribution of C converges to the Boltzmann distribution W (C). Any observable can be measured from a Markov chain, O(C)W (C) O = 1 O(C i ). W (C) N i 6/ 33

Detailed balance Detailed balance: p(c D) p(d C) = W (D) W (C). Detailed balance (and ergodicity) guarantees that IF the MC converges, it converges to the desired distribution W (C). Metropolis-Hastings algorithm: propose accept/reject. p(c D) = q(c D)α(C D). { α(c D) = min 1, W (D) } q(d C). W (C) q(c D) 7/ 33

Autocorrelation time Autocorrelation time measures the efficiency of the update algorithm. time sequence: O(t 1) O(t) O(t + 1), O(t) = O[C(t)]. Autocorrelation function A O ( t) = O(t)O(t + t) O(t) 2 e t/τ. Complexity τ O = 1 O(C i ). N Statistical error δo 1 N i only if O(i) are M(t)M(s + t) M 2 0.035 0.030 0.025 0.020 0.015 0.010 0.005 Local Self-learning Naive-Wolff 10-2 10-3 0 100 200 300 400 500 600 700 800 independent. 0.000 0 100 200 300 400 500 600 700 800 t 8/ 33

Outline 1 Introduction to Markov Chain Monte Carlo 2 The Importance of Update 3 Example: Ising Model and Cluster Update 4 Example: Bosonic Model and Cumulative Update 9/ 33

Metropolis algorithm Local update: randomly select a site and flip the spin. q(c D) = q(d C) = 1 N { }. α(c D) = min. 1, W (D) W (C) N trials are counted as one MC step. Very general: applies to any model. N Metropolis, A W Rosenbluth, M N Rosenbluth, A H Teller, and E Teller, J Chem Phys 21, 1087 (1953). 10/ 33

Critical slowing down The real dynamical relaxation time diverges at the critical point: a critical system is slow to equilibrate. The local update mimics the real relaxation process: also exhibits the critical slowing down phenomena. τ L z, z = 2.125 for the 2D Ising model. 2500 10 4 2000 1500 1000 500 τ L z 10 3 10 2 0 0 10 20 30 40 50 60 70 80 90 10 1 10 1 10 2 There s a way around this: MCMC simulation does not have to mimic the real dynamics... 11/ 33

Wolff algorithm: a cluster update A cluster is built one bond at a time. Probability of activating a bond is cleverly designed, such that q(c D) q(d C) = W (D) W (C). Thus we have the ideal acceptance ratio α = 1. Very efficient update: τ L 0.35. Only works with two-body interactions: H = ij J ijs i s j. U Wolff, PRL. 62, 361 (1989). 12/ 33

Challenge Local update is too slow for many models: critical slowing down, glassy behavior, separation of energy scales, etc. Global update is only available for certain models. Like Wolff algorithm for two-body interactions. A good update algorithm for generic models? 13/ 33

A good update samples low-energy states A dilemma of local updates: Step is too small: high acceptance, small difference. Step is too big: low acceptance, big difference. Global updates: explore the low-energy configurations. 14/ 33

Modeling the update process A stochastic function f : C D. Parameterized: f = f (p 1, p 2,..., p n ). Analytically known q(c f D). Optimize p i, such that Choose the acceptance ratio α(c D) = min The approximation q(c D) f W (D) q(d C) f W (C) long as we know q(c f D) accurately. q(c f D) q(d f C) W (D) W (C). { } 1, W (D) q(d f C) 1. W (C) q(c f D) does not introduce error in the MC simulation, as 15/ 33

Design (i) Trial simulation by local update (ii) Learning Machine Learning Ηeff (iii) (iv) Simulating Ηeff Η Propose trial Conf. Detailed balance 16/ 33

Outline 1 Introduction to Markov Chain Monte Carlo 2 The Importance of Update 3 Example: Ising Model and Cluster Update 4 Example: Bosonic Model and Cumulative Update 17/ 33

Example: The model: The effective model: H = J 1 s i s j J 2 ij H eff = J eff 1 ijkl s i s j. ij s i s j s k s l. We consider J 1 = 1, J 2 = 0.2. An Ising transition at T c = 2.493. Ising model without J 2 has T c = 2.269. 18/ 33

The self-learning update The update: cluster is constructed using the effective model. q(c D) q(d C) = W eff(d) W eff (C). The acceptance ratio: { α(c D) = min 1, W (D) W (C) W eff (C) W eff (D) }. The algorithm still satisfies the detailed balance exactly, although the effective model is approximate. 19/ 33

Learning the effective model 1. Generate a sample using the local update. 2. Perform a linear regression, H eff = J eff 1 S i S j + E 0. ij 20/ 33

Unsupervised machine learning The linear regression we just did can be viewed as an unsupervised machine learning. Unsupervised ML: learning the underlying distribution from a sample of configurations. We pretend that we didn t know H, and learn a simpler model H eff. The learned model describes the low-energy effective theory, where the high-energy fluctuations are treated as noise in the sample. Can incorporate more advanced ML models and algorithms. 21/ 33

Learning the effective model 1. Generate a sample using the local update, at T = 5 > T c. 2. Perform a linear regression, H eff = J eff 1 S i S j + E 0. 3. J eff 1 = 1.0726. 4. Generate another sample using the self-learning update, at T = T c. 5. J eff 1 = 1.1064. ij 22/ 33

Autocorrelation time M(t)M(s + t) M 2 0.035 0.030 0.025 0.020 0.015 0.010 0.005 Local Self-learning Naive-Wolff 10-2 10-3 0 100200300400 500600700800 0.000 0 100 200 300 400 500 600 700 800 t System size 40 40. 23/ 33

SLMC works well at moderate sizes 2000 Local Self-Learned τ 1500 10 3 10 2 10 1 1000 10 0 0 20 40 60 80 500 0 0 10 20 30 40 50 60 70 80 L The results look good at moderate sizes L = 20 80, but the scaling behavior is exponential... Eventually, the self-learning update becomes slower than the local update. 24/ 33

Limiting the cluster size H = H H eff scales with the perimeter of the cluster L. τ e L/L0 because α e β H e L/L0. A simple effective model is not accurate: need more accurate model, with a hierarchy of energy scales. Quick fix: limit the cluster size during cluster growing. Still satisfies the detailed balance, if α is corrected correspondingly. A block update: same scaling as the local update, but with a smaller prefactor. 25/ 33

Restricted-cluster-size update τ 8000 7000 6000 5000 4000 3000 2000 Local Self-Learned Restricted τ L L 2.2 2000 1500 1000 500 0 0 20 40 60 80 τ R L 2.1 1000 0 0 50 100 150 200 250 300 350 L We restrict the cluster size to r = 40. 26/ 33

Summary τ 8000 7000 6000 5000 4000 3000 2000 Local Self-learning Restricted τ L L 2.2 2000 1500 1000 500 0 0 20 40 60 80 τ R L 2.1 1000 0 0 50 100 150 200 250 300 350 L 27/ 33

Outline 1 Introduction to Markov Chain Monte Carlo 2 The Importance of Update 3 Example: Ising Model and Cluster Update 4 Example: Bosonic Model and Cumulative Update 28/ 33

Determinant Monte Carlo: computational cost Fermion coupling to (auxiliary) bosonic field. H = ij t ij c i c j + i σ i c i c i + H[σ]. Simulation: use bosonic field as configurations. C = {σ i (t)}, i = 1,... N; 0 < t < β. Weight of each configuration W (C): integrate out the fermions. Computational cost: βn 3. Fast update: βn 3 for βn steps of local updates. Generating two independent configurations: βn 3 τ L. 29/ 33

Cumulative update Idea: replace W (C) by W (C), which is much faster to evaluate. Local updates guided by W (C): C D : C C 1 C i 1 C i C i+1 C M D. Detailed balance: p(c i C i+1 ) p(c i+1 C i ) = W (C i+1) W (C i ) For each path: p(c C 1 ) p(c i C i+1 )p(c M D) p(c 1 C) p(c i+1 C i )p(d C M ) = W (C 1 ) W (C 2 ) W (C) W (C 1 ) W (C M ) W (D) W (C M 1 ) W (C M ) = W (D) W (C). Summing over all paths: q(c D) p(c D) = q(d C) p(d C) = C i p(c C 1 ) p(c i C i+1 )p(c M D) C i p(c 1 C) p(c i+1 C i )p(d C M ) = W (D) W (C). 30/ 33

Cumulative update Approximated weight: Update scheme: Acceptance ratio: Choose good W and M τ L : τ CU 1. Computational cost: βn 3 τ βn 3. W (C) W (C). q(c D) q(d C) = W (D) W (C) W (D) W (C). { α(c D) = min 1, W (D) W } (C) W (C) W 1. (D) Again, W (C) W (C) does not introduce approximation. 31/ 33

Constructing effective weights 1. Physical intuition: bosonic effective model of W eff [σ] + Linear Regression. Works best for gapped fermions. H = ij t ij c iα c jα + i σ i c iα σz αβc iβ + H[σ] W eff [σ] = ij J ij σ i σ j. 2. Machine Learning models: Restricted Boltzmann Machine, Neural Networks, etc. 32/ 33

Self-Learning MC in action Zi-Hong Liu, Xiao-Yan Xu, Yang Qi, Kai Sun and Zi-Yang Meng, to appear. Fermion + boson model, new critical universality. Simulated at the critical point. SLMC: L = 20 30. superposition of spin up and spin down Ising site Ising spin 3 b coupling L=24, β=8 Fermi pockets Hot spots fermion 2 c a λ=2 k=q ky -t BKT phase Bare Fermi surface ln(χ -1 (q,0)- χ -1 (0,0)) fermion site L=30, β=8 a ln( q )+ln(c ) q q 1 0-1 λ=1 SDW Induced shallow band Clock phase kx PM phase a q =1.997 ± 0.028 ln(c )= -2.159x10 q -2-1 -0.5 0 ln( q ) 0.5-3 ± 0.0116 1 1.5 33/ 33