Guiding Monte Carlo Simulations with Machine Learning

Similar documents
Self-learning Monte Carlo Method

Revealing fermionic quantum criticality from new Monte Carlo techniques. Zi Yang Meng ( 孟子杨 )

Quantum Monte Carlo investigations of correlated electron systems, present and future. Zi Yang Meng ( 孟子杨 )

Monte Caro simulations

A Video from Google DeepMind.

Classical Monte Carlo Simulations

Accelerated Monte Carlo simulations with restricted Boltzmann machines

4. Cluster update algorithms

Markov Processes. Stochastic process. Markov process

A Monte Carlo Implementation of the Ising Model in Python

Critical Dynamics of Two-Replica Cluster Algorithms

Random Walks A&T and F&S 3.1.2

Monte Carlo Study of Planar Rotator Model with Weak Dzyaloshinsky Moriya Interaction

Machine Learning with Quantum-Inspired Tensor Networks

ICCP Project 2 - Advanced Monte Carlo Methods Choose one of the three options below

Quantum and classical annealing in spin glasses and quantum computing. Anders W Sandvik, Boston University

Numerical methods for lattice field theory

Monte Carlo and cold gases. Lode Pollet.

Markov Chain Monte Carlo The Metropolis-Hastings Algorithm

Optimized statistical ensembles for slowly equilibrating classical and quantum systems

Quantum Annealing in spin glasses and quantum computing Anders W Sandvik, Boston University

Machine learning quantum phases of matter

Quantum simulation with string-bond states: Joining PEPS and Monte Carlo

Computer Vision Group Prof. Daniel Cremers. 11. Sampling Methods: Markov Chain Monte Carlo

Parallel Tempering Algorithm in Monte Carlo Simulation

The underdoped cuprates as fractionalized Fermi liquids (FL*)

Nematic and Magnetic orders in Fe-based Superconductors

Wang-Landau sampling for Quantum Monte Carlo. Stefan Wessel Institut für Theoretische Physik III Universität Stuttgart

Pattern Recognition and Machine Learning. Bishop Chapter 11: Sampling Methods

A Review of Pseudo-Marginal Markov Chain Monte Carlo

Brief introduction to Markov Chain Monte Carlo

3.320 Lecture 18 (4/12/05)

General Construction of Irreversible Kernel in Markov Chain Monte Carlo

Quantum Monte Carlo Simulations in the Valence Bond Basis. Anders Sandvik, Boston University

Graphical Representations and Cluster Algorithms

Their Statistical Analvsis. With Web-Based Fortran Code. Berg

Global phase diagrams of two-dimensional quantum antiferromagnets. Subir Sachdev Harvard University

Markov chain Monte Carlo Lecture 9

Numerical Analysis of 2-D Ising Model. Ishita Agarwal Masters in Physics (University of Bonn) 17 th March 2011

Statistics and Quantum Computing

Introduction to Machine Learning

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo

Sampling Methods (11/30/04)

Bayesian Methods for Machine Learning

Copyright 2001 University of Cambridge. Not to be quoted or copied without permission.

Session 3A: Markov chain Monte Carlo (MCMC)

Diagrammatic Monte Carlo

Cluster Algorithms to Reduce Critical Slowing Down

Monte Carlo Methods in High Energy Physics I

Markov chain Monte Carlo methods for visual tracking

Renormalization Group: non perturbative aspects and applications in statistical and solid state physics.

Machine Learning with Tensor Networks

Metropolis/Variational Monte Carlo. Microscopic Theories of Nuclear Structure, Dynamics and Electroweak Currents June 12-30, 2017, ECT*, Trento, Italy

A Bayesian Approach to Phylogenetics

J ij S i S j B i S i (1)

Lecture 7 and 8: Markov Chain Monte Carlo

Quantum Monte Carlo Simulations in the Valence Bond Basis

Quantum disordering magnetic order in insulators, metals, and superconductors

Directional Ordering in the Classical Compass Model in Two and Three Dimensions

Unitary Fermi Gas: Quarky Methods

Ground State Projector QMC in the valence-bond basis

CS 781 Lecture 9 March 10, 2011 Topics: Local Search and Optimization Metropolis Algorithm Greedy Optimization Hopfield Networks Max Cut Problem Nash

PY 502, Computational Physics, Fall 2017 Monte Carlo simulations in classical statistical physics

Metropolis Monte Carlo simulation of the Ising Model

Improvement of Monte Carlo estimates with covariance-optimized finite-size scaling at fixed phenomenological coupling

Fig. 1 Cluster flip: before. The region inside the dotted line is flipped in one Wolff move. Let this configuration be A.

Quantum phase transitions of insulators, superconductors and metals in two dimensions

A = {(x, u) : 0 u f(x)},

Quantum Simulation Studies of Charge Patterns in Fermi-Bose Systems

Probabilistic Machine Learning

CPSC 540: Machine Learning

Quantum Monte Carlo. Matthias Troyer, ETH Zürich

Markov Chain Monte Carlo Simulations and Their Statistical Analysis An Overview

A quick introduction to Markov chains and Markov chain Monte Carlo (revised version)

Kyle Reing University of Southern California April 18, 2018

Physics 115/242 Monte Carlo simulations in Statistical Physics

an introduction to bayesian inference

Classical boson sampling algorithms and the outlook for experimental boson sampling

GPU-based computation of the Monte Carlo simulation of classical spin systems

Probabilistic Graphical Models

Monte Carlo Simulation of the Ising Model. Abstract

Temperature-dependence of magnetism of free Fe clusters

On Markov Chain Monte Carlo

On the Optimal Scaling of the Modified Metropolis-Hastings algorithm

3.320: Lecture 19 (4/14/05) Free Energies and physical Coarse-graining. ,T) + < σ > dµ

Monte Carlo Simulation of the 2D Ising model

Spin liquids on the triangular lattice

arxiv: v2 [cond-mat.stat-mech] 13 Oct 2010

Quantum Metropolis Sampling. K. Temme, K. Vollbrecht, T. Osborne, D. Poulin, F. Verstraete arxiv:

Monte Carlo. Lecture 15 4/9/18. Harvard SEAS AP 275 Atomistic Modeling of Materials Boris Kozinsky

Quantum phase transitions of insulators, superconductors and metals in two dimensions

Comparison of Modern Stochastic Optimization Algorithms

L échantillonnage parfait: un nouveau paradigme des calculs de Monte Carlo

Quantum criticality in the cuprate superconductors. Talk online: sachdev.physics.harvard.edu

State-Space Methods for Inferring Spike Trains from Calcium Imaging

Sign-problem-free Quantum Monte Carlo of the onset of antiferromagnetism in metals

STA 4273H: Statistical Machine Learning

Markov Chain Monte Carlo (MCMC)

Balancing and Control of a Freely-Swinging Pendulum Using a Model-Free Reinforcement Learning Algorithm

Irreversible Markov-Chain Monte Carlo methods

Transcription:

Guiding Monte Carlo Simulations with Machine Learning Yang Qi Department of Physics, Massachusetts Institute of Technology Joining Fudan University in 2017 KITS, June 29 2017. 1/ 33

References Works led by Liang Fu, Zi-Yang Meng, and Lei Wang. Jun-Wei Liu, Xiao-Yan Xu, Hui-Tao Shen, Yuri Nagai, Li Huang, Yi-Feng Yang etc. References: 1. J Liu, YQ, Z-Y Meng and L Fu, PRB 95, 041101(R) (2017). 2. J Liu, H Shen, YQ, Z-Y Meng and L Fu, PRB 95, 241104(R) (2017). 3. X-Y Xu, YQ, J Liu, L Fu and Z-Y Meng, arxiv:1612.03804 [cond-mat.str-el]. 4. Y Nagai, H Shen, YQ and L Fu, arxiv:1705.06724 [cond-mat.str-el]. 5. Li Huang, Lei Wang, PRB 95, 035105(R) (2017). 6. Li Huang, Yi-Feng Yang, Lei Wang, PRE 95, 031301(R)(2017). 7. Lei Wang, arxiv:1702.08586 [cond-mat.str-el]. 2/ 33

Outline 1 Introduction to Markov Chain Monte Carlo 2 The Importance of Update 3 Example: Ising Model and Cluster Update 4 Example: Bosonic Model and Cumulative Update 3/ 33

Outline 1 Introduction to Markov Chain Monte Carlo 2 The Importance of Update 3 Example: Ising Model and Cluster Update 4 Example: Bosonic Model and Cumulative Update 4/ 33

Monte Carlo simulation: an unbiased method A widely used numerical method in statistical physics and quantum many-body physics. Unbiased: reliable statistical error bar. Fast: error decreases as 1/ N. Universal: applies to any model without the sign program. 5/ 33

Introduction to MCMC Consider a statistical mechanics model: Z = C e βh[c] = C W (C). The Markov-chain Monte Carlo (MCMC) is a way to do importance sampling. A Markov chain is constructed, C i 1 C i C i+1 Markov chain: p(c i C j ) only depends on C i (no memory). Goal: distribution of C converges to the Boltzmann distribution W (C). Any observable can be measured from a Markov chain, O(C)W (C) O = 1 O(C i ). W (C) N i 6/ 33

Detailed balance Detailed balance: p(c D) p(d C) = W (D) W (C). Detailed balance (and ergodicity) guarantees that IF the MC converges, it converges to the desired distribution W (C). Metropolis-Hastings algorithm: propose accept/reject. p(c D) = q(c D)α(C D). { α(c D) = min 1, W (D) } q(d C). W (C) q(c D) 7/ 33

Autocorrelation time Autocorrelation time measures the efficiency of the update algorithm. time sequence: O(t 1) O(t) O(t + 1), O(t) = O[C(t)]. Autocorrelation function A O ( t) = O(t)O(t + t) O(t) 2 e t/τ. Complexity τ O = 1 O(C i ). N Statistical error δo 1 N i only if O(i) are M(t)M(s + t) M 2 0.035 0.030 0.025 0.020 0.015 0.010 0.005 Local Self-learning Naive-Wolff 10-2 10-3 0 100 200 300 400 500 600 700 800 independent. 0.000 0 100 200 300 400 500 600 700 800 t 8/ 33

Outline 1 Introduction to Markov Chain Monte Carlo 2 The Importance of Update 3 Example: Ising Model and Cluster Update 4 Example: Bosonic Model and Cumulative Update 9/ 33

Metropolis algorithm Local update: randomly select a site and flip the spin. q(c D) = q(d C) = 1 N { }. α(c D) = min. 1, W (D) W (C) N trials are counted as one MC step. Very general: applies to any model. N Metropolis, A W Rosenbluth, M N Rosenbluth, A H Teller, and E Teller, J Chem Phys 21, 1087 (1953). 10/ 33

Metropolis algorithm Local update: randomly select a site and flip the spin. q(c D) = q(d C) = 1 N { }. α(c D) = min. 1, W (D) W (C) N trials are counted as one MC step. Very general: applies to any model. N Metropolis, A W Rosenbluth, M N Rosenbluth, A H Teller, and E Teller, J Chem Phys 21, 1087 (1953). 10/ 33

Critical slowing down The real dynamical relaxation time diverges at the critical point: a critical system is slow to equilibrate. The local update mimics the real relaxation process: also exhibits the critical slowing down phenomena. τ L z, z = 2.125 for the 2D Ising model. 2500 10 4 2000 1500 1000 500 τ L z 10 3 10 2 0 0 10 20 30 40 50 60 70 80 90 10 1 10 1 10 2 There s a way around this: MCMC simulation does not have to mimic the real dynamics... 11/ 33

Wolff algorithm: a cluster update A cluster is built one bond at a time. Probability of activating a bond is cleverly designed, such that q(c D) q(d C) = W (D) W (C). Thus we have the ideal acceptance ratio α = 1. Very efficient update: τ L 0.35. Only works with two-body interactions: H = ij J ijs i s j. U Wolff, PRL. 62, 361 (1989). 12/ 33

Wolff algorithm: a cluster update A cluster is built one bond at a time. Probability of activating a bond is cleverly designed, such that q(c D) q(d C) = W (D) W (C). Thus we have the ideal acceptance ratio α = 1. Very efficient update: τ L 0.35. Only works with two-body interactions: H = ij J ijs i s j. U Wolff, PRL. 62, 361 (1989). 12/ 33

Wolff algorithm: a cluster update A cluster is built one bond at a time. Probability of activating a bond is cleverly designed, such that q(c D) q(d C) = W (D) W (C). Thus we have the ideal acceptance ratio α = 1. Very efficient update: τ L 0.35. Only works with two-body interactions: H = ij J ijs i s j. U Wolff, PRL. 62, 361 (1989). 12/ 33

Wolff algorithm: a cluster update A cluster is built one bond at a time. Probability of activating a bond is cleverly designed, such that q(c D) q(d C) = W (D) W (C). Thus we have the ideal acceptance ratio α = 1. Very efficient update: τ L 0.35. Only works with two-body interactions: H = ij J ijs i s j. U Wolff, PRL. 62, 361 (1989). 12/ 33

Wolff algorithm: a cluster update A cluster is built one bond at a time. Probability of activating a bond is cleverly designed, such that q(c D) q(d C) = W (D) W (C). Thus we have the ideal acceptance ratio α = 1. Very efficient update: τ L 0.35. Only works with two-body interactions: H = ij J ijs i s j. U Wolff, PRL. 62, 361 (1989). 12/ 33

Wolff algorithm: a cluster update A cluster is built one bond at a time. Probability of activating a bond is cleverly designed, such that q(c D) q(d C) = W (D) W (C). Thus we have the ideal acceptance ratio α = 1. Very efficient update: τ L 0.35. Only works with two-body interactions: H = ij J ijs i s j. U Wolff, PRL. 62, 361 (1989). 12/ 33

Wolff algorithm: a cluster update A cluster is built one bond at a time. Probability of activating a bond is cleverly designed, such that q(c D) q(d C) = W (D) W (C). Thus we have the ideal acceptance ratio α = 1. Very efficient update: τ L 0.35. Only works with two-body interactions: H = ij J ijs i s j. U Wolff, PRL. 62, 361 (1989). 12/ 33

Wolff algorithm: a cluster update A cluster is built one bond at a time. Probability of activating a bond is cleverly designed, such that q(c D) q(d C) = W (D) W (C). Thus we have the ideal acceptance ratio α = 1. Very efficient update: τ L 0.35. Only works with two-body interactions: H = ij J ijs i s j. U Wolff, PRL. 62, 361 (1989). 12/ 33

Challenge Local update is too slow for many models: critical slowing down, glassy behavior, separation of energy scales, etc. Global update is only available for certain models. Like Wolff algorithm for two-body interactions. A good update algorithm for generic models? 13/ 33

A good update samples low-energy states A dilemma of local updates: Step is too small: high acceptance, small difference. Step is too big: low acceptance, big difference. Global updates: explore the low-energy configurations. 14/ 33

A good update samples low-energy states A dilemma of local updates: Step is too small: high acceptance, small difference. Step is too big: low acceptance, big difference. Global updates: explore the low-energy configurations. 14/ 33

A good update samples low-energy states A dilemma of local updates: Step is too small: high acceptance, small difference. Step is too big: low acceptance, big difference. Global updates: explore the low-energy configurations. 14/ 33

Modeling the update process A stochastic function f : C D. Parameterized: f = f (p 1, p 2,..., p n ). Analytically known q(c f D). Optimize p i, such that Choose the acceptance ratio α(c D) = min The approximation q(c D) f W (D) q(d C) f W (C) long as we know q(c f D) accurately. q(c f D) q(d f C) W (D) W (C). { } 1, W (D) q(d f C) 1. W (C) q(c f D) does not introduce error in the MC simulation, as 15/ 33

Design (i) Trial simulation by local update (ii) Learning Machine Learning Ηeff (iii) (iv) Simulating Ηeff Η Propose trial Conf. Detailed balance 16/ 33

Outline 1 Introduction to Markov Chain Monte Carlo 2 The Importance of Update 3 Example: Ising Model and Cluster Update 4 Example: Bosonic Model and Cumulative Update 17/ 33

Example: The model: The effective model: H = J 1 s i s j J 2 ij H eff = J eff 1 ijkl s i s j. ij s i s j s k s l. We consider J 1 = 1, J 2 = 0.2. An Ising transition at T c = 2.493. Ising model without J 2 has T c = 2.269. 18/ 33

The self-learning update The update: cluster is constructed using the effective model. q(c D) q(d C) = W eff(d) W eff (C). The acceptance ratio: { α(c D) = min 1, W (D) W (C) W eff (C) W eff (D) }. The algorithm still satisfies the detailed balance exactly, although the effective model is approximate. 19/ 33

Learning the effective model 1. Generate a sample using the local update. 2. Perform a linear regression, H eff = J eff 1 S i S j + E 0. ij 20/ 33

Unsupervised machine learning The linear regression we just did can be viewed as an unsupervised machine learning. Unsupervised ML: learning the underlying distribution from a sample of configurations. We pretend that we didn t know H, and learn a simpler model H eff. The learned model describes the low-energy effective theory, where the high-energy fluctuations are treated as noise in the sample. Can incorporate more advanced ML models and algorithms. 21/ 33

Learning the effective model 1. Generate a sample using the local update, at T = 5 > T c. 2. Perform a linear regression, H eff = J eff 1 S i S j + E 0. 3. J eff 1 = 1.0726. 4. Generate another sample using the self-learning update, at T = T c. 5. J eff 1 = 1.1064. ij 22/ 33

Autocorrelation time M(t)M(s + t) M 2 0.035 0.030 0.025 0.020 0.015 0.010 0.005 Local Self-learning Naive-Wolff 10-2 10-3 0 100200300400 500600700800 0.000 0 100 200 300 400 500 600 700 800 t System size 40 40. 23/ 33

SLMC works well at moderate sizes 2000 Local Self-Learned τ 1500 10 3 10 2 10 1 1000 10 0 0 20 40 60 80 500 0 0 10 20 30 40 50 60 70 80 L The results look good at moderate sizes L = 20 80, but the scaling behavior is exponential... Eventually, the self-learning update becomes slower than the local update. 24/ 33

Limiting the cluster size H = H H eff scales with the perimeter of the cluster L. τ e L/L0 because α e β H e L/L0. A simple effective model is not accurate: need more accurate model, with a hierarchy of energy scales. Quick fix: limit the cluster size during cluster growing. Still satisfies the detailed balance, if α is corrected correspondingly. A block update: same scaling as the local update, but with a smaller prefactor. 25/ 33

Restricted-cluster-size update τ 8000 7000 6000 5000 4000 3000 2000 Local Self-Learned Restricted τ L L 2.2 2000 1500 1000 500 0 0 20 40 60 80 τ R L 2.1 1000 0 0 50 100 150 200 250 300 350 L We restrict the cluster size to r = 40. 26/ 33

Summary τ 8000 7000 6000 5000 4000 3000 2000 Local Self-learning Restricted τ L L 2.2 2000 1500 1000 500 0 0 20 40 60 80 τ R L 2.1 1000 0 0 50 100 150 200 250 300 350 L 27/ 33

Outline 1 Introduction to Markov Chain Monte Carlo 2 The Importance of Update 3 Example: Ising Model and Cluster Update 4 Example: Bosonic Model and Cumulative Update 28/ 33

Determinant Monte Carlo: computational cost Fermion coupling to (auxiliary) bosonic field. H = ij t ij c i c j + i σ i c i c i + H[σ]. Simulation: use bosonic field as configurations. C = {σ i (t)}, i = 1,... N; 0 < t < β. Weight of each configuration W (C): integrate out the fermions. Computational cost: βn 3. Fast update: βn 3 for βn steps of local updates. Generating two independent configurations: βn 3 τ L. 29/ 33

Cumulative update Idea: replace W (C) by W (C), which is much faster to evaluate. Local updates guided by W (C): C D : C C 1 C i 1 C i C i+1 C M D. Detailed balance: p(c i C i+1 ) p(c i+1 C i ) = W (C i+1) W (C i ) For each path: p(c C 1 ) p(c i C i+1 )p(c M D) p(c 1 C) p(c i+1 C i )p(d C M ) = W (C 1 ) W (C 2 ) W (C) W (C 1 ) W (C M ) W (D) W (C M 1 ) W (C M ) = W (D) W (C). Summing over all paths: q(c D) p(c D) = q(d C) p(d C) = C i p(c C 1 ) p(c i C i+1 )p(c M D) C i p(c 1 C) p(c i+1 C i )p(d C M ) = W (D) W (C). 30/ 33

Cumulative update Approximated weight: Update scheme: Acceptance ratio: Choose good W and M τ L : τ CU 1. Computational cost: βn 3 τ βn 3. W (C) W (C). q(c D) q(d C) = W (D) W (C) W (D) W (C). { α(c D) = min 1, W (D) W } (C) W (C) W 1. (D) Again, W (C) W (C) does not introduce approximation. 31/ 33

Constructing effective weights 1. Physical intuition: bosonic effective model of W eff [σ] + Linear Regression. Works best for gapped fermions. H = ij t ij c iα c jα + i σ i c iα σz αβc iβ + H[σ] W eff [σ] = ij J ij σ i σ j. 2. Machine Learning models: Restricted Boltzmann Machine, Neural Networks, etc. 32/ 33

Self-Learning MC in action Zi-Hong Liu, Xiao-Yan Xu, Yang Qi, Kai Sun and Zi-Yang Meng, to appear. Fermion + boson model, new critical universality. Simulated at the critical point. SLMC: L = 20 30. superposition of spin up and spin down Ising site Ising spin 3 b coupling L=24, β=8 Fermi pockets Hot spots fermion 2 c a λ=2 k=q ky -t BKT phase Bare Fermi surface ln(χ -1 (q,0)- χ -1 (0,0)) fermion site L=30, β=8 a ln( q )+ln(c ) q q 1 0-1 λ=1 SDW Induced shallow band Clock phase kx PM phase a q =1.997 ± 0.028 ln(c )= -2.159x10 q -2-1 -0.5 0 ln( q ) 0.5-3 ± 0.0116 1 1.5 33/ 33