Domain Wall Fermion Simulations with the Exact One-Flavor Algorithm

Domain Wall Fermion Simulations with the Exact One-Flavor Algorithm David Murphy Columbia University The 34th International Symposium on Lattice Field Theory (Southampton, UK) July 27th, 2016 D. Murphy (Lattice 2016) EOFA Domain Wall Fermion Simulations July 27th, 2016 1 / 17

Motivation Recent advances in algorithms used by RBC/UKQCD have driven down the cost of simulating degenerate light quark pairs: 1 Extensive tuning of forces via Hasenbusch mass splitting [Phys. Lett. B519 177-182] 2 Shamir DWF Möbius DWF zmöbius DWF [Izubuchi et al., Lattice 2015] 3 Reduced L s approximation during molecular dynamics [proposed by Brower et al., arxiv:1206.5214; implemented by G. McGlynn, Lattice 2015] 4 Mixed precision CG: single precision inner solves, double precision defect correction RHMC strange and charm quark determinants are now the most expensive part of our evolution strategy Gauge 5800 s 11.6% Light quarks (L s = 14) 18000 s 36.1% L s = 14/L s = 32 correction det. 1600 s 3.3% Strange and charm quarks (L s = 32) 24200 s 48.5% DED 170 s 0.3% Total 50000 s Table: Timings for one HMC trajectory of 80 2 96 192 N f = 2 + 1 + 1 physical mass a 1 3 GeV ensemble on a 12,288-node BG/Q partition [G. McGlynn, Ph.D. thesis] D. Murphy (Lattice 2016) EOFA Domain Wall Fermion Simulations July 27th, 2016 2 / 17

Motivation (cont d.) These techniques are of limited use for RHMC: Unclear how to restart inner single-precision solver in the context of multishift CG Cost of multishift D D inversions at light quark masses largely negates potential gain from Hasenbusch splitting For I = 0 K ππ calculations with G-parity boundary conditions D D describes 4 flavors, and RHMC is needed for light quarks as well [C. Kelly, Thur. @ 15:50] Goal: Explore the exact one-flavor algorithm as an alternative to RHMC for simulating single quark flavors or degenerate G-parity quark pairs D. Murphy (Lattice 2016) EOFA Domain Wall Fermion Simulations July 27th, 2016 3 / 17

The Exact One-Flavor Algorithm Introduced by TWQCD for efficient one-flavor simulations on GPU clusters History: 2009: Introduction [TWQCD, arxiv:0911.5532] 2014: Detailed derivation [Chen and Chiu, Phys. Lett. B738 55-60] 2014: Demonstration of 20% overall speed-up in 16 3 32 16 N f = 2 + 1 simulation with EOFA vs. RHMC for heavy quark [Chen and Chiu, arxiv:1412.0819] Main idea: Use block manipulations in spin space to write [ ] D(m1) 1 det = D(m 2) det (M 1 L) det (M R) with M L and M R Hermitian and positive-definite Contrast with RHMC, where we instead compute [ ] { [ D(m1) D D(m 1) det = det D(m 2) D D(m 2) using a rational approximation to the square root ]} 1/2 Exact in the sense that one does not need to take a fractional power of the fermion determinant to remove unwanted flavors D. Murphy (Lattice 2016) EOFA Domain Wall Fermion Simulations July 27th, 2016 4 / 17

Derivation Sketch I [Chen and Chiu, Phys. Lett. B738 55-60] We begin by factoring the DWF Dirac operator: D w: Wilson Dirac operator c,d: Möbius parameters L ss : 5D hopping term (D DWF ) xx,ss = ( (c + d) D w + 1 ) xx δ ss + ( (c d) D w 1 ) xx L ss { ( = (D w) xx δ ss + δ ) } { xx d + c (1 + L) (1 L) 1 1 ss }{{ } d (1 L) + c (1 } + L) D EOFA (D w) xx δ ss + δ xx (P +M + + P M ) ss M ± does not involve the gauge field and can be worked out explicitly Shamir kernel: (c, d) = (1/2, 1/2) D DWF and D EOFA are the same operator Möbius kernel: (c, d) = (α/2, 1/2) D EOFA is dense but has a somewhat better condition number than D DWF H γ 5R 5D EOFA is Hermitian, but not positive-definite D. Murphy (Lattice 2016) EOFA Domain Wall Fermion Simulations July 27th, 2016 5 / 17

Derivation Sketch II [Chen and Chiu, Phys. Lett. B738 55-60] Consider a slight generalization of D EOFA, in the chiral representation of γ µ : ( ) W M 5 + M +(m 1) (σ t) D(m 1, m 2) = (σ t) W M 5 + M (m 2) Applying the Schur det. identity to D EOFA (m) = D(m, m), and taking a ratio: det (D EOFA (m 1)) det (W M5 + M+(m1)) det (H (m1)) = Schur complements det (D EOFA (m 2)) det (W M 5 + M (m 2)) det (H +(m 2)) Defining ± R 5 (M ±(m 2) M ±(m 1)) = kω ±Ω ± ( m2 m1), also have: det ( D(m1, m ) 2) det ( det (W M5 + M+(m1)) det (H+(m2) +) ) = 1 = = D(m1, m 2) det (W M 5 + M (m 2)) det (H (m 1) + ) Finally, using Sylvester s determinant identity (det(1 + AB) = det(1 + BA)): [ ( )] 1 [ ( det (D EOFA(m 1)) det (D = det 1 + kω 1 EOFA(m 2)) H (m 1) Ω det 1 + kω 1 + Ω + H +(m 2) +P + }{{}}{{} M L Hermitian and pos.-def.! M R )] 1 D. Murphy (Lattice 2016) EOFA Domain Wall Fermion Simulations July 27th, 2016 6 / 17

Derivation Sketch III [Chen and Chiu, Phys. Lett. B738 55-60] To simulate, we express the determinant as a path integral: 1 det (M 1 L) det (M = Dφ LDφ L R) DφRDφ R e φ L M Lφ L φ R M Rφ R φ L and φ R are bosonic pseudofermion fields with two spin components M L and M R contain nested matrix inverses expensive to compute with CG Easier to work with block form: M EOFA = 1 kp Ω 1 H(m 1 ) Ω P + kp + Ω 1 + Ω + P + H(m 2 ) + P + for which we have, in terms of an ordinary four-component spinor: det (D EOFA (m 1)) det (D EOFA (m = DφDφ e φ M EOFA φ 2)) EOFA evolution requires inversions of the general form ( H(m) + α ±P ± ) ψ = φ D. Murphy (Lattice 2016) EOFA Domain Wall Fermion Simulations July 27th, 2016 7 / 17

The Hybrid Monte Carlo Algorithm We perform dynamical QCD simulations using the HMC algorithm Idea: Generate a Markov chain of gauge field configurations by evolving a Hamiltonian system in (unphysical) molecular dynamics time Generalized coordinate: Gauge field (U x,µ) Conjugate momentum: π x,µ su(3) Hamiltonian: H = 1 2 π2 + S g[u] + S f [U] Equations of motion: { τ U x,µ = π x,µu x,µ ( ) τ π x,µ = T a x,µ a Sg[U] + S f [U] Metropolis accept/reject step corrects for finite precision integration Accept new gauge field U x,µ with probability Paccept = min ( 1, e H) Implementing this for EOFA requires: 1 Action S f [U] 2 Pseudofermion heatbath 3 Pseudofermion contribution to momentum update π x,µ π x,µ T a ( a x,µs f [U] ) τ Tests performed on two N f = 2 + 1 Shamir DWF ensembles: 16I: 16 3 32 16, β = 2.13, (am l, am h ) = (0.01, 0.032) [Phys. Rev. D76, 014504] 32I: 32 3 64 16, β = 2.25, (am l, am h ) = (0.004, 0.03) [Phys. Rev. D93, 054502] D. Murphy (Lattice 2016) EOFA Domain Wall Fermion Simulations July 27th, 2016 8 / 17

I. Action: Equivalence of EOFA and RHMC for Möbius DWF Relation between RHMC determinant ratios and EOFA determinant ratios: { [ ]} D 1/2 ( D DWF (m 1) (c + d) L s ) + m 1 (c d) Ls 12L 3 T [ ] DEOFA (m 1) det = det D D DWF (m 2) (c + d) Ls + m 2 (c d) Ls D EOFA (m 2) ] [ D(mq) D(mq+ mq) det 0.15 0.14 0.13 0.12 0.11 0.10 0.09 Exact RHMC EOFA 0.08 0.5 0.6 0.7 0.8 0.9 1.0 m q log(rw) 72 71 70 69 68 67 66 RHMC EOFA 65 0 2 4 6 8 10 N rw (a) Free 4 5 lattice, m q = 0.01, α = 4.0 (Möbius scale), 20 stochastic hits. (b) 16I. Reweight am h = 0.032 0.042 in N rw steps with 10 hits per step. Observe significantly smaller systematic and statistical errors for EOFA Cheaper to perform quark mass reweighting with EOFA! D. Murphy (Lattice 2016) EOFA Domain Wall Fermion Simulations July 27th, 2016 9 / 17

II. Heatbath At start of HMC trajectory: draw η with P (η) e η2 /2, compute φ = M 1/2 EOFA η Still requires rational approximation x 1/2 α 0 + N p α l=1 l/(β l + x) Defining γ l = (1 + β l ) 1, can show Np ] M 1/2 EOFA α01 + α lγ l [1 + kγ lp Ω 1 Ω P kγ lp +Ω 1 + Ω +P + H(m 1) γ l P H(m 2) γ lβ l +P + l=1 ±P ± has large number of zero modes not invertible! (no multishift) λ(meofa) 10 3 λ max 10 2 λ min 10 1 10 0 10 1 10 2 10 1 10 0 am 1 (am 2 1.0) Figure: Spectral range of M EOFA. η η φ M φ / η η 10 2 10 4 10 6 10 8 10 10 10 12 10 14 10 16 0 2 4 6 8 10 12 14 16 18 N p Figure: Heatbath relative error using N p poles. D. Murphy (Lattice 2016) EOFA Domain Wall Fermion Simulations July 27th, 2016 10 / 17

III. Momentum Update EOFA pseudofermion force is { a x,µs f [U] = kχ L γ5r5 ( a x,µ D w ) χl kχ R γ5r5 ( a x,µ D w ) χr χ L = [H(m 1)] 1 Ω P φ, χ R = [H(m 2) +P +] 1 Ω +P +φ Compared to RHMC: Cheaper to evaluate Somewhat smaller total force, with F L F R Percentage 10 0 10 1 10 2 10 3 EOFA RHMC F EOFA = 0.0239 F RHMC = 0.0364 Percentage 10 0 10 1 10 2 10 3 EOFA (L) EOFA (R) F L = 0.0041 F R = 0.0239 10 4 0.00 0.04 0.08 0.12 F Figure: Lattice-wide distribution of EOFA and RHMC total forces by link 10 4 0.00 0.02 0.04 0.06 0.08 F Figure: Lattice-wide distribution of EOFA force contributions from L and R terms by link D. Murphy (Lattice 2016) EOFA Domain Wall Fermion Simulations July 27th, 2016 11 / 17

Test: Reproduce 16I Ensemble [Phys. Rev. D76, 014504] Parallel N f = 2 + 1 evolutions: compare RHMC heavy quark to EOFA heavy quark 0.590 P vs. MD traj. 1 H vs. MD traj. 0.2 Q (%) 0.588 0 0.1 EOFA 0.586-1 0.0 0.590 1 0.2 0.588 0 0.1 RHMC 0.586 0 500 1000 1500-1 0 500 1000 1500 0.0-10 0 10 β 2.13 am l 0.01 am h 0.032 Table: Parameters EOFA RHMC EOFA RHMC am π 0.243(2) 0.244(1) af π 0.0887(8) 0.0883(6) am K 0.326(2) 0.326(1) af K 0.0966(6) 0.0963(4) am Ω 0.990(9) 0.995(10) am res(m l) 0.00305(4) 0.00306(4) Table: Spectrum D. Murphy (Lattice 2016) EOFA Domain Wall Fermion Simulations July 27th, 2016 12 / 17

Optimizations I: Accelerating EOFA Inversions 10 0 10 4 10 8 Ref.: RHMC multishift EOFA (unpreconditioned) EOFA (preconditioned) EOFA (preconditioned + mixed CG) 10 12 0 20 40 60 80 100 120 140 10 0 10 4 10 8 am phys s = 0.02424 am phys l = 0.000244 10 12 0 500 1000 1500 2000 2500 Time (s) EOFA solves accelerated with: Even-odd preconditioning Optimized BG/Q assembly generated by BAGEL [P. Boyle] Mixed-precision CG Test on 32I ensemble at physical am l and am s determined by chiral fits [Phys. Rev. D93, 054502] Reference (dashed line): even-odd preconditioned multishift solve of D D with same quark mass am phys s am phys l EOFA 7.6 s 156.5 s RHMC 37.7 s 422.5 s Ratio 5.0 2.7 Table: Comparison of total CG inversion time between optimized EOFA and RHMC D. Murphy (Lattice 2016) EOFA Domain Wall Fermion Simulations July 27th, 2016 13 / 17

Optimizations II: Accelerating EOFA Heatbath Following TWQCD, we use the chronological inversion method introduced by Brower et al. to forecast CG guesses [Nucl. Phys. B484, 353-374] Forecast solution to (H + α l ±P ±) ψ = φ by minimizing Ψ[x] = x (H + α l ±P ±) x φ x x φ over the space of accumulated solutions for previous shifts α l 3000 2500 CG Iterations 2000 1500 1000 500 0 L Zero Last Solution Forecasted R 0 4 8 12 16 20 Pole Test: 16I, 10 poles in rational approximation Find 30% reduction in total iteration count No additional gain from sharing solutions between L and R terms D. Murphy (Lattice 2016) EOFA Domain Wall Fermion Simulations July 27th, 2016 14 / 17

Conclusions and Next Steps We have: Independently implemented TWQCD s EOFA, and performed basic algorithmic tests Reproduced 16I ensemble using EOFA to evolve the heavy flavor Partially implemented and tested a number of optimizations: 1 Even-odd preconditioning 2 Optimized BG/Q assembly for sparse matrix applications (BAGEL) 3 Mixed-precision CG 4 Forecasted CG guesses for heatbath 5 Hasenbusch mass splitting Next steps: Better ideas to ameliorate the cost of the EOFA heatbath? Tuning Sexton-Weingarten integration and/or Hasenbusch mass splitting? G-parity simulations with EOFA light quarks Port to P. Boyle s Grid library: is a further speed-up possible if we perform deflated sparse matrix inversions with HDCR? [Poster by A. Yamaguchi and P. Boyle] We expect a gain over RHMC with a fully optimized and tuned implementation of EOFA, but the exact gain is difficult to predict! Thank you! D. Murphy (Lattice 2016) EOFA Domain Wall Fermion Simulations July 27th, 2016 15 / 17

Extra Slides D. Murphy (Lattice 2016) EOFA Domain Wall Fermion Simulations July 27th, 2016 16 / 17

Spectrum of the EOFA and RHMC actions Can think of EOFA as a preconditioning step applied to RHMC action Mapping of eigenvalue spectrum: (λ min, λ max) RHMC (1, λ max) EOFA Similar range, but M EOFA spectrum more densely concentrated near low end Computes same determinant ratio, but easier to stochastically estimate M EOFA! Notation: action is S = φ Mφ MEOFA = 1 kp Ω 1 H(m1) Ω P + 1 kp+ω + Ω+P+ H(m2) +P+ MRHMC = ( D D(m2) ) 1/4 ( D D(m1) ) 1/2 ( D D(m2) ) 1/4 10 5 10 5 10 4 10 4 10 3 10 3 10 2 10 2 10 1 10 1 10 0 0.8 1.0 1.2 1.4 1.6 1.8 2.0 2.2 λ(m EOFA ) 10 0 0.8 1.0 1.2 1.4 1.6 1.8 2.0 2.2 λ(m RHMC ) Figure: 4 5 free lattice, am 1 = 0.1, am 2 = 1.0, am 5 = 1.8. Lines mark 95% of density. D. Murphy (Lattice 2016) EOFA Domain Wall Fermion Simulations July 27th, 2016 17 / 17