Acceleration of Feynman loop integrals in highenergy physics on many core GPUs

Similar documents
Numerical Computation of a Non-Planar Two-Loop Vertex Diagram

Multi-threaded adaptive extrapolation procedure for Feynman loop integrals in the physical region

Parallel adaptive methods for Feynman loop integrals. Conference on Computational Physics (CCP 2011)

The GRACE project - QCD, SUSY, Multi-loop -

Scalable Software for Multivariate Integration on Hybrid Platforms

Distributed and multi-core computation of 2-loop integrals

arxiv: v1 [hep-ph] 30 Dec 2015

On Iterated Numerical Integration

Numerical multi-loop calculations: tools and applications

Numerical Evaluation of Multi-loop Integrals

Rational Chebyshev pseudospectral method for long-short wave equations

TVID: Three-loop Vacuum Integrals from Dispersion relations

Numerical Evaluation of Multi-loop Integrals

Triangle diagrams in the Standard Model

ISR effects on loop corrections of a top pairproduction

MHV Diagrams and Tree Amplitudes of Gluons

arxiv: v1 [hep-lat] 7 Sep 2007

Improving many flavor QCD simulations using multiple GPUs

arxiv:hep-ph/ v1 18 Nov 1996

Note on homological modeling of the electric circuits

Numerical evaluation of multi-loop integrals

CS 450 Numerical Analysis. Chapter 8: Numerical Integration and Differentiation

GPU acceleration of Newton s method for large systems of polynomial equations in double double and quad double arithmetic

Precision neutron flux measurement with a neutron beam monitor

PHY 396 L. Solutions for homework set #20.

Molecular Dynamics Simulations

Functions associated to scattering amplitudes. Stefan Weinzierl

arxiv: v1 [math.cv] 18 Aug 2015

Automation of NLO computations using the FKS subtraction method

Lightlike solitons with spin

Azimuthal anisotropy of the identified charged hadrons in Au+Au collisions at S NN. = GeV at RHIC

arxiv:hep-ph/ v1 28 Mar 2006

NNLO antenna subtraction with two hadronic initial states

Monte Carlo Calculation of Phase Shift in Four Dimensional O(4) φ 4 Theory

An Empirical Study of the ɛ-algorithm for Accelerating Numerical Sequences

Beyond Wiener Askey Expansions: Handling Arbitrary PDFs

P-ADIC STRINGS AT FINITE TEMPERATURE

REDUCTION OF FEYNMAN GRAPH AMPLITUDES TO A MINIMAL SET OF BASIC INTEGRALS

Mathematical analysis of the computational complexity of integer sub-decomposition algorithm

Hypergeometric representation of the two-loop equal mass sunrise diagram

High-velocity collision of particles around a rapidly rotating black hole

INP MSU 96-34/441 hep-ph/ October 1996 Some techniques for calculating two-loop diagrams 1 Andrei I. Davydychev Institute for Nuclear Physics,

arxiv:astro-ph/ v1 10 Dec 1996

1.E Neutron Energy (MeV)

Measurement of D-meson production in p-pb collisions with the ALICE detector

The Non-commutative S matrix

arxiv: v1 [hep-lat] 10 Nov 2018

arxiv:hep-th/ v1 2 Jul 1998

Exploring finite density QCD phase transition with canonical approach Power of multiple precision computation

Summary of free theory: one particle state: vacuum state is annihilated by all a s: then, one particle state has normalization:

Automating Feynman-diagrammatic calculations

Treatment of Overlapping Divergences in the Photon Self-Energy Function*) Introduction

Universality check of the overlap fermions in the Schrödinger functional

Structural Aspects of Numerical Loop Calculus

A perturbative analysis of tachyon condensation

Finite Temperature Field Theory

Correction for PMT temperature dependence of the LHCf calorimeters

Schematic Project of PhD Thesis: Two-Loop QCD Corrections with the Differential Equations Method

QFT. Chapter 14: Loop Corrections to the Propagator

MHD free convective flow past a vertical plate

Sector Decomposition

On the Landau gauge three-gluon vertex

A model of the measurement process in quantum theory

NLM introduction and wishlist

A construction of the Schrödinger Functional for Möbius Domain Wall Fermions

Analytic approximation for the modified Bessel function I 2/3

GPU-based computation of the Monte Carlo simulation of classical spin systems

Interpolation via symmetric exponential functions

NLO-QCD calculation in GRACE. - GRACE status - Y. Kurihara (KEK) GRACE Group LoopFest IV

When Is It Possible to Use Perturbation Technique in Field Theory? arxiv:hep-ph/ v1 27 Jun 2000

Andrej Liptaj. Study of the Convergence of Perturbative Power Series

NLO event generator for LHC

Chapter 8 Indeterminate Forms and Improper Integrals Math Class Notes

Laboratory experiments on the formation and recoil jet transport of aerosol by laser ablation

First order sensitivity analysis of electron acceleration in dual grating type dielectric laser accelerator structures

The integration of response surface method in microsoft excel with visual basic application

Tectonics of the terrestrial litosphere in spherical harmonics

AC-DC Transfer Difference Measurements using a Frequency Output Thermal Converter

The role of symmetry in nuclear physics

& Λ Production in ALICE

7 Veltman-Passarino Reduction

Department of Physics, Washington University, St. Louis, MO 63130, USA (Dated: November 27, 2005)

PHENIX measurements of bottom and charm quark production

Least action principle and stochastic motion : a generic derivation of path probability

Fiducial cross sections for Higgs boson production in association with a jet at next-to-next-to-leading order in QCD. Abstract

LOOP-TREE DUALITY AND QUANTUM FIELD THEORY IN 4D

Evaluating multiloop Feynman integrals by Mellin-Barnes representation

Quark tensor and axial charges within the Schwinger-Dyson formalism

Claude Tadonki. MINES ParisTech PSL Research University Centre de Recherche Informatique

REVIEW REVIEW. Quantum Field Theory II

Quantum Field Theory II

RG scaling at chiral phase transition in two-flavor QCD

The mechanism of nuclear spontaneous time reversal symmetry breaking transfered to digital logic system

arxiv: v1 [hep-lat] 31 Oct 2014

Accurate numerical orbit propagation using Polynomial Algebra Computational Engine PACE. ISSFD 2015 congress, Germany. Dated: September 14, 2015

Quantum Field Theory II

arxiv: v1 [physics.comp-ph] 22 Jul 2010

The solution of 4-dimensional Schrodinger equation for Scarf potential and its partner potential constructed By SUSY QM

Gravitational collapse and the vacuum energy

Determination of Failure Point of Asphalt-Mixture Fatigue-Test Results Using the Flow Number Method

Transcription:

Journal of Physics: Conference Series OPEN ACCESS Acceleration of Feynman loop integrals in highenergy physics on many core GPUs To cite this article: F Yuasa et al 2013 J. Phys.: Conf. Ser. 454 012081 View the article online for updates and enhancements. Related content - Multi-threaded adaptive extrapolation procedure for Feynman loop integrals in the physical region E de Doncker, F Yuasa and R Assaf - Distributed and multi-core computation of 2-loop integrals E de Doncker and F Yuasa - Regularization of IR divergent loop integrals E de Doncker, F Yuasa and Y Kurihara This content was downloaded from IP address 46.3.194.69 on 01/12/2017 at 22:14

Acceleration of Feynman loop integrals in high-energy physics on many core GPUs F Yuasa 1, T Ishikawa 1, N Hamaguchi 1, T Koike 2 and N Nakasato 3 1 High Energy Accelerator Research Organization (KEK), 1-1 Oho, Tsukuba, Ibaraki 305-0801, Japan 2 Seikei University, Musashino, Tokyo 180-8633, Japan 3 University of Aizu, Aizu-Wakamatsu, Fukushima 965-8580, Japan E-mail: fukuko.yuasa@kek.jp Abstract. The current and future colliders in high-energy physics require theorists to carry out a large scale computation for a precise comparison between experimental results and theoretical ones. In a perturbative approach several methods to evaluate Feynman loop integrals which appear in the theoretical calculation of cross-sections are well established in the one-loop level, however, more studies are necessary for higher-order levels. Direct Computation Method (DCM) is developed to evaluate multi-loop integrals. DCM is based on a combination of multidimensional numerical integration and extrapolation on a sequence of integrals. It is a fully numerical method and is applicable to a wide class of integrals with various physics parameters. The computation time depends on physics parameters and the topology of loop diagrams and it becomes longer for the two-loop integrals. In this paper we present our approach to the acceleration of the two-loop integrals by DCM on multiple GPU boards. 1. Introduction Much of the success of the Standard Model in high-energy physics rests on the result of the various precision measurements[1, 2]. These precision measurements require the knowledge of higher order quantum corrections. In Feynman diagrammatic approach followed by most of the automated systems, the cross sections can be obtained by computing directly the squared of matrix elements or amplitudes. In order to include higher order corrections in the Standard model or beyond it is indispensable to handle evaluations of Feynman loop integrals. The most general form of the scalar Feynman loop integral with L loops and n internal lines is given by L I = lim ǫ 0+ j=1 d 4 l j i(2π) 4 n r=1 1 D r (1) where l j is the j-th loop momentum and we put the space-time dimension 4. Here r=1 D r = q 2 r m 2 r +iǫ, (2) is the inverse of Feynman propagator with the momentum q r and m r the mass of the r-th particle. Using the generalized Feynman s identity, n 1 1 δ(1 x r ) = Γ(n) dxr D r ( x r D r ) n, (3) 0 Content from this work may be used under the terms of the Creative Commons Attribution 3.0 licence. Any further distribution of this work must maintain attribution to the author(s) and the title of the work, journal citation and DOI. Published under licence by Ltd 1

we find where I = ( ) 1 4L/2 Γ( n 4L 4π 2 ) I, (4) 1 n I = ( 1) n dx i δ(1 C n 4(L+1)/2 x i ) 0 (D iǫc) n 4L/2. (5) i=1 Here D function is a polynomial in Feynman parameters {x i } and it contains physics parameters such as external momenta and particle masses and C function is a polynomial in {x i }. Both functions are determined by the topology of Feynman diagram. The details on how to get them are summarized in [3]. In the case of modern theoretical calculations in high-energy physics, loop integrals are performed with new mathematical methods and are based on an intense use of computer power to get theoretical predictions of cross-sections for particle reactions. Depending on the number of loops, legs, and physics parameters, it may take long computation time. In fact it is reported that the two-loop box (four legs) with arbitrary masses requires long computation time[3]. To reduce this practical problem, current hardware trends such as many-core/multi-core computation can be available. Recently GPU boards have been used in scientific applications which require a huge computation time. Also in high-energy physics successful computations on GPU have been reported in [4, 5]. In this paper, we show our approach to reduce the computation time of loop integrals on multiple GPUs and show its performance. The plan of this paper is as follows; we explain numerical techniques of our program for loop integrals in Section 2. The environment for the acceleration of the computation and the results of the performance test are shown in Section 3. Section 4 gives summary of this paper. 2. Numerical Techniques One of the difficult points for the numerical computation of loop integrals is vanishing denominators in (5). Our program code for loop integrals is a combination of numerical multidimensional integration and extrapolation in order to reduce the difficulty. It consists of the following three steps: (i) Let ǫ in (5) be finite as ǫ l = ǫ 0 /(const.) l, with l = 0,1,2,, where a starting ǫ, ǫ 0, and constant are positive numbers. (ii) Evaluatethemulti-dimensionalintegralI in(5)numerically. Thankstoǫ l wegetfinitevalue corresponding to each l. Repetition of integration gives a sequence of I(ǫ l ), l = 0,1,2,. (iii) Extrapolate the sequence I(ǫ l ) to the limit and determine I. We call above prescriptions Direct Computation Method (DCM). Please note that in the unphysical region extrapolation is not necessary and we can put ǫ = 0 in (5) because singularities do not appear and the integration is not difficult in this region. On the other hand, singularities appear in the physical region and we should take the parameters for extrapolation, ǫ 0 and constant, appropriately corresponding to physics parameters such as masses. In current DCM, the way to define them automatically is not included and we take them empirically. In DCM, the integration routine plays a central role. The numerical result of the integral should be as precise as possible for each ǫ l, so that the extrapolation of I(ǫ l ) converges well. For multi-dimensional integration we can make use of any method when once it gives a good approximation. We have been using two different integration methods. One is an adaptive algorithm routine, DQAGE, in QUADPACK[6] and the other is Double Exponential Formulas[7], shortly DE. We call the combination of DQAGE and and the extrapolation technique DQ-DCM and call the combination of DE and the extrapolation technique DE-DCM. 2

We use Wynn s ε algorithm[8, 9, 10] for the extrapolation and we usually repeat the integration 15 times. Since computation time for the extrapolation of the sequence is negligible small compared to that for the integration, we do not go further about the extrapolation. Throughout this paper, we concentrate exclusively on the acceleration of DE-DCM. Let us explain about DE formulas briefly. An integral with the interval (a,b) is given as I = b a f(x)dx (6) and with x = φ(t), a = φ( ) and b = φ( ). I = g(t)dt, (7) where g(t) = f(φ(t))φ (t). (8) Application of the trapezoidal rule with mesh size h gives I h = h f(φ(kh))φ (kh). (9) k= Here note the infinite sum should be truncated appropriately as N + Ih N = h f(φ(kh))φ (kh), (10) k= N N = N +N + +1, (11) where N is the number of function evaluation. In DE-DCM we take the following transformation I = 1 0 f(x)dx, (12) φ(t) = 1 2 (1+tanh(π sinh(t))). (13) 2 Through our experiences in DE-DCM[11], we point out following things: DE is effective for singularities appearing in the boundary of the integration domain, When singularities appear in the integration domain, we may split the integration domain, DE has been developed for the one-dimensional integration, we apply it in an iterative way, The parallelization of DE is straightforward. 2.1. Two-loop self-energy integral In this paper, we take two-loop self-energy integrals (two legs) as an example because there is a rich literature on the analytic methods or semi-numerical methods such as [12, 13, 14] and we can compare numerical results by DE-DCM to analytic ones. 3

D and C functions corresponding to the diagram shown in Fig.1 are given as: D = s(x 5 (x 1 +x 3 )(x 2 +x 4 )+(x 1 +x 2 )x 3 x 4 +(x 3 +x 4 )x 1 x 2 )+C M 2, (14) C = (x 1 +x 2 +x 3 +x 4 )x 5 +(x 1 +x 2 )(x 3 +x 4 ), (15) 5 M 2 = x i m 2 i. (16) i=1 Here, m i is a mass of an internal particle indicated Feynman parameter x i and s is a physics parameter. x 1 x 3 p x 5 p x 2 x 4 Figure 1. Two-loop self-energy diagram with five internal lines. x i is Feynman parameter related to each line and p a physics parameter which determines s in (14) as p 2 is s. 0.32 two-loop selfenergy (real part) 0.3 0.28 0.26 I(s) 0.24 0.22 0.2 0.18 0.16 0.14 0 10 20 30 40 50 60 70 80 90 100 s Figure 2. Numerical results of real part of two-loop self-energy integrals with n = 1024 and h = 8.36181640625 10 3. All filled circles are results after the extrapolation. The starting ǫ for the extrapolation, ǫ 0, is 1.2 15 and constant is 1.2. In this computation we did not split the integration domain. The numerical results of the real part of the integrals for 4.0 s 10.0 GeV 2 in the physical region with mass parameters as m 1 = m 2 = m 5 = 10.0 GeV and m 3 = m 4 = 0.0 GeV are plotted in Fig.2. For positive s, we repeated the integration to obtain I(ǫ l )for the extrapolation. For above physics parameter setting we don t have to split the integration domain because the values of masses given are not so small compared to s. All the computation shown here has been done in the double precision arithmetic because there is no infrared divergence in this case. For example, for the case s = 10.0 GeV 2, a numerical result is 0.26673229 10 1 ±0.191 10 8. The result using analytic formulas in [12] is 0.266732 10 1 and the agreement is excellent. 3. Acceleration of Computation In this section firstly we explain the software environment and the modification of DE-DCM program code for the acceleration. Then we show the hardware environment for the computation time measurement and results. 4

3.1. Software Environment We use LSUMP compiler[15] which has been developed by Nakasato et al. [15]. In addition to the LSUMP, we use Goose[16] software package which enables us parallel programming with Goose directives on the SIMD hardware accelerator. Both are originally developed for astrophysical N-body simulations and several very effective N-body simulation codes are available on GPU or special-purpose computers. With the combination of LSUMP and Goose, we only have to define the inner-most computation loop in the program. In order to use LSUMP and Goose, we transform the variables so as to make integration domain as [0,1] and then divide multiple computation loops for the multi-dimensional integration to double loops. Though the double precision and quadruple precision arithmetic are available with LSUMP[17] and Goose[18], all computation shown in this paper has been done in double precision arithmetic for the physics parameters given in Section 2. 3.2. Hardware Environment We setup two kinds of hardware environment. One is a host computer (AMD Phenon II x6 1090T 3.2GHz) with two AMD/ATI HD5870 GPU boards and the other is a host computer (AMD FX-8150 3.6GHz) with four AMD/ATI HD6970 GPU boards. Parameters of each GPU board are shown in Table 1. Table 1. Specification of each GPU board. SP FP stands for Single Precision floating-point arithmetic and DP FP Double Precision floating point, respectively. HD5870 HD6970 Clock [MHz] 850 880 number of cores 1600 1536 SP FP [GFLOP] 2740 2732 DP FP [GFLOP] 544 683 Table 2. Computation time for s = 5.0 GeV 2 with N = 1024 and s = 1.0 GeV 2 with N = 2048. s GPU # of time [sec] boards 5.0 HD5870 1 242.313 HD5870 2 122.251 5.0 HD6970 1 221.059 HD6970 2 110.715 HD6970 3 74.519 HD6970 4 55.638-1.0 HD6970 4 882.288 3.3. Computation Time Measurement We have measured the computation time using one to four GPU boards taking two-loop selfenergy integral for s = 5.0 GeV 2 with N = 1024 and h = 8.362 10 3 as examples. The measured results are shown in Table 2. For positive s, in the physical region, we should repeat the integration to obtain the sequence I(ǫ l ) and extrapolate it to accomplish the loop integral. However, the values of computation time shown in Table 2 are time required for only one repetition for simplicity. Therefore the amount of time in total becomes 15 times longer in the practical computation. We also run the same program code compiled by Goose on a host computer using one core with same integration parameters for s = 5.0 GeV 2 and roughly speaking it takes about one day. For example on AMD Phenon II x6 1090T 3.2GHz, it took 79903.860 seconds. In DE-DCM, the expected computation time for one repetition in both the unphysical region andthephysicalregionismoreorlessthesameanditdependsonthevalueofn. Forcomparison 5

we also show the computation time for s = 1.0 GeV 2 with N = 2048 and h = 4.181 10 3 in Table 2 even though it is in the unphysical region. Please note that the computation time with N = 2048 is about 16 times longer compared to one with N = 1024, because the integration is 4-dimensional one. 4. Summary We have been developing a fully numerical method for Feynman loop integrals, Direct Computation Method (DCM). DCM is based on the numerical integration and extrapolation method and it can handle the singularities due to the vanishing denominator in the expression of loop integrals. The numerical integration plays a central role in DCM and in this paper we used Double Exponential Formulas in an iterated way for the multi-dimensional integration. Since it is fully numerical, it can be applicable to multi-loop integrals with multi-leg and arbitrary physics parameters in principle. However, the computation time becomes longer for higher-order loops and reducing it is a crucial problem. In this paper we demonstrated the acceleration of the loop integrals taking two-loop self-energy integral as an example. In the performance test, it is clearly shown that the reduction of the computation time is excellent on multiple GPU boards. Concerning the further development we need to test various two-loop integrals with more legs and various physics parameters different from those in this paper. Acknowledgments We acknowledge the support from Grant-in-Aid for Scientific Research (No.24540292 and No. 23540328) of JSPS and the Large Scale Simulation Program No.12-07 of KEK. This work is supported in part by Strategic Programs for Innovative Research (SPIRE) Field 5 The origin of matter and the universe. One of authors (F.Y.) wishes to thank Prof. E de Doncker for her continuous encouragement. References [1] D. Perret-Gallix, Computational Particle Physics for Event Simulation and Analysis, a talk in CCP2012. [2] J. Blümlein, Why precision, arxiv:1205.4991v2 [hep-ph]. [3] F. Yuasa, E. de Doncker, N. Hamaguchi, T. Ishikawa, K. Kato, Y. Kurihara, J. Fujimoto, Y. Shimizu, Numerical computation of two-loop box diagrams with masses, Comput. Phys. Commun. 183 (2012) 2136. [4] J. Kanzaki, Monte Carlo integration on GPU, DOI:10.1140/epjc/s10052-011-1559-8. [5] M. Hayakawa, K.-I. Ishikawa, Y. Osaki, S. Takeda, S. Uno, N. Yamada, Improving many flavor QCD simulations using multiple GPUs, PoS Lattice2010:325,2010. [6] R. Piessens, E. de Doncker, C. W. Ubelhuber and D. K. Kahaner, QUADPACK, A Subroutine Package for Automatic Integration, Springer Series in Computational Mathematics. Springer-Verlag, 1983. [7] H. Takahasi and M. Mori, Double Exponential Formulas for Numerical Integration, RIMS Kyoto Univ. 9 (1974) 721. [8] D. Shanks, Non-linear transformations of divergent and slowly convergent sequences, J. Math. Phys. 34 (1955) 1. [9] P. Wynn, On a device for calculating the e m(s m) transformations, Mathematical Tables Aids to Computing 10 (1956) 91. [10] P. Wynn, On the convergence and stability of the epsilon algorithm, SIAM J. Numer. Anal. 3 (1966) 91. [11] F. Yuasa and N. Hamaguchi, Double Exponential Integration Method with algorithm for the Feynman Loop Integration Calculation [in Japanese], 2008-HPC-114 No.6 pp. 31-36. [12] D.J. Broadhurst, J. Fleischer and O.V. Tarasov, Two-loop two-point functions with masses: asymptotic expansions and Taylor series, in any dimension, Z.Phys. C60 (1993) 287. [13] J. Fujimoto, Y. Shimizu, K. Kato and Y. Oyanagi, Numerical Approach to Loop Integrals, in Proc. of the Second International Workshop on Software Engineering, Artificial Intelligence and Expert Systems in High Energy and Nuclear Physics, January 13-18, 1992, l Agelonde. [14] G. Passarino and S. Uccirati, Algebraic-numerical evaluation of Feynman diagrams: two-loop self-energies, Nucl. Phys. B629 (2002) 97. [15] N. Nakasato and J. Makino, A Compiler for High Performance Computing With Many-Core Accelerators, in: IEEE International Conference on Cluster Computing and Workshops, 2009, pp. 1-9. 6

[16] A. Kawai, T. Fukushige, N. Nakasato and T. Narumi, Goose and DS-CUDA [in Japanese], ENSEMBLE, The Bulletin of The Molecular Simulation Society of Japan, Vol.14, No. 2, (2012.04.30) 81-84. [17] N. Nakasato, T. Ishikasa, J. Makino and F. Yuasa, Fast Quad-Precision Operations On Many-Core Accelerators [in Japanese], Vol2009-HPC-121 No.39. [18] T. Koike, Speed-up of numerical Feynman loop integration by graphics processing unit [in Japanese], unpublished Masters thesis, Seikei University, Musashino, Tokyo, Japan, 2012. 7