arxiv: v1 [physics.data-an] 19 Feb 2017

Similar documents
Dalitz Plot Analyses of B D + π π, B + π + π π + and D + s π+ π π + at BABAR

The ππ and Kπ amplitudes from heavy flavor decays

Decay. Scalar Meson σ Phase Motion at D + π π + π + 1 Introduction. 2 Extracting f 0 (980) phase motion with the AD method.

Amplitude analyses with charm decays at e + e machines

arxiv: v2 [hep-ex] 8 Aug 2013

Introduction to numerical computations on the GPU

CharmSpectroscopy from B factories

Geometrical Methods for Data Analysis I: Dalitz Plots and Their Uses

arxiv: v1 [physics.data-an] 2 Mar 2016

arxiv: v1 [hep-ph] 16 Oct 2016

Hydra. A library for data analysis in massively parallel platforms. A. Augusto Alves Jr and Michael D. Sokoloff

arxiv: v1 [hep-ex] 14 Sep 2009

Complex amplitude phase motion in Dalitz plot heavy meson three body decay.

Welcome to MCS 572. content and organization expectations of the course. definition and classification

A CUDA Solver for Helmholtz Equation

Antimo Palano INFN and University of Bari, Italy On behalf of the LHCb Collaboration

Studies of charmonium production in e + e - annihilation and B decays at BaBar

Hydra. A. Augusto Alves Jr and M.D. Sokoloff. University of Cincinnati

Perm State University Research-Education Center Parallel and Distributed Computing

Department of Physics and Astronomy, University of California, Riverside, CA, USA

A. J. Schwartz Physics Department, University of Cincinnati, Cincinnati, Ohio USA

GPU Accelerated Markov Decision Processes in Crowd Simulation

Vincent Poireau CNRS-IN2P3, LAPP Annecy, Université de Savoie, France On behalf of the BaBar collaboration

CHARM MESON SPECTROSCOPY AT

Pentaquarks and Exotic Charm Spectroscopy at LHCb

Hadron Spectroscopy at BESIII

Generalized Partial Wave Analysis Software for PANDA

Overview of Light-Hadron Spectroscopy and Exotics

XVII International Conference on Hadron Spectroscopy and Structure - Hadron September, 2017 University of Salamanca, Salamanca, Spain

Research on GPU-accelerated algorithm in 3D finite difference neutron diffusion calculation method

Strange Charmed Baryons Spectroscopy

Charmonium(-like) and Bottomonium(-like) States Results from Belle and BaBar

CRYPTOGRAPHIC COMPUTING

D 0 -D 0 mixing and CP violation at LHC

arxiv: v1 [hep-ph] 12 Feb 2019

arxiv: v1 [hep-ex] 3 Nov 2014

arxiv: v1 [hep-lat] 7 Oct 2010

F. S.Navarra Instituto de Física, Universidade de São Paulo, C.P , São Paulo, SP, Brazil.

Exotic Quarkonium Spectroscopy and Production

Accelerating linear algebra computations with hybrid GPU-multicore systems.

Julian Merten. GPU Computing and Alternative Architecture

Dalitz Plot Analysis of Heavy Quark Mesons Decays (3).

Algorithm for Sparse Approximate Inverse Preconditioners in the Conjugate Gradient Method

Heavy Quark Spectroscopy at LHCb

Measurements of CPV and mixing in charm decays

Recent results from the LHCb

arxiv: v1 [hep-ex] 31 Dec 2014

Zahra Haddadi, KVI-CART (University of Groningen) for the BESIII collaboration 1 Sep EUNPC 2015, Groningen

Charmonium & charmoniumlike exotics

D D Shape. Speaker: Yi FANG for BESIII Collaboration. The 7th International Workshop on Charm Physics May 18-22, 2015 Detroit, Michigan

Tips Geared Towards R. Adam J. Suarez. Arpil 10, 2015

BaBar s Contributions. Veronique Ziegler Jefferson Lab

Tesi di Laurea Specialistica

Search for new physics in three-body charmless B mesons decays

A model leading to self-consistent iteration computation with need for HP LA (e.g, diagonalization and orthogonalization)

Accelerating Linear Algebra on Heterogeneous Architectures of Multicore and GPUs using MAGMA and DPLASMA and StarPU Schedulers

Hadronic Charm Decays: Experimental Review

Overview of LHCb Experiment

PoS(FPCP 2010)017. Baryonic B Decays. Jing-Ge Shiu SLAC-PUB National Taiwan University, Taiwan

Accelerating Model Reduction of Large Linear Systems with Graphics Processors

Parallelism of MRT Lattice Boltzmann Method based on Multi-GPUs

Hadron Spectroscopy at LHCb

Analysis tools: the heavy quark sector. Gianluca Cavoto INFN Roma ATHOS 12 Jun 20 th -22 nd, 2012 Camogli, Italy

Introduction. The Standard Model

The invariant and helicity amplitudes in the

arxiv: v2 [hep-ex] 16 Nov 2007

Studies of D ( ) Production in B Decays and e + e c c Events

! p 2 p 1 p 1 Decay and Measurement of f 0 Masses and Widths. Study of the D 1 s

STUDY OF D AND D PRODUCTION IN B AND C JETS, WITH THE DELPHI DETECTOR C. BOURDARIOS

Hadron Spectroscopy at COMPASS

LHCb: first results and prospects for the run

GPU acceleration of Newton s method for large systems of polynomial equations in double double and quad double arithmetic

Bottomonium results. K.Trabelsi kek.jp

ACCELERATED LEARNING OF GAUSSIAN PROCESS MODELS

International Workshop on Heavy Quarkonium Oct. 2007, DESY Hamburg. Prospects for Panda. Charmonium Spectroscopy

Baryon resonance production at. LIU Beijiang (IHEP, CAS) For the BESIII collaboration ATHOS3/PWA8 2015, GWU

TETRAQUARKS AND PENTAQUARK(S)

Search for exotic charmonium states

PoS(CKM2016)113. Measurement of γ from B meson decay to D ( ) K ( )

Institute of High Energy Physics, Chinese Academy of Sciences, 19B Yuanquanlu, Shijingshan district, Beijing, , China

Implementing NNLO into MCFM

arxiv: v1 [hep-lat] 23 Dec 2010

A microsecond a day keeps the doctor away: Efficient GPU Molecular Dynamics with GROMACS

Charmonium Radiative Transitions

2! s measurement using B 0 s J/"# at LHCb

arxiv: v1 [physics.comp-ph] 22 Nov 2012

Klaus Schulten Department of Physics and Theoretical and Computational Biophysics Group University of Illinois at Urbana-Champaign

V.V. Gligorov, CERN On behalf of the LHCb collaboration 25th February 2013

A Massively Parallel Eigenvalue Solver for Small Matrices on Multicore and Manycore Architectures

Exotic hadronic states XYZ

Petascale Quantum Simulations of Nano Systems and Biomolecules

Population annealing study of the frustrated Ising antiferromagnet on the stacked triangular lattice

Higgs Searches and Properties Measurement with ATLAS. Haijun Yang (on behalf of the ATLAS) Shanghai Jiao Tong University

The progress and prospect on charm mixing

S0214 : GPU Based Stacking Sequence Generation For Composite Skins Using GA

arxiv: v1 [hep-ex] 14 Sep 2015

D 0 -mixing and CP Violation in Charm at Belle

A new multiplication algorithm for extended precision using floating-point expansions. Valentina Popescu, Jean-Michel Muller,Ping Tak Peter Tang

Explore Computational Power of GPU in Electromagnetics and Micromagnetics

Exotic hadrons at LHCb

Transcription:

arxiv:1703.03284v1 [physics.data-an] 19 Feb 2017 Model-independent partial wave analysis using a massively-parallel fitting framework L Sun 1, R Aoude 2, A C dos Reis 2, M Sokoloff 3 1 School of Physics and Technology, Wuhan University, Wuhan, Hubei Province, 430072 China 2 Centro Brasileiro de Pesquisas Físicas (CBPF), Rio de Janeiro, 22290-180 Brazil 3 University of Cincinnati, Physics Department, ML0011, Cincinnati OH 45221-0011, USA E-mail: sunl@whu.edu.cn Abstract. The functionality of GooFit, a GPU-friendly framework for doing maximumlikelihood fits, has been extended to extract model-independent S-wave amplitudes in threebody decays such as D + h + h + h. A full amplitude analysis is done where the magnitudes and phases of the S-wave amplitudes are anchored at a finite number of m 2 (h + h ) control points, and a cubic spline is used to interpolate between these points. The amplitudes for P- wave and D-wave intermediate states are modeled as spin-dependent Breit-Wigner resonances. GooFit uses the Thrust library, with a CUDA backend for NVIDIA GPUs and an OpenMP backend for threads with conventional CPUs. Performance on a variety of platforms is compared. Executing on systems with GPUs is typically a few hundred times faster than executing the same algorithm on a single CPU. 1. Introduction The physics of scalar mesons below 2 GeV is a long-standing puzzle in light meson spectroscopy [1], this is partly due to the fact that these mesons often have large widths and overlap with each other, which make them hard to model. Contributions from Light scalars can be extracted by using Dalitz plot analyses [2] in three-body hadronic decays of charm mesons. First developed by the E791 Collaboration [3], the Model Independent Partial Wave Analysis (MIPWA) technique extracts S-wave amplitudes in the Dalitz plot analysis with no assumption about the nature of the S-wave. To pin down its large numbers of free parameters, the MIPWA technique requires the large samples of three-body decay events that have become increasingly available at the B-factories and the LHC. Notably, the LHCb experiment has recorded charm decays with sample sizes exceeding any previous experiment by more than an order of magnitude, offering a unique opportunity to study S-wave structures with unprecedented levels of precision. Analysing very large statistics samples requires disproportionately more computing power. Running all the calculations in a single Central Processing unit (CPU) thread would take a prohibitively long time. Originally, a Graphics Processing Unit (GPU) was a specialised electronic circuit designed to rapidly manipulate and alter memory to accelerate the creation of images for output to a video display. The highly parallel structure makes GPUs more effective than CPUs for algorithms where large blocks of data are processed in parallel. Today, the functionality of GPUs has been extended to enable highly parallel computing for scientific and other more general applications. An open-source framework called GooFit [4] has been developed to exploit the processing power

of these GPUs for parallel function evaluation, particularly in the context of maximum likelihood fits. This paper reports an extension of the GooFit framework to support MIPWA of a threebody decay with vastly reduced processing time. 2. MIPWA method Essentially all studies of three-body hadronic D (s) decays employ the same technique: the unbinned maximum likelihood fit of the Dalitz plot, in which the quantum mechanical matrix element governing the decay is represented by a coherent sum of amplitudes [1]. These amplitudes correspond to the possible intermediate states in the decay chain D (s) Rh 3, R h 1 h 2 (h = K, π). The amplitudes are grouped according to the orbital angular momentum L of R and h 3 in the rest frame of the D (s), and the total amplitude is A(s 12, s 13 ) = L c L k A L k (s 12, s 13 ), (1) where s 12(13) m 2 (h 1 h 2(3) ). The amplitudes A L k are weighted by constant complex coefficients c L k, and the series is truncated at L = 2. For a resonance with non-zero spin, the amplitude A k is most often described using a relativistic Breit-Wigner function multiplied by a real spindependent angular factor [5]. For the S-wave, two qualitatively different approaches exist. In the Isobar model, the S-wave is treated as a sum of a constant non-resonant term and Breit-Wigner functions for the scalar resonances. While the Isobar model provides reasonably good descriptions for narrow resonances, it fails to describe the overlap of broad resonances. Additionally, the physical interpretation of the constant non-resonant term is problematic. To address these issues, the MIPWA describes the ensemble of scalar components using a purely phenomenological set of parameters derived from the data. The s 12(13) mass spectrum is divided into N 1 slices with N boundary points separating the slices and at the two ends of the spectrum. The S-wave is represented by a generic complex function A 0 (s) = a 0 (s)e iφ0(s). At each of the k boundary points, A 0 (s = s k ) = a k e iφ k where a k and φ k are real parameters. Between the N boundary points, the S-wave is parametrised by a cubic spline [6] in the complex plane. The set of {a k, φ k } are free parameters, along with the coefficients c L l of the higher spin terms, are determined in the MIPWA fit. The large data sets studied, along with the large number of parameters (2N) required to describe the S-wave, make maximising the likelihood a computationally intensive problem. Interpolating between the boundary points leads to correlations between the parameters, and hence to non-linear behavior. 3. MIPWA method with GooFit GooFit provides an interface to allow probability density functions (PDFs) to be evaluated in parallel, using either GPUs or multicore CPUs as back-ends. While the original intention of GooFit was to utilise the massive computational power of NVIDIA GPUs based on the proprietary Compute Unified Device Architecture language (CUDA) [7], the Thrust parallel algorithms library [8] also supports an OpenMP backend for conventional CPUs. GooFit has been used to perform a number of amplitude analyses, for example that of Ref. [9]. For a time-integrated Dalitz plot analysis, GooFit provides the DalitzPlotPdf class to model the Dalitz plot PDF. The DalitzPlotPdf object contains a list of ResonancePdf objects to describe resonant amplitudes as well as a constant nonresonant amplitude. The original GooFit package provided only the Isobar model to parameterise the S-wave. The ResonancePdf class has been extended to add support for the MIPWA method. The entire S-wave is treated as a single resonance (ResonancePdf object), with a set of free parameters {a k 0, φk 0 } to be determined in the fit. k

Figure 1. The projections of m 2 (K + K ) (left) and m 2 (K + K + ) (right) from D + K + K + K. In each plot, the toy MC signal events (points with error bars ) are shown together with the total fit (blue line), φ resonance (red line), and S-wave (magenta line) determined from the MIPWA. 4. MIPWA in D + K + K + K decay 1 The D + K + K + K decay offers a good opportunity to study the K + K S-wave amplitude directly. LHCb collected a large D + K + K + K data sample from 2.1 fb 1 of s = 7 TeV pp collisions recorded by the experiment during 2012. About 100K candidates were selected with a signal purity of 90%. The resonant structure of the K + K S-wave amplitude was studied for the first time using an Isobar model, and the results were first presented during the CHARM 2016 workshop [10]. This analysis indicates that the S-wave component accounts for about 90% of the decay rate, while the P-wave (φ(1020) resonance) makes up the rest. A specific goal of the work presented here is to provide a tool to extract the K + K S-wave by performing an MIPWA on the same LHCb data sample. In this case, the Dalitz plot is symmetrised along the two axes of m 2 (K + K ). The K + K mass squared range is divided into 39 slices using 40 boundary points. Because the final state contains two indistinguishable K + mesons, the K + K amplitudes interfere with themselves, and this sort of interference allows the S-wave phase to be determined. To test the GooFit extension to support MIPWA fits, samples of 100K signal events are generated from a Dalitz plot PDF ( Toy MC ). The mass projections from the fit to a test sample can be seen in Fig. 1. First, the fit quality of the MIPWA method is tested by comparing the fitted {a k, φ k } values with the inputs. In each fit iteration, the Dalitz plot normalisation method ( DalitzPlotPdf::normalise() ) is called so that the integral of the total PDF is equal to one. The normalisation integral is calculated numerically based on evenly distributed grid points in the Dalitz plot plane. Figure 2 shows the improvement in the fit quality as the normalisation grid spacing is reduced. Although the finer granularity increases numbers of calculations, the high speed of the GPUs makes this problem tractable. Table 1. GPU performance of the MIPWA fit. Platform GPU Model Chip CUDA cores Run time (sec.) tworkstation Tesla K40c GK110BGL 2880 76 Desktop PC GeForce GTX 980 2nd gen. Maxwell (GM204) 2048 67 Laptop ASUS N56V GeForce GT 650M GK107 384 179 1 Charge conjugation is implied throughout.

Figure 2. Differences between the fitted values and input ones for each boundary point. Shown are set of points with different grid spacings for the normalisation: 0.01 GeV 2 (purple), 0.004 GeV 2 (red), and 0.001 GeV 2 (blue). As shown in Table 1, the GooFit performance on three different GPU platforms has been measured by running the MIPWA fit over the same toy MC sample of 100 K events (see Fig. 1 for the fit projections). For comparison, the MIPWA fit using an older code with the same functionality that runs on one CPU core takes about eight hours to complete. Perhaps surprisingly, an older generation mobile GPU (a GeForce GT 650M with 384 cores) provides excellent performance; a newer HPC GPU board (a Tesla K40c with 2880 cores) provides better performance, but not in proportion to the number of cores; a high-end gamer board (a GeForce GTX 980) provides the best performance, albeit by a small margin. Based on other studies with GooFit, significantly better performance on the new P100 boards is anticipated which utilise NVIDIA s new Pascal GPU architecture. Table 2. Specifications of the testing CPU platforms. Asterisks next to the number of cores indicate hyperthreading - two virtual processors per physical core. Name Chip type # of Cores Clock [GHz] RAM [GiB] Intel Xeon E5520 8* 2.27 24 Goofy Intel Xeon CPU E5-2680 v3 24* 2.50 120 In addition to timing GooFit s new MIPWA performance on GPUs using Thrust s CUDA backend, its performance on two different CPU platforms is timed using Thrust s OpenMP backend, as shown in Fig. 3. With two Intel Xeon E5-2680 v3 CPUs, the fit uses 791 seconds for one OpenMP thread, 50 seconds for 24 threads. The speedup is almost linear with the increase of the number of OpenMP threads, up until the number of threads equals the numbers of physical cores. 5. Summary This paper describes an extension of GooFit to support MIPWA fits for three-body decays, and have achieved speedups of a few hundred by using GPUs. The main branch of GooFit s source code is in a GooFit repository at https://github.com/goofit/goofit, while the updated code

Wallclock Time (sec) 2000 Goofy 1800 (24 physical cores) 1600 1400 1200 1000 800 600 400 200 Speedup 20 18 16 14 12 10 8 6 4 2 Goofy (24 physical cores) Speedup per thread 1 0.9 0.8 0.7 0.6 0.5 0.4 Goofy (24 physical cores Figure 3. Timing results and speedups for the test MIPWA fit. Tested on two different CPU platforms as shown in Table 2: Goofy (Intel Xeon CPU E5-2680 v3 x 2) and (Intel Xeon CPU E5520 x 2). with MIPWA support is in a personal GooFit branch at https://github.com/liang-sun/ GooFit. 5.1. Acknowledgments This work was performed with support from NSF Award PHY-1414736. NVidia provided K40 GPUs for our use through its University Partnership program. The Ohio Supercomputer Center made their Oakley computer farm available for development, for testing, and for GooFit outreach workshops. References [1] C. Patrignani et al. (Particle Data Group), Chin. Phys. C 40, 100001 (2016). [2] Dalitz R H, 1953 Phil. Mag. 44 1068. [3] E. M. Aitala et al. (E791 Collaboration), Phys. Rev. D 73, 032004 (2006). [4] R. Andreassen et al., IEEE Access 2, 160 (2014). [5] S. Kopp et al. (CLEO Collaboration), Phys. Rev. D 63, 092001 (2001). [6] W. H. Press et al., Numerical Recipes: The Art of Scientific Computing, 3rd Edition, Cambridge University Press, 2007. [7] NVIDIA Corporation, 2015, NVIDIA CUDA C programming guide, Version 7. [8] Nathan Bell and Jared Hoberock, Thrust: A Productivity-Oriented Library for CUDA, available from the Documentation link at https://thrust.github.io/, accessed 29 January, 2017. [9] J. P. Lees et al. (BABAR Collaboration), Phys. Rev. D 93, 112014 (2016). [10] LHCb Collaboration, LHCb-CONF-2016-008 (2016).