Efficient implementation of the overlap operator on multi-gpus

Size: px
Start display at page:

Download "Efficient implementation of the overlap operator on multi-gpus"

Transcription

1 Efficient implementation of the overlap operator on multi-gpus Andrei Alexandru Mike Lujan, Craig Pelissier, Ben Gamari, Frank Lee SAAHPC University of Tennessee

2 Outline Motivation Overlap operator Multi-gpu Wilson-Dirac kernel Eigensolver and inverter Conclusions 2

3 Building blocks of matter Quarks are the constituents of matter Quarks interact strongly by exchanging gluons Peculiar properties: Confinement Asymptotic freedom (Nobel Prize 2004) Theory of strong interactions -- Quantum Chromodynamics (QCD)

4 Lattice QCD Replace space-time with a fourdimensional lattice Differential operators are replaced with finite difference operators Typical lattice size per dimension and times longer in the 4th dimension For example 24 3 x48= sites Typical project size ~ 1 Petaflop

5 Why overlap fermions on multi-gpus? Study QCD dynamics in the chiral regime Overlap fermions preserve chiral symmetry at finite lattice spacing Overlap fermions are computationally demanding Use GPUs since they have good memory bandwidth Memory requirements of overlap force us to use multiple GPUs 5

6 Lattice QCD QCD is a field theory U110 Ψ Α a 10 U111 Ψ Α a 11 U112 Ψ Α a 12 U113 Ψ Α a 13 U114 Ψ Α a 14 Lattice QCD defined on a 4D grid U15 U 0 10 U16 U 0 11 U17 U 0 12 U18 U 0 13 U19 U 0 14 quarks -- sites Ψ a Α 5 U 0 5 Ψ a Α 6 U 0 6 Ψ a Α 7 U 0 7 Ψ a Α 8 U 0 8 Ψ a Α 9 U 0 9 gluons -- links U10 Ψ a Α 0 U 0 0 U11 Ψ a Α 1 U 0 1 U12 Ψ a Α 2 U 0 2 U13 Ψ a Α 3 U 0 3 U14 Ψ a Α 4 U 0 4 Links are randomly generated according to dynamics 6

7 Wilson-Dirac operator Wilson fermions is one of the simplest discretizations It is numerically fast, very sparse It breaks chiral symmetry m + D/ D w =(ma + 4) 1 2 Tµ µ>0: (T µ ψ) n = U µ (n)ψ n+ˆµ (1 γ µ ), µ<0: (T µ ψ) n = U µ (n ˆµ) ψ n ˆµ (1 + γ µ ). Serves as a kernel for the overlap operator 0 ψ(x) ψ(y) 0 =(D 1 w ) x,y 7

8 Wilson-Dirac operator Y (n) =(MX)(n) =X(n) κ µ Vµ (n)x(n +ˆµ)+V (n ˆµ)X(n ˆµ) Wilson operator is used to multiply Wilson fields, 4x3 matrices living at every site on the lattice The value of Y at a site depends on the value of X at the same site and the neighboring sites (8) Each of the fields at the neighboring sites needs to be transported to the final site -- this involves a multiplication with a color matrix (3x3) and a spinor matrix (4x4) The color matrices differ from link to link whereas the spinor matrices depend only on direction The matrices and the vectors are all complex T Μ Ψ U 1,1 U 1,2 U 1,3 U 2,1 U 2,2 U 2,3 U 3,1 U 3,2 U 3,3 n,μ Ψ 1,1 Ψ 1,2 Ψ 1,3 Ψ 1,4 Ψ 2,1 Ψ 2,2 Ψ 2,3 Ψ 2,4 Ψ 3,1 Ψ 3,2 Ψ 3,3 Ψ 3,4 nμ 1,1 1,2 1,3 1,4 2,1 2,2 2,3 2,4 3,1 3,2 3,3 3,4 4,1 4,2 4,3 4,4 Μ

9 Overlap operator Overlap operator is dense Ε About 100 times more expensive than Wilson kernel The cost is proportional to conditioning number of (H w ) 2 and log δ D =1+γ 5 sign(h w ) H w = γ 5 D w sign(h w ) QP (Q 2 ) with Q = H w H w δ = max x [,1] 1 xp (x) 9

10 Requirements Overlap operator Wilson kernel + vector routines Hwilson eigensolver Propagator calculation Overlap inverter Overlap eigensolver 10

11 System architecture GPU Memory 140 GB/s 1-6GB CPU ~5GB/s Infiniband ~2 x 2.5GB/s GB/s 12-48GB Memory 11

12 Computational strategy We use one process per GPU and MPI for communication All data resides in GPU memory Lattice sites are split evenly between the nodes All data belonging to a particular site resides on the node that owns the sites The communication is mainly implemented via shifts and is overlapped if possible with computation 12

13 Vector routines Expression Templates + THRUST library = auto-generate optimized kernels φ αψ 1 + βψ 2 + γψ Non-reduction kernels scale perfectly Max bandwidth on M2070 with ECC on is about 85GB/s Reduction kernels have poor scaling -- small computational fraction Most poor scaling is due to poor single-node kernel performance on small vectors Bandwidth per GPU GBs vector addition scalar product GPU Count 13

14 Wilson-Dirac kernel 14

15 Wilson-Dirac kernel The cost of the Wilson-Dirac operator is 1368 flops/site: 600 multiplications (44%) and 768 additions (56%) -- balanced load The amount of data for one site computation is in: 8 spinors + 8 links (neighbors) + 1 spinor out: 1 spinor In double precision 3072 bytes/site and computational density is 1368flops/3072bytes = 0.45 flop/byte -- double that in single precision For 85GB/s max bandwidth we have max kernel performance: GFlops (double) and 76.5 GFlops (single) It has a fair amount of parallelism the 8 transports can be implemented in parallel each of these transports can be split in 2 parallel tasks 15

16 Calculation steps The communication time is overlapped with the computation time to hide latency 1. Gather: compute compressed fields and fill in communication buffers 2. Comm: initiate non-blocking communication 3. Bulk: compute interior points dslash 4. Scatter: finish communication and add results 16

17 Minimal surface Cut the lattice in hypercubes with the same dimensions The longest dimension is always cut first and an already cut dimension is preferred As the lattice is cut the boundary to interior ratio increases GPUs N int N boun dims x , 24, 24, x x , 24, 24, x x , 24, 24, x x , 24, 24, x x , 12, 24, x x , 12, 12, 12 17

18 Dslash anatomy 2 :Dslash bulk 1:Gather 2: gpu > cpu 3: cpu > cpu 4: cpu > gpu 5:Scatter PCI Infiniband PCI stream 1 stream 2 18

19 Dslash timing GPUs gather scatter gpu>cpu cpu>cpu cpu>gpu comm dslash bulk

20 Strong scaling for 24 3 x64 Performance per GPU GFLOPS GPU Count double precision performance model single precision 20

21 Comparison with other codes Our code QUDA 32 3 x x GPUs 32 GPUs 16 GPUs 32 GPUs double single single

22 Overlap operator 22

23 Sign approximation Polynomial approximation P (Q 2 )ψ = P (Q 2 )ψ = n i=1 n i=1 c i T i (Q 2 )ψ Rational approximation b i Q 2 + c i ψ Time s GPU Count double pass polynomial 23

24 Wilson-Dirac kernel performance comparison Performance GFLOPS GPU Equivalent Count CPU GPU 24

25 Performance comparison The GPU cluster uses 1GPU per node and QDR Infiniband interconnects The CPU machine is a Cray XT5 with dual hex-core AMD processors We compare the performance of 32GPUs (the target cluster dimension) vs 256 CPU-cores (optimal performance for CPU) 25

26 Overlap performance For 24 3 x64 lattice overlap operator matrixvector multiplication takes 1.1 s on 32GPUs On 256 cores Cray XT-5 it takes 3.3 s This translate into a ratio of 1GPU = 24CPUcores 26

27 Hwilson eigensolver 27

28 Small eigenspace dimension Λ Number of Eigenvectors Polynomial Order Number of Eigenvectors δ = Ae bn = λ /λ max 28

29 Eigensolvers We use implicitly restarted Arnoldi factorization The method requires storage for temporary vectors. For optimal convergence we need 2.5 times more vectors than required: k=2.5 l For efficiency we need to also code a matrix-matrix multiplication routine AV k = V k H k + f k e k with (e k ) n = δ k,n H k µ = QR V k QV k H k RQ + µ For each iteration we need k matrix-vector multiplications and k 2 vector orthogonalizations We use locking of the converged eigenvectors to accelerate convergence 29

30 Hwilson eigensolver We use Chebyshev acceleration of order 100; Arnoldi eigensolver converges in one iteration Compute 200 eigenvectors: storage required for 500 vectors = 85GB Total time: 0.27 hours on the GPU cluster vs 0.60 hours on the Cray XT-5 This corresponds to 1GPU=18 CPU-cores In situations with reduced GPU memory, use mixed mode: the eigensystem is store in CPU memory. This is feasible due to Chebyshev acceleration. In this mode the GPU code takes 0.43 hours 30

31 Overlap eigensolver 31

32 Overlap eigensystem Deflation speeds up inversions considerably One propagator = 12 inversions m π = 200 MeV: 12 x 2,000 = 24,000 deflation: 6, x 200 = 9,000 one propagator per config: 2.5 times speed-up Compute hermitian overlap eigenvectors and then rebuild overlap eigenvectors 32

33 Overlap eigensystem Compute 100 eigenvector pairs to precision On GPU cluster this takes 2.7 hours On the Cray machine this takes 10.6 hours This translates in 1GPU = 26 CPU-cores In the situation where memory is limited, we can use mixed mode: store the overlap Krylov space in CPU memory. The code takes 4 hours to converge in this case. 33

34 Overlap inverter 34

35 Overlap inverter We use m π = 200 MeV and a precision of 10-8 We use adaptive CG method which is 60% faster than regular CG We use a multi-shifted inverter with shifts We store the overlap eigensystem in CPU memory, Hwilson eigensystem in GPU memory and solutions in GPU memory The GPU cluster takes 0.52 hours vs 2.3 hours for the Cray machine The performance translates to 1GPU = 35 CPU-cores 35

36 Summary CPU GPU Hwilson eigensolver Overlap eigensolver expensive orthogonalization Chebyshev acceleration 2.5 x 200 vectors 200 Hwilson vectors 2.5 x 100 eigenpairs Overlap inverter 100 Overlap eigenpairs 200 Hwilson vectors + solutions (100) Pure GPU(32): 3.5 hours Cray XT5(256): 13.5 hours 36

37 Conclusions We showed how to efficiently implement overlap operator on GPUs For efficiency we need to store the data in GPU memory, which forces us to use GPUs in parallel For 24 3 x64 lattices of interest Wilson kernel scaling efficiency is 50% on 32 GPUs Scaling efficiency is better that CPU codes of equivalent performance The sign function needed for the overlap operator polynomial approximation is better both in terms of memory use and performance Most of the time is spent in eigensolvers. We use implicitly restarted Arnoldi eigensolvers. On systems with reduced memory a mixed strategy can be use with only 50-60% performance penalty. Overall, the GPU/CPU performance ratio for our codes is compatible with the ratio measured for the dslash routine. This result is not surprising since the most time consuming part of these codes is the dslash routine, but it takes careful planning to work around all possible bottlenecks. 37

38 Outlook Most of the time is spent in overlap eigensolver Chebyshev acceleration: preliminary 20-30% boost Mixed precision -- different eigensolver method Use different inversion/deflation strategy 38

Tuning And Understanding MILC Performance In Cray XK6 GPU Clusters. Mike Showerman, Guochun Shi Steven Gottlieb

Tuning And Understanding MILC Performance In Cray XK6 GPU Clusters. Mike Showerman, Guochun Shi Steven Gottlieb Tuning And Understanding MILC Performance In Cray XK6 GPU Clusters Mike Showerman, Guochun Shi Steven Gottlieb Outline Background Lattice QCD and MILC GPU and Cray XK6 node architecture Implementation

More information

Case Study: Quantum Chromodynamics

Case Study: Quantum Chromodynamics Case Study: Quantum Chromodynamics Michael Clark Harvard University with R. Babich, K. Barros, R. Brower, J. Chen and C. Rebbi Outline Primer to QCD QCD on a GPU Mixed Precision Solvers Multigrid solver

More information

Accelerating Quantum Chromodynamics Calculations with GPUs

Accelerating Quantum Chromodynamics Calculations with GPUs Accelerating Quantum Chromodynamics Calculations with GPUs Guochun Shi, Steven Gottlieb, Aaron Torok, Volodymyr Kindratenko NCSA & Indiana University National Center for Supercomputing Applications University

More information

Measuring freeze-out parameters on the Bielefeld GPU cluster

Measuring freeze-out parameters on the Bielefeld GPU cluster Measuring freeze-out parameters on the Bielefeld GPU cluster Outline Fluctuations and the QCD phase diagram Fluctuations from Lattice QCD The Bielefeld hybrid GPU cluster Freeze-out conditions from QCD

More information

Deflation for inversion with multiple right-hand sides in QCD

Deflation for inversion with multiple right-hand sides in QCD Deflation for inversion with multiple right-hand sides in QCD A Stathopoulos 1, A M Abdel-Rehim 1 and K Orginos 2 1 Department of Computer Science, College of William and Mary, Williamsburg, VA 23187 2

More information

arxiv: v1 [hep-lat] 10 Jul 2012

arxiv: v1 [hep-lat] 10 Jul 2012 Hybrid Monte Carlo with Wilson Dirac operator on the Fermi GPU Abhijit Chakrabarty Electra Design Automation, SDF Building, SaltLake Sec-V, Kolkata - 700091. Pushan Majumdar Dept. of Theoretical Physics,

More information

Last Time. Social Network Graphs Betweenness. Graph Laplacian. Girvan-Newman Algorithm. Spectral Bisection

Last Time. Social Network Graphs Betweenness. Graph Laplacian. Girvan-Newman Algorithm. Spectral Bisection Eigenvalue Problems Last Time Social Network Graphs Betweenness Girvan-Newman Algorithm Graph Laplacian Spectral Bisection λ 2, w 2 Today Small deviation into eigenvalue problems Formulation Standard eigenvalue

More information

Catalytic effects of monopole in QCD

Catalytic effects of monopole in QCD Catalytic effects of monopole in QCD Masayasu Hasegawa Bogoliubov Laboratory of Theoretical Physics, Joint Institute for Nuclear Research Lattice and Functional Techniques for Exploration of Phase Structure

More information

Lattice Boltzmann simulations on heterogeneous CPU-GPU clusters

Lattice Boltzmann simulations on heterogeneous CPU-GPU clusters Lattice Boltzmann simulations on heterogeneous CPU-GPU clusters H. Köstler 2nd International Symposium Computer Simulations on GPU Freudenstadt, 29.05.2013 1 Contents Motivation walberla software concepts

More information

arxiv: v1 [hep-lat] 8 Nov 2014

arxiv: v1 [hep-lat] 8 Nov 2014 Staggered Dslash Performance on Intel Xeon Phi Architecture arxiv:1411.2087v1 [hep-lat] 8 Nov 2014 Department of Physics, Indiana University, Bloomington IN 47405, USA E-mail: ruizli AT umail.iu.edu Steven

More information

Block Iterative Eigensolvers for Sequences of Dense Correlated Eigenvalue Problems

Block Iterative Eigensolvers for Sequences of Dense Correlated Eigenvalue Problems Mitglied der Helmholtz-Gemeinschaft Block Iterative Eigensolvers for Sequences of Dense Correlated Eigenvalue Problems Birkbeck University, London, June the 29th 2012 Edoardo Di Napoli Motivation and Goals

More information

Claude Tadonki. MINES ParisTech PSL Research University Centre de Recherche Informatique

Claude Tadonki. MINES ParisTech PSL Research University Centre de Recherche Informatique Claude Tadonki MINES ParisTech PSL Research University Centre de Recherche Informatique claude.tadonki@mines-paristech.fr Monthly CRI Seminar MINES ParisTech - CRI June 06, 2016, Fontainebleau (France)

More information

Unraveling the mysteries of quarks with hundreds of GPUs. Ron Babich NVIDIA

Unraveling the mysteries of quarks with hundreds of GPUs. Ron Babich NVIDIA Unraveling the mysteries of quarks with hundreds of GPUs Ron Babich NVIDIA Collaborators and QUDA developers Kip Barros (LANL) Rich Brower (Boston University) Mike Clark (NVIDIA) Justin Foley (University

More information

DELFT UNIVERSITY OF TECHNOLOGY

DELFT UNIVERSITY OF TECHNOLOGY DELFT UNIVERSITY OF TECHNOLOGY REPORT -09 Computational and Sensitivity Aspects of Eigenvalue-Based Methods for the Large-Scale Trust-Region Subproblem Marielba Rojas, Bjørn H. Fotland, and Trond Steihaug

More information

Lattice Quantum Chromodynamics on the MIC architectures

Lattice Quantum Chromodynamics on the MIC architectures Lattice Quantum Chromodynamics on the MIC architectures Piotr Korcyl Universität Regensburg Intel MIC Programming Workshop @ LRZ 28 June 2017 Piotr Korcyl Lattice Quantum Chromodynamics on the MIC 1/ 25

More information

Performance Analysis of Lattice QCD Application with APGAS Programming Model

Performance Analysis of Lattice QCD Application with APGAS Programming Model Performance Analysis of Lattice QCD Application with APGAS Programming Model Koichi Shirahata 1, Jun Doi 2, Mikio Takeuchi 2 1: Tokyo Institute of Technology 2: IBM Research - Tokyo Programming Models

More information

Accelerating linear algebra computations with hybrid GPU-multicore systems.

Accelerating linear algebra computations with hybrid GPU-multicore systems. Accelerating linear algebra computations with hybrid GPU-multicore systems. Marc Baboulin INRIA/Université Paris-Sud joint work with Jack Dongarra (University of Tennessee and Oak Ridge National Laboratory)

More information

A Jacobi Davidson Method with a Multigrid Solver for the Hermitian Wilson-Dirac Operator

A Jacobi Davidson Method with a Multigrid Solver for the Hermitian Wilson-Dirac Operator A Jacobi Davidson Method with a Multigrid Solver for the Hermitian Wilson-Dirac Operator Artur Strebel Bergische Universität Wuppertal August 3, 2016 Joint Work This project is joint work with: Gunnar

More information

Lattice QCD at non-zero temperature and density

Lattice QCD at non-zero temperature and density Lattice QCD at non-zero temperature and density Frithjof Karsch Bielefeld University & Brookhaven National Laboratory QCD in a nutshell, non-perturbative physics, lattice-regularized QCD, Monte Carlo simulations

More information

ab initio Electronic Structure Calculations

ab initio Electronic Structure Calculations ab initio Electronic Structure Calculations New scalability frontiers using the BG/L Supercomputer C. Bekas, A. Curioni and W. Andreoni IBM, Zurich Research Laboratory Rueschlikon 8803, Switzerland ab

More information

The symmetries of QCD (and consequences)

The symmetries of QCD (and consequences) The symmetries of QCD (and consequences) Sinéad M. Ryan Trinity College Dublin Quantum Universe Symposium, Groningen, March 2018 Understand nature in terms of fundamental building blocks The Rumsfeld

More information

CME342 Parallel Methods in Numerical Analysis. Matrix Computation: Iterative Methods II. Sparse Matrix-vector Multiplication.

CME342 Parallel Methods in Numerical Analysis. Matrix Computation: Iterative Methods II. Sparse Matrix-vector Multiplication. CME342 Parallel Methods in Numerical Analysis Matrix Computation: Iterative Methods II Outline: CG & its parallelization. Sparse Matrix-vector Multiplication. 1 Basic iterative methods: Ax = b r = b Ax

More information

QR Factorization of Tall and Skinny Matrices in a Grid Computing Environment

QR Factorization of Tall and Skinny Matrices in a Grid Computing Environment QR Factorization of Tall and Skinny Matrices in a Grid Computing Environment Emmanuel AGULLO (INRIA / LaBRI) Camille COTI (Iowa State University) Jack DONGARRA (University of Tennessee) Thomas HÉRAULT

More information

arxiv: v1 [hep-lat] 19 Jul 2009

arxiv: v1 [hep-lat] 19 Jul 2009 arxiv:0907.3261v1 [hep-lat] 19 Jul 2009 Application of preconditioned block BiCGGR to the Wilson-Dirac equation with multiple right-hand sides in lattice QCD Abstract H. Tadano a,b, Y. Kuramashi c,b, T.

More information

Introduction to numerical computations on the GPU

Introduction to numerical computations on the GPU Introduction to numerical computations on the GPU Lucian Covaci http://lucian.covaci.org/cuda.pdf Tuesday 1 November 11 1 2 Outline: NVIDIA Tesla and Geforce video cards: architecture CUDA - C: programming

More information

The Blue Gene/P at Jülich Case Study & Optimization. W.Frings, Forschungszentrum Jülich,

The Blue Gene/P at Jülich Case Study & Optimization. W.Frings, Forschungszentrum Jülich, The Blue Gene/P at Jülich Case Study & Optimization W.Frings, Forschungszentrum Jülich, 26.08.2008 Jugene Case-Studies: Overview Case Study: PEPC Case Study: racoon Case Study: QCD CPU0CPU3 CPU1CPU2 2

More information

Efficient algorithms for symmetric tensor contractions

Efficient algorithms for symmetric tensor contractions Efficient algorithms for symmetric tensor contractions Edgar Solomonik 1 Department of EECS, UC Berkeley Oct 22, 2013 1 / 42 Edgar Solomonik Symmetric tensor contractions 1/ 42 Motivation The goal is to

More information

Accelerating Linear Algebra on Heterogeneous Architectures of Multicore and GPUs using MAGMA and DPLASMA and StarPU Schedulers

Accelerating Linear Algebra on Heterogeneous Architectures of Multicore and GPUs using MAGMA and DPLASMA and StarPU Schedulers UT College of Engineering Tutorial Accelerating Linear Algebra on Heterogeneous Architectures of Multicore and GPUs using MAGMA and DPLASMA and StarPU Schedulers Stan Tomov 1, George Bosilca 1, and Cédric

More information

Parallelization of the Molecular Orbital Program MOS-F

Parallelization of the Molecular Orbital Program MOS-F Parallelization of the Molecular Orbital Program MOS-F Akira Asato, Satoshi Onodera, Yoshie Inada, Elena Akhmatskaya, Ross Nobes, Azuma Matsuura, Atsuya Takahashi November 2003 Fujitsu Laboratories of

More information

Lattice QCD with Domain Decomposition on Intel R Xeon Phi TM

Lattice QCD with Domain Decomposition on Intel R Xeon Phi TM Lattice QCD with Domain Decomposition on Intel R Xeon Phi TM Co-Processors Simon Heybrock, Bálint Joó, Dhiraj D. Kalamkar, Mikhail Smelyanskiy, Karthikeyan Vaidyanathan, Tilo Wettig, and Pradeep Dubey

More information

Finite-choice algorithm optimization in Conjugate Gradients

Finite-choice algorithm optimization in Conjugate Gradients Finite-choice algorithm optimization in Conjugate Gradients Jack Dongarra and Victor Eijkhout January 2003 Abstract We present computational aspects of mathematically equivalent implementations of the

More information

Domain Wall Fermion Simulations with the Exact One-Flavor Algorithm

Domain Wall Fermion Simulations with the Exact One-Flavor Algorithm Domain Wall Fermion Simulations with the Exact One-Flavor Algorithm David Murphy Columbia University The 34th International Symposium on Lattice Field Theory (Southampton, UK) July 27th, 2016 D. Murphy

More information

SP-CNN: A Scalable and Programmable CNN-based Accelerator. Dilan Manatunga Dr. Hyesoon Kim Dr. Saibal Mukhopadhyay

SP-CNN: A Scalable and Programmable CNN-based Accelerator. Dilan Manatunga Dr. Hyesoon Kim Dr. Saibal Mukhopadhyay SP-CNN: A Scalable and Programmable CNN-based Accelerator Dilan Manatunga Dr. Hyesoon Kim Dr. Saibal Mukhopadhyay Motivation Power is a first-order design constraint, especially for embedded devices. Certain

More information

Parallel Asynchronous Hybrid Krylov Methods for Minimization of Energy Consumption. Langshi CHEN 1,2,3 Supervised by Serge PETITON 2

Parallel Asynchronous Hybrid Krylov Methods for Minimization of Energy Consumption. Langshi CHEN 1,2,3 Supervised by Serge PETITON 2 1 / 23 Parallel Asynchronous Hybrid Krylov Methods for Minimization of Energy Consumption Langshi CHEN 1,2,3 Supervised by Serge PETITON 2 Maison de la Simulation Lille 1 University CNRS March 18, 2013

More information

arxiv: v1 [hep-lat] 28 Aug 2014 Karthee Sivalingam

arxiv: v1 [hep-lat] 28 Aug 2014 Karthee Sivalingam Clover Action for Blue Gene-Q and Iterative solvers for DWF arxiv:18.687v1 [hep-lat] 8 Aug 1 Department of Meteorology, University of Reading, Reading, UK E-mail: K.Sivalingam@reading.ac.uk P.A. Boyle

More information

Parallel Eigensolver Performance on High Performance Computers 1

Parallel Eigensolver Performance on High Performance Computers 1 Parallel Eigensolver Performance on High Performance Computers 1 Andrew Sunderland STFC Daresbury Laboratory, Warrington, UK Abstract Eigenvalue and eigenvector computations arise in a wide range of scientific

More information

Parallel Eigensolver Performance on High Performance Computers

Parallel Eigensolver Performance on High Performance Computers Parallel Eigensolver Performance on High Performance Computers Andrew Sunderland Advanced Research Computing Group STFC Daresbury Laboratory CUG 2008 Helsinki 1 Summary (Briefly) Introduce parallel diagonalization

More information

Matrix Algorithms. Volume II: Eigensystems. G. W. Stewart H1HJ1L. University of Maryland College Park, Maryland

Matrix Algorithms. Volume II: Eigensystems. G. W. Stewart H1HJ1L. University of Maryland College Park, Maryland Matrix Algorithms Volume II: Eigensystems G. W. Stewart University of Maryland College Park, Maryland H1HJ1L Society for Industrial and Applied Mathematics Philadelphia CONTENTS Algorithms Preface xv xvii

More information

Comparing iterative methods to compute the overlap Dirac operator at nonzero chemical potential

Comparing iterative methods to compute the overlap Dirac operator at nonzero chemical potential Comparing iterative methods to compute the overlap Dirac operator at nonzero chemical potential, Tobias Breu, and Tilo Wettig Institute for Theoretical Physics, University of Regensburg, 93040 Regensburg,

More information

Direct Self-Consistent Field Computations on GPU Clusters

Direct Self-Consistent Field Computations on GPU Clusters Direct Self-Consistent Field Computations on GPU Clusters Guochun Shi, Volodymyr Kindratenko National Center for Supercomputing Applications University of Illinois at UrbanaChampaign Ivan Ufimtsev, Todd

More information

Heterogeneous programming for hybrid CPU-GPU systems: Lessons learned from computational chemistry

Heterogeneous programming for hybrid CPU-GPU systems: Lessons learned from computational chemistry Heterogeneous programming for hybrid CPU-GPU systems: Lessons learned from computational chemistry and Eugene DePrince Argonne National Laboratory (LCF and CNM) (Eugene moved to Georgia Tech last week)

More information

Arnoldi Methods in SLEPc

Arnoldi Methods in SLEPc Scalable Library for Eigenvalue Problem Computations SLEPc Technical Report STR-4 Available at http://slepc.upv.es Arnoldi Methods in SLEPc V. Hernández J. E. Román A. Tomás V. Vidal Last update: October,

More information

Multigrid Methods for Linear Systems with Stochastic Entries Arising in Lattice QCD. Andreas Frommer

Multigrid Methods for Linear Systems with Stochastic Entries Arising in Lattice QCD. Andreas Frommer Methods for Linear Systems with Stochastic Entries Arising in Lattice QCD Andreas Frommer Collaborators The Dirac operator James Brannick, Penn State University Björn Leder, Humboldt Universität Berlin

More information

GPU accelerated Arnoldi solver for small batched matrix

GPU accelerated Arnoldi solver for small batched matrix 15. 09. 22 GPU accelerated Arnoldi solver for small batched matrix Samsung Advanced Institute of Technology Hyung-Jin Kim Contents - Eigen value problems - Solution - Arnoldi Algorithm - Target - CUDA

More information

CRYSTAL in parallel: replicated and distributed (MPP) data. Why parallel?

CRYSTAL in parallel: replicated and distributed (MPP) data. Why parallel? CRYSTAL in parallel: replicated and distributed (MPP) data Roberto Orlando Dipartimento di Chimica Università di Torino Via Pietro Giuria 5, 10125 Torino (Italy) roberto.orlando@unito.it 1 Why parallel?

More information

Solving Quadratic Equations with XL on Parallel Architectures

Solving Quadratic Equations with XL on Parallel Architectures Solving Quadratic Equations with XL on Parallel Architectures Cheng Chen-Mou 1, Chou Tung 2, Ni Ru-Ben 2, Yang Bo-Yin 2 1 National Taiwan University 2 Academia Sinica Taipei, Taiwan Leuven, Sept. 11, 2012

More information

arxiv: v1 [hep-lat] 2 May 2012

arxiv: v1 [hep-lat] 2 May 2012 A CG Method for Multiple Right Hand Sides and Multiple Shifts in Lattice QCD Calculations arxiv:1205.0359v1 [hep-lat] 2 May 2012 Fachbereich C, Mathematik und Naturwissenschaften, Bergische Universität

More information

Static-scheduling and hybrid-programming in SuperLU DIST on multicore cluster systems

Static-scheduling and hybrid-programming in SuperLU DIST on multicore cluster systems Static-scheduling and hybrid-programming in SuperLU DIST on multicore cluster systems Ichitaro Yamazaki University of Tennessee, Knoxville Xiaoye Sherry Li Lawrence Berkeley National Laboratory MS49: Sparse

More information

ELSI: A Unified Software Interface for Kohn-Sham Electronic Structure Solvers

ELSI: A Unified Software Interface for Kohn-Sham Electronic Structure Solvers ELSI: A Unified Software Interface for Kohn-Sham Electronic Structure Solvers Victor Yu and the ELSI team Department of Mechanical Engineering & Materials Science Duke University Kohn-Sham Density-Functional

More information

Preconditioned Parallel Block Jacobi SVD Algorithm

Preconditioned Parallel Block Jacobi SVD Algorithm Parallel Numerics 5, 15-24 M. Vajteršic, R. Trobec, P. Zinterhof, A. Uhl (Eds.) Chapter 2: Matrix Algebra ISBN 961-633-67-8 Preconditioned Parallel Block Jacobi SVD Algorithm Gabriel Okša 1, Marián Vajteršic

More information

Appendix A Notational Conventions

Appendix A Notational Conventions Appendix A Notational Conventions Throughout the book we use Einstein s implicit summation convention: repeated indices in an expression are automatically summed over. We work in natural units where the

More information

Numerical Methods I Eigenvalue Problems

Numerical Methods I Eigenvalue Problems Numerical Methods I Eigenvalue Problems Aleksandar Donev Courant Institute, NYU 1 donev@courant.nyu.edu 1 MATH-GA 2011.003 / CSCI-GA 2945.003, Fall 2014 October 2nd, 2014 A. Donev (Courant Institute) Lecture

More information

GPU-accelerated Computing at Scale. Dirk Pleiter I GTC Europe 10 October 2018

GPU-accelerated Computing at Scale. Dirk Pleiter I GTC Europe 10 October 2018 GPU-accelerated Computing at Scale irk Pleiter I GTC Europe 10 October 2018 Outline Supercomputers at JSC Future science challenges Outlook and conclusions 2 3 Supercomputers at JSC JUQUEEN (until 2018)

More information

Review of lattice EFT methods and connections to lattice QCD

Review of lattice EFT methods and connections to lattice QCD Review of lattice EFT methods and connections to lattice QCD Dean Lee Michigan State University uclear Lattice EFT Collaboration Multi-Hadron Systems from Lattice QCD Institute for uclear Theory Feburary

More information

hypre MG for LQFT Chris Schroeder LLNL - Physics Division

hypre MG for LQFT Chris Schroeder LLNL - Physics Division hypre MG for LQFT Chris Schroeder LLNL - Physics Division This work performed under the auspices of the U.S. Department of Energy by under Contract DE-??? Contributors hypre Team! Rob Falgout (project

More information

Computation of eigenvalues and singular values Recall that your solutions to these questions will not be collected or evaluated.

Computation of eigenvalues and singular values Recall that your solutions to these questions will not be collected or evaluated. Math 504, Homework 5 Computation of eigenvalues and singular values Recall that your solutions to these questions will not be collected or evaluated 1 Find the eigenvalues and the associated eigenspaces

More information

ACCELERATING SPARSE CHOLESKY FACTORIZATION ON THE GPU

ACCELERATING SPARSE CHOLESKY FACTORIZATION ON THE GPU ACCELERATING SPARSE CHOLESKY FACTORIZATION ON THE GPU STEVE RENNICH, SR. ENGINEER, NVIDIA DEVELOPER TECHNOLOGY DARKO STOSIC, PHD CANDIDATE, UNIV. FEDERAL DE PERNAMBUCO TIM DAVIS, PROFESSOR, CSE, TEXAS

More information

Jacobi-Based Eigenvalue Solver on GPU. Lung-Sheng Chien, NVIDIA

Jacobi-Based Eigenvalue Solver on GPU. Lung-Sheng Chien, NVIDIA Jacobi-Based Eigenvalue Solver on GPU Lung-Sheng Chien, NVIDIA lchien@nvidia.com Outline Symmetric eigenvalue solver Experiment Applications Conclusions Symmetric eigenvalue solver The standard form is

More information

Welcome to MCS 572. content and organization expectations of the course. definition and classification

Welcome to MCS 572. content and organization expectations of the course. definition and classification Welcome to MCS 572 1 About the Course content and organization expectations of the course 2 Supercomputing definition and classification 3 Measuring Performance speedup and efficiency Amdahl s Law Gustafson

More information

A knowledge-based approach to high-performance computing in ab initio simulations.

A knowledge-based approach to high-performance computing in ab initio simulations. Mitglied der Helmholtz-Gemeinschaft A knowledge-based approach to high-performance computing in ab initio simulations. AICES Advisory Board Meeting. July 14th 2014 Edoardo Di Napoli Academic background

More information

Antti-Pekka Hynninen, 5/10/2017, GTC2017, San Jose CA

Antti-Pekka Hynninen, 5/10/2017, GTC2017, San Jose CA S7255: CUTT: A HIGH- PERFORMANCE TENSOR TRANSPOSE LIBRARY FOR GPUS Antti-Pekka Hynninen, 5/10/2017, GTC2017, San Jose CA MOTIVATION Tensor contractions are the most computationally intensive part of quantum

More information

Accelerating computation of eigenvectors in the nonsymmetric eigenvalue problem

Accelerating computation of eigenvectors in the nonsymmetric eigenvalue problem Accelerating computation of eigenvectors in the nonsymmetric eigenvalue problem Mark Gates 1, Azzam Haidar 1, and Jack Dongarra 1,2,3 1 University of Tennessee, Knoxville, TN, USA 2 Oak Ridge National

More information

MAA507, Power method, QR-method and sparse matrix representation.

MAA507, Power method, QR-method and sparse matrix representation. ,, and representation. February 11, 2014 Lecture 7: Overview, Today we will look at:.. If time: A look at representation and fill in. Why do we need numerical s? I think everyone have seen how time consuming

More information

Computational Methods. Eigenvalues and Singular Values

Computational Methods. Eigenvalues and Singular Values Computational Methods Eigenvalues and Singular Values Manfred Huber 2010 1 Eigenvalues and Singular Values Eigenvalues and singular values describe important aspects of transformations and of data relations

More information

Accelerating and Scaling Lanczos Diagonalization with GPGPU

Accelerating and Scaling Lanczos Diagonalization with GPGPU Accelerating and Scaling Lanczos Diagonalization with GPGPU Bill Brouwer, Filippo Spiga, Pierre-Yves Taunay, Sreejith GJ Nvidia GTC 2013 Outline Introduction Applications QE FQHE Theory Diagonalization

More information

Accelerating computation of eigenvectors in the dense nonsymmetric eigenvalue problem

Accelerating computation of eigenvectors in the dense nonsymmetric eigenvalue problem Accelerating computation of eigenvectors in the dense nonsymmetric eigenvalue problem Mark Gates 1, Azzam Haidar 1, and Jack Dongarra 1,2,3 1 University of Tennessee, Knoxville, TN, USA 2 Oak Ridge National

More information

arxiv: v1 [cs.dc] 4 Sep 2014

arxiv: v1 [cs.dc] 4 Sep 2014 and NVIDIA R GPUs arxiv:1409.1510v1 [cs.dc] 4 Sep 2014 O. Kaczmarek, C. Schmidt and P. Steinbrecher Fakultät für Physik, Universität Bielefeld, D-33615 Bielefeld, Germany E-mail: okacz, schmidt, p.steinbrecher@physik.uni-bielefeld.de

More information

A Numerical QCD Hello World

A Numerical QCD Hello World A Numerical QCD Hello World Bálint Thomas Jefferson National Accelerator Facility Newport News, VA, USA INT Summer School On Lattice QCD, 2007 What is involved in a Lattice Calculation What is a lattice

More information

ILAS 2017 July 25, 2017

ILAS 2017 July 25, 2017 Polynomial and rational filtering for eigenvalue problems and the EVSL project Yousef Saad Department of Computer Science and Engineering University of Minnesota ILAS 217 July 25, 217 Large eigenvalue

More information

A Quantum Chemistry Domain-Specific Language for Heterogeneous Clusters

A Quantum Chemistry Domain-Specific Language for Heterogeneous Clusters A Quantum Chemistry Domain-Specific Language for Heterogeneous Clusters ANTONINO TUMEO, ORESTE VILLA Collaborators: Karol Kowalski, Sriram Krishnamoorthy, Wenjing Ma, Simone Secchi May 15, 2012 1 Outline!

More information

Computing least squares condition numbers on hybrid multicore/gpu systems

Computing least squares condition numbers on hybrid multicore/gpu systems Computing least squares condition numbers on hybrid multicore/gpu systems M. Baboulin and J. Dongarra and R. Lacroix Abstract This paper presents an efficient computation for least squares conditioning

More information

The Removal of Critical Slowing Down. Lattice College of William and Mary

The Removal of Critical Slowing Down. Lattice College of William and Mary The Removal of Critical Slowing Down Lattice 2008 College of William and Mary Michael Clark Boston University James Brannick, Rich Brower, Tom Manteuffel, Steve McCormick, James Osborn, Claudio Rebbi 1

More information

Lecture 11: CMSC 878R/AMSC698R. Iterative Methods An introduction. Outline. Inverse, LU decomposition, Cholesky, SVD, etc.

Lecture 11: CMSC 878R/AMSC698R. Iterative Methods An introduction. Outline. Inverse, LU decomposition, Cholesky, SVD, etc. Lecture 11: CMSC 878R/AMSC698R Iterative Methods An introduction Outline Direct Solution of Linear Systems Inverse, LU decomposition, Cholesky, SVD, etc. Iterative methods for linear systems Why? Matrix

More information

lattice QCD and the hadron spectrum Jozef Dudek ODU/JLab

lattice QCD and the hadron spectrum Jozef Dudek ODU/JLab lattice QCD and the hadron spectrum Jozef Dudek ODU/JLab the light meson spectrum relatively simple models of hadrons: bound states of constituent quarks and antiquarks the quark model empirical meson

More information

The Gauge Principle Contents Quantum Electrodynamics SU(N) Gauge Theory Global Gauge Transformations Local Gauge Transformations Dynamics of Field Ten

The Gauge Principle Contents Quantum Electrodynamics SU(N) Gauge Theory Global Gauge Transformations Local Gauge Transformations Dynamics of Field Ten Lecture 4 QCD as a Gauge Theory Adnan Bashir, IFM, UMSNH, Mexico August 2013 Hermosillo Sonora The Gauge Principle Contents Quantum Electrodynamics SU(N) Gauge Theory Global Gauge Transformations Local

More information

Opportunities for ELPA to Accelerate the Solution of the Bethe-Salpeter Eigenvalue Problem

Opportunities for ELPA to Accelerate the Solution of the Bethe-Salpeter Eigenvalue Problem Opportunities for ELPA to Accelerate the Solution of the Bethe-Salpeter Eigenvalue Problem Peter Benner, Andreas Marek, Carolin Penke August 16, 2018 ELSI Workshop 2018 Partners: The Problem The Bethe-Salpeter

More information

AMS526: Numerical Analysis I (Numerical Linear Algebra) Lecture 23: GMRES and Other Krylov Subspace Methods; Preconditioning

AMS526: Numerical Analysis I (Numerical Linear Algebra) Lecture 23: GMRES and Other Krylov Subspace Methods; Preconditioning AMS526: Numerical Analysis I (Numerical Linear Algebra) Lecture 23: GMRES and Other Krylov Subspace Methods; Preconditioning Xiangmin Jiao SUNY Stony Brook Xiangmin Jiao Numerical Analysis I 1 / 18 Outline

More information

A refined Lanczos method for computing eigenvalues and eigenvectors of unsymmetric matrices

A refined Lanczos method for computing eigenvalues and eigenvectors of unsymmetric matrices A refined Lanczos method for computing eigenvalues and eigenvectors of unsymmetric matrices Jean Christophe Tremblay and Tucker Carrington Chemistry Department Queen s University 23 août 2007 We want to

More information

Solution of eigenvalue problems. Subspace iteration, The symmetric Lanczos algorithm. Harmonic Ritz values, Jacobi-Davidson s method

Solution of eigenvalue problems. Subspace iteration, The symmetric Lanczos algorithm. Harmonic Ritz values, Jacobi-Davidson s method Solution of eigenvalue problems Introduction motivation Projection methods for eigenvalue problems Subspace iteration, The symmetric Lanczos algorithm Nonsymmetric Lanczos procedure; Implicit restarts

More information

REVIEW. Quantum electrodynamics (QED) Quantum electrodynamics is a theory of photons interacting with the electrons and positrons of a Dirac field:

REVIEW. Quantum electrodynamics (QED) Quantum electrodynamics is a theory of photons interacting with the electrons and positrons of a Dirac field: Quantum electrodynamics (QED) based on S-58 Quantum electrodynamics is a theory of photons interacting with the electrons and positrons of a Dirac field: Noether current of the lagrangian for a free Dirac

More information

Low fermionic eigenmode dominance in QCD on the lattice

Low fermionic eigenmode dominance in QCD on the lattice FACHBEREICH PHYSIK BERGISCHE UNIVERSITÄT GESAMTHOCHSCHULE WUPPERTAL Low fermionic eigenmode dominance in QCD on the lattice Dissertation zur Erlangung des Doktorgrades des Fachbereichs Physik der Bergischen

More information

Parallel Longest Common Subsequence using Graphics Hardware

Parallel Longest Common Subsequence using Graphics Hardware Parallel Longest Common Subsequence using Graphics Hardware John Kloetzli rian Strege Jonathan Decker Dr. Marc Olano Presented by: rian Strege 1 Overview Introduction Problem Statement ackground and Related

More information

H ψ = E ψ. Introduction to Exact Diagonalization. Andreas Läuchli, New states of quantum matter MPI für Physik komplexer Systeme - Dresden

H ψ = E ψ. Introduction to Exact Diagonalization. Andreas Läuchli, New states of quantum matter MPI für Physik komplexer Systeme - Dresden H ψ = E ψ Introduction to Exact Diagonalization Andreas Läuchli, New states of quantum matter MPI für Physik komplexer Systeme - Dresden http://www.pks.mpg.de/~aml laeuchli@comp-phys.org Simulations of

More information

Parallel programming practices for the solution of Sparse Linear Systems (motivated by computational physics and graphics)

Parallel programming practices for the solution of Sparse Linear Systems (motivated by computational physics and graphics) Parallel programming practices for the solution of Sparse Linear Systems (motivated by computational physics and graphics) Eftychios Sifakis CS758 Guest Lecture - 19 Sept 2012 Introduction Linear systems

More information

A Parallel Algorithm for Computing the Extremal Eigenvalues of Very Large Sparse Matrices*

A Parallel Algorithm for Computing the Extremal Eigenvalues of Very Large Sparse Matrices* A Parallel Algorithm for Computing the Extremal Eigenvalues of Very Large Sparse Matrices* Fredrik Manne Department of Informatics, University of Bergen, N-5020 Bergen, Norway Fredrik. Manne@ii. uib. no

More information

Cold and dense QCD matter

Cold and dense QCD matter Cold and dense QCD matter GCOE sympodium Feb. 15, 2010 Yoshimasa Hidaka Quantum ChromoDynamics Atom Electron 10-10 m Quantum ChromoDynamics Atom Nucleon Electron 10-10 m 10-15 m Quantum ElectroDynamics

More information

AMS526: Numerical Analysis I (Numerical Linear Algebra for Computational and Data Sciences)

AMS526: Numerical Analysis I (Numerical Linear Algebra for Computational and Data Sciences) AMS526: Numerical Analysis I (Numerical Linear Algebra for Computational and Data Sciences) Lecture 19: Computing the SVD; Sparse Linear Systems Xiangmin Jiao Stony Brook University Xiangmin Jiao Numerical

More information

QCDOC A Specialized Computer for Particle Physics

QCDOC A Specialized Computer for Particle Physics QCDOC A Specialized Computer for Particle Physics Supercomputers for Science across the Atlantic May 19, 2005 Norman H. Christ Columbia University Outline Physics overview Computer design opportunities

More information

Cyclops Tensor Framework

Cyclops Tensor Framework Cyclops Tensor Framework Edgar Solomonik Department of EECS, Computer Science Division, UC Berkeley March 17, 2014 1 / 29 Edgar Solomonik Cyclops Tensor Framework 1/ 29 Definition of a tensor A rank r

More information

Algebraic Multi-Grid solver for lattice QCD on Exascale hardware: Intel Xeon Phi

Algebraic Multi-Grid solver for lattice QCD on Exascale hardware: Intel Xeon Phi Available online at www.prace-ri.eu Partnership for Advanced Computing in Europe Algebraic Multi-Grid solver for lattice QCD on Exascale hardware: Intel Xeon Phi A. Abdel-Rehim aa, G. Koutsou a, C. Urbach

More information

Derivation of Electro Weak Unification and Final Form of Standard Model with QCD and Gluons 1 W W W 3

Derivation of Electro Weak Unification and Final Form of Standard Model with QCD and Gluons 1 W W W 3 Derivation of Electro Weak Unification and Final Form of Standard Model with QCD and Gluons 1 W 1 + 2 W 2 + 3 W 3 Substitute B = cos W A + sin W Z 0 Sum over first generation particles. up down Left handed

More information

Large-N Quantum Field Theories and Nonlinear Random Processes

Large-N Quantum Field Theories and Nonlinear Random Processes Large-N Quantum Field Theories and Nonlinear Random Processes Pavel Buividovich (ITEP, Moscow and JINR, Dubna) ITEP Lattice Seminar, 16.09.2010 Motivation Problems for modern Lattice QCD simulations(based

More information

Eigenvalue Problems. Eigenvalue problems occur in many areas of science and engineering, such as structural analysis

Eigenvalue Problems. Eigenvalue problems occur in many areas of science and engineering, such as structural analysis Eigenvalue Problems Eigenvalue problems occur in many areas of science and engineering, such as structural analysis Eigenvalues also important in analyzing numerical methods Theory and algorithms apply

More information

GPU Acceleration of Cutoff Pair Potentials for Molecular Modeling Applications

GPU Acceleration of Cutoff Pair Potentials for Molecular Modeling Applications GPU Acceleration of Cutoff Pair Potentials for Molecular Modeling Applications Christopher Rodrigues, David J. Hardy, John E. Stone, Klaus Schulten, Wen-Mei W. Hwu University of Illinois at Urbana-Champaign

More information

Index. for generalized eigenvalue problem, butterfly form, 211

Index. for generalized eigenvalue problem, butterfly form, 211 Index ad hoc shifts, 165 aggressive early deflation, 205 207 algebraic multiplicity, 35 algebraic Riccati equation, 100 Arnoldi process, 372 block, 418 Hamiltonian skew symmetric, 420 implicitly restarted,

More information

Exploiting Low-Rank Structure in Computing Matrix Powers with Applications to Preconditioning

Exploiting Low-Rank Structure in Computing Matrix Powers with Applications to Preconditioning Exploiting Low-Rank Structure in Computing Matrix Powers with Applications to Preconditioning Erin C. Carson, Nicholas Knight, James Demmel, Ming Gu U.C. Berkeley SIAM PP 12, Savannah, Georgia, USA, February

More information

College of William & Mary Department of Computer Science

College of William & Mary Department of Computer Science Technical Report WM-CS-2009-06 College of William & Mary Department of Computer Science WM-CS-2009-06 Extending the eigcg algorithm to non-symmetric Lanczos for linear systems with multiple right-hand

More information

PRECONDITIONING IN THE PARALLEL BLOCK-JACOBI SVD ALGORITHM

PRECONDITIONING IN THE PARALLEL BLOCK-JACOBI SVD ALGORITHM Proceedings of ALGORITMY 25 pp. 22 211 PRECONDITIONING IN THE PARALLEL BLOCK-JACOBI SVD ALGORITHM GABRIEL OKŠA AND MARIÁN VAJTERŠIC Abstract. One way, how to speed up the computation of the singular value

More information

The Fermion Bag Approach

The Fermion Bag Approach The Fermion Bag Approach Anyi Li Duke University In collaboration with Shailesh Chandrasekharan 1 Motivation Monte Carlo simulation Sign problem Fermion sign problem Solutions to the sign problem Fermion

More information

arxiv: v1 [hep-lat] 4 Nov 2014

arxiv: v1 [hep-lat] 4 Nov 2014 Meson Mass Decomposition,2, Ying Chen, Terrence Draper 2, Ming Gong,2, Keh-Fei Liu 2, Zhaofeng Liu, and Jian-Ping Ma 3,4 arxiv:4.927v [hep-lat] 4 Nov 24 (χqcd Collaboration) Institute of High Energy Physics,

More information