Graphics Card Computing for Materials Modelling

Size: px
Start display at page:

Download "Graphics Card Computing for Materials Modelling"

Transcription

1 Graphics Card Computing for Materials Modelling Case study: Analytic Bond Order Potentials B. Seiser, T. Hammerschmidt, R. Drautz, D. Pettifor Funded by EPSRC within the collaborative multi-scale project Alloys By Design: Nickel-base superalloys

2 Alloys by Design Materials for gas turbine blades: Challenge: CREEP RESISTANT STABLE 0.5 μm 2.5 μm Dislocation creep COATABLE Precipitation of detrimental phases CASTABLE Titanium Steel Nickel Aluminium 25 μm 25 cm Ni-based superalloys: Cr, Co, Mo, W, Al, Ti, Ta, Re, Ru, Hf, C, B (<10 wt%) alloy design still empirically rather than theoretically expensive, time-consuming, non-optimized alloys Reaction with coatings Freckling instabilities Need multi-scale modelling for alloy design

3 Materials Modelling with GPUs Molecular dynamics GPU codes Hierarchy in Materials Modelling AceMD (the biomolecular MD package used by GPUGRID) Ascalaph (molecular modelling suite) HOOMD (Highly Optimized Object Oriented Molecular Dynamics) VMD & NAMD (Visual Molecular Dynamics) Density functional theory codes TeraChem (GTO, J. Chem. Theory Comput., 2008, 4 (2), pp ) Single precision: x speed up BIGDFT (WL, see Journal of Chemical Physics 131, , 2009) Dwarfs are essential for most electronic structure calculation methods

4 Tight-binding method Total energy: Repulsive energy: E = E rep + E bond Summation of pair-wise interactions Bond energy: Bond integral: H kl k H = l H ik i H jl H ij j E F E bond = n(e) E de n(e) Density of states H ii H ij H ik 0 H ji H jj 0 H jl H ki 0 H kk H kl H ij = < i H j> = R T Hv = Ev x ppσ (r ij ) ppπ (r ij ) ppπ (r ij ) Matrices dimension depending on number of orbitals Lapack Scalapack Hv = Ev periodic crystal E x R E F 0 H lj H lk H ll Jacket n(e)

5 Analytic Bond Order potentials Moments of density of states: Moment theorem: Cyrot-Lackmann (1967) = 1 = centre of gravity = RMS width = skewness = bimodality Bond integral Interference path between atom i and j Bond order potential (BOP) bond energy: n = 3 Drautz and Pettifor (2006) n = 4 n = 5 where g n and is n th moment E f

6 BOPfox BOPfox tool (Fortran 90): Tight-binding, EAM, BOP -> Molecular dynamics, kmc Benchmark for fcc with 864 W atoms, 12 moments [s] [%] initialization neighbour lists bond matrix evaluate moments evaluate ainf,binf forces EAM Fermi level search self-consistency total % matrix multiplications rest is spent on path finding

7 Interference paths Calculation of interference paths: Length (n) = 2 l ( ) = ( ) ( ) li start and end on atom i lj ji + ( ) ( ) start and lk end ki 2 nd moment of atom i = sum of paths (n=2) that 4 nd moment of atom i = sum of paths (n=4) that on atom i j i k T ( ) ii = ( ) li ( ) li EP Set of end points

8 Interference paths Calculation of interference paths: Length = 3 ( ) = ( )( ) j k + ( )( ) +... i

9 Density of of states Number of matrix multiplications /atom Matrix multiplications EAM/PP TB 20 7x x10 4 5x10 4 4x10 4 3x10 4 2x10 4 1x Energy Number of moments Accuarcy Number of matrix multiplications scales linearly with number of atoms!

10 BOPfox goes GPU BOPfox tool (Fortran 90): Tight-binding, EAM, BOP -> Molecular dynamics, kmc Benchmark for fcc with 864 W atoms, 12 moments [s] [%] initialization neighbour lists bond matrix evaluate moments evaluate ainf,binf forces EAM Fermi level search self-consistency total hosttogpu_uploadatomicpositions(); hosttogpu_uploadneighbourlist(); gpu_gettodolist(); //Get list of matrix calculations gpu_calculatebondintegrals(); //r ik -> H ik for (i = 2; i <= ninterferencemax; i++){ gpu_matrixmultiplication(); gpu_matrixaddition(); gpu_momentcalculation(); gputohost_moments(); }

11 Graphics Card Computing for Materials Modelling BOPfox and BOPC BOPfox (CPU) Hardware Intel Core2 Dual CPU E GHz 4 GB memory Compiler options Gfortran Release modus (-03) BOPC (GPU) Hardware nvidia GeForce GTX multiprocessors 216 cores 1.5 Ghz Compiler options Nvcc release modus (-03), CUDA 2.0 Benchmark of BOPfox vs BOPC Task BOPfox (CPU) [ms] BOPC (GPU) [ms] Factor (Speed up) Calculation of matrices ~22 Path finding ~44 Matrix multiplication ~19 24 x overall speed up

12 Conclusions Materials modelling can benefit significantly from GPU parallelization Linear algebra and FFT are essential for most electronic structure calculation methods Models like analytic bond order potentials try to avoid expensive LA/FFT routines significant speed up possible

Introduction to numerical computations on the GPU

Introduction to numerical computations on the GPU Introduction to numerical computations on the GPU Lucian Covaci http://lucian.covaci.org/cuda.pdf Tuesday 1 November 11 1 2 Outline: NVIDIA Tesla and Geforce video cards: architecture CUDA - C: programming

More information

A CUDA Solver for Helmholtz Equation

A CUDA Solver for Helmholtz Equation Journal of Computational Information Systems 11: 24 (2015) 7805 7812 Available at http://www.jofcis.com A CUDA Solver for Helmholtz Equation Mingming REN 1,2,, Xiaoguang LIU 1,2, Gang WANG 1,2 1 College

More information

A model leading to self-consistent iteration computation with need for HP LA (e.g, diagonalization and orthogonalization)

A model leading to self-consistent iteration computation with need for HP LA (e.g, diagonalization and orthogonalization) A model leading to self-consistent iteration computation with need for HP LA (e.g, diagonalization and orthogonalization) Schodinger equation: Hψ = Eψ Choose a basis set of wave functions Two cases: Orthonormal

More information

Accelerating linear algebra computations with hybrid GPU-multicore systems.

Accelerating linear algebra computations with hybrid GPU-multicore systems. Accelerating linear algebra computations with hybrid GPU-multicore systems. Marc Baboulin INRIA/Université Paris-Sud joint work with Jack Dongarra (University of Tennessee and Oak Ridge National Laboratory)

More information

Tips Geared Towards R. Adam J. Suarez. Arpil 10, 2015

Tips Geared Towards R. Adam J. Suarez. Arpil 10, 2015 Tips Geared Towards R Departments of Statistics North Carolina State University Arpil 10, 2015 1 / 30 Advantages of R As an interpretive and interactive language, developing an algorithm in R can be done

More information

Direct Self-Consistent Field Computations on GPU Clusters

Direct Self-Consistent Field Computations on GPU Clusters Direct Self-Consistent Field Computations on GPU Clusters Guochun Shi, Volodymyr Kindratenko National Center for Supercomputing Applications University of Illinois at UrbanaChampaign Ivan Ufimtsev, Todd

More information

Accelerating Linear Algebra on Heterogeneous Architectures of Multicore and GPUs using MAGMA and DPLASMA and StarPU Schedulers

Accelerating Linear Algebra on Heterogeneous Architectures of Multicore and GPUs using MAGMA and DPLASMA and StarPU Schedulers UT College of Engineering Tutorial Accelerating Linear Algebra on Heterogeneous Architectures of Multicore and GPUs using MAGMA and DPLASMA and StarPU Schedulers Stan Tomov 1, George Bosilca 1, and Cédric

More information

Klaus Schulten Department of Physics and Theoretical and Computational Biophysics Group University of Illinois at Urbana-Champaign

Klaus Schulten Department of Physics and Theoretical and Computational Biophysics Group University of Illinois at Urbana-Champaign Klaus Schulten Department of Physics and Theoretical and Computational Biophysics Group University of Illinois at Urbana-Champaign GTC, San Jose Convention Center, CA Sept. 20 23, 2010 GPU and the Computational

More information

A Massively Parallel Eigenvalue Solver for Small Matrices on Multicore and Manycore Architectures

A Massively Parallel Eigenvalue Solver for Small Matrices on Multicore and Manycore Architectures A Massively Parallel Eigenvalue Solver for Small Matrices on Multicore and Manycore Architectures Manfred Liebmann Technische Universität München Chair of Optimal Control Center for Mathematical Sciences,

More information

Population annealing study of the frustrated Ising antiferromagnet on the stacked triangular lattice

Population annealing study of the frustrated Ising antiferromagnet on the stacked triangular lattice Population annealing study of the frustrated Ising antiferromagnet on the stacked triangular lattice Michal Borovský Department of Theoretical Physics and Astrophysics, University of P. J. Šafárik in Košice,

More information

Morse index of figure-eight choreography for equal mass three-body problem

Morse index of figure-eight choreography for equal mass three-body problem Morse index of figure-eight choreography for equal mass three-body problem Hiroshi Fukuda College of Liberal Arts and Sciences, Kitasato University (2017.3.22) In this talk We show Morse index of Figure-eight

More information

Multicore Parallelization of Determinant Quantum Monte Carlo Simulations

Multicore Parallelization of Determinant Quantum Monte Carlo Simulations Multicore Parallelization of Determinant Quantum Monte Carlo Simulations Andrés Tomás, Che-Rung Lee, Zhaojun Bai, Richard Scalettar UC Davis SIAM Conference on Computation Science & Engineering Reno, March

More information

GPU acceleration of Newton s method for large systems of polynomial equations in double double and quad double arithmetic

GPU acceleration of Newton s method for large systems of polynomial equations in double double and quad double arithmetic GPU acceleration of Newton s method for large systems of polynomial equations in double double and quad double arithmetic Jan Verschelde joint work with Xiangcheng Yu University of Illinois at Chicago

More information

The Fast Multipole Method in molecular dynamics

The Fast Multipole Method in molecular dynamics The Fast Multipole Method in molecular dynamics Berk Hess KTH Royal Institute of Technology, Stockholm, Sweden ADAC6 workshop Zurich, 20-06-2018 Slide BioExcel Slide Molecular Dynamics of biomolecules

More information

Computational Linear Algebra

Computational Linear Algebra Computational Linear Algebra PD Dr. rer. nat. habil. Ralf Peter Mundani Computation in Engineering / BGU Scientific Computing in Computer Science / INF Winter Term 2017/18 Part 2: Direct Methods PD Dr.

More information

Accelerating Model Reduction of Large Linear Systems with Graphics Processors

Accelerating Model Reduction of Large Linear Systems with Graphics Processors Accelerating Model Reduction of Large Linear Systems with Graphics Processors P. Benner 1, P. Ezzatti 2, D. Kressner 3, E.S. Quintana-Ortí 4, Alfredo Remón 4 1 Max-Plank-Institute for Dynamics of Complex

More information

An FPGA Implementation of Reciprocal Sums for SPME

An FPGA Implementation of Reciprocal Sums for SPME An FPGA Implementation of Reciprocal Sums for SPME Sam Lee and Paul Chow Edward S. Rogers Sr. Department of Electrical and Computer Engineering University of Toronto Objectives Accelerate part of Molecular

More information

MAGMA. Matrix Algebra on GPU and Multicore Architectures. Mark Gates. February 2012

MAGMA. Matrix Algebra on GPU and Multicore Architectures. Mark Gates. February 2012 MAGMA Matrix Algebra on GPU and Multicore Architectures Mark Gates February 2012 1 Hardware trends Scale # cores instead of clock speed Hardware issue became software issue Multicore Hybrid 1.E+07 1e7

More information

arxiv: v1 [physics.comp-ph] 30 Oct 2017

arxiv: v1 [physics.comp-ph] 30 Oct 2017 An efficient GPU algorithm for tetrahedron-based Brillouin-zone integration Daniel Guterding 1, and Harald O. Jeschke 1 Lucht Probst Associates, Große Gallusstraße 9, 011 Frankfurt am Main, Germany, European

More information

Hybrid CPU/GPU Acceleration of Detection of 2-SNP Epistatic Interactions in GWAS

Hybrid CPU/GPU Acceleration of Detection of 2-SNP Epistatic Interactions in GWAS Hybrid CPU/GPU Acceleration of Detection of 2-SNP Epistatic Interactions in GWAS Jorge González-Domínguez*, Bertil Schmidt*, Jan C. Kässens**, Lars Wienbrandt** *Parallel and Distributed Architectures

More information

What I Did Last Summer

What I Did Last Summer What I Did Last Summer LINGOs, GPUs, and Monitoring Vertex Imran Haque Department of Computer Science Pande Lab, Stanford University http://cs.stanford.edu/people/ihaque http://folding.stanford.edu ihaque@cs.stanford.edu

More information

CRYPTOGRAPHIC COMPUTING

CRYPTOGRAPHIC COMPUTING CRYPTOGRAPHIC COMPUTING ON GPU Chen Mou Cheng Dept. Electrical Engineering g National Taiwan University January 16, 2009 COLLABORATORS Daniel Bernstein, UIC, USA Tien Ren Chen, Army Tanja Lange, TU Eindhoven,

More information

Julian Merten. GPU Computing and Alternative Architecture

Julian Merten. GPU Computing and Alternative Architecture Future Directions of Cosmological Simulations / Edinburgh 1 / 16 Julian Merten GPU Computing and Alternative Architecture Institut für Theoretische Astrophysik Zentrum für Astronomie Universität Heidelberg

More information

UNMIXING 4-D PTYCHOGRAPHIC IMAGES

UNMIXING 4-D PTYCHOGRAPHIC IMAGES UNMIXING 4-D PTYCHOGRAPHIC IMAGES Mentors: Dr. Rick Archibald(ORNL), Dr. Azzam Haidar(UTK), Dr. Stanimire Tomov(UTK), and Dr. Kwai Wong(UTK) PROJECT BY: MICHAELA SHOFFNER(UTK) ZHEN ZHANG(CUHK) HUANLIN

More information

Parallelization of Molecular Dynamics (with focus on Gromacs) SeSE 2014 p.1/29

Parallelization of Molecular Dynamics (with focus on Gromacs) SeSE 2014 p.1/29 Parallelization of Molecular Dynamics (with focus on Gromacs) SeSE 2014 p.1/29 Outline A few words on MD applications and the GROMACS package The main work in an MD simulation Parallelization Stream computing

More information

arxiv: v1 [hep-lat] 10 Jul 2012

arxiv: v1 [hep-lat] 10 Jul 2012 Hybrid Monte Carlo with Wilson Dirac operator on the Fermi GPU Abhijit Chakrabarty Electra Design Automation, SDF Building, SaltLake Sec-V, Kolkata - 700091. Pushan Majumdar Dept. of Theoretical Physics,

More information

Real-time signal detection for pulsars and radio transients using GPUs

Real-time signal detection for pulsars and radio transients using GPUs Real-time signal detection for pulsars and radio transients using GPUs W. Armour, M. Giles, A. Karastergiou and C. Williams. University of Oxford. 15 th July 2013 1 Background of GPUs Why use GPUs? Influence

More information

Uni10 The Universal Tensor Network Library

Uni10 The Universal Tensor Network Library Uni0 The Universal Tensor Network Library Ying-Jer Kao Department of Physics National Taiwan University National Center for Theoretical Sciences http://www.uni0.org TNQMP 06, ISSP Graphical Representation

More information

Adaptive Heterogeneous Computing with OpenCL: Harnessing hundreds of GPUs and CPUs

Adaptive Heterogeneous Computing with OpenCL: Harnessing hundreds of GPUs and CPUs Adaptive Heterogeneous Computing with OpenCL: Harnessing hundreds of GPUs and CPUs Simon McIntosh-Smith simonm@cs.bris.ac.uk Head of Microelectronics Research University of Bristol, UK 1 ! Collaborators

More information

TR A Comparison of the Performance of SaP::GPU and Intel s Math Kernel Library (MKL) for Solving Dense Banded Linear Systems

TR A Comparison of the Performance of SaP::GPU and Intel s Math Kernel Library (MKL) for Solving Dense Banded Linear Systems TR-0-07 A Comparison of the Performance of ::GPU and Intel s Math Kernel Library (MKL) for Solving Dense Banded Linear Systems Ang Li, Omkar Deshmukh, Radu Serban, Dan Negrut May, 0 Abstract ::GPU is a

More information

Explore Computational Power of GPU in Electromagnetics and Micromagnetics

Explore Computational Power of GPU in Electromagnetics and Micromagnetics Explore Computational Power of GPU in Electromagnetics and Micromagnetics Presenter: Sidi Fu, PhD candidate, UC San Diego Advisor: Prof. Vitaliy Lomakin Center of Magnetic Recording Research, Department

More information

Block AIR Methods. For Multicore and GPU. Per Christian Hansen Hans Henrik B. Sørensen. Technical University of Denmark

Block AIR Methods. For Multicore and GPU. Per Christian Hansen Hans Henrik B. Sørensen. Technical University of Denmark Block AIR Methods For Multicore and GPU Per Christian Hansen Hans Henrik B. Sørensen Technical University of Denmark Model Problem and Notation Parallel-beam 3D tomography exact solution exact data noise

More information

Parallelization of the Molecular Orbital Program MOS-F

Parallelization of the Molecular Orbital Program MOS-F Parallelization of the Molecular Orbital Program MOS-F Akira Asato, Satoshi Onodera, Yoshie Inada, Elena Akhmatskaya, Ross Nobes, Azuma Matsuura, Atsuya Takahashi November 2003 Fujitsu Laboratories of

More information

Modèle de liaisons fortes au 4ème moment pour traiter l ordre-désordre dans les alliages

Modèle de liaisons fortes au 4ème moment pour traiter l ordre-désordre dans les alliages Modèle de liaisons fortes au 4ème moment pour traiter l ordre-désordre dans les alliages Jan Los, Christine Mottet, Guy Tréglia CINaM, Marseille Christine Goyhenex IPCMS, Strasbourg Outline Context Tight

More information

Practical Combustion Kinetics with CUDA

Practical Combustion Kinetics with CUDA Funded by: U.S. Department of Energy Vehicle Technologies Program Program Manager: Gurpreet Singh & Leo Breton Practical Combustion Kinetics with CUDA GPU Technology Conference March 20, 2015 Russell Whitesides

More information

Molecular dynamics simulation. CS/CME/BioE/Biophys/BMI 279 Oct. 5 and 10, 2017 Ron Dror

Molecular dynamics simulation. CS/CME/BioE/Biophys/BMI 279 Oct. 5 and 10, 2017 Ron Dror Molecular dynamics simulation CS/CME/BioE/Biophys/BMI 279 Oct. 5 and 10, 2017 Ron Dror 1 Outline Molecular dynamics (MD): The basic idea Equations of motion Key properties of MD simulations Sample applications

More information

SP-CNN: A Scalable and Programmable CNN-based Accelerator. Dilan Manatunga Dr. Hyesoon Kim Dr. Saibal Mukhopadhyay

SP-CNN: A Scalable and Programmable CNN-based Accelerator. Dilan Manatunga Dr. Hyesoon Kim Dr. Saibal Mukhopadhyay SP-CNN: A Scalable and Programmable CNN-based Accelerator Dilan Manatunga Dr. Hyesoon Kim Dr. Saibal Mukhopadhyay Motivation Power is a first-order design constraint, especially for embedded devices. Certain

More information

Shortest Lattice Vector Enumeration on Graphics Cards

Shortest Lattice Vector Enumeration on Graphics Cards Shortest Lattice Vector Enumeration on Graphics Cards Jens Hermans 1 Michael Schneider 2 Fréderik Vercauteren 1 Johannes Buchmann 2 Bart Preneel 1 1 K.U.Leuven 2 TU Darmstadt SHARCS - 10 September 2009

More information

Machine learning the Born-Oppenheimer potential energy surface: from molecules to materials. Gábor Csányi Engineering Laboratory

Machine learning the Born-Oppenheimer potential energy surface: from molecules to materials. Gábor Csányi Engineering Laboratory Machine learning the Born-Oppenheimer potential energy surface: from molecules to materials Gábor Csányi Engineering Laboratory Interatomic potentials for molecular dynamics Transferability biomolecular

More information

Symmetric Pivoting in ScaLAPACK Craig Lucas University of Manchester Cray User Group 8 May 2006, Lugano

Symmetric Pivoting in ScaLAPACK Craig Lucas University of Manchester Cray User Group 8 May 2006, Lugano Symmetric Pivoting in ScaLAPACK Craig Lucas University of Manchester Cray User Group 8 May 2006, Lugano Introduction Introduction We wanted to parallelize a serial algorithm for the pivoted Cholesky factorization

More information

Molecular Clustering and Velocity Increase in Converging-Diverging Nozzle in MD Simulation

Molecular Clustering and Velocity Increase in Converging-Diverging Nozzle in MD Simulation Molecular Clustering and Velocity Increase in Converging-Diverging Nozzle in MD Simulation Jeoungsu Na 1, Jaehawn Lee 2, Changil Hong 2, Suhee Kim 1 R&D Department of NaJen, Co. LTD, Korea 2 Dept. of Computer

More information

Cosmology with Galaxy Clusters: Observations meet High-Performance-Computing

Cosmology with Galaxy Clusters: Observations meet High-Performance-Computing Cosmology with Galaxy Clusters: Observations meet High-Performance-Computing Julian Merten (ITA/ZAH) Clusters of galaxies GPU lensing codes Abell 2744 CLASH: A HST/MCT programme Clusters of galaxies DM

More information

The Lattice Boltzmann Method for Laminar and Turbulent Channel Flows

The Lattice Boltzmann Method for Laminar and Turbulent Channel Flows The Lattice Boltzmann Method for Laminar and Turbulent Channel Flows Vanja Zecevic, Michael Kirkpatrick and Steven Armfield Department of Aerospace Mechanical & Mechatronic Engineering The University of

More information

Random Sampling for Short Lattice Vectors on Graphics Cards

Random Sampling for Short Lattice Vectors on Graphics Cards Random Sampling for Short Lattice Vectors on Graphics Cards Michael Schneider, Norman Göttert TU Darmstadt, Germany mischnei@cdc.informatik.tu-darmstadt.de CHES 2011, Nara September 2011 Michael Schneider

More information

Algorithm for Sparse Approximate Inverse Preconditioners in the Conjugate Gradient Method

Algorithm for Sparse Approximate Inverse Preconditioners in the Conjugate Gradient Method Algorithm for Sparse Approximate Inverse Preconditioners in the Conjugate Gradient Method Ilya B. Labutin A.A. Trofimuk Institute of Petroleum Geology and Geophysics SB RAS, 3, acad. Koptyug Ave., Novosibirsk

More information

Accelerating Three-Body Molecular Dynamics Potentials Using NVIDIA Tesla K20X GPUs. GE Global Research Masako Yamada

Accelerating Three-Body Molecular Dynamics Potentials Using NVIDIA Tesla K20X GPUs. GE Global Research Masako Yamada Accelerating Three-Body Molecular Dynamics Potentials Using NVIDIA Tesla K20X GPUs GE Global Research Masako Yamada Overview of MD Simulations Non-Icing Surfaces for Wind Turbines Large simulations ~ 1

More information

Parallel Rabin-Karp Algorithm Implementation on GPU (preliminary version)

Parallel Rabin-Karp Algorithm Implementation on GPU (preliminary version) Bulletin of Networking, Computing, Systems, and Software www.bncss.org, ISSN 2186-5140 Volume 7, Number 1, pages 28 32, January 2018 Parallel Rabin-Karp Algorithm Implementation on GPU (preliminary version)

More information

Introduction to Benchmark Test for Multi-scale Computational Materials Software

Introduction to Benchmark Test for Multi-scale Computational Materials Software Introduction to Benchmark Test for Multi-scale Computational Materials Software Shun Xu*, Jian Zhang, Zhong Jin xushun@sccas.cn Computer Network Information Center Chinese Academy of Sciences (IPCC member)

More information

Welcome to MCS 572. content and organization expectations of the course. definition and classification

Welcome to MCS 572. content and organization expectations of the course. definition and classification Welcome to MCS 572 1 About the Course content and organization expectations of the course 2 Supercomputing definition and classification 3 Measuring Performance speedup and efficiency Amdahl s Law Gustafson

More information

PART 1 Introduction to Theory of Solids

PART 1 Introduction to Theory of Solids Elsevier UK Job code: MIOC Ch01-I044647 9-3-2007 3:03p.m. Page:1 Trim:165 240MM TS: Integra, India PART 1 Introduction to Theory of Solids Elsevier UK Job code: MIOC Ch01-I044647 9-3-2007 3:03p.m. Page:2

More information

INITIAL INTEGRATION AND EVALUATION

INITIAL INTEGRATION AND EVALUATION INITIAL INTEGRATION AND EVALUATION OF SLATE PARALLEL BLAS IN LATTE Marc Cawkwell, Danny Perez, Arthur Voter Asim YarKhan, Gerald Ragghianti, Jack Dongarra, Introduction The aim of the joint milestone STMS10-52

More information

Crystal-Structure Analysis with Moments of the Density-of-States: Application to Intermetallic Topologically Close-Packed Phases

Crystal-Structure Analysis with Moments of the Density-of-States: Application to Intermetallic Topologically Close-Packed Phases Article Crystal-Structure Analysis with Moments of the Density-of-States: Application to Intermetallic Topologically Close-Packed Phases Thomas Hammerschmidt *, Alvin Noe Ladines, Jörg Koßmann and Ralf

More information

arxiv: v1 [hep-lat] 7 Oct 2010

arxiv: v1 [hep-lat] 7 Oct 2010 arxiv:.486v [hep-lat] 7 Oct 2 Nuno Cardoso CFTP, Instituto Superior Técnico E-mail: nunocardoso@cftp.ist.utl.pt Pedro Bicudo CFTP, Instituto Superior Técnico E-mail: bicudo@ist.utl.pt We discuss the CUDA

More information

TP 1: Euler s Algorithm-Air Resistance-Introduction to Fortran

TP 1: Euler s Algorithm-Air Resistance-Introduction to Fortran TP 1: Euler s Algorithm-Air Resistance-Introduction to Fortran December 10, 2009 1 References N.J.Giordano, Computational Physics. R.H.Landau, M.J.Paez, C.C.Bordeianu, Computational Physics. H.Gould, J.Tobochnick,

More information

NIH Center for Macromolecular Modeling and Bioinformatics Developer of VMD and NAMD. Beckman Institute

NIH Center for Macromolecular Modeling and Bioinformatics Developer of VMD and NAMD. Beckman Institute NIH Center for Macromolecular Modeling and Bioinformatics Developer of VMD and NAMD 5 faculty members (2 physics, 1 chemistry, 1 biochemistry, 1 computer science); 8 developers; 1 system admin; 15 post

More information

SPARSE SOLVERS POISSON EQUATION. Margreet Nool. November 9, 2015 FOR THE. CWI, Multiscale Dynamics

SPARSE SOLVERS POISSON EQUATION. Margreet Nool. November 9, 2015 FOR THE. CWI, Multiscale Dynamics SPARSE SOLVERS FOR THE POISSON EQUATION Margreet Nool CWI, Multiscale Dynamics November 9, 2015 OUTLINE OF THIS TALK 1 FISHPACK, LAPACK, PARDISO 2 SYSTEM OVERVIEW OF CARTESIUS 3 POISSON EQUATION 4 SOLVERS

More information

Classical potentials for metals

Classical potentials for metals Classical potentials for metals About 80 % of all elements are metals. The crystal structures of the elements are distributed as follows: FCC 15 HCP 26 BCC 16 All other metals 13 So if we can describe

More information

CP2K: Past, Present, Future. Jürg Hutter Department of Chemistry, University of Zurich

CP2K: Past, Present, Future. Jürg Hutter Department of Chemistry, University of Zurich CP2K: Past, Present, Future Jürg Hutter Department of Chemistry, University of Zurich Outline Past History of CP2K Development of features Present Quickstep DFT code Post-HF methods (RPA, MP2) Libraries

More information

APPLICATION OF CUDA TECHNOLOGY FOR CALCULATION OF GROUND STATES OF FEW-BODY NUCLEI BY FEYNMAN'S CONTINUAL INTEGRALS METHOD

APPLICATION OF CUDA TECHNOLOGY FOR CALCULATION OF GROUND STATES OF FEW-BODY NUCLEI BY FEYNMAN'S CONTINUAL INTEGRALS METHOD APPLICATION OF CUDA TECHNOLOGY FOR CALCULATION OF GROUND STATES OF FEW-BODY NUCLEI BY FEYNMAN'S CONTINUAL INTEGRALS METHOD M.A. Naumenko, V.V. Samarin Joint Institute for Nuclear Research, Dubna, Russia

More information

5 questions, 3 points each, 15 points total possible. 26 Fe Cu Ni Co Pd Ag Ru 101.

5 questions, 3 points each, 15 points total possible. 26 Fe Cu Ni Co Pd Ag Ru 101. Physical Chemistry II Lab CHEM 4644 spring 2017 final exam KEY 5 questions, 3 points each, 15 points total possible h = 6.626 10-34 J s c = 3.00 10 8 m/s 1 GHz = 10 9 s -1. B= h 8π 2 I ν= 1 2 π k μ 6 P

More information

A CPU-GPU Hybrid Implementation and Model-Driven Scheduling of the Fast Multipole Method

A CPU-GPU Hybrid Implementation and Model-Driven Scheduling of the Fast Multipole Method A CPU-GPU Hybrid Implementation and Model-Driven Scheduling of the Fast Multipole Method Jee Choi 1, Aparna Chandramowlishwaran 3, Kamesh Madduri 4, and Richard Vuduc 2 1 ECE, Georgia Tech 2 CSE, Georgia

More information

Using AmgX to accelerate a PETSc-based immersed-boundary method code

Using AmgX to accelerate a PETSc-based immersed-boundary method code 29th International Conference on Parallel Computational Fluid Dynamics May 15-17, 2017; Glasgow, Scotland Using AmgX to accelerate a PETSc-based immersed-boundary method code Olivier Mesnard, Pi-Yueh Chuang,

More information

ECS 178 Course Notes QUATERNIONS

ECS 178 Course Notes QUATERNIONS ECS 178 Course Notes QUATERNIONS Kenneth I. Joy Institute for Data Analysis and Visualization Department of Computer Science University of California, Davis Overview The quaternion number system was discovered

More information

Available online at ScienceDirect. Procedia Engineering 61 (2013 ) 94 99

Available online at  ScienceDirect. Procedia Engineering 61 (2013 ) 94 99 Available online at www.sciencedirect.com ScienceDirect Procedia Engineering 6 (203 ) 94 99 Parallel Computational Fluid Dynamics Conference (ParCFD203) Simulations of three-dimensional cavity flows with

More information

S0214 : GPU Based Stacking Sequence Generation For Composite Skins Using GA

S0214 : GPU Based Stacking Sequence Generation For Composite Skins Using GA S0214 : GPU Based Stacking Sequence Generation For Composite Skins Using GA Date: 16th May 2012 Wed, 3pm to 3.25pm(Adv. Session) Sathyanarayana K., Manish Banga, and Ravi Kumar G. V. V. Engineering Services,

More information

Interatomic potentials with error bars. Gábor Csányi Engineering Laboratory

Interatomic potentials with error bars. Gábor Csányi Engineering Laboratory Interatomic potentials with error bars Gábor Csányi Engineering Laboratory What makes a potential Ingredients Desirable properties Representation of atomic neighbourhood smoothness, faithfulness, continuity

More information

Jacobi-Based Eigenvalue Solver on GPU. Lung-Sheng Chien, NVIDIA

Jacobi-Based Eigenvalue Solver on GPU. Lung-Sheng Chien, NVIDIA Jacobi-Based Eigenvalue Solver on GPU Lung-Sheng Chien, NVIDIA lchien@nvidia.com Outline Symmetric eigenvalue solver Experiment Applications Conclusions Symmetric eigenvalue solver The standard form is

More information

FENZI: GPU-enabled Molecular Dynamics Simulations of Large Membrane Regions based on the CHARMM force field and PME

FENZI: GPU-enabled Molecular Dynamics Simulations of Large Membrane Regions based on the CHARMM force field and PME 211 IEEE International Parallel & Distributed Processing Symposium : GPU-enabled Molecular Dynamics Simulations of Large Membrane Regions based on the force field and PME Narayan Ganesan, Michela Taufer

More information

Level-3 BLAS on a GPU

Level-3 BLAS on a GPU Level-3 BLAS on a GPU Picking the Low Hanging Fruit Francisco Igual 1 Gregorio Quintana-Ortí 1 Robert A. van de Geijn 2 1 Departamento de Ingeniería y Ciencia de los Computadores. University Jaume I. Castellón

More information

Chapter 2. Atomic Structure

Chapter 2. Atomic Structure Chapter 2 Atomic Structure 2 6 (a) Aluminum foil used for storing food weighs about 0. g per square cm. How many atoms of aluminum are contained in one 6.25 cm 2 size of foil? (b) Using the densities and

More information

On pairwise comparison matrices that can be made consistent by the modification of a few elements

On pairwise comparison matrices that can be made consistent by the modification of a few elements Noname manuscript No. (will be inserted by the editor) On pairwise comparison matrices that can be made consistent by the modification of a few elements Sándor Bozóki 1,2 János Fülöp 1,3 Attila Poesz 2

More information

Research on GPU-accelerated algorithm in 3D finite difference neutron diffusion calculation method

Research on GPU-accelerated algorithm in 3D finite difference neutron diffusion calculation method NUCLEAR SCIENCE AND TECHNIQUES 25, 0501 (14) Research on GPU-accelerated algorithm in 3D finite difference neutron diffusion calculation method XU Qi ( 徐琪 ), 1, YU Gang-Lin ( 余纲林 ), 1 WANG Kan ( 王侃 ),

More information

Modeling and visualization of molecular dynamic processes

Modeling and visualization of molecular dynamic processes JASS 2009 Konstantin Shefov Modeling and visualization of molecular dynamic processes St-Petersburg State University Physics faculty Department of Computational Physics Supervisor PhD Stepanova Margarita

More information

arxiv: v1 [physics.comp-ph] 22 Nov 2012

arxiv: v1 [physics.comp-ph] 22 Nov 2012 A Customized 3D GPU Poisson Solver for Free BCs Nazim Dugan a, Luigi Genovese b, Stefan Goedecker a, a Department of Physics, University of Basel, Klingelbergstr. 82, 4056 Basel, Switzerland b Laboratoire

More information

Environmentally dependent bond-order potentials: New developments and applications

Environmentally dependent bond-order potentials: New developments and applications Bull. Mater. Sci., Vol. 6, No., January 003, pp. 43 5. Indian Academy of Sciences. Environmentally dependent bond-order potentials: New developments and applications D NGUYEN-MANH*, D G PETTIFOR, D J H

More information

Porting a Sphere Optimization Program from lapack to scalapack

Porting a Sphere Optimization Program from lapack to scalapack Porting a Sphere Optimization Program from lapack to scalapack Paul C. Leopardi Robert S. Womersley 12 October 2008 Abstract The sphere optimization program sphopt was originally written as a sequential

More information

Q-Chem 4.0: Expanding the Frontiers. Jing Kong Q-Chem Inc. Pittsburgh, PA

Q-Chem 4.0: Expanding the Frontiers. Jing Kong Q-Chem Inc. Pittsburgh, PA Q-Chem 4.0: Expanding the Frontiers Jing Kong Q-Chem Inc. Pittsburgh, PA Q-Chem: Profile Q-Chem is a high performance quantum chemistry program; Contributed by best quantum chemists from 40 universities

More information

COMPARISON OF CPU AND GPU IMPLEMENTATIONS OF THE LATTICE BOLTZMANN METHOD

COMPARISON OF CPU AND GPU IMPLEMENTATIONS OF THE LATTICE BOLTZMANN METHOD XVIII International Conference on Water Resources CMWR 2010 J. Carrera (Ed) c CIMNE, Barcelona, 2010 COMPARISON OF CPU AND GPU IMPLEMENTATIONS OF THE LATTICE BOLTZMANN METHOD James.E. McClure, Jan F. Prins

More information

Designing Survivable Networks: A Flow Based Approach

Designing Survivable Networks: A Flow Based Approach Designing Survivable Networks: A Flow Based Approach Prakash Mirchandani 1 University of Pittsburgh This is joint work with Anant Balakrishnan 2 of the University of Texas at Austin and Hari Natarajan

More information

Parallel sparse direct solvers for Poisson s equation in streamer discharges

Parallel sparse direct solvers for Poisson s equation in streamer discharges Parallel sparse direct solvers for Poisson s equation in streamer discharges Margreet Nool, Menno Genseberger 2 and Ute Ebert,3 Centrum Wiskunde & Informatica (CWI), P.O.Box 9479, 9 GB Amsterdam, The Netherlands

More information

The Augmented Spherical Wave Method

The Augmented Spherical Wave Method Introduction Institut für Physik, Universität Augsburg Electronic Structure in a Nutshell Outline 1 Fundamentals Generations 2 Outline 1 Fundamentals Generations 2 Outline Fundamentals Generations 1 Fundamentals

More information

Part III: Theoretical Surface Science Adsorption at Surfaces

Part III: Theoretical Surface Science Adsorption at Surfaces Technische Universität München Part III: Theoretical Surface Science Adsorption at Surfaces Karsten Reuter Lecture course: Solid State Theory Adsorption at surfaces (T,p) Phase II Phase I Corrosion Growth

More information

NIH Center for Macromolecular Modeling and Bioinformatics Developer of VMD and NAMD. Beckman Institute

NIH Center for Macromolecular Modeling and Bioinformatics Developer of VMD and NAMD. Beckman Institute NIH Center for Macromolecular Modeling and Bioinformatics Developer of VMD and NAMD 5 faculty members (2 physics, 1 chemistry, 1 biochemistry, 1 computer science); 8 developers; 1 system admin; 15 post

More information

S Subdivide, Preprocess and Conquer: Micromagnetism FEM/BEM-Simulations on Single-Node/Multi-GPU Systems

S Subdivide, Preprocess and Conquer: Micromagnetism FEM/BEM-Simulations on Single-Node/Multi-GPU Systems S4283 - Subdivide, : Micromagnetism FEM/BEM-Simulations on Single-Node/Multi-GPU Systems Elmar Westphal - Forschungszentrum Jülich GmbH 1 Contents Micromagnetism TetraMag, a FEM/BEM Micromagnetism Simulator

More information

GPU accelerated Monte Carlo simulations of lattice spin models

GPU accelerated Monte Carlo simulations of lattice spin models Available online at www.sciencedirect.com Physics Procedia 15 (2011) 92 96 GPU accelerated Monte Carlo simulations of lattice spin models M. Weigel, T. Yavors kii Institut für Physik, Johannes Gutenberg-Universität

More information

CHEM1902/ N-2 November 2014

CHEM1902/ N-2 November 2014 CHEM1902/4 2014-N-2 November 2014 The cubic form of boron nitride (borazon) is the second-hardest material after diamond and it crystallizes with the structure shown below. The large spheres represent

More information

EA = I 3 = E = i=1, i k

EA = I 3 = E = i=1, i k MTH5 Spring 7 HW Assignment : Sec.., # (a) and (c), 5,, 8; Sec.., #, 5; Sec.., #7 (a), 8; Sec.., # (a), 5 The due date for this assignment is //7. Sec.., # (a) and (c). Use the proof of Theorem. to obtain

More information

Algorithmic Challenges in Photodynamics Simulations

Algorithmic Challenges in Photodynamics Simulations Algorithmic Challenges in Photodynamics Simulations Felix Plasser González Research Group Institute for Theoretical Chemistry, University of Vienna, Austria Grundlsee, 24 th February 2016 Photodynamics

More information

Porting a sphere optimization program from LAPACK to ScaLAPACK

Porting a sphere optimization program from LAPACK to ScaLAPACK Porting a sphere optimization program from LAPACK to ScaLAPACK Mathematical Sciences Institute, Australian National University. For presentation at Computational Techniques and Applications Conference

More information

WRF performance tuning for the Intel Woodcrest Processor

WRF performance tuning for the Intel Woodcrest Processor WRF performance tuning for the Intel Woodcrest Processor A. Semenov, T. Kashevarova, P. Mankevich, D. Shkurko, K. Arturov, N. Panov Intel Corp., pr. ak. Lavrentieva 6/1, Novosibirsk, Russia, 630090 {alexander.l.semenov,tamara.p.kashevarova,pavel.v.mankevich,

More information

Making electronic structure methods scale: Large systems and (massively) parallel computing

Making electronic structure methods scale: Large systems and (massively) parallel computing AB Making electronic structure methods scale: Large systems and (massively) parallel computing Ville Havu Department of Applied Physics Helsinki University of Technology - TKK Ville.Havu@tkk.fi 1 Outline

More information

Vector Analysis HOMEWORK IX Solution. 1. If T Λ k (V ), v 1,..., v k is a set of k linearly dependent vectors on V, prove

Vector Analysis HOMEWORK IX Solution. 1. If T Λ k (V ), v 1,..., v k is a set of k linearly dependent vectors on V, prove 1. If T Λ k (V ), v 1,..., v k is a set of k linearly dependent vectors on V, prove T ( v 1,..., v k ) = 0 Since v 1,..., v k is a set of k linearly dependent vectors, there exists a 1,..., a k F such

More information

Experiences with Self-Consistent Tight Binding and ELSI

Experiences with Self-Consistent Tight Binding and ELSI Experiences with Self-Consistent Tight Binding and ELSI Ben Hourahine benjamin.hourahine@strath.ac.uk Motivation via computational cost Method Eform. (ev) Time Orthogonal TB 3.2 1 Self-Consistent Orthogonal

More information

Sparse BLAS-3 Reduction

Sparse BLAS-3 Reduction Sparse BLAS-3 Reduction to Banded Upper Triangular (Spar3Bnd) Gary Howell, HPC/OIT NC State University gary howell@ncsu.edu Sparse BLAS-3 Reduction p.1/27 Acknowledgements James Demmel, Gene Golub, Franc

More information

Suggested Reading. Pages in Engler and Randle

Suggested Reading. Pages in Engler and Randle The Structure Factor Suggested Reading Pages 303-312312 in DeGraef & McHenry Pages 59-61 in Engler and Randle 1 Structure Factor (F ) N i1 1 2 i( hu kv lw ) F fe i i j i Describes how atomic arrangement

More information

ab initio Electronic Structure Calculations

ab initio Electronic Structure Calculations ab initio Electronic Structure Calculations New scalability frontiers using the BG/L Supercomputer C. Bekas, A. Curioni and W. Andreoni IBM, Zurich Research Laboratory Rueschlikon 8803, Switzerland ab

More information

Light curve modeling of eclipsing binary stars

Light curve modeling of eclipsing binary stars Light curve modeling of eclipsing binary stars Gábor Marschalkó Baja Observatory of University of Szeged Wigner Research Centre for Physics Binary stars physical variables pulsating stars mass, radius,

More information

A Quantum Chemistry Domain-Specific Language for Heterogeneous Clusters

A Quantum Chemistry Domain-Specific Language for Heterogeneous Clusters A Quantum Chemistry Domain-Specific Language for Heterogeneous Clusters ANTONINO TUMEO, ORESTE VILLA Collaborators: Karol Kowalski, Sriram Krishnamoorthy, Wenjing Ma, Simone Secchi May 15, 2012 1 Outline!

More information

Evaluation and Benchmarking of Highly Scalable Parallel Numerical Libraries

Evaluation and Benchmarking of Highly Scalable Parallel Numerical Libraries Evaluation and Benchmarking of Highly Scalable Parallel Numerical Libraries Christos Theodosiou (ctheodos@grid.auth.gr) User and Application Support Scientific Computing Centre @ AUTH Presentation Outline

More information

Solving the Inverse Toeplitz Eigenproblem Using ScaLAPACK and MPI *

Solving the Inverse Toeplitz Eigenproblem Using ScaLAPACK and MPI * Solving the Inverse Toeplitz Eigenproblem Using ScaLAPACK and MPI * J.M. Badía and A.M. Vidal Dpto. Informática., Univ Jaume I. 07, Castellón, Spain. badia@inf.uji.es Dpto. Sistemas Informáticos y Computación.

More information