Introduction to Benchmark Test for Multi-scale Computational Materials Software

Similar documents
RWTH Aachen University

Molecular Dynamics Simulations

Perm State University Research-Education Center Parallel and Distributed Computing

Large-scale Electronic Structure Simulations with MVAPICH2 on Intel Knights Landing Manycore Processors

LAMMPS Performance Benchmark on VSC-1 and VSC-2

The Fast Multipole Method in molecular dynamics

Scalable and Power-Efficient Data Mining Kernels

A Quantum Chemistry Domain-Specific Language for Heterogeneous Clusters

A Massively Parallel Eigenvalue Solver for Small Matrices on Multicore and Manycore Architectures

TR A Comparison of the Performance of SaP::GPU and Intel s Math Kernel Library (MKL) for Solving Dense Banded Linear Systems

Antti-Pekka Hynninen, 5/10/2017, GTC2017, San Jose CA

Massively scalable computing method to tackle large eigenvalue problems for nanoelectronics modeling

Massively scalable computing method to tackle large eigenvalue problems for nanoelectronics modeling

Parallel Transposition of Sparse Data Structures

Piz Daint & Piz Kesch : from general purpose supercomputing to an appliance for weather forecasting. Thomas C. Schulthess

Density Functional Theory

Efficient Parallelization of Molecular Dynamics Simulations on Hybrid CPU/GPU Supercoputers

Parallelization of Molecular Dynamics (with focus on Gromacs) SeSE 2014 p.1/29

Quantum Chemical Calculations by Parallel Computer from Commodity PC Components

HYCOM and Navy ESPC Future High Performance Computing Needs. Alan J. Wallcraft. COAPS Short Seminar November 6, 2017

Introduction to numerical computations on the GPU

Investigation of an Unusual Phase Transition Freezing on heating of liquid solution

Software optimization for petaflops/s scale Quantum Monte Carlo simulations

Verbundprojekt ELPA-AEO. Eigenwert-Löser für Petaflop-Anwendungen Algorithmische Erweiterungen und Optimierungen

Targeting Extreme Scale Computational Challenges with Heterogeneous Systems

Accelerating Rosetta with OpenMM. Peter Eastman. RosettaCon, August 5, 2010

Mesoscopic Simulation of Aggregate Structure and Stability of Heavy Crude Oil by GPU Accelerated DPD

A Reconfigurable Quantum Computer

Some thoughts about energy efficient application execution on NEC LX Series compute clusters

Modeling and visualization of molecular dynamic processes

Basic introduction of NWChem software

Massively parallel semi-lagrangian solution of the 6d Vlasov-Poisson problem

MagmaDNN High-Performance Data Analytics for Manycore GPUs and CPUs

On Portability, Performance and Scalability of a MPI OpenCL Lattice Boltzmann Code

Lattice Quantum Chromodynamics on the MIC architectures

Hydra. A library for data analysis in massively parallel platforms. A. Augusto Alves Jr and Michael D. Sokoloff

INITIAL INTEGRATION AND EVALUATION

Multiphase Flow Simulations in Inclined Tubes with Lattice Boltzmann Method on GPU

Information Sciences Institute 22 June 2012 Bob Lucas, Gene Wagenbreth, Dan Davis, Roger Grimes and

Advanced Molecular Dynamics

arxiv: v1 [hep-lat] 7 Oct 2010

Direct Self-Consistent Field Computations on GPU Clusters

Tips Geared Towards R. Adam J. Suarez. Arpil 10, 2015

HPC APPLICATION SUPPORT FOR GPU COMPUTING

Welcome to MCS 572. content and organization expectations of the course. definition and classification

Particle-Simulation Methods for Fluid Dynamics

Claude Tadonki. MINES ParisTech PSL Research University Centre de Recherche Informatique

Using Web-Based Computations in Organic Chemistry

Lazy matrices for contraction-based algorithms

Computational Physics. J. M. Thijssen

Optimizing the Performance of Reactive Molecular Dynamics Simulations for Multi-core Architectures

High Performance Computing

Lattice Boltzmann simulations on heterogeneous CPU-GPU clusters

JASS Modeling and visualization of molecular dynamic processes

Due: since the calculation takes longer than before, we ll make it due on 02/05/2016, Friday

2.3 Modeling Interatomic Interactions Pairwise Potentials Many-Body Potentials Studying Biomolecules: The Force

ELSI: A Unified Software Interface for Kohn-Sham Electronic Structure Solvers

Everyday Multithreading

Basic introduction of NWChem software

Example questions for Molecular modelling (Level 4) Dr. Adrian Mulholland

Panorama des modèles et outils de programmation parallèle

Molecular Dynamics Simulation of Nanometric Machining Under Realistic Cutting Conditions Using LAMMPS

Molecular dynamics simulation. CS/CME/BioE/Biophys/BMI 279 Oct. 5 and 10, 2017 Ron Dror

CECAM Tutorial Stuttgart, Oct. 2012

Optimization strategy for MASNUM surface wave model

Performance Evaluation of MPI on Weather and Hydrological Models

MO Calculation for a Diatomic Molecule. /4 0 ) i=1 j>i (1/r ij )

IFM Chemistry Computational Chemistry 2010, 7.5 hp LAB2. Computer laboratory exercise 1 (LAB2): Quantum chemical calculations

Jack Smith. Center for Environmental, Geotechnical and Applied Science. Marshall University

Hydra. A. Augusto Alves Jr and M.D. Sokoloff. University of Cincinnati

HPMPC - A new software package with efficient solvers for Model Predictive Control

Accelerating Linear Algebra on Heterogeneous Architectures of Multicore and GPUs using MAGMA and DPLASMA and StarPU Schedulers

Efficient multigrid solvers for mixed finite element discretisations in NWP models

Simulation Methods II

Parallel Sparse Tensor Decompositions using HiCOO Format

CP2K. New Frontiers. ab initio Molecular Dynamics

Multiscale simulations of complex fluid rheology

SPARSE SOLVERS POISSON EQUATION. Margreet Nool. November 9, 2015 FOR THE. CWI, Multiscale Dynamics

Benchmarking program performance evaluation of Parallel programming language XcalableMP on Many core processor

Grading Homework (30%), Midterm project (30%), Final project (30%), Participation (10%)

SIMCON - Computer Simulation of Condensed Matter

An FPGA Implementation of Reciprocal Sums for SPME

Cosmological N-Body Simulations and Galaxy Surveys

The Panel: What does the future look like for NPW application development? 17 th ECMWF Workshop on High Performance Computing in Meteorology

Advancing Weather Prediction at NOAA. 18 November 2015 Tom Henderson NOAA / ESRL / GSD

Weather and Climate Modeling on GPU and Xeon Phi Accelerated Systems

WRF performance tuning for the Intel Woodcrest Processor

Efficient Molecular Dynamics on Heterogeneous Architectures in GROMACS

One Optimized I/O Configuration per HPC Application

Julian Merten. GPU Computing and Alternative Architecture

COMBINED EXPLICIT-IMPLICIT TAYLOR SERIES METHODS

Cactus Tools for Petascale Computing

APPLICATION OF CUDA TECHNOLOGY FOR CALCULATION OF GROUND STATES OF FEW-BODY NUCLEI BY FEYNMAN'S CONTINUAL INTEGRALS METHOD

RESEARCH ON THE DISTRIBUTED PARALLEL SPATIAL INDEXING SCHEMA BASED ON R-TREE

MPI at MPI. Jens Saak. Max Planck Institute for Dynamics of Complex Technical Systems Computational Methods in Systems and Control Theory

LAMMPS for Molecular Dynamics Simulation

Stochastic Modelling of Electron Transport on different HPC architectures

Establishing a CUDA Research Center at Penn State: Perspectives on GPU-Enabled Teaching and Research

Computational Modeling Software and their applications

CRYSTAL in parallel: replicated and distributed (MPP) data. Why parallel?

Transcription:

Introduction to Benchmark Test for Multi-scale Computational Materials Software Shun Xu*, Jian Zhang, Zhong Jin xushun@sccas.cn Computer Network Information Center Chinese Academy of Sciences (IPCC member) 2019.1.14

Outline Introduction to computational materials software Benchmark testing routine proposed Current progress of benchmark testing system Conclusions

The Computational Materials Science Material modelling Theoretical Research Computational materials Science Experimental study Material design Multiscale mechanics Statistical Mechanics Fluid Mechanics Molecular Mechanics Quantum Mechanics Difficulties in Computational Materials Science Multidimensional system Nonlinear system Various of characteristic space scale These are great challenges of improving computational performance

The temporal and spatial scales of Computational materials Hydrodynamic finite element method Bodies Molecules Atoms Molecules Liquid and solid (crystals) Polymers Thermodynamics Electrostatics Nanodevices Self-Assembly Aggregates Nanomaterials Electrons Force fields

Methods used in Computational Materials Science The commonly used methods include First-principles/ab initio method Molecular dynamics method Monte Carlo method Cellular automata method Phase field method Geometric topological model method Finite element analysis

Software developed for Computational Materials Science First-principles/ab initio method NWChem QUANTUM ESPRESSO Abinit Gaussian VASP Molecular dynamics method LAMMPS GROMACS CHARMM Material Studio Phase field method MICRESS (incomplete list)

To harness High-Performance Computing facilities Two typical open source HPC software of computational materials NWChem: Open Source High-Performance Computational Chemistry nanostructures from quantum to classical LAMMPS: Large-scale Atomic/Molecular Massively Parallel Simulator classical molecular dynamics on materials modeling

For better use, questions we need answer to are... Where is the bottleneck in these software running? And how to evaluate it? How to estimate the best computational performance of these software over current computing resource? And how to achieve the best performance? And more Resort to benchmark testing!

We propose a routine to benchmark testing 1. To prepare for benchmark Collect hardware information Collect operation system information Choose compiler environments Do basic benchmarks for system 2. In benchmark testing of application software Locate common problems in target software Write & profile a prototype program Prepare typical use cases Run & compare with typical use cases 3. Over benchmark testing Do a way of batch processing Setup a platform of performance analysis

NWChem: benchmark testing Locate common problems in target software o SCF: self-consistent field o Density functional theory o Ab initio molecular dynamics o Perturbation theory o Coupled cluster o MCSCF: multiconfiguration self-consistent field o CI: configuration interaction o Molecular mechanics o Molecular dynamics o Free energy simulation

NWChem: benchmark testing The NWChem architecture representing general functionality within NWChem (citation from: sourcebook of parallel computing)

NWChem: Features considered in Parallel performance The Global Array (GA) Toolkit o for one-sided access to shared data structures o for data locality sensitive o for high-level operations on distributed arrays for out-of-core array based algorithms o for simulations driven by dynamic load balancing Peigs: a parallel eigen-solver MA: a portable memory allocator RTDB:The run-time database (a storage of key/value pairs)

Write & profile a prototype program of HF-SCF Although a broad range of computational chemistry methods, Hartree-Fock self-consistent field (HF-SCF) is usual a common module as foundation. The complexity of HF-SCF The cost of solving Fock eigen matrix scales with O(N 3 ), where N is for basis functions. But the cost of building Fock eigen matrix scales with the number of integrals, which is formally O(N 4 ) for N basis functions. HILQC: A prototype Program of HF-SCF

About HILQC HILQC is a Heterogeneous Integral Library for Quantum Chemistry Optimized in prescreening and presort stage on Intel Platform (MIC/KNL) Optimizing Two-Electron Repulsion Integral Calculation on Knights Landing Architecture (Presented at IXPUG workshop 2018 by Dr. Yingqi Tian from our group.) Integrated with Gaussian function based BDF (Beijing Density Functional) program package (finished)

Presort and J K matrix testing Haswell Broadwell KNL GAMESS caffeine 0.14 0.12 0.13 0.1 cocaine 0.44 0.33 0.46 0.4 taxol 2.78 1.62 1.55 3.8 valinomycin 4.48 2.71 2.12 10.8

AVX-512 Intrinsic of incomplete gamma function The calculation of all double electron integrals based on (GTOs) and (STOs) basis functions can be transformed into a series of incomplete Gamma functions

LAMMPS: benchmark testing Locate common problems in target software o classical molecular dynamics of atoms/particles o Interatomic potentials (force fields) pair style bond style angle style dihedral style improper style kspace style o Ensembles, constraints, and boundary conditions o Integrators velocity-verlet integrator Brownian dynamics rigid body integration energy minimization

LAMMPS: Features considered in Parallel performance GPU Package USER-INTEL Package KOKKOS Package USER-OMP Package OPT Package for NVIDIA GPUs as well as OpenCL support for Intel CPUs and Intel Xeon Phi for Nvidia GPUs, Intel Xeon Phi, and OpenMP threading for OpenMP threading and generic CPU optimizations generic CPU optimizations Many-core CPUs NVIDIA GPUs Intel Phi USER-INTEL, KOKKOS, USER- OMP, OPT packages GPU, KOKKOS packages USER-INTEL, KOKKOS packages

Write & profile a prototype program of MD The calculation of interatomic potentials (force fields) is a typical bottleneck. The complexity of Molecular Dynamics (MD) The cost of building neighbor list of many-body short-range potentials The cost of calculating long-range interactions for charge. emd: A prototype Program of MD

About emd An easy-to-use MD software for biomacromolecules Standard MD simulation process, developed by CNIC, CAS Main Features o Based on Moly HPC Framework o Custom Force Field o Advanced Modeling o Heterogeneous Computing

The emd software architecture for high performance Molecular Dynamics simulations Atom Info Particle properties Group constraint Define groups and constraints Neighbor list build Construct neighbor list Force field mapping Force style MD Integrator MD main loop Initiate system Initiate atom and box Accelerated computing GPU and MIC Modify ensemble Construct various ensembles Python Module API of classes and functions Logical flow Logical flow control Visualization Data visualization Trajectory output Trajectory frames Thermo logout Thermodynamics statistics

emd & Moly UI frontend Computation backend

About Benchmark testing in NWChem and LAMMPS Locate calculation hotspot and bottleneck In theoretical way by profiling tools (Intel Vtune Amplifier) Write prototype program highlight it Computation Network Memory and Disk Run & compare typical use cases Compare results between prototype program and application software Tell us how it works efficiently

Benchmark testing in batch processing Some slides on this content are not posted online for privacy reason

A benchmark platform for performance analysis (under work) To publish performance data in a uniform format To give a way to compare performance in testing systems To create a machine learning method for performance estimation To publish any information to help us well understand the software and hardware performance

Output figures from benchmark testing LAMMPS benchmark of rhodo case by only one command number of particles fixed (Amdahl law) number of particles scaled (Gustafson law)

Conclusions To build an efficient way to well learn the performance of software and hardware, by benchmark testing system It is feasible for the benchmark testing system used from computational materials to a broad range of scientific applications A part of this work is summited as a proposal of IPCC project

Thank you!