Adaptive Heterogeneous Computing with OpenCL: Harnessing hundreds of GPUs and CPUs

Similar documents
Adaptive Heterogeneous Computing with OpenCL: Harnessing hundreds of GPUs and CPUs

Mechanisms for exploiting heterogeneous computing: Harnessing hundreds of GPUs and CPUs

BUDE. A General Purpose Molecular Docking Program Using OpenCL. Richard B Sessions

Docking. GBCB 5874: Problem Solving in GBCB

Parallelization of Molecular Dynamics (with focus on Gromacs) SeSE 2014 p.1/29

Using AutoDock for Virtual Screening

Petascale Quantum Simulations of Nano Systems and Biomolecules

Klaus Schulten Department of Physics and Theoretical and Computational Biophysics Group University of Illinois at Urbana-Champaign

Introduction to numerical computations on the GPU

Predictive Molecular Simulation for Drug Discovery

Homology modeling. Dinesh Gupta ICGEB, New Delhi 1/27/2010 5:59 PM

Molecular dynamics simulation. CS/CME/BioE/Biophys/BMI 279 Oct. 5 and 10, 2017 Ron Dror

est Drive K20 GPUs! Experience The Acceleration Run Computational Chemistry Codes on Tesla K20 GPU today

User Guide for LeDock

MURCIA: Fast parallel solvent accessible surface area calculation on GPUs and application to drug discovery and molecular visualization

A Quantum Chemistry Domain-Specific Language for Heterogeneous Clusters

Julian Merten. GPU Computing and Alternative Architecture

5.1. Hardwares, Softwares and Web server used in Molecular modeling

Molecular Mechanics, Dynamics & Docking

Cheminformatics platform for drug discovery application

Using a CUDA-Accelerated PGAS Model on a GPU Cluster for Bioinformatics

Perm State University Research-Education Center Parallel and Distributed Computing

Chemical properties that affect binding of enzyme-inhibiting drugs to enzymes

Alchemical free energy calculations in OpenMM

Real-time signal detection for pulsars and radio transients using GPUs

Introduction to Benchmark Test for Multi-scale Computational Materials Software

INTENSIVE COMPUTATION. Annalisa Massini

11 Parallel programming models

PROTEIN-PROTEIN DOCKING REFINEMENT USING RESTRAINT MOLECULAR DYNAMICS SIMULATIONS

Claude Tadonki. MINES ParisTech PSL Research University Centre de Recherche Informatique

Computational chemical biology to address non-traditional drug targets. John Karanicolas

Cross Discipline Analysis made possible with Data Pipelining. J.R. Tozer SciTegic

Protein structure prediction. CS/CME/BioE/Biophys/BMI 279 Oct. 10 and 12, 2017 Ron Dror

De Novo molecular design with Deep Reinforcement Learning

Direct Self-Consistent Field Computations on GPU Clusters

Welcome to MCS 572. content and organization expectations of the course. definition and classification

Progress of Compound Library Design Using In-silico Approach for Collaborative Drug Discovery

Quantum Chemical Calculations by Parallel Computer from Commodity PC Components

How is molecular dynamics being used in life sciences? Davide Branduardi

The PhilOEsophy. There are only two fundamental molecular descriptors

Nuclear Physics and Computing: Exascale Partnerships. Juan Meza Senior Scientist Lawrence Berkeley National Laboratory

Softwares for Molecular Docking. Lokesh P. Tripathi NCBS 17 December 2007

High-Performance Computing and Groundbreaking Applications

Graphics Card Computing for Materials Modelling

GPU Computing Activities in KISTI

Simulation Laboratories at JSC

Chemical properties that affect binding of enzyme-inhibiting drugs to enzymes

Calculation of ground states of few-body nuclei using NVIDIA CUDA technology

Biologically Relevant Molecular Comparisons. Mark Mackey

Open-Source Parallel FE Software : FrontISTR -- Performance Considerations about B/F (Byte per Flop) of SpMV on K-Supercomputer and GPU-Clusters --

Bioengineering & Bioinformatics Summer Institute, Dept. Computational Biology, University of Pittsburgh, PGH, PA

Performance Analysis of Lattice QCD Application with APGAS Programming Model

Scalable and Power-Efficient Data Mining Kernels

Panorama des modèles et outils de programmation parallèle

APPLICATION OF CUDA TECHNOLOGY FOR CALCULATION OF GROUND STATES OF FEW-BODY NUCLEI BY FEYNMAN'S CONTINUAL INTEGRALS METHOD

Lattice Boltzmann simulations on heterogeneous CPU-GPU clusters

Hybrid CPU/GPU Acceleration of Detection of 2-SNP Epistatic Interactions in GWAS

Kd = koff/kon = [R][L]/[RL]

Piz Daint & Piz Kesch : from general purpose supercomputing to an appliance for weather forecasting. Thomas C. Schulthess

MD Simulation in Pose Refinement and Scoring Using AMBER Workflows

Ligand Scout Tutorials

GPU acceleration of Newton s method for large systems of polynomial equations in double double and quad double arithmetic

Hit Finding and Optimization Using BLAZE & FORGE

Efficient Molecular Dynamics on Heterogeneous Architectures in GROMACS

Random Sampling for Short Lattice Vectors on Graphics Cards

The Fast Multipole Method in molecular dynamics

Population annealing study of the frustrated Ising antiferromagnet on the stacked triangular lattice

High Performance Computing

Tips Geared Towards R. Adam J. Suarez. Arpil 10, 2015

VMD: Visual Molecular Dynamics. Our Microscope is Made of...

RAID+: Deterministic and Balanced Data Distribution for Large Disk Enclosures

Supporting Online Material for

Dr. Sander B. Nabuurs. Computational Drug Discovery group Center for Molecular and Biomolecular Informatics Radboud University Medical Centre

Research on GPU-accelerated algorithm in 3D finite difference neutron diffusion calculation method

N-body Simulations. On GPU Clusters

Introduction to Chemoinformatics and Drug Discovery

NIH Center for Macromolecular Modeling and Bioinformatics Developer of VMD and NAMD. Beckman Institute

Virtual Libraries and Virtual Screening in Drug Discovery Processes using KNIME

Building 3D models of proteins

Programme Last week s quiz results + Summary Fold recognition Break Exercise: Modelling remote homologues

The QMC Petascale Project

arxiv: v1 [hep-lat] 7 Oct 2010

Molecular Modelling for Medicinal Chemistry (F13MMM) Room A36

Early Stages of Drug Discovery in the Pharmaceutical Industry

Life Science Webinar Series

Listening for thunder beyond the clouds

Protein Structure Prediction II Lecturer: Serafim Batzoglou Scribe: Samy Hamdouche

XXL-BIOMD. Large Scale Biomolecular Dynamics Simulations. onsdag, 2009 maj 13

CRYPTOGRAPHIC COMPUTING

arxiv: v1 [physics.data-an] 19 Feb 2017

MM-GBSA for Calculating Binding Affinity A rank-ordering study for the lead optimization of Fxa and COX-2 inhibitors

Advanced Computing Systems for Scientific Research

Schrodinger ebootcamp #3, Summer EXPLORING METHODS FOR CONFORMER SEARCHING Jas Bhachoo, Senior Applications Scientist

COMBINATORIAL CHEMISTRY: CURRENT APPROACH

SP-CNN: A Scalable and Programmable CNN-based Accelerator. Dilan Manatunga Dr. Hyesoon Kim Dr. Saibal Mukhopadhyay

Virtual Screening: How Are We Doing?

A Deep Convolutional Neural Network for Bioactivity Prediction in Structure-based Drug Discovery

STRUCTURAL BIOINFORMATICS II. Spring 2018

Delayed and Higher-Order Transfer Entropy

Different conformations of the drugs within the virtual library of FDA approved drugs will be generated.

Transcription:

Adaptive Heterogeneous Computing with OpenCL: Harnessing hundreds of GPUs and CPUs Simon McIntosh-Smith simonm@cs.bris.ac.uk Head of Microelectronics Research University of Bristol, UK 1

! A brief biography Graduated as Valedictorian in Computer Science from Cardiff University in 1991 Joined Inmos to work for David May as a microprocessor architect Moved to Pixelfusion in 1999 a high- tech start- up designing the first GPGPU, a many- core general purpose graphics processor Co- founded ClearSpeed in 2002 as Director of Architecture and ApplicaMons Joined the CS department at the University of Bristol in April 2009 to focus on High Performance CompuMng and architectures 2

! Recent HPC activities European HPC: PRACE Mont Blanc ARM-based supercomputer EESI exascale software initiative UK HPC: ARCHER project working group Next UK national supercomputer Several PFLOPS, goes live summer 13 3

! Collaborators Richard B. Sessions, Amaurys Avila Ibarra University of Bristol, Biochemistry Developers of the docking code BUDE James Price (port to OpenCL) University of Bristol, Computer Science Tsuyoshi Hamada, Felipe Cruz (GPUs) University of Nagasaki, Japan Winners of the 2009 Gordon Bell price/ performance prize 4

Molecular docking 5

! Molecular docking Proteins typically O(1,000) atoms Ligands typically O(100) atoms 6

! BUDE: Bristol University Docking Engine Accuracy Speed Typical docking scoring functions Empirical Free Energy Forcefield BUDE Free Energy calculations MM 1,2 QM/MM 3 Entropy: solvation configurational Electrostatics All atom Explicit solvent No Yes Yes Approx Approx Yes? Approx Yes No Yes Yes No No Yes 1. MD Tyka, AR Clarke, RB Sessions, J. Phys. Chem. B 110 17212-20 (2006) 2. MD Tyka, RB Sessions, AR Clarke, J. Phys. Chem. B 111 9571-80 (2007) 3. CJ Woods, FR Manby, AJ Mulholland, J. Chem. Phys. 128 014109 (2008) 7

! Empirical Free Energy Function (atom-atom) ΔG ligand binding = i=1 N proteinj=1 N ligand f(xi,x j ) Parameterised using experimental data N. Gibbs, A.R. Clarke & R.B. Sessions, "Ab-initio Protein Folding using Physicochemical Potentials and a Simplified Off-Lattice Model", Proteins 43:186-202,2001 8

! BUDE Acceleration with OpenCL START (input) Copy protein and ligand coordinates (once) GA like, energy minimisation (output) END ( ) R x R xy R xz T x R yx R y R yz T y R zx R zy R z T z Energy of pose Geometry (transform ligand) Energy i=1 N prot j=1 N lig f(x i,x j ) "Benchmarking energy efficiency, power costs and carbon emissions on heterogeneous systems", Simon McIntosh-Smith, Terry Wilson, Amaurys Avila Ibarra, Jon Crisp and Richard B.Sessions, The Computer Journal, September 12th 2011. DOI: 10.1093/comjnl/bxr091 9

! Why OpenCL? Open standard with fast growing support Evolving rapidly: C++, HSA, OpenCL 2.0, Platform portability Have used dozens of different GPUs & CPUs Naturally supports heterogeneous systems Harness CPUs and GPUs simultaneously 10

! Multiple levels of parallelism O(10 8 ) conformers from O(10 7 ) ligands O(10 5 ) poses per conformer (ligand) O(10 3 ) atoms per protein O(10 2 ) atoms per ligand (drug molecule) Conformers all independent Poses all independent, but there are benefits in grouping all poses of one conformer to one OpenCL device 11

! Parallelism strategy Distribute ligands across nodes 10 7 -way parallelism All the poses of one conformer distributed across all the OpenCL devices in a node 10 3 -way parallelism Each Work-Item (thread) performs an entire conformer-protein docking 10 5 atom-atom force calculations 12

! How BUDE s EMC works 13

Experimental results 14

! Redocking into Xray Structure 1CIL (Human carbonic anhydrase II) 15

! Another example 1EZQ (Human Factor XA) 16

! The science is working well 53 Experimental results from the Ligand Binding Database 17

Heterogeneous Systems 18

! OpenCL for heterogeneous computing A modern computer includes: One or more CPUs One or more GPUs CPU GPU CPU GPU OpenCL (Open Compute Language) lets programmers write a single portable program that uses ALL resources in the heterogeneous platform 19

! Kinds of Heterogeneous Systems SuperMicro 4 GPU workstation 20

! Kinds of Heterogeneous Systems SuperMicro GPU servers 2 GPUs in 1U 21

! Kinds of Heterogeneous Systems HP SL390 2 CPUs, 8 GPUs in half width 4U 22

! Kinds of Heterogeneous Systems Dell PowerEdge C410x 16 GPUs in 4U 23

! Kinds of Heterogeneous Systems AMD Llano Fusion APUs 24

! Kinds of Heterogeneous Systems Intel Core2 Duo CPU P8600 @ 2.40GHz, NVIDIA GeForce 9400M integrated GPU, NVIDIA GeForce 9600M GT discrete GPU 25

Benchmark results 26

! BUDE s heterogeneous approach Within a node: 1. Discover all OpenCL platforms/devices, including CPUs and GPUs 2. Run a micro benchmark on each device, actually a short piece of real work 3. Load balance using micro benchmark results 4. Re-run micro benchmark at regular intervals to adapt to load changes 27

! BUDE s heterogeneous approach Between nodes: 1. Partition entire ligand database (~160M) into subsets that will take a few minutes to process on one node 2. Distribute initial subsets across nodes 3. Work stealing scheduler to load balance across nodes of different performance 4. Time-outs trigger reallocation of MIA subsets to working nodes 28

! Benchmarking methodology Use the same power measurement equipment for all the systems under test Watts Up? Pro meter Measures complete system power at the wall Run as fast as possible on all available resources (i.e. all cores or all GPUs simultaneously) 29

! MacBook Pro 2009 results Time in seconds, lower is better 30

! Benchmark results 31

! Performance results Speedup (higher is better) 16 14 12 10 8 6 4 13.4 11.7 8.1 GPUs 7.3 6.5 5.3 2 0 M2050 (x2) GTX-580 M2070 M2050 GTX-285 HD5870 A8-3850 0.8 32

! Performance results Speedup (higher is better) 1.2 1.0 0.8 0.6 0.4 2.8GHz 1.0 2.4GHz 3.4GHz 0.9 0.9 2.3GHz 0.7 CPUs 2.9GHz 0.3 0.2 0.0 8 cores E5462 (Fortran) x2 8 cores 4 cores 4 cores 4 cores E5620 x2 i7-2600k i5-2500t A8-3850 CPU 33

! Performance results Speedup (higher is better) 16 14 12 10 8 6 13.6 Heterogeneous (CPUs+GPUs) 11.7 5.9 4 2 1.2 0 M2050 (x2) & E5620 (x2) GTX-580 & E8500 HD5870 & i5-2500t A8-3850 GPU & CPU 34

! Performance results 16 Speedup (higher is better) 14 12 10 8 6 13.4 11.7 Where we finished Where we could go Where we started 4 2 1.0 0 M2050 x2 GTX-580 E5462 (Fortran) x2 ~$2,000 x2 ~$400 35

! Relative energy and run-time 18 16 88% reduction in energy 93% reduction in time 16.2 15.4 Relative Performance 15.2 14 13.2 Relative Performance Per Watt 12 10.6 11.4 10 9.3 9.1 8 6 6.6 5.9 4.5 4 2 1.0 1.0 0.8 0.7 2.4 0 M2050 + E5620 x2 M2050 x2 GTX-580 HD5870 + i5-2500t HD5870 E5620 x2 A8-3850 GPU i5-2500t Measurements are for a constant amount of work. Energy measurements are at the wall and include any idle components. 36

What does this let us do? 37

! Potentially save lives New Delhi Metallo-betalactamase-1 (NDM-1) is an enzyme that makes bacteria resistant to antibiotics, giving rise to superbugs http://news.discovery.com/ human/superbug-found-injapan.html 38

! NDM-1 as a docking target NDM-1 protein made up of 939 atoms 39

40

! GPU-system DEGIMA Used 222 GPUs in parallel for drug docking simulations ATI Radeon HD5870 (2.72 TFLOPS) & Intel i5-2500t ~600 TFLOPS single precision peak performance Courtesy of Tsuyoshi Hamada and Felipe Cruz, Nagasaki 41

! NDM-1 experiment 7.65 million candidate drug molecules, 21.8 conformers each à 166.7x10 6 dockings 4.168 x 10 12 poses calculated ~98 hours actual wall-time Top 300 hits being analysed, down selecting to 10 compounds for further investigation in the lab 42

Future work 43

! Future work Run on new, 372 GPU Emerald cluster HP SL390 system, Nvidia M2090s GPUs Also Gnodal fast Ethernet interconnect Further improve forcefield (underway) RMSD already under 2Å, down from >4Å Port to emerging, OpenCL systems E.g. ARM-based, Imagination, Benchmark on Kepler, Graphics Core Next Explore fault tolerance within a node 44

! Conclusions OpenCL enables truly heterogeneous computing, harnessing all hardware resources in a system GPUs can yield significant savings in energy costs (and equipment costs) OpenCL can work well for multi-core CPUs as well as for GPUs It s possible to screen libraries of millions of molecules against complex targets using highly accurate, computationally-expensive methods in hours using equipment costing O( 100K) 45

! References S. McIntosh-Smith, T. Wilson, A.A. Ibarra, J. Crisp and R.B. Sessions, "Benchmarking energy efficiency, power costs and carbon emissions on heterogeneous systems", The Computer Journal, September 12th 2011. DOI: 10.1093/comjnl/bxr091 N. Gibbs, A.R. Clarke & R.B. Sessions, "Abinitio Protein Folding using Physicochemical Potentials and a Simplified Off-Lattice Model", Proteins 43:186-202,200 46

! References Kumarasamy, K.K., et al., Emergence of a new antibiotic resistance mechanism in India, Pakistan, and the UK: a molecular, biological, and epidemiological study. Lancet Infectious Diseases, 2010. 10(9): p. 597-602. Zhang, H.M. and Q. Hao, Crystal structure of NDM-1 reveals a common beta-lactam hydrolysis mechanism. Faseb Journal, 2011. 25(8): p. 2574-2582. Irwin, J.J. and B.K. Shoichet, ZINC - A free database of commercially available compounds for virtual screening. Journal of Chemical Information and Modeling, 2005. 45(1): p. 177-182. 47

! References Wolfenden, R., et al., Affinities of Amino- Acid Side-Chains for Solvent Water. Biochemistry, 1981. 20(4): p. 849-855. Gibbs, N., A.R. Clarke, and R.B. Sessions, Ab initio protein structure prediction using physicochemical potentials and a simplified off-lattice model. Proteins, 2001. 43(2): p. 186-202. 48