Molecular dynamics simulations and drug discovery

Size: px
Start display at page:

Download "Molecular dynamics simulations and drug discovery"

Transcription

1

2 olecular dynamics simulations and drug discovery Jacob D. Durrant, J. Andrew ccammon BC Biology :71 DOI: / With constant improvements in both computer power and algorithm design, the future of computer-aided drug design is promising; molecular dynamics simulations are likely to play an increasingly important role. First principles computational materials design for energy storage materials in lithium ion batteries Ying Shirley eng,. Elena Arroyo-de Dompablo Energy Environ. Sci. 2009, 2, DOI: /b901825e Specifically, we show how each relevant property can be related to the structural component in the material and can be computed from first principles.

3 SAPLING TIE ap for Atomistic Simulations ms μs OLECULAR DYNAICS ns ps AB-INITIO ETHODS fs SYSTE SIZE

4 SAPLING TIE ap for Atomistic Simulations ms μs OLECULAR DYNAICS ns ps AB-INITIO ETHODS Q/ ETHODS fs SYSTE SIZE

5 Nobel Prize 2013 in Chemistry: Development of ultiscale odels Further details: Electron dynamics in complex environments with real-time time dependent density functional theory in a Q- framework U.N. orzan et al. The Journal of Chemical Physics 140, (2014). DOI: /

6 THE LIO PROJECT Implementing novel methods of atomistic quantum simulation that efficiently use the current state-of-the-art hardware tools Computer Science LIO Theoretical Chemistry github.com/nanolebrero/lio

7 THE LIO PROJECT Quantum Calculations (Q/ with Amber) DFT + Gaussian Basis Single Point Calculation Electronic Propagation Born-Oppenheimer TD-DFT Electron Dynamics Dynamics Ehrenfest Dynamics github.com/nanolebrero/lio

8 DFT: The way of the density Hohenberg - Khon Theorem ρ( x, y, z) Ψ (r 1,, r n ) otions are governed not by deterministic laws, but by probability functions; chemical bonds are formed not mechanically, but by shifting clouds of electrons that are simultaneously waves and particles. olecular dynamics simulations and drug discovery

9 DFT: The way of the density Hohenberg - Khon Theorem ρ( x, y, z) Ψ (r 1,, r n ) ρ( x, y, z) = Pij χ i ( x, y, z ) χ j ( x, y, z) i=1

10 E [ ρ ] = T [ ρ ] + V ne [ ρ ] + V ee [ ρ ] + E XC [ ρ ]

11 E [ ρ ] = T [ ρ ] + V ne [ ρ ] + V ee [ ρ ] + E XC [ ρ ] Analytical Resolution: ρ(r ) = Pij χi (r ) χ j (r ) i=1 gets distributed = T [ P ] +V ne [ P ] +V ee [ P ]

12 E [ ρ ] = T [ ρ ] + V ne [ ρ ] + V ee [ ρ ] + E XC [ ρ ] Single Point ( 1 Step )

13 E XC [ ρ ] ϵ XC [ ρ(x, y, z ), ρ(x, y, z) ] dv E XC [ ρ ] ϵ XC [ ρ(r k ), ρ(r k ) ] V k k ρ(r k ) = Pij χ i (r k ) χ j (r k ) i=1 χi ρ = x i=1 x j =1 i=1 Pij χ j + χ i χj Pij x

14 SINGLE POINT CALCULATION INPUT Nuclear Geometry Starting Guess Pij E XC [ ρ ] (1) Compute Basis Values. (2) Compute Density Values. OUTPUT Ground State Density (3) Numeric Integral

15 SINGLE POINT CALCULATION INPUT Nuclear Geometry Starting Guess Pij χ1 (r 1) χ 1 (r K ) χ (r 1 ) χ (r K ) E XC [ ρ ] (1) Compute Basis Values. (2) Compute Density Values. OUTPUT Ground State Density (3) Numeric Integral

16 SINGLE POINT CALCULATION INPUT Nuclear Geometry Starting Guess ρ(r k ) = Pij χ i (r k ) χ j (r k ) i=1 j =1 Pij E XC [ ρ ] (1) Compute Basis Values. (2) Compute Density Values. OUTPUT Ground State Density (3) Numeric Integral

17 SINGLE POINT CALCULATION INPUT Nuclear Geometry Starting Guess ϵxc [ ρ(r k ), ρ( r k ) ] V k k Pij E XC [ ρ ] (1) Compute Basis Values. (2) Compute Density Values. OUTPUT Ground State Density (3) Numeric Integral

18 SINGLE POINT CALCULATION INPUT Nuclear Geometry Starting Guess ρ(r k ) = Pij χ i (r k ) χ j (r k ) i=1 j =1 Pij E XC [ ρ ] (1) Compute Basis Values. (2) Compute Density Values. OUTPUT Ground State Density (3) Numeric Integral

19 ρ(r k ) = Pij χ i (r k ) χ j (r k ) i=1 j =1 1 Thread = 1 Point + Double Sum

20 ρ(r k ) = Pij χ i (r k ) χ j (r k ) i=1 j =1 i=1 ρ(r k ) = χ i (r k ) P ij χ j (r k ) 1 Thread = 1 Point + Only j-sum (+ other subroutine to i-accumulate)

21 ρ(r k ) = Pij χ i (r k ) χ j (r k ) i=1 j =1 i=1 ρ(r k ) = χ i (r k ) P ij χ j (r k ) SAE PATTERN SAE NUBER χi ρ = x i=1 x i=1 j =1 Pij χ j + χi χj P ij x

22 P ij χ j (r k ) POINTS OF GRID FUNCTIONS OF BASIS k=1 i=1 i=2 i=3 i=4 i=5 i=6 i=7 i=8 k=2 k=3 k=4 k=5 k=6 k=7 k=8

23 BLOCKS OF SIZE B POINTS OF GRID FUNCTIONS OF BASIS k=1 i=1 i=2 i=3 i=4 i=5 i=6 i=7 i=8 k=2 k=3 k=4 k=5 k=6 k=7 k=8

24 Thread 1 ( function i = 1 ) Thread 2 ( function i = 2 ) P1 j χ j (r k ) P2 j χ j (r k ) Point k fixed Thread B ( function i = B ) BLOCKS OF SIZE B P Bj χ j (r k )

25 Thread 1 ( function i = 1 ) Thread 2 ( function i = 2 ) P1 j χ j (r k ) P2 j χ j (r k ) Point k fixed Thread B ( function i = B ) BLOCKS OF SIZE B P Bj χ j (r k ) Different sets of j = 1 to for all threads The same set of j = 1 to for all threads Pi 1, P i 2,, Pi χ 1 (r k ), χ2 (r k ),, χ (r k )

26 B 2B P 1 j χ j (r k ) + j= B +1 B 2B P 2 j χ j (r k ) + j =B +1 P1 j χ j (r k ) + + j= B+1 P1 j χ j (r k ) P2 j χ j ( r k ) + + j = B +1 P2 j χ j ( r k ) B 2B P Bj χ j (r k ) + j= B+1 P Bj χ j (r k ) + + j = B +1 P Bj χ j (r k )

27 B 2B P 1 j χ j (r k ) + j= B +1 B 2B P 2 j χ j (r k ) + j =B +1 P1 j χ j (r k ) + + j= B+1 P1 j χ j (r k ) P2 j χ j ( r k ) + + j = B +1 P2 j χ j ( r k ) B 2B P Bj χ j (r k ) + j= B+1 P Bj χ j (r k ) + + j = B +1 P Bj χ j (r k ) (1) Load functions j = 1 to B (one each thread) to shared mem. (2) syncthreads() (3) Each thread performs sums from 1 to B. Repeat (1) to (3) for j = B+1 to 2B, 2B+1 to 3B, etc.

28 B 2B P 1 j χ j (r k ) + j= B +1 B 2B P 2 j χ j (r k ) + j =B +1 P1 j χ j (r k ) + + j= B+1 P1 j χ j (r k ) P2 j χ j ( r k ) + + j = B +1 P2 j χ j ( r k ) B 2B P Bj χ j (r k ) + { P11 PB 1 j= B+1 P1 2 P1 B PB 2 P B B } P Bj χ j (r k ) + + j = B +1 P Bj χ j (r k )

29 B 2B P 1 j χ j (r k ) + j= B +1 B 2B P 2 j χ j (r k ) + j =B +1 P1 j χ j (r k ) + + j= B+1 P1 j χ j (r k ) P2 j χ j ( r k ) + + j = B +1 P2 j χ j ( r k ) B 2B P Bj χ j (r k ) + { P11 PB 1 j= B+1 P1 2 P1 B PB 2 P B B } P Bj χ j (r k ) + + j = B +1 P Bj χ j (r k ) TEXTURE EORY 2D access Local cache

30 GET GLOBAL VARS =+ P11 GET χ 1 (r k ) GLOBAL VARS From a memory intensive kernel =+ P12 χ 2(r k )

31 GET GLOBAL VARS =+ P11 GET χ 1 (r k ) GLOBAL VARS From a memory intensive kernel GET GLOBAL VARS =+ P11 χ 1 (r k ) =+ P12 χ 2(r k ) to a more arithmetic intensive kernel. =+ P1 B χ B (r k )

32 ( Estimated time for the 4 core CPU ) x GPU: GeForce GTX x HARDWARE CPU: Intel i Exc Time (s) 8 6 x7.3 Test Systems 4 Speedup (1 core) Heme Taxol Valinomycin 2 x 29 x 26 x 23 0 CPU Heme GPU CPU Taxol GPU CPU GPU Valinomycin ( no openp ) Details:. A. Nitsche,. Ferreria, E. E. ocskos,. C. Gonzalez Lebrero, J. Chem Theory and Comput. 2014, 10, DOI: /ct400308n

33 Gaussian function centered in atom Function value is relevant for this point Function value is not relevant for this point

34 GROUPS OF POINTS Groups solved separately: reduced version of problem. Inhomogeneous workload: few groups with a lot of points/functions VS a lot of groups with very few. Small Group: bad for GPU. Hybrid Partitioning: Small spheres (lots of points) and big cubes (fewer points)

35 HYBRID SCHEE Compute the smallest groups in the CPU. Hardware: Tesla K80 (1 device) N x Intel Xeon E v3 synergistic behavior

36 RESULTS FOR THE HYBRID XC TER KERNEL HYBRID COPUTING SCHEE Limitations Coulomb Q/ Hardware GeForce GTX 980 vs Intel i Tesla K80 vs Xeon E Hybrid (K80+8c) vs Xeon E Speedup x 8.0 x 1.9 x 2.8

37 CURRENT STATE HYBRID COPUTING SCHEE Q/ Coulomb cublas Current Limitations Exc (again!) atrix Diag/Dcmp? Single Point Time (sec) XC TER KERNEL 76 s 20 s 3.3 s Before XC Kernel After XC Kernel After: Hybrid, Q, Coulomb, cublas

38 CURRENT STATE HYBRID COPUTING SCHEE Q/ Coulomb cublas Current Limitations Exc (again!) atrix Diag/Dcmp? Single Point Time (sec) XC TER KERNEL 76 s 20 s 3.3 s Before XC Kernel After XC Kernel After: Hybrid, Q, Coulomb, cublas Free Energy Calculation days 30 days

39 PROBLE ath expression specific to the physical model. Very inhomogeneous workload. emory intensive calculations. Direct manipulation of registers and memory influences algorithmic structure. ultiple blocks of threads can run in the same execution unit Re-organizing work to make it more homogeneous. Distribution of work that takes into account the different capabilities of CPUs / GPUs Customized solution with performance comparable to those of problems more naturally suited for GPU.

40 Thank you for your attention! And thanks to Everyone at the group of computer simulation of the DQIAyQF, FCEN, UBA. Collaborators: - Jorge Kohanoff - Cristián G. Sánchez - Ivan Girotto Institutional Support: - CONICET - UBA - ICTP THE LIO GROUP - ariano González Lebrero - Dario Estrin - Damian Scherlis - Esteban osckos - Uriel orzan - William Agudelo - Nicolás Foglia - Fernando Boubetta - Federico Pedron - anuel Ferreria - atías Nitsche - Juan Pablo Darago - Chad Hopkins visit our code! github.com/nanolebrero/lio

Introduction to numerical computations on the GPU

Introduction to numerical computations on the GPU Introduction to numerical computations on the GPU Lucian Covaci http://lucian.covaci.org/cuda.pdf Tuesday 1 November 11 1 2 Outline: NVIDIA Tesla and Geforce video cards: architecture CUDA - C: programming

More information

Computational Methods. Chem 561

Computational Methods. Chem 561 Computational Methods Chem 561 Lecture Outline 1. Ab initio methods a) HF SCF b) Post-HF methods 2. Density Functional Theory 3. Semiempirical methods 4. Molecular Mechanics Computational Chemistry " Computational

More information

Introduction to Density Functional Theory

Introduction to Density Functional Theory Introduction to Density Functional Theory S. Sharma Institut für Physik Karl-Franzens-Universität Graz, Austria 19th October 2005 Synopsis Motivation 1 Motivation : where can one use DFT 2 : 1 Elementary

More information

Parallelization of Molecular Dynamics (with focus on Gromacs) SeSE 2014 p.1/29

Parallelization of Molecular Dynamics (with focus on Gromacs) SeSE 2014 p.1/29 Parallelization of Molecular Dynamics (with focus on Gromacs) SeSE 2014 p.1/29 Outline A few words on MD applications and the GROMACS package The main work in an MD simulation Parallelization Stream computing

More information

Antti-Pekka Hynninen, 5/10/2017, GTC2017, San Jose CA

Antti-Pekka Hynninen, 5/10/2017, GTC2017, San Jose CA S7255: CUTT: A HIGH- PERFORMANCE TENSOR TRANSPOSE LIBRARY FOR GPUS Antti-Pekka Hynninen, 5/10/2017, GTC2017, San Jose CA MOTIVATION Tensor contractions are the most computationally intensive part of quantum

More information

Accelerating Linear Algebra on Heterogeneous Architectures of Multicore and GPUs using MAGMA and DPLASMA and StarPU Schedulers

Accelerating Linear Algebra on Heterogeneous Architectures of Multicore and GPUs using MAGMA and DPLASMA and StarPU Schedulers UT College of Engineering Tutorial Accelerating Linear Algebra on Heterogeneous Architectures of Multicore and GPUs using MAGMA and DPLASMA and StarPU Schedulers Stan Tomov 1, George Bosilca 1, and Cédric

More information

arxiv: v1 [hep-lat] 7 Oct 2010

arxiv: v1 [hep-lat] 7 Oct 2010 arxiv:.486v [hep-lat] 7 Oct 2 Nuno Cardoso CFTP, Instituto Superior Técnico E-mail: nunocardoso@cftp.ist.utl.pt Pedro Bicudo CFTP, Instituto Superior Técnico E-mail: bicudo@ist.utl.pt We discuss the CUDA

More information

Practical Combustion Kinetics with CUDA

Practical Combustion Kinetics with CUDA Funded by: U.S. Department of Energy Vehicle Technologies Program Program Manager: Gurpreet Singh & Leo Breton Practical Combustion Kinetics with CUDA GPU Technology Conference March 20, 2015 Russell Whitesides

More information

Accelerating linear algebra computations with hybrid GPU-multicore systems.

Accelerating linear algebra computations with hybrid GPU-multicore systems. Accelerating linear algebra computations with hybrid GPU-multicore systems. Marc Baboulin INRIA/Université Paris-Sud joint work with Jack Dongarra (University of Tennessee and Oak Ridge National Laboratory)

More information

Using a CUDA-Accelerated PGAS Model on a GPU Cluster for Bioinformatics

Using a CUDA-Accelerated PGAS Model on a GPU Cluster for Bioinformatics Using a CUDA-Accelerated PGAS Model on a GPU Cluster for Bioinformatics Jorge González-Domínguez Parallel and Distributed Architectures Group Johannes Gutenberg University of Mainz, Germany j.gonzalez@uni-mainz.de

More information

Hybrid CPU/GPU Acceleration of Detection of 2-SNP Epistatic Interactions in GWAS

Hybrid CPU/GPU Acceleration of Detection of 2-SNP Epistatic Interactions in GWAS Hybrid CPU/GPU Acceleration of Detection of 2-SNP Epistatic Interactions in GWAS Jorge González-Domínguez*, Bertil Schmidt*, Jan C. Kässens**, Lars Wienbrandt** *Parallel and Distributed Architectures

More information

Lecture 8: Introduction to Density Functional Theory

Lecture 8: Introduction to Density Functional Theory Lecture 8: Introduction to Density Functional Theory Marie Curie Tutorial Series: Modeling Biomolecules December 6-11, 2004 Mark Tuckerman Dept. of Chemistry and Courant Institute of Mathematical Science

More information

Direct Self-Consistent Field Computations on GPU Clusters

Direct Self-Consistent Field Computations on GPU Clusters Direct Self-Consistent Field Computations on GPU Clusters Guochun Shi, Volodymyr Kindratenko National Center for Supercomputing Applications University of Illinois at UrbanaChampaign Ivan Ufimtsev, Todd

More information

Efficient Molecular Dynamics on Heterogeneous Architectures in GROMACS

Efficient Molecular Dynamics on Heterogeneous Architectures in GROMACS Efficient Molecular Dynamics on Heterogeneous Architectures in GROMACS Berk Hess, Szilárd Páll KTH Royal Institute of Technology GTC 2012 GROMACS: fast, scalable, free Classical molecular dynamics package

More information

A microsecond a day keeps the doctor away: Efficient GPU Molecular Dynamics with GROMACS

A microsecond a day keeps the doctor away: Efficient GPU Molecular Dynamics with GROMACS GTC 20130319 A microsecond a day keeps the doctor away: Efficient GPU Molecular Dynamics with GROMACS Erik Lindahl erik.lindahl@scilifelab.se Molecular Dynamics Understand biology We re comfortably on

More information

HECToR CSE technical meeting, Oxford Parallel Algorithms for the Materials Modelling code CRYSTAL

HECToR CSE technical meeting, Oxford Parallel Algorithms for the Materials Modelling code CRYSTAL HECToR CSE technical meeting, Oxford 2009 Parallel Algorithms for the Materials Modelling code CRYSTAL Dr Stanko Tomi Computational Science & Engineering Department, STFC Daresbury Laboratory, UK Acknowledgements

More information

Population annealing study of the frustrated Ising antiferromagnet on the stacked triangular lattice

Population annealing study of the frustrated Ising antiferromagnet on the stacked triangular lattice Population annealing study of the frustrated Ising antiferromagnet on the stacked triangular lattice Michal Borovský Department of Theoretical Physics and Astrophysics, University of P. J. Šafárik in Košice,

More information

Q-Chem 4.0: Expanding the Frontiers. Jing Kong Q-Chem Inc. Pittsburgh, PA

Q-Chem 4.0: Expanding the Frontiers. Jing Kong Q-Chem Inc. Pittsburgh, PA Q-Chem 4.0: Expanding the Frontiers Jing Kong Q-Chem Inc. Pittsburgh, PA Q-Chem: Profile Q-Chem is a high performance quantum chemistry program; Contributed by best quantum chemists from 40 universities

More information

Density Functional Theory

Density Functional Theory Density Functional Theory Iain Bethune EPCC ibethune@epcc.ed.ac.uk Overview Background Classical Atomistic Simulation Essential Quantum Mechanics DFT: Approximations and Theory DFT: Implementation using

More information

A CUDA Solver for Helmholtz Equation

A CUDA Solver for Helmholtz Equation Journal of Computational Information Systems 11: 24 (2015) 7805 7812 Available at http://www.jofcis.com A CUDA Solver for Helmholtz Equation Mingming REN 1,2,, Xiaoguang LIU 1,2, Gang WANG 1,2 1 College

More information

Electron Correlation

Electron Correlation Electron Correlation Levels of QM Theory HΨ=EΨ Born-Oppenheimer approximation Nuclear equation: H n Ψ n =E n Ψ n Electronic equation: H e Ψ e =E e Ψ e Single determinant SCF Semi-empirical methods Correlation

More information

Real-time signal detection for pulsars and radio transients using GPUs

Real-time signal detection for pulsars and radio transients using GPUs Real-time signal detection for pulsars and radio transients using GPUs W. Armour, M. Giles, A. Karastergiou and C. Williams. University of Oxford. 15 th July 2013 1 Background of GPUs Why use GPUs? Influence

More information

Explore Computational Power of GPU in Electromagnetics and Micromagnetics

Explore Computational Power of GPU in Electromagnetics and Micromagnetics Explore Computational Power of GPU in Electromagnetics and Micromagnetics Presenter: Sidi Fu, PhD candidate, UC San Diego Advisor: Prof. Vitaliy Lomakin Center of Magnetic Recording Research, Department

More information

Uncertainty Propagation and Nonlinear Filtering for Space Navigation using Differential Algebra

Uncertainty Propagation and Nonlinear Filtering for Space Navigation using Differential Algebra Uncertainty Propagation and Nonlinear Filtering for Space Navigation using Differential Algebra M. Valli, R. Arellin, P. Di Lizia and M. R. Lavagna Departent of Aerospace Engineering, Politecnico di Milano

More information

André Schleife Department of Materials Science and Engineering

André Schleife Department of Materials Science and Engineering André Schleife Department of Materials Science and Engineering Yesterday you (should have) learned this: http://upload.wikimedia.org/wikipedia/commons/e/ea/ Simple_Harmonic_Motion_Orbit.gif 1. deterministic

More information

Exchange Correlation Functional Investigation of RT-TDDFT on a Sodium Chloride. Dimer. Philip Straughn

Exchange Correlation Functional Investigation of RT-TDDFT on a Sodium Chloride. Dimer. Philip Straughn Exchange Correlation Functional Investigation of RT-TDDFT on a Sodium Chloride Dimer Philip Straughn Abstract Charge transfer between Na and Cl ions is an important problem in physical chemistry. However,

More information

Solving PDEs with CUDA Jonathan Cohen

Solving PDEs with CUDA Jonathan Cohen Solving PDEs with CUDA Jonathan Cohen jocohen@nvidia.com NVIDIA Research PDEs (Partial Differential Equations) Big topic Some common strategies Focus on one type of PDE in this talk Poisson Equation Linear

More information

APPLICATION OF CUDA TECHNOLOGY FOR CALCULATION OF GROUND STATES OF FEW-BODY NUCLEI BY FEYNMAN'S CONTINUAL INTEGRALS METHOD

APPLICATION OF CUDA TECHNOLOGY FOR CALCULATION OF GROUND STATES OF FEW-BODY NUCLEI BY FEYNMAN'S CONTINUAL INTEGRALS METHOD APPLICATION OF CUDA TECHNOLOGY FOR CALCULATION OF GROUND STATES OF FEW-BODY NUCLEI BY FEYNMAN'S CONTINUAL INTEGRALS METHOD M.A. Naumenko, V.V. Samarin Joint Institute for Nuclear Research, Dubna, Russia

More information

Stochastic Modelling of Electron Transport on different HPC architectures

Stochastic Modelling of Electron Transport on different HPC architectures Stochastic Modelling of Electron Transport on different HPC architectures www.hp-see.eu E. Atanassov, T. Gurov, A. Karaivan ova Institute of Information and Communication Technologies Bulgarian Academy

More information

Tracking using CONDENSATION: Conditional Density Propagation

Tracking using CONDENSATION: Conditional Density Propagation Tracking using CONDENSATION: Conditional Density Propagation Goal Model-based visual tracking in dense clutter at near video frae rates M. Isard and A. Blake, CONDENSATION Conditional density propagation

More information

Klaus Schulten Department of Physics and Theoretical and Computational Biophysics Group University of Illinois at Urbana-Champaign

Klaus Schulten Department of Physics and Theoretical and Computational Biophysics Group University of Illinois at Urbana-Champaign Klaus Schulten Department of Physics and Theoretical and Computational Biophysics Group University of Illinois at Urbana-Champaign GTC, San Jose Convention Center, CA Sept. 20 23, 2010 GPU and the Computational

More information

POD-DEIM MODEL ORDER REDUCTION FOR THE MONODOMAIN REACTION-DIFFUSION EQUATION IN NEURO-MUSCULAR SYSTEM

POD-DEIM MODEL ORDER REDUCTION FOR THE MONODOMAIN REACTION-DIFFUSION EQUATION IN NEURO-MUSCULAR SYSTEM 6th European Conference on Coputational Mechanics (ECCM 6) 7th European Conference on Coputational Fluid Dynaics (ECFD 7) 1115 June 2018, Glasgow, UK POD-DEIM MODEL ORDER REDUCTION FOR THE MONODOMAIN REACTION-DIFFUSION

More information

A Quantum Chemistry Domain-Specific Language for Heterogeneous Clusters

A Quantum Chemistry Domain-Specific Language for Heterogeneous Clusters A Quantum Chemistry Domain-Specific Language for Heterogeneous Clusters ANTONINO TUMEO, ORESTE VILLA Collaborators: Karol Kowalski, Sriram Krishnamoorthy, Wenjing Ma, Simone Secchi May 15, 2012 1 Outline!

More information

arxiv: v1 [physics.comp-ph] 30 Oct 2017

arxiv: v1 [physics.comp-ph] 30 Oct 2017 An efficient GPU algorithm for tetrahedron-based Brillouin-zone integration Daniel Guterding 1, and Harald O. Jeschke 1 Lucht Probst Associates, Große Gallusstraße 9, 011 Frankfurt am Main, Germany, European

More information

Multilevel low-rank approximation preconditioners Yousef Saad Department of Computer Science and Engineering University of Minnesota

Multilevel low-rank approximation preconditioners Yousef Saad Department of Computer Science and Engineering University of Minnesota Multilevel low-rank approximation preconditioners Yousef Saad Department of Computer Science and Engineering University of Minnesota SIAM CSE Boston - March 1, 2013 First: Joint work with Ruipeng Li Work

More information

CS Lecture 13. More Maximum Likelihood

CS Lecture 13. More Maximum Likelihood CS 6347 Lecture 13 More Maxiu Likelihood Recap Last tie: Introduction to axiu likelihood estiation MLE for Bayesian networks Optial CPTs correspond to epirical counts Today: MLE for CRFs 2 Maxiu Likelihood

More information

Parallelism of MRT Lattice Boltzmann Method based on Multi-GPUs

Parallelism of MRT Lattice Boltzmann Method based on Multi-GPUs Parallelism of MRT Lattice Boltzmann Method based on Multi-GPUs 1 School of Information Engineering, China University of Geosciences (Beijing) Beijing, 100083, China E-mail: Yaolk1119@icloud.com Ailan

More information

Claude Tadonki. MINES ParisTech PSL Research University Centre de Recherche Informatique

Claude Tadonki. MINES ParisTech PSL Research University Centre de Recherche Informatique Claude Tadonki MINES ParisTech PSL Research University Centre de Recherche Informatique claude.tadonki@mines-paristech.fr Monthly CRI Seminar MINES ParisTech - CRI June 06, 2016, Fontainebleau (France)

More information

Computational Chemistry I

Computational Chemistry I Computational Chemistry I Text book Cramer: Essentials of Quantum Chemistry, Wiley (2 ed.) Chapter 3. Post Hartree-Fock methods (Cramer: chapter 7) There are many ways to improve the HF method. Most of

More information

Research on GPU-accelerated algorithm in 3D finite difference neutron diffusion calculation method

Research on GPU-accelerated algorithm in 3D finite difference neutron diffusion calculation method NUCLEAR SCIENCE AND TECHNIQUES 25, 0501 (14) Research on GPU-accelerated algorithm in 3D finite difference neutron diffusion calculation method XU Qi ( 徐琪 ), 1, YU Gang-Lin ( 余纲林 ), 1 WANG Kan ( 王侃 ),

More information

Weile Jia 1, Long Wang 1, Zongyan Cao 1, Jiyun Fu 1, Xuebin Chi 1, Weiguo Gao 2, Lin-Wang Wang 3

Weile Jia 1, Long Wang 1, Zongyan Cao 1, Jiyun Fu 1, Xuebin Chi 1, Weiguo Gao 2, Lin-Wang Wang 3 A plane wave pseudopotential density functional theory molecular dynamics code on multi-gpu machine - GPU Technology Conference, San Jose, May 17th, 2012 Weile Jia 1, Long Wang 1, Zongyan Cao 1, Jiyun

More information

Multicore Parallelization of Determinant Quantum Monte Carlo Simulations

Multicore Parallelization of Determinant Quantum Monte Carlo Simulations Multicore Parallelization of Determinant Quantum Monte Carlo Simulations Andrés Tomás, Che-Rung Lee, Zhaojun Bai, Richard Scalettar UC Davis SIAM Conference on Computation Science & Engineering Reno, March

More information

PuReMD-GPU: A Reactive Molecular Dynamic Simulation Package for GPUs

PuReMD-GPU: A Reactive Molecular Dynamic Simulation Package for GPUs Purdue University Purdue e-pubs Department of Computer Science Technical Reports Department of Computer Science 2012 PuReMD-GPU: A Reactive Molecular Dynamic Simulation Package for GPUs Sudhir B. Kylasa

More information

e-companion ONLY AVAILABLE IN ELECTRONIC FORM

e-companion ONLY AVAILABLE IN ELECTRONIC FORM OPERATIONS RESEARCH doi 10.1287/opre.1070.0427ec pp. ec1 ec5 e-copanion ONLY AVAILABLE IN ELECTRONIC FORM infors 07 INFORMS Electronic Copanion A Learning Approach for Interactive Marketing to a Custoer

More information

CRYPTOGRAPHIC COMPUTING

CRYPTOGRAPHIC COMPUTING CRYPTOGRAPHIC COMPUTING ON GPU Chen Mou Cheng Dept. Electrical Engineering g National Taiwan University January 16, 2009 COLLABORATORS Daniel Bernstein, UIC, USA Tien Ren Chen, Army Tanja Lange, TU Eindhoven,

More information

MURCIA: Fast parallel solvent accessible surface area calculation on GPUs and application to drug discovery and molecular visualization

MURCIA: Fast parallel solvent accessible surface area calculation on GPUs and application to drug discovery and molecular visualization MURCIA: Fast parallel solvent accessible surface area calculation on GPUs and application to drug discovery and molecular visualization Eduardo J. Cepas Quiñonero Horacio Pérez-Sánchez Wolfgang Wenzel

More information

Bus transit = 20 ns (one way) access, each module cannot be accessed faster than 120 ns. So, the maximum bandwidth is

Bus transit = 20 ns (one way) access, each module cannot be accessed faster than 120 ns. So, the maximum bandwidth is 32 Flynn: Coputer Architecture { The Solutions Chapter 6. Meory Syste Design Proble 6. The eory odule uses 64 4 M b chips for 32MB of data and 8 4 M b chips for ECC. This allows 64 bits + 8 bits ECC to

More information

ab initio Electronic Structure Calculations

ab initio Electronic Structure Calculations ab initio Electronic Structure Calculations New scalability frontiers using the BG/L Supercomputer C. Bekas, A. Curioni and W. Andreoni IBM, Zurich Research Laboratory Rueschlikon 8803, Switzerland ab

More information

Intro to ab initio methods

Intro to ab initio methods Lecture 2 Part A Intro to ab initio methods Recommended reading: Leach, Chapters 2 & 3 for QM methods For more QM methods: Essentials of Computational Chemistry by C.J. Cramer, Wiley (2002) 1 ab initio

More information

Measuring freeze-out parameters on the Bielefeld GPU cluster

Measuring freeze-out parameters on the Bielefeld GPU cluster Measuring freeze-out parameters on the Bielefeld GPU cluster Outline Fluctuations and the QCD phase diagram Fluctuations from Lattice QCD The Bielefeld hybrid GPU cluster Freeze-out conditions from QCD

More information

In 1830, Auguste Comte wrote in his work

In 1830, Auguste Comte wrote in his work N o v e l A r c h i t e c t u r e s Graphical Processing Units for Quantum Chemistry The authors provide a brief overview of electronic structure theory and detail their experiences implementing quantum

More information

Jacobi-Based Eigenvalue Solver on GPU. Lung-Sheng Chien, NVIDIA

Jacobi-Based Eigenvalue Solver on GPU. Lung-Sheng Chien, NVIDIA Jacobi-Based Eigenvalue Solver on GPU Lung-Sheng Chien, NVIDIA lchien@nvidia.com Outline Symmetric eigenvalue solver Experiment Applications Conclusions Symmetric eigenvalue solver The standard form is

More information

GPU Computing Activities in KISTI

GPU Computing Activities in KISTI International Advanced Research Workshop on High Performance Computing, Grids and Clouds 2010 June 21~June 25 2010, Cetraro, Italy HPC Infrastructure and GPU Computing Activities in KISTI Hongsuk Yi hsyi@kisti.re.kr

More information

Scalable Hybrid Programming and Performance for SuperLU Sparse Direct Solver

Scalable Hybrid Programming and Performance for SuperLU Sparse Direct Solver Scalable Hybrid Programming and Performance for SuperLU Sparse Direct Solver Sherry Li Lawrence Berkeley National Laboratory Piyush Sao Rich Vuduc Georgia Institute of Technology CUG 14, May 4-8, 14, Lugano,

More information

Multiphase Flow Simulations in Inclined Tubes with Lattice Boltzmann Method on GPU

Multiphase Flow Simulations in Inclined Tubes with Lattice Boltzmann Method on GPU Multiphase Flow Simulations in Inclined Tubes with Lattice Boltzmann Method on GPU Khramtsov D.P., Nekrasov D.A., Pokusaev B.G. Department of Thermodynamics, Thermal Engineering and Energy Saving Technologies,

More information

Parallel Sparse Tensor Decompositions using HiCOO Format

Parallel Sparse Tensor Decompositions using HiCOO Format Figure sources: A brief survey of tensors by Berton Earnshaw and NVIDIA Tensor Cores Parallel Sparse Tensor Decompositions using HiCOO Format Jiajia Li, Jee Choi, Richard Vuduc May 8, 8 @ SIAM ALA 8 Outline

More information

AUTHOR QUERY FORM. Fax: For correction or revision of any artwork, please consult

AUTHOR QUERY FORM. Fax: For correction or revision of any artwork, please consult Our reference: YJCPH 52 P-authorquery-v7 AUTHOR QUERY FORM Journal: YJCPH Please e-mail or fax your responses and any corrections to: Article Number: 52 E-mail: corrections.esch@elsevier.vtex.lt Fax: +

More information

GPU acceleration of Newton s method for large systems of polynomial equations in double double and quad double arithmetic

GPU acceleration of Newton s method for large systems of polynomial equations in double double and quad double arithmetic GPU acceleration of Newton s method for large systems of polynomial equations in double double and quad double arithmetic Jan Verschelde joint work with Xiangcheng Yu University of Illinois at Chicago

More information

Study on Markov Alternative Renewal Reward. Process for VLSI Cell Partitioning

Study on Markov Alternative Renewal Reward. Process for VLSI Cell Partitioning Int. Journal of Math. Analysis, Vol. 7, 2013, no. 40, 1949-1960 HIKARI Ltd, www.-hikari.co http://dx.doi.org/10.12988/ia.2013.36142 Study on Markov Alternative Renewal Reward Process for VLSI Cell Partitioning

More information

Block AIR Methods. For Multicore and GPU. Per Christian Hansen Hans Henrik B. Sørensen. Technical University of Denmark

Block AIR Methods. For Multicore and GPU. Per Christian Hansen Hans Henrik B. Sørensen. Technical University of Denmark Block AIR Methods For Multicore and GPU Per Christian Hansen Hans Henrik B. Sørensen Technical University of Denmark Model Problem and Notation Parallel-beam 3D tomography exact solution exact data noise

More information

Computation of Large Sparse Aggregated Areas for Analytic Database Queries

Computation of Large Sparse Aggregated Areas for Analytic Database Queries Computation of Large Sparse Aggregated Areas for Analytic Database Queries Steffen Wittmer Tobias Lauer Jedox AG Collaborators: Zurab Khadikov Alexander Haberstroh Peter Strohm Business Intelligence and

More information

Heterogeneous programming for hybrid CPU-GPU systems: Lessons learned from computational chemistry

Heterogeneous programming for hybrid CPU-GPU systems: Lessons learned from computational chemistry Heterogeneous programming for hybrid CPU-GPU systems: Lessons learned from computational chemistry and Eugene DePrince Argonne National Laboratory (LCF and CNM) (Eugene moved to Georgia Tech last week)

More information

Protein Structure Prediction on GPU: a Declarative Approach in a Multi-agent Framework

Protein Structure Prediction on GPU: a Declarative Approach in a Multi-agent Framework Protein Structure Prediction on GPU: a Declarative Approach in a Multi-agent Framework Federico Campeotto (1),(2) Agostino Dovier (2) Enrico Pontelli (1) campe8@nmsu.edu agostino.dovier@uniud.it epontell@cs.nmsu.edu

More information

Molecular Mechanics: The Ab Initio Foundation

Molecular Mechanics: The Ab Initio Foundation Molecular Mechanics: The Ab Initio Foundation Ju Li GEM4 Summer School 2006 Cell and Molecular Mechanics in BioMedicine August 7 18, 2006, MIT, Cambridge, MA, USA 2 Outline Why are electrons quantum? Born-Oppenheimer

More information

A MESHSIZE BOOSTING ALGORITHM IN KERNEL DENSITY ESTIMATION

A MESHSIZE BOOSTING ALGORITHM IN KERNEL DENSITY ESTIMATION A eshsize boosting algorith in kernel density estiation A MESHSIZE BOOSTING ALGORITHM IN KERNEL DENSITY ESTIMATION C.C. Ishiekwene, S.M. Ogbonwan and J.E. Osewenkhae Departent of Matheatics, University

More information

GPU-accelerated Computing at Scale. Dirk Pleiter I GTC Europe 10 October 2018

GPU-accelerated Computing at Scale. Dirk Pleiter I GTC Europe 10 October 2018 GPU-accelerated Computing at Scale irk Pleiter I GTC Europe 10 October 2018 Outline Supercomputers at JSC Future science challenges Outlook and conclusions 2 3 Supercomputers at JSC JUQUEEN (until 2018)

More information

Case Study: Quantum Chromodynamics

Case Study: Quantum Chromodynamics Case Study: Quantum Chromodynamics Michael Clark Harvard University with R. Babich, K. Barros, R. Brower, J. Chen and C. Rebbi Outline Primer to QCD QCD on a GPU Mixed Precision Solvers Multigrid solver

More information

On Lotka-Volterra Evolution Law

On Lotka-Volterra Evolution Law Advanced Studies in Biology, Vol. 3, 0, no. 4, 6 67 On Lota-Volterra Evolution Law Farruh Muhaedov Faculty of Science, International Islaic University Malaysia P.O. Box, 4, 570, Kuantan, Pahang, Malaysia

More information

Accelerating Model Reduction of Large Linear Systems with Graphics Processors

Accelerating Model Reduction of Large Linear Systems with Graphics Processors Accelerating Model Reduction of Large Linear Systems with Graphics Processors P. Benner 1, P. Ezzatti 2, D. Kressner 3, E.S. Quintana-Ortí 4, Alfredo Remón 4 1 Max-Plank-Institute for Dynamics of Complex

More information

On the Mixed Discretization of the Time Domain Magnetic Field Integral Equation

On the Mixed Discretization of the Time Domain Magnetic Field Integral Equation On the Mixed Discretization of the Tie Doain Magnetic Field Integral Equation H. A. Ülkü 1 I. Bogaert K. Cools 3 F. P. Andriulli 4 H. Bağ 1 Abstract Tie doain agnetic field integral equation (MFIE) is

More information

Massively parallel semi-lagrangian solution of the 6d Vlasov-Poisson problem

Massively parallel semi-lagrangian solution of the 6d Vlasov-Poisson problem Massively parallel semi-lagrangian solution of the 6d Vlasov-Poisson problem Katharina Kormann 1 Klaus Reuter 2 Markus Rampp 2 Eric Sonnendrücker 1 1 Max Planck Institut für Plasmaphysik 2 Max Planck Computing

More information

A CPU-GPU Hybrid Implementation and Model-Driven Scheduling of the Fast Multipole Method

A CPU-GPU Hybrid Implementation and Model-Driven Scheduling of the Fast Multipole Method A CPU-GPU Hybrid Implementation and Model-Driven Scheduling of the Fast Multipole Method Jee Choi 1, Aparna Chandramowlishwaran 3, Kamesh Madduri 4, and Richard Vuduc 2 1 ECE, Georgia Tech 2 CSE, Georgia

More information

The electronic structure of materials 2 - DFT

The electronic structure of materials 2 - DFT Quantum mechanics 2 - Lecture 9 December 19, 2012 1 Density functional theory (DFT) 2 Literature Contents 1 Density functional theory (DFT) 2 Literature Historical background The beginnings: L. de Broglie

More information

Reactivity and Organocatalysis. (Adalgisa Sinicropi and Massimo Olivucci)

Reactivity and Organocatalysis. (Adalgisa Sinicropi and Massimo Olivucci) Reactivity and Organocatalysis (Adalgisa Sinicropi and Massimo Olivucci) The Aldol Reaction - O R 1 O R 1 + - O O OH * * H R 2 R 1 R 2 The (1957) Zimmerman-Traxler Transition State Counterion (e.g. Li

More information

arxiv: v1 [physics.data-an] 19 Feb 2017

arxiv: v1 [physics.data-an] 19 Feb 2017 arxiv:1703.03284v1 [physics.data-an] 19 Feb 2017 Model-independent partial wave analysis using a massively-parallel fitting framework L Sun 1, R Aoude 2, A C dos Reis 2, M Sokoloff 3 1 School of Physics

More information

Introduction to Benchmark Test for Multi-scale Computational Materials Software

Introduction to Benchmark Test for Multi-scale Computational Materials Software Introduction to Benchmark Test for Multi-scale Computational Materials Software Shun Xu*, Jian Zhang, Zhong Jin xushun@sccas.cn Computer Network Information Center Chinese Academy of Sciences (IPCC member)

More information

Beam dynamics calculation

Beam dynamics calculation September 6 Beam dynamics calculation S.B. Vorozhtsov, Е.Е. Perepelkin and V.L. Smirnov Dubna, JINR http://parallel-compute.com Outline Problem formulation Numerical methods OpenMP and CUDA realization

More information

Topic 5a Introduction to Curve Fitting & Linear Regression

Topic 5a Introduction to Curve Fitting & Linear Regression /7/08 Course Instructor Dr. Rayond C. Rup Oice: A 337 Phone: (95) 747 6958 E ail: rcrup@utep.edu opic 5a Introduction to Curve Fitting & Linear Regression EE 4386/530 Coputational ethods in EE Outline

More information

Ab initio calculations for potential energy surfaces. D. Talbi GRAAL- Montpellier

Ab initio calculations for potential energy surfaces. D. Talbi GRAAL- Montpellier Ab initio calculations for potential energy surfaces D. Talbi GRAAL- Montpellier A theoretical study of a reaction is a two step process I-Electronic calculations : techniques of quantum chemistry potential

More information

The Fast Multipole Method in molecular dynamics

The Fast Multipole Method in molecular dynamics The Fast Multipole Method in molecular dynamics Berk Hess KTH Royal Institute of Technology, Stockholm, Sweden ADAC6 workshop Zurich, 20-06-2018 Slide BioExcel Slide Molecular Dynamics of biomolecules

More information

Two case studies of Monte Carlo simulation on GPU

Two case studies of Monte Carlo simulation on GPU Two case studies of Monte Carlo simulation on GPU National Institute for Computational Sciences University of Tennessee Seminar series on HPC, Feb. 27, 2014 Outline 1 Introduction 2 Discrete energy lattice

More information

SEISMIC FRAGILITY ANALYSIS

SEISMIC FRAGILITY ANALYSIS 9 th ASCE Specialty Conference on Probabilistic Mechanics and Structural Reliability PMC24 SEISMIC FRAGILITY ANALYSIS C. Kafali, Student M. ASCE Cornell University, Ithaca, NY 483 ck22@cornell.edu M. Grigoriu,

More information

The linear sampling method and the MUSIC algorithm

The linear sampling method and the MUSIC algorithm INSTITUTE OF PHYSICS PUBLISHING INVERSE PROBLEMS Inverse Probles 17 (2001) 591 595 www.iop.org/journals/ip PII: S0266-5611(01)16989-3 The linear sapling ethod and the MUSIC algorith Margaret Cheney Departent

More information

Jim Held, Ph.D., Intel Fellow & Director Emerging Technology Research, Intel Labs. HPC User Forum April 18, 2018

Jim Held, Ph.D., Intel Fellow & Director Emerging Technology Research, Intel Labs. HPC User Forum April 18, 2018 Jim Held, Ph.D., Intel Fellow & Director Emerging Technology Research, Intel Labs HPC User Forum April 18, 2018 Quantum Computing: Key Concepts Superposition Classical Physics Quantum Physics v Entanglement

More information

2 Electronic structure theory

2 Electronic structure theory Electronic structure theory. Generalities.. Born-Oppenheimer approximation revisited In Sec..3 (lecture 3) the Born-Oppenheimer approximation was introduced (see also, for instance, [Tannor.]). We are

More information

GEM4 Summer School OpenCourseWare

GEM4 Summer School OpenCourseWare GEM4 Summer School OpenCourseWare http://gem4.educommons.net/ http://www.gem4.org/ Lecture: Molecular Mechanics by Ju Li. Given August 9, 2006 during the GEM4 session at MIT in Cambridge, MA. Please use

More information

TR A Comparison of the Performance of SaP::GPU and Intel s Math Kernel Library (MKL) for Solving Dense Banded Linear Systems

TR A Comparison of the Performance of SaP::GPU and Intel s Math Kernel Library (MKL) for Solving Dense Banded Linear Systems TR-0-07 A Comparison of the Performance of ::GPU and Intel s Math Kernel Library (MKL) for Solving Dense Banded Linear Systems Ang Li, Omkar Deshmukh, Radu Serban, Dan Negrut May, 0 Abstract ::GPU is a

More information

Walter Kohn was awarded with the Nobel Prize in Chemistry in 1998 for his development of the density functional theory.

Walter Kohn was awarded with the Nobel Prize in Chemistry in 1998 for his development of the density functional theory. Walter Kohn was awarded with the Nobel Prize in Chemistry in 1998 for his development of the density functional theory. Walter Kohn receiving his Nobel Prize from His Majesty the King at the Stockholm

More information

A method to determine relative stroke detection efficiencies from multiplicity distributions

A method to determine relative stroke detection efficiencies from multiplicity distributions A ethod to deterine relative stroke detection eiciencies ro ultiplicity distributions Schulz W. and Cuins K. 2. Austrian Lightning Detection and Inoration Syste (ALDIS), Kahlenberger Str.2A, 90 Vienna,

More information

Bert de Groot, Udo Schmitt, Helmut Grubmüller

Bert de Groot, Udo Schmitt, Helmut Grubmüller Computergestützte Biophysik I Bert de Groot, Udo Schmitt, Helmut Grubmüller Max Planck-Institut für biophysikalische Chemie Theoretische und Computergestützte Biophysik Am Fassberg 11 37077 Göttingen Tel.:

More information

High-Performance Computing, Planet Formation & Searching for Extrasolar Planets

High-Performance Computing, Planet Formation & Searching for Extrasolar Planets High-Performance Computing, Planet Formation & Searching for Extrasolar Planets Eric B. Ford (UF Astronomy) Research Computing Day September 29, 2011 Postdocs: A. Boley, S. Chatterjee, A. Moorhead, M.

More information

Julian Merten. GPU Computing and Alternative Architecture

Julian Merten. GPU Computing and Alternative Architecture Future Directions of Cosmological Simulations / Edinburgh 1 / 16 Julian Merten GPU Computing and Alternative Architecture Institut für Theoretische Astrophysik Zentrum für Astronomie Universität Heidelberg

More information

Plaquette Renormalized Tensor Network States: Application to Frustrated Systems

Plaquette Renormalized Tensor Network States: Application to Frustrated Systems Workshop on QIS and QMP, Dec 20, 2009 Plaquette Renormalized Tensor Network States: Application to Frustrated Systems Ying-Jer Kao and Center for Quantum Science and Engineering Hsin-Chih Hsiao, Ji-Feng

More information

Software optimization for petaflops/s scale Quantum Monte Carlo simulations

Software optimization for petaflops/s scale Quantum Monte Carlo simulations Software optimization for petaflops/s scale Quantum Monte Carlo simulations A. Scemama 1, M. Caffarel 1, E. Oseret 2, W. Jalby 2 1 Laboratoire de Chimie et Physique Quantiques / IRSAMC, Toulouse, France

More information

GPU Acceleration of Cutoff Pair Potentials for Molecular Modeling Applications

GPU Acceleration of Cutoff Pair Potentials for Molecular Modeling Applications GPU Acceleration of Cutoff Pair Potentials for Molecular Modeling Applications Christopher Rodrigues, David J. Hardy, John E. Stone, Klaus Schulten, Wen-Mei W. Hwu University of Illinois at Urbana-Champaign

More information

S0214 : GPU Based Stacking Sequence Generation For Composite Skins Using GA

S0214 : GPU Based Stacking Sequence Generation For Composite Skins Using GA S0214 : GPU Based Stacking Sequence Generation For Composite Skins Using GA Date: 16th May 2012 Wed, 3pm to 3.25pm(Adv. Session) Sathyanarayana K., Manish Banga, and Ravi Kumar G. V. V. Engineering Services,

More information

Optimized LU-decomposition with Full Pivot for Small Batched Matrices S3069

Optimized LU-decomposition with Full Pivot for Small Batched Matrices S3069 Optimized LU-decomposition with Full Pivot for Small Batched Matrices S369 Ian Wainwright High Performance Consulting Sweden ian.wainwright@hpcsweden.se Based on work for GTC 212: 1x speed-up vs multi-threaded

More information

GPU Accelerated Markov Decision Processes in Crowd Simulation

GPU Accelerated Markov Decision Processes in Crowd Simulation GPU Accelerated Markov Decision Processes in Crowd Simulation Sergio Ruiz Computer Science Department Tecnológico de Monterrey, CCM Mexico City, México sergio.ruiz.loza@itesm.mx Benjamín Hernández National

More information

A model leading to self-consistent iteration computation with need for HP LA (e.g, diagonalization and orthogonalization)

A model leading to self-consistent iteration computation with need for HP LA (e.g, diagonalization and orthogonalization) A model leading to self-consistent iteration computation with need for HP LA (e.g, diagonalization and orthogonalization) Schodinger equation: Hψ = Eψ Choose a basis set of wave functions Two cases: Orthonormal

More information

First, a look at using OpenACC on WRF subroutine advance_w dynamics routine

First, a look at using OpenACC on WRF subroutine advance_w dynamics routine First, a look at using OpenACC on WRF subroutine advance_w dynamics routine Second, an estimate of WRF multi-node performance on Cray XK6 with GPU accelerators Based on performance of WRF kernels, what

More information