The Nanoscience End-Station and Petascale Computing

Size: px
Start display at page:

Download "The Nanoscience End-Station and Petascale Computing"

Transcription

1 The Nanoscience End-Station and Petascale Computing Thomas C. Schulthess Computer Science and Mathematics Division & Center for Nanophase Materials Science DANSE kickoff meeting, Aug , 2006

2 SNS, CNMS, and NCCS - relevant user facilities SNS: increase in neutron scattering capability (flux & instr. sensitivity) - materials science - soft materials - magnetism - macromolecular systems - molecular biophysics - structural proteomics National Center for Computational Sciences NCCS: - IBM P4 (5TF); SGI/Xeon (9TF) - Cray X1E (18TF) - 1K vector CNMS: - functional nanomaterials - macomolecular systems - nanofabrication - nanocatalysis - nanomaterials theory > transport > magnetism/spintronics > carbon nanofibers > catalysis > electronic structure > atomistic simulations - Cray XT3 (25TF) - 5K opteron - Outlook: 100TF this fall; 250TF 2007/8; 1000TF in 2008/9

3 National Center for Computational Sciences Leadership Computing Facility (LCF) Feb Systems Cray XT3 Jaguar 25TF Cray X1E Phoenix 18TF SGI Altix Ram 1.5TF IBM SP4 Cheetah 4.5TF SGI Linux OIC 8TF IBM Linux NSTG.3TF Visualization Cluster.5TF IBM HPSS Supercomputers 24,880 CPUs 52TB Memory 58 TFlops GHz 44 TB Memory GHz 2 TB Memory (256) 1.5GHz 2TB Memory (864) 1.3GHz 1.1TB Memory (1376) 3.4GHz 2.6TB Memory (56) 3GHz 76GB Memory (128) 2.2GHz 128GB Memory Many Storage Devices Supported Shared Disk 240TB 32TB 36TB 32TB 80 TB 4.5TB 9TB 5TB TB Scientific Visualization Lab 27-projector Power Wall Test Systems 96-processor Cray XT3 32-processor Cray X1E* 16-Processor SGI Altix Evaluation Platforms 144-processor Cray XD1 with FPGAs SRC Mapstation Clearspeed Backup Storage 5PB 5 PB

4 LCF plan for the next 5 years: Cray X1E Cray XT3 IBM Blue Gene Vector Arch Global memory Powerful CPU Estimating the Petaflop-scale Cray X1E system: Cray XT3 BG/L@LLNL: 360TF with 128K cores IBM BG (ANL) > achieve Petaflop with ~ 500K TBD cores 100 TF Possible ORNL scenario: > 25K sockets with 4 cores each Cluster Arch Low latency High bandwidth Scalability 100K CPU MB/CPU 250 TF 25 TF 18 Whatever happens, we have to 5 TF deal with ~100K cores TF for petaflop-scale systems at the end of the decade! 1000 TF

5 Characteristics of Computational Nanoscience Interdisciplinary (like most science in the 21 st century) Builds on established domains like physics, chemistry, materials science, and biology (legacy codes). High performance computing will be a key component providing many opportunities. Computer architectures are increasingly complex and specialized. It will take large teams to use them. Since nanoscience is still an emerging field, computational nanoscience has to be extensible and reconfigurable.

6 Large Scientific User Facilities Neutron Reflectometer Ultra-high vacuum station Sample Users: high impact science Facility Instrumentation Computational Endstation for Nano- &? Materials Science Materials, Chemistry, Physics : Math : Computer Science Open Source Repository Generic Tool Kit Unified I/O systems Optimized Kernels Users: high HPC Users impact science

7 Computational Endstation for Nanoscience Step 1: endstation allocation on the NLCF {X1E: 300Kh; XT3: 3.5Mh; SGI Altix: unlimited} for high impact projects - High temperature superconductivity (production) - Maier, Kent, Jarrell, Schulthess - Spintronics (production) - Alvarez, Moreo, Dagotto, Schulthess - Nanomagnetism (production) - Eisenbach, Nicholson, Stocks, Kent, Schulthess - Physicochemical mechanism of mutating DNA under radiation (pilot) - Kent, Landman (UGA) - Molecular electronics (pilot) - Bernholc et al. (NCSU) Step 2: Systematically evolve software in to highperformance, stable, readily accessible instrumentation - high performance kernels - generic toolkit for nanoscience (extending C++/STL) - unified I/O system (XML based - incl. tools for accessibility from Fortran legacy codes) - visualization Step 3: integrate with user program of ORNL s Center for Nanophase Materials Sciences (CNMS)

8 What do we really need to study FePt nanoparticles (and other nanosystems)? E = KV sin 2 Θ mhcos 2 Θ H Θ H Θ FePt Take advantage of (atomic) degrees of freedom, ( s 1, s 2,..., s N ) in order to manipulate macroscopic properties m = 1/N i s i F (T, m) = E(T, m) k B T lnw (E, m)

9 The basic idea of our approach F (T, X) = E(T 0, X) k B T lnw (E, X) Compute energy with ab initio codes: LSMS: > 80% efficiency runs on ~1000 units VASP: ~50% efficiency runs on ~500 units (1 unit = 1 core, 4 cores,...) Compute density of states with extended Wang- Laudau method Zhou, Schulthess, Torbrügge, and Landau (Phys. Rev. Lett , 2006) Driver for LSMS and VASP With LSMS & petaflop: magnetic free energy surface for 500 atom nanoparticle - possible in 2009

10 Kohn-Sham Density Functional Theory (DFT) Easy in reciprocal space Easy in real space [ V (r)]ψ i = ɛ i ψ i V = F [ρ] +... ρ(r) = ψ i (r) 2 Self-consistent eigenproblem for {ɛ i }, {ψ i } Convenient evaluation of Hamiltonian in plane-wave basis. Use FFT for transformations. ψ = G C G e igr Many codes: VASP, PWSCF/ESPRESSO, CPMD, PARATEC, CASTEP, QBOX, ABINIT,...

11 Parallel FFT layouts 1 grid over four processors 2Ecut 4 grids on four processors Plane-waves chosen within cutoff radius Ecut: sparse basis in reciprocal (frequency) space Different distribution methods hybrid parallel. All bands simultaneous methods essential FFTs are small but many e.g. ( )^3 grid

12 VASP Very popular plane-wave DFT code PAW, ultrasoft pseudopotential DFT F90, MPI, BLAS, LAPACK, SCALAPACK Canonical 3D parallel FFTs Here: No major surgery. No heroics. Small diffs.

13 Fe399Pt408 Benchmark Exclude initialization 3 iterations (50+ for convergence) Spin polarized ferromagnetic solution. LDA. Fe 8+,Pt 10+ cores orbitals per spin inc. unoccupied PAW 19.6Ry/268eV cutoff 30.8x30.8x29.7 Angstrom supercell 126x126x120 FFT grid (defaults) Gamma point code (halved grid) ~ plane waves/orbital Hybrid parallel

14 Timings: Davidson Time/s (Faster is better) X1E XT3 P No. processors At 256 processors: X1E 1.7x XT3, 3.3x P690

15 Timings: RMM-DIIS Time/s (Faster is better) X1E P No. processors At 256 processors: X1E 2.8x P690

16 Profiling VASP on Cray X1E ~20% of peak at 128 MSPs (whole application) ~25% BLAS+LAPACK, ~25% FFTs 2 Key problems: - Scaling: Limited by global linear algebra Eigenvector solutions in subspace rotations - Single processor (MSP on Cray X1E) performance limited by short average vector length (33 for 128 MSPs)

17 Scaling Time/s EDDIAG ORTHCH No. MSPs Not a platform specific problem Turn-over due to SCALAPACK solve for all eigenvectors of dense diagonally dominant ~5000x5000 matrix. Small matrix compared to number of processors Need improved algorithms, tuned code, e.g. iterative Jacobi methods (Ian Bush/Daresbury). Suggestions welcome!

18 Single MSP performance BLAS performs well (>10 GFLOP/s) Limited by short vector lengths in FFTs (generic), realspace pseudopotential evaluation (code specific) Here: Focus on FFT performance The current code is poorly structured for the X1E: No easy access to lots of data. Common to other DFT codes. - In PAW method, FFT dimensions are small. - No blocking or multiple transforms - No exploit of data locality e.g. if on 1 processor - Explicitly MPI. Awkward to insert CAF

19 Plane wave FFT module Today Application code 3D FFT Future Application code Multiple FFT module 1D FFT MPI ND FFT MPI or?? Advantage: Other DFT codes benefit Norm Troullier/CRAY has written vectorized multi-streaming FFTs. Not connected yet.

20 Software development: direct user community natural but modern path User Community / Other Software Frameworks STATUS I/O I/O Common I/O system XML I/O Prototype App. Code App. Code App. Code App. Code App. Code App. Code Generic toolkits Optimized kernels App. Code Combination of User-developed and Code Repository Ψ-Mag, ALPS Current Research Using Cray, BG/L Basic Libraries BLAS, FFT, etc. Today Current Research Future - high performance kernels - generic toolkit for nanoscience (extending C++/STL) - unified I/O system (XML based - incl. tools for accessibility from Fortran legacy codes) - visualization

21 Results Atoms colored by moment 807 atoms 128 MSPs 600 CPU hours FM electronic structure+forces Publishable accuracy

22 Strong size effects in magnetic moments 43 atoms 55 atoms Clear non-bulk behavior in small clusters 201 atoms But! AFM or ferrimagnetic states are lowest energy O(10 mev/atom) for relaxed geometries Magnetic moment

23 3 Fe Bulk Magnetic moment (u B ) 2 Near-surface Fe atoms have enhanced moment 1 Relaxations can be significant. AF spins! Fully relaxed Bulk 807 atoms Distance from centre (A) Pt Bulk

24 Proton transfer in H2 O on TiO2 - interpretation of quasielastic neutron scattering VASP runs with ~ 1000 atoms reaching ~ 10 ns study proton transfer in water on TiO2 CNMS user project by Jorge Soffo turn calculations around in about one month

25 Nanoscience end-station Supported and maintained by NTI of the CNMS Capability: LCF Cray supercomputers Capacity computing: multi-teraflop Beowulf cluster, and allocation at NERSC Supported capabilities: - MD, MC (flexible models with Ψ-Mag toolkit) - QMC, Quantum cluster methods, Hubbard, spin-fermion - DFT (LDA, SIC-LSD), VASP, other electronic structure codes - Future plans: D-QMC, AF-QMC Available to users via CNMS user projects - see

26 The team / collaborators End-station concept: - ORNL: Peter Cummings (CNMS), Malcolm Stocks (MST) - Ames Lab: Bruce Harmon FePt nanoparticles and gen. Wang Landau: - ORNL: Paul Kent (CSMD), Cheggang Zhoug (CNMS), Mark Fahey (NCCS), Don Nicholson (CSMD), Markus Eisenbach (MST) - Univ. of Georgia at Athens: David Landau - Cray: Nathan Wichmann, Norm Troulier, Jeff Larkin, and John Leveske Software infrastructure - ORNL: Mike Summers (CSED), Xiuping Tao (CSM) - and the above - Florida State: Greg Brow - Univ. of Tennessee: Tom Swain and Kirck Sayer

27 Acknowledgment This research was conducted at the Center for Nanophase Materials Sciences, which is sponsored by the Division of Scientific User Facilities of the United States Department of Energy (DOE). It was supported in part by the Laboratory Research and Development fund at ORNL. The research was enabled my computational resources of the National Center for Computational Sciences, which is sponsored by the Office of Advanced Scientific Computing Research.

28

ab initio Electronic Structure Calculations

ab initio Electronic Structure Calculations ab initio Electronic Structure Calculations New scalability frontiers using the BG/L Supercomputer C. Bekas, A. Curioni and W. Andreoni IBM, Zurich Research Laboratory Rueschlikon 8803, Switzerland ab

More information

Extreme scale simulations of high-temperature superconductivity. Thomas C. Schulthess

Extreme scale simulations of high-temperature superconductivity. Thomas C. Schulthess Extreme scale simulations of high-temperature superconductivity Thomas C. Schulthess T [K] Superconductivity: a state of matter with zero electrical resistivity Heike Kamerlingh Onnes (1853-1926) Discovery

More information

Weile Jia 1, Long Wang 1, Zongyan Cao 1, Jiyun Fu 1, Xuebin Chi 1, Weiguo Gao 2, Lin-Wang Wang 3

Weile Jia 1, Long Wang 1, Zongyan Cao 1, Jiyun Fu 1, Xuebin Chi 1, Weiguo Gao 2, Lin-Wang Wang 3 A plane wave pseudopotential density functional theory molecular dynamics code on multi-gpu machine - GPU Technology Conference, San Jose, May 17th, 2012 Weile Jia 1, Long Wang 1, Zongyan Cao 1, Jiyun

More information

A scalable method for ab initio computation of free energies in nanoscale systems

A scalable method for ab initio computation of free energies in nanoscale systems A scalable method for ab initio computation of free energies in nanoscale systems M. Eisenbach, C.-G. Zhou, D. M. Nicholson, G. Brown, J. Larkin, and T. C. Schulthess Oak Ridge National Laboratory, Oak

More information

The QMC Petascale Project

The QMC Petascale Project The QMC Petascale Project Richard G. Hennig What will a petascale computer look like? What are the limitations of current QMC algorithms for petascale computers? How can Quantum Monte Carlo algorithms

More information

VASP: running on HPC resources. University of Vienna, Faculty of Physics and Center for Computational Materials Science, Vienna, Austria

VASP: running on HPC resources. University of Vienna, Faculty of Physics and Center for Computational Materials Science, Vienna, Austria VASP: running on HPC resources University of Vienna, Faculty of Physics and Center for Computational Materials Science, Vienna, Austria The Many-Body Schrödinger equation 0 @ 1 2 X i i + X i Ĥ (r 1,...,r

More information

Parallel Eigensolver Performance on High Performance Computers

Parallel Eigensolver Performance on High Performance Computers Parallel Eigensolver Performance on High Performance Computers Andrew Sunderland Advanced Research Computing Group STFC Daresbury Laboratory CUG 2008 Helsinki 1 Summary (Briefly) Introduce parallel diagonalization

More information

The Plane-wave Pseudopotential Method

The Plane-wave Pseudopotential Method The Plane-wave Pseudopotential Method k(r) = X G c k,g e i(g+k) r Chris J Pickard Electrons in a Solid Nearly Free Electrons Nearly Free Electrons Nearly Free Electrons Electronic Structures Methods Empirical

More information

Institute for Functional Imaging of Materials (IFIM)

Institute for Functional Imaging of Materials (IFIM) Institute for Functional Imaging of Materials (IFIM) Sergei V. Kalinin Guiding the design of materials tailored for functionality Dynamic matter: information dimension Static matter Functional matter Imaging

More information

ELSI: A Unified Software Interface for Kohn-Sham Electronic Structure Solvers

ELSI: A Unified Software Interface for Kohn-Sham Electronic Structure Solvers ELSI: A Unified Software Interface for Kohn-Sham Electronic Structure Solvers Victor Yu and the ELSI team Department of Mechanical Engineering & Materials Science Duke University Kohn-Sham Density-Functional

More information

Benchmark of the CPMD code on CRESCO HPC Facilities for Numerical Simulation of a Magnesium Nanoparticle.

Benchmark of the CPMD code on CRESCO HPC Facilities for Numerical Simulation of a Magnesium Nanoparticle. Benchmark of the CPMD code on CRESCO HPC Facilities for Numerical Simulation of a Magnesium Nanoparticle. Simone Giusepponi a), Massimo Celino b), Salvatore Podda a), Giovanni Bracco a), Silvio Migliori

More information

From first principles calculations to statistical physics and simulation of magnetic materials

From first principles calculations to statistical physics and simulation of magnetic materials From first principles calculations to statistical physics and simulation of magnetic materials Markus Eisenbach Oak Ridge National Laboratory This work was sponsored in parts by the Center for Defect Physics,

More information

CRYSTAL in parallel: replicated and distributed (MPP) data. Why parallel?

CRYSTAL in parallel: replicated and distributed (MPP) data. Why parallel? CRYSTAL in parallel: replicated and distributed (MPP) data Roberto Orlando Dipartimento di Chimica Università di Torino Via Pietro Giuria 5, 10125 Torino (Italy) roberto.orlando@unito.it 1 Why parallel?

More information

GPU Computing Activities in KISTI

GPU Computing Activities in KISTI International Advanced Research Workshop on High Performance Computing, Grids and Clouds 2010 June 21~June 25 2010, Cetraro, Italy HPC Infrastructure and GPU Computing Activities in KISTI Hongsuk Yi hsyi@kisti.re.kr

More information

Big Bang, Big Iron: CMB Data Analysis at the Petascale and Beyond

Big Bang, Big Iron: CMB Data Analysis at the Petascale and Beyond Big Bang, Big Iron: CMB Data Analysis at the Petascale and Beyond Julian Borrill Computational Cosmology Center, LBL & Space Sciences Laboratory, UCB with Christopher Cantalupo, Theodore Kisner, Radek

More information

Combining new algorithms, new software design and new hardware to sustain a petaflops in simulations

Combining new algorithms, new software design and new hardware to sustain a petaflops in simulations Combining new algorithms, new software design and new hardware to sustain a petaflops in simulations Thomas C. Schulthess schulthess@cscs.ch 28 th SPEEPUP Workshop, Lausanne, Sep. 7, 2009 DCA++ Story:

More information

Quantum Chemical Calculations by Parallel Computer from Commodity PC Components

Quantum Chemical Calculations by Parallel Computer from Commodity PC Components Nonlinear Analysis: Modelling and Control, 2007, Vol. 12, No. 4, 461 468 Quantum Chemical Calculations by Parallel Computer from Commodity PC Components S. Bekešienė 1, S. Sėrikovienė 2 1 Institute of

More information

Performance optimization of WEST and Qbox on Intel Knights Landing

Performance optimization of WEST and Qbox on Intel Knights Landing Performance optimization of WEST and Qbox on Intel Knights Landing Huihuo Zheng 1, Christopher Knight 1, Giulia Galli 1,2, Marco Govoni 1,2, and Francois Gygi 3 1 Argonne National Laboratory 2 University

More information

Update on Cray Earth Sciences Segment Activities and Roadmap

Update on Cray Earth Sciences Segment Activities and Roadmap Update on Cray Earth Sciences Segment Activities and Roadmap 31 Oct 2006 12 th ECMWF Workshop on Use of HPC in Meteorology Per Nyberg Director, Marketing and Business Development Earth Sciences Segment

More information

MODULE 2: QUANTUM MECHANICS. Practice: Quantum ESPRESSO

MODULE 2: QUANTUM MECHANICS. Practice: Quantum ESPRESSO MODULE 2: QUANTUM MECHANICS Practice: Quantum ESPRESSO I. What is Quantum ESPRESSO? 2 DFT software PW-DFT, PP, US-PP, PAW http://www.quantum-espresso.org FREE PW-DFT, PP, PAW http://www.abinit.org FREE

More information

Piz Daint & Piz Kesch : from general purpose supercomputing to an appliance for weather forecasting. Thomas C. Schulthess

Piz Daint & Piz Kesch : from general purpose supercomputing to an appliance for weather forecasting. Thomas C. Schulthess Piz Daint & Piz Kesch : from general purpose supercomputing to an appliance for weather forecasting Thomas C. Schulthess 1 Cray XC30 with 5272 hybrid, GPU accelerated compute nodes Piz Daint Compute node:

More information

The Performance Evolution of the Parallel Ocean Program on the Cray X1

The Performance Evolution of the Parallel Ocean Program on the Cray X1 The Performance Evolution of the Parallel Ocean Program on the Cray X1 Patrick H. Worley Oak Ridge National Laboratory John Levesque Cray Inc. 46th Cray User Group Conference May 18, 2003 Knoxville Marriott

More information

Algorithms and Computational Aspects of DFT Calculations

Algorithms and Computational Aspects of DFT Calculations Algorithms and Computational Aspects of DFT Calculations Part II Juan Meza and Chao Yang High Performance Computing Research Lawrence Berkeley National Laboratory IMA Tutorial Mathematical and Computational

More information

Parallel Eigensolver Performance on High Performance Computers 1

Parallel Eigensolver Performance on High Performance Computers 1 Parallel Eigensolver Performance on High Performance Computers 1 Andrew Sunderland STFC Daresbury Laboratory, Warrington, UK Abstract Eigenvalue and eigenvector computations arise in a wide range of scientific

More information

Parallel Eigensolver Performance on the HPCx System

Parallel Eigensolver Performance on the HPCx System Parallel Eigensolver Performance on the HPCx System Andrew Sunderland, Elena Breitmoser Terascaling Applications Group CCLRC Daresbury Laboratory EPCC, University of Edinburgh Outline 1. Brief Introduction

More information

Linear-scaling ab initio study of surface defects in metal oxide and carbon nanostructures

Linear-scaling ab initio study of surface defects in metal oxide and carbon nanostructures Linear-scaling ab initio study of surface defects in metal oxide and carbon nanostructures Rubén Pérez SPM Theory & Nanomechanics Group Departamento de Física Teórica de la Materia Condensada & Condensed

More information

Making electronic structure methods scale: Large systems and (massively) parallel computing

Making electronic structure methods scale: Large systems and (massively) parallel computing AB Making electronic structure methods scale: Large systems and (massively) parallel computing Ville Havu Department of Applied Physics Helsinki University of Technology - TKK Ville.Havu@tkk.fi 1 Outline

More information

A knowledge-based approach to high-performance computing in ab initio simulations.

A knowledge-based approach to high-performance computing in ab initio simulations. Mitglied der Helmholtz-Gemeinschaft A knowledge-based approach to high-performance computing in ab initio simulations. AICES Advisory Board Meeting. July 14th 2014 Edoardo Di Napoli Academic background

More information

CP2K. New Frontiers. ab initio Molecular Dynamics

CP2K. New Frontiers. ab initio Molecular Dynamics CP2K New Frontiers in ab initio Molecular Dynamics Jürg Hutter, Joost VandeVondele, Valery Weber Physical-Chemistry Institute, University of Zurich Ab Initio Molecular Dynamics Molecular Dynamics Sampling

More information

Efficient implementation of the overlap operator on multi-gpus

Efficient implementation of the overlap operator on multi-gpus Efficient implementation of the overlap operator on multi-gpus Andrei Alexandru Mike Lujan, Craig Pelissier, Ben Gamari, Frank Lee SAAHPC 2011 - University of Tennessee Outline Motivation Overlap operator

More information

The Plane-Wave Pseudopotential Method

The Plane-Wave Pseudopotential Method Hands-on Workshop on Density Functional Theory and Beyond: Computational Materials Science for Real Materials Trieste, August 6-15, 2013 The Plane-Wave Pseudopotential Method Ralph Gebauer ICTP, Trieste

More information

INITIAL INTEGRATION AND EVALUATION

INITIAL INTEGRATION AND EVALUATION INITIAL INTEGRATION AND EVALUATION OF SLATE PARALLEL BLAS IN LATTE Marc Cawkwell, Danny Perez, Arthur Voter Asim YarKhan, Gerald Ragghianti, Jack Dongarra, Introduction The aim of the joint milestone STMS10-52

More information

Conquest order N ab initio Electronic Structure simulation code for quantum mechanical modelling in large scale

Conquest order N ab initio Electronic Structure simulation code for quantum mechanical modelling in large scale Fortran Expo: 15 Jun 2012 Conquest order N ab initio Electronic Structure simulation code for quantum mechanical modelling in large scale Lianheng Tong Overview Overview of Conquest project Brief Introduction

More information

Re-design of Higher level Matrix Algorithms for Multicore and Heterogeneous Architectures. Based on the presentation at UC Berkeley, October 7, 2009

Re-design of Higher level Matrix Algorithms for Multicore and Heterogeneous Architectures. Based on the presentation at UC Berkeley, October 7, 2009 III.1 Re-design of Higher level Matrix Algorithms for Multicore and Heterogeneous Architectures Based on the presentation at UC Berkeley, October 7, 2009 Background and motivation Running time of an algorithm

More information

ELECTRONIC STRUCTURE CALCULATIONS FOR THE SOLID STATE PHYSICS

ELECTRONIC STRUCTURE CALCULATIONS FOR THE SOLID STATE PHYSICS FROM RESEARCH TO INDUSTRY 32 ème forum ORAP 10 octobre 2013 Maison de la Simulation, Saclay, France ELECTRONIC STRUCTURE CALCULATIONS FOR THE SOLID STATE PHYSICS APPLICATION ON HPC, BLOCKING POINTS, Marc

More information

Some thoughts about energy efficient application execution on NEC LX Series compute clusters

Some thoughts about energy efficient application execution on NEC LX Series compute clusters Some thoughts about energy efficient application execution on NEC LX Series compute clusters G. Wellein, G. Hager, J. Treibig, M. Wittmann Erlangen Regional Computing Center & Department of Computer Science

More information

Data Intensive Computing meets High Performance Computing

Data Intensive Computing meets High Performance Computing Data Intensive Computing meets High Performance Computing Kathy Yelick Associate Laboratory Director for Computing Sciences, Lawrence Berkeley National Laboratory Professor of Electrical Engineering and

More information

Introduction to Parallelism in CASTEP

Introduction to Parallelism in CASTEP to ism in CASTEP Stewart Clark Band University of Durham 21 September 2012 Solve for all the bands/electrons (Band-) Band CASTEP solves the Kohn-Sham equations for electrons in a periodic array of nuclei:

More information

Parallelization and benchmarks

Parallelization and benchmarks Parallelization and benchmarks Content! Scalable implementation of the DFT/LCAO module! Plane wave DFT code! Parallel performance of the spin-free CC codes! Scalability of the Tensor Contraction Engine

More information

1. Hydrogen atom in a box

1. Hydrogen atom in a box 1. Hydrogen atom in a box Recall H atom problem, V(r) = -1/r e r exact answer solved by expanding in Gaussian basis set, had to solve secular matrix involving matrix elements of basis functions place atom

More information

Symmetric Pivoting in ScaLAPACK Craig Lucas University of Manchester Cray User Group 8 May 2006, Lugano

Symmetric Pivoting in ScaLAPACK Craig Lucas University of Manchester Cray User Group 8 May 2006, Lugano Symmetric Pivoting in ScaLAPACK Craig Lucas University of Manchester Cray User Group 8 May 2006, Lugano Introduction Introduction We wanted to parallelize a serial algorithm for the pivoted Cholesky factorization

More information

Accelerating Linear Algebra on Heterogeneous Architectures of Multicore and GPUs using MAGMA and DPLASMA and StarPU Schedulers

Accelerating Linear Algebra on Heterogeneous Architectures of Multicore and GPUs using MAGMA and DPLASMA and StarPU Schedulers UT College of Engineering Tutorial Accelerating Linear Algebra on Heterogeneous Architectures of Multicore and GPUs using MAGMA and DPLASMA and StarPU Schedulers Stan Tomov 1, George Bosilca 1, and Cédric

More information

Parallelization Strategies for Density Matrix Renormalization Group algorithms on Shared-Memory Systems

Parallelization Strategies for Density Matrix Renormalization Group algorithms on Shared-Memory Systems Parallelization Strategies for Density Matrix Renormalization Group algorithms on Shared-Memory Systems G. Hager HPC Services, Computing Center Erlangen, Germany E. Jeckelmann Theoretical Physics, Univ.

More information

ORBIT Code Review and Future Directions. S. Cousineau, A. Shishlo, J. Holmes ECloud07

ORBIT Code Review and Future Directions. S. Cousineau, A. Shishlo, J. Holmes ECloud07 ORBIT Code Review and Future Directions S. Cousineau, A. Shishlo, J. Holmes ECloud07 ORBIT Code ORBIT (Objective Ring Beam Injection and Transport code) ORBIT is an object-oriented, open-source code started

More information

Electronic Structure Calculations, Density Functional Theory and its Modern Implementations

Electronic Structure Calculations, Density Functional Theory and its Modern Implementations Tutoriel Big RENOBLE Electronic Structure Calculations, Density Functional Theory and its Modern Implementations Thierry Deutsch L_Sim - CEA renoble 19 October 2011 Outline 1 of Atomistic calculations

More information

Performance of the fusion code GYRO on three four generations of Crays. Mark Fahey University of Tennessee, Knoxville

Performance of the fusion code GYRO on three four generations of Crays. Mark Fahey University of Tennessee, Knoxville Performance of the fusion code GYRO on three four generations of Crays Mark Fahey mfahey@utk.edu University of Tennessee, Knoxville Contents Introduction GYRO Overview Benchmark Problem Test Platforms

More information

Basic introduction of NWChem software

Basic introduction of NWChem software Basic introduction of NWChem software Background! NWChem is part of the Molecular Science Software Suite! Designed and developed to be a highly efficient and portable Massively Parallel computational chemistry

More information

A Reconfigurable Quantum Computer

A Reconfigurable Quantum Computer A Reconfigurable Quantum Computer David Moehring CEO, IonQ, Inc. College Park, MD Quantum Computing for Business 4-6 December 2017, Mountain View, CA IonQ Highlights Full Stack Quantum Computing Company

More information

Parallel Asynchronous Hybrid Krylov Methods for Minimization of Energy Consumption. Langshi CHEN 1,2,3 Supervised by Serge PETITON 2

Parallel Asynchronous Hybrid Krylov Methods for Minimization of Energy Consumption. Langshi CHEN 1,2,3 Supervised by Serge PETITON 2 1 / 23 Parallel Asynchronous Hybrid Krylov Methods for Minimization of Energy Consumption Langshi CHEN 1,2,3 Supervised by Serge PETITON 2 Maison de la Simulation Lille 1 University CNRS March 18, 2013

More information

Massive Parallelization of First Principles Molecular Dynamics Code

Massive Parallelization of First Principles Molecular Dynamics Code Massive Parallelization of First Principles Molecular Dynamics Code V Hidemi Komatsu V Takahiro Yamasaki V Shin-ichi Ichikawa (Manuscript received April 16, 2008) PHASE is a first principles molecular

More information

The Quantum ESPRESSO Software Distribution

The Quantum ESPRESSO Software Distribution The Quantum ESPRESSO Software Distribution The DEMOCRITOS center of Italian INFM is dedicated to atomistic simulations of materials, with a strong emphasis on the development of high-quality scientific

More information

Scalable and Power-Efficient Data Mining Kernels

Scalable and Power-Efficient Data Mining Kernels Scalable and Power-Efficient Data Mining Kernels Alok Choudhary, John G. Searle Professor Dept. of Electrical Engineering and Computer Science and Professor, Kellogg School of Management Director of the

More information

Reliability at Scale

Reliability at Scale Reliability at Scale Intelligent Storage Workshop 5 James Nunez Los Alamos National lab LA-UR-07-0828 & LA-UR-06-0397 May 15, 2007 A Word about scale Petaflop class machines LLNL Blue Gene 350 Tflops 128k

More information

ALMA: All-scale predictive design of heat management material structures

ALMA: All-scale predictive design of heat management material structures ALMA: All-scale predictive design of heat management material structures Version Date: 2015.11.13. Last updated 2015.12.02 Purpose of this document: Definition of a data organisation that is applicable

More information

Lecture 19. Architectural Directions

Lecture 19. Architectural Directions Lecture 19 Architectural Directions Today s lecture Advanced Architectures NUMA Blue Gene 2010 Scott B. Baden / CSE 160 / Winter 2010 2 Final examination Announcements Thursday, March 17, in this room:

More information

Parallelization of Molecular Dynamics (with focus on Gromacs) SeSE 2014 p.1/29

Parallelization of Molecular Dynamics (with focus on Gromacs) SeSE 2014 p.1/29 Parallelization of Molecular Dynamics (with focus on Gromacs) SeSE 2014 p.1/29 Outline A few words on MD applications and the GROMACS package The main work in an MD simulation Parallelization Stream computing

More information

Petascale Quantum Simulations of Nano Systems and Biomolecules

Petascale Quantum Simulations of Nano Systems and Biomolecules Petascale Quantum Simulations of Nano Systems and Biomolecules Emil Briggs North Carolina State University 1. Outline of real-space Multigrid (RMG) 2. Scalability and hybrid/threaded models 3. GPU acceleration

More information

Preconditioned Eigenvalue Solvers for electronic structure calculations. Andrew V. Knyazev. Householder Symposium XVI May 26, 2005

Preconditioned Eigenvalue Solvers for electronic structure calculations. Andrew V. Knyazev. Householder Symposium XVI May 26, 2005 1 Preconditioned Eigenvalue Solvers for electronic structure calculations Andrew V. Knyazev Department of Mathematics and Center for Computational Mathematics University of Colorado at Denver Householder

More information

Improvements for Implicit Linear Equation Solvers

Improvements for Implicit Linear Equation Solvers Improvements for Implicit Linear Equation Solvers Roger Grimes, Bob Lucas, Clement Weisbecker Livermore Software Technology Corporation Abstract Solving large sparse linear systems of equations is often

More information

Direct Self-Consistent Field Computations on GPU Clusters

Direct Self-Consistent Field Computations on GPU Clusters Direct Self-Consistent Field Computations on GPU Clusters Guochun Shi, Volodymyr Kindratenko National Center for Supercomputing Applications University of Illinois at UrbanaChampaign Ivan Ufimtsev, Todd

More information

FEAST eigenvalue algorithm and solver: review and perspectives

FEAST eigenvalue algorithm and solver: review and perspectives FEAST eigenvalue algorithm and solver: review and perspectives Eric Polizzi Department of Electrical and Computer Engineering University of Masachusetts, Amherst, USA Sparse Days, CERFACS, June 25, 2012

More information

NIMROD Project Overview

NIMROD Project Overview NIMROD Project Overview Christopher Carey - Univ. Wisconsin NIMROD Team www.nimrodteam.org CScADS Workshop July 23, 2007 Project Overview NIMROD models the macroscopic dynamics of magnetized plasmas by

More information

Porting a Sphere Optimization Program from lapack to scalapack

Porting a Sphere Optimization Program from lapack to scalapack Porting a Sphere Optimization Program from lapack to scalapack Paul C. Leopardi Robert S. Womersley 12 October 2008 Abstract The sphere optimization program sphopt was originally written as a sequential

More information

Crossing the Chasm. On the Paths to Exascale: Presented by Mike Rezny, Monash University, Australia

Crossing the Chasm. On the Paths to Exascale: Presented by Mike Rezny, Monash University, Australia On the Paths to Exascale: Crossing the Chasm Presented by Mike Rezny, Monash University, Australia michael.rezny@monash.edu Crossing the Chasm meeting Reading, 24 th October 2016 Version 0.1 In collaboration

More information

Nuclear Physics and Computing: Exascale Partnerships. Juan Meza Senior Scientist Lawrence Berkeley National Laboratory

Nuclear Physics and Computing: Exascale Partnerships. Juan Meza Senior Scientist Lawrence Berkeley National Laboratory Nuclear Physics and Computing: Exascale Partnerships Juan Meza Senior Scientist Lawrence Berkeley National Laboratory Nuclear Science and Exascale i Workshop held in DC to identify scientific challenges

More information

DFT / SIESTA algorithms

DFT / SIESTA algorithms DFT / SIESTA algorithms Javier Junquera José M. Soler References http://siesta.icmab.es Documentation Tutorials Atomic units e = m e = =1 atomic mass unit = m e atomic length unit = 1 Bohr = 0.5292 Ang

More information

quantum ESPRESSO stands for Quantum open-source Package for Research in Electronic Structure, Simulation, and Optimization

quantum ESPRESSO stands for Quantum open-source Package for Research in Electronic Structure, Simulation, and Optimization The quantum ESPRESSO distribution The IOM-DEMOCRITOS center of Italian CNR is dedicated to atomistic simulations of materials, with a strong emphasis on the development of high-quality scientific software

More information

MSC HPC Infrastructure Update. Alain St-Denis Canadian Meteorological Centre Meteorological Service of Canada

MSC HPC Infrastructure Update. Alain St-Denis Canadian Meteorological Centre Meteorological Service of Canada MSC HPC Infrastructure Update Alain St-Denis Canadian Meteorological Centre Meteorological Service of Canada Outline HPC Infrastructure Overview Supercomputer Configuration Scientific Direction 2 IT Infrastructure

More information

Carlo Cavazzoni, HPC department, CINECA

Carlo Cavazzoni, HPC department, CINECA Large Scale Parallelism Carlo Cavazzoni, HPC department, CINECA Parallel Architectures Two basic architectural scheme: Distributed Memory Shared Memory Now most computers have a mixed architecture + accelerators

More information

Broyden Mixing for Nuclear Density Functional Calculations

Broyden Mixing for Nuclear Density Functional Calculations Broyden Mixing for Nuclear Density Functional Calculations M.V. Stoitsov 1 Department of Physics and Astronomy, University of Tennessee, Knoxville, TN 37996, USA 2 Physics Division, Oak Ridge National

More information

Large Scale Electronic Structure Calculations

Large Scale Electronic Structure Calculations Large Scale Electronic Structure Calculations Jürg Hutter University of Zurich 8. September, 2008 / Speedup08 CP2K Program System GNU General Public License Community Developers Platform on "Berlios" (cp2k.berlios.de)

More information

DGDFT: A Massively Parallel Method for Large Scale Density Functional Theory Calculations

DGDFT: A Massively Parallel Method for Large Scale Density Functional Theory Calculations DGDFT: A Massively Parallel Method for Large Scale Density Functional Theory Calculations The recently developed discontinuous Galerkin density functional theory (DGDFT)[21] aims at reducing the number

More information

Poisson Solver, Pseudopotentials, Atomic Forces in the BigDFT code

Poisson Solver, Pseudopotentials, Atomic Forces in the BigDFT code CECAM Tutorial on Wavelets in DFT, CECAM - LYON,, in the BigDFT code Kernel Luigi Genovese L_Sim - CEA Grenoble 28 November 2007 Outline, Kernel 1 The with Interpolating Scaling Functions in DFT for Interpolating

More information

Welcome to MCS 572. content and organization expectations of the course. definition and classification

Welcome to MCS 572. content and organization expectations of the course. definition and classification Welcome to MCS 572 1 About the Course content and organization expectations of the course 2 Supercomputing definition and classification 3 Measuring Performance speedup and efficiency Amdahl s Law Gustafson

More information

Reflecting on the Goal and Baseline of Exascale Computing

Reflecting on the Goal and Baseline of Exascale Computing Reflecting on the Goal and Baseline of Exascale Computing Thomas C. Schulthess!1 Tracking supercomputer performance over time? Linpack benchmark solves: Ax = b!2 Tracking supercomputer performance over

More information

Dilute Magnetic Semiconductors

Dilute Magnetic Semiconductors John von Neumann Institute for Computing Dilute Magnetic Semiconductors L. Bergqvist, P. H. Dederichs published in NIC Symposium 28, G. Münster, D. Wolf, M. Kremer (Editors), John von Neumann Institute

More information

Multi-Length Scale Matrix Computations and Applications in Quantum Mechanical Simulations

Multi-Length Scale Matrix Computations and Applications in Quantum Mechanical Simulations Multi-Length Scale Matrix Computations and Applications in Quantum Mechanical Simulations Zhaojun Bai http://www.cs.ucdavis.edu/ bai joint work with Wenbin Chen, Roger Lee, Richard Scalettar, Ichitaro

More information

Accelerating linear algebra computations with hybrid GPU-multicore systems.

Accelerating linear algebra computations with hybrid GPU-multicore systems. Accelerating linear algebra computations with hybrid GPU-multicore systems. Marc Baboulin INRIA/Université Paris-Sud joint work with Jack Dongarra (University of Tennessee and Oak Ridge National Laboratory)

More information

CP2K: the gaussian plane wave (GPW) method

CP2K: the gaussian plane wave (GPW) method CP2K: the gaussian plane wave (GPW) method Basis sets and Kohn-Sham energy calculation R. Vuilleumier Département de chimie Ecole normale supérieure Paris Tutorial CPMD-CP2K CPMD and CP2K CPMD CP2K http://www.cpmd.org

More information

Opportunities from Accurate and Efficient Density Functional Theory Calculations for Large Systems

Opportunities from Accurate and Efficient Density Functional Theory Calculations for Large Systems Seminar CENTRE FOR PREDICTIVE MODELLING, WARWICK Opportunities from Accurate and Efficient Density Functional Theory Calculations for Large Systems Luigi Genovese L_Sim CEA Grenoble October 30, 2017 http://bigdft.org

More information

Density Functional Theory

Density Functional Theory Density Functional Theory Iain Bethune EPCC ibethune@epcc.ed.ac.uk Overview Background Classical Atomistic Simulation Essential Quantum Mechanics DFT: Approximations and Theory DFT: Implementation using

More information

Massively parallel electronic structure calculations with Python software. Jussi Enkovaara Software Engineering CSC the finnish IT center for science

Massively parallel electronic structure calculations with Python software. Jussi Enkovaara Software Engineering CSC the finnish IT center for science Massively parallel electronic structure calculations with Python software Jussi Enkovaara Software Engineering CSC the finnish IT center for science GPAW Software package for electronic structure calculations

More information

Pseudopotentials: design, testing, typical errors

Pseudopotentials: design, testing, typical errors Pseudopotentials: design, testing, typical errors Kevin F. Garrity Part 1 National Institute of Standards and Technology (NIST) Uncertainty Quantification in Materials Modeling 2015 Parameter free calculations.

More information

Stochastic Modelling of Electron Transport on different HPC architectures

Stochastic Modelling of Electron Transport on different HPC architectures Stochastic Modelling of Electron Transport on different HPC architectures www.hp-see.eu E. Atanassov, T. Gurov, A. Karaivan ova Institute of Information and Communication Technologies Bulgarian Academy

More information

SPARSE SOLVERS POISSON EQUATION. Margreet Nool. November 9, 2015 FOR THE. CWI, Multiscale Dynamics

SPARSE SOLVERS POISSON EQUATION. Margreet Nool. November 9, 2015 FOR THE. CWI, Multiscale Dynamics SPARSE SOLVERS FOR THE POISSON EQUATION Margreet Nool CWI, Multiscale Dynamics November 9, 2015 OUTLINE OF THIS TALK 1 FISHPACK, LAPACK, PARDISO 2 SYSTEM OVERVIEW OF CARTESIUS 3 POISSON EQUATION 4 SOLVERS

More information

Projector-Augmented Wave Method:

Projector-Augmented Wave Method: Projector-Augmented Wave Method: An introduction Peter E. Blöchl Clausthal University of Technology Germany http://www.pt.tu-clausthal.de/atp/ 23. Juli 2003 Why PAW all-electron wave functions (EFG s,

More information

Comparing the Efficiency of Iterative Eigenvalue Solvers: the Quantum ESPRESSO experience

Comparing the Efficiency of Iterative Eigenvalue Solvers: the Quantum ESPRESSO experience Comparing the Efficiency of Iterative Eigenvalue Solvers: the Quantum ESPRESSO experience Stefano de Gironcoli Scuola Internazionale Superiore di Studi Avanzati Trieste-Italy 0 Diagonalization of the Kohn-Sham

More information

Scalability Programme at ECMWF

Scalability Programme at ECMWF Scalability Programme at ECMWF Picture: Stan Tomov, ICL, University of Tennessee, Knoxville Peter Bauer, Mike Hawkins, George Mozdzynski, Tiago Quintino, Deborah Salmond, Stephan Siemen, Yannick Trémolet

More information

Key concepts in Density Functional Theory (II)

Key concepts in Density Functional Theory (II) Kohn-Sham scheme and band structures European Theoretical Spectroscopy Facility (ETSF) CNRS - Laboratoire des Solides Irradiés Ecole Polytechnique, Palaiseau - France Present Address: LPMCN Université

More information

Block Iterative Eigensolvers for Sequences of Dense Correlated Eigenvalue Problems

Block Iterative Eigensolvers for Sequences of Dense Correlated Eigenvalue Problems Mitglied der Helmholtz-Gemeinschaft Block Iterative Eigensolvers for Sequences of Dense Correlated Eigenvalue Problems Birkbeck University, London, June the 29th 2012 Edoardo Di Napoli Motivation and Goals

More information

Basic introduction of NWChem software

Basic introduction of NWChem software Basic introduction of NWChem software Background NWChem is part of the Molecular Science Software Suite Designed and developed to be a highly efficient and portable Massively Parallel computational chemistry

More information

Supporting Information

Supporting Information Supporting Information The Origin of Active Oxygen in a Ternary CuO x /Co 3 O 4 -CeO Catalyst for CO Oxidation Zhigang Liu, *, Zili Wu, *, Xihong Peng, ++ Andrew Binder, Songhai Chai, Sheng Dai *,, School

More information

Quantum Cluster Simulations of Low D Systems

Quantum Cluster Simulations of Low D Systems Quantum Cluster Simulations of Low D Systems Electronic Correlations on Many Length Scales M. Jarrell, University of Cincinnati High Perf. QMC Hybrid Method SP Sep. in 1D NEW MEM SP Sep. in 1D 2-Chain

More information

Recent Developments in the ELSI Infrastructure for Large-Scale Electronic Structure Theory

Recent Developments in the ELSI Infrastructure for Large-Scale Electronic Structure Theory elsi-interchange.org MolSSI Workshop and ELSI Conference 2018, August 15, 2018, Richmond, VA Recent Developments in the ELSI Infrastructure for Large-Scale Electronic Structure Theory Victor Yu 1, William

More information

WRF performance tuning for the Intel Woodcrest Processor

WRF performance tuning for the Intel Woodcrest Processor WRF performance tuning for the Intel Woodcrest Processor A. Semenov, T. Kashevarova, P. Mankevich, D. Shkurko, K. Arturov, N. Panov Intel Corp., pr. ak. Lavrentieva 6/1, Novosibirsk, Russia, 630090 {alexander.l.semenov,tamara.p.kashevarova,pavel.v.mankevich,

More information

Students & Postdocs Collaborators

Students & Postdocs Collaborators Advancing first-principle symmetry-guided nuclear modeling for studies of nucleosynthesis and fundamental symmetries in nature Students & Postdocs Collaborators NCSA Blue Waters Symposium for Petascale

More information

Wouldn t it be great if

Wouldn t it be great if IDEMA DISKCON Asia-Pacific 2009 Spin Torque MRAM with Perpendicular Magnetisation: A Scalable Path for Ultra-high Density Non-volatile Memory Dr. Randall Law Data Storage Institute Agency for Science Technology

More information

References. Documentation Manuals Tutorials Publications

References.   Documentation Manuals Tutorials Publications References http://siesta.icmab.es Documentation Manuals Tutorials Publications Atomic units e = m e = =1 atomic mass unit = m e atomic length unit = 1 Bohr = 0.5292 Ang atomic energy unit = 1 Hartree =

More information

HYCOM and Navy ESPC Future High Performance Computing Needs. Alan J. Wallcraft. COAPS Short Seminar November 6, 2017

HYCOM and Navy ESPC Future High Performance Computing Needs. Alan J. Wallcraft. COAPS Short Seminar November 6, 2017 HYCOM and Navy ESPC Future High Performance Computing Needs Alan J. Wallcraft COAPS Short Seminar November 6, 2017 Forecasting Architectural Trends 3 NAVY OPERATIONAL GLOBAL OCEAN PREDICTION Trend is higher

More information

Some notes on efficient computing and setting up high performance computing environments

Some notes on efficient computing and setting up high performance computing environments Some notes on efficient computing and setting up high performance computing environments Andrew O. Finley Department of Forestry, Michigan State University, Lansing, Michigan. April 17, 2017 1 Efficient

More information