The Nanoscience End-Station and Petascale Computing
|
|
- Osborne Cole
- 5 years ago
- Views:
Transcription
1 The Nanoscience End-Station and Petascale Computing Thomas C. Schulthess Computer Science and Mathematics Division & Center for Nanophase Materials Science DANSE kickoff meeting, Aug , 2006
2 SNS, CNMS, and NCCS - relevant user facilities SNS: increase in neutron scattering capability (flux & instr. sensitivity) - materials science - soft materials - magnetism - macromolecular systems - molecular biophysics - structural proteomics National Center for Computational Sciences NCCS: - IBM P4 (5TF); SGI/Xeon (9TF) - Cray X1E (18TF) - 1K vector CNMS: - functional nanomaterials - macomolecular systems - nanofabrication - nanocatalysis - nanomaterials theory > transport > magnetism/spintronics > carbon nanofibers > catalysis > electronic structure > atomistic simulations - Cray XT3 (25TF) - 5K opteron - Outlook: 100TF this fall; 250TF 2007/8; 1000TF in 2008/9
3 National Center for Computational Sciences Leadership Computing Facility (LCF) Feb Systems Cray XT3 Jaguar 25TF Cray X1E Phoenix 18TF SGI Altix Ram 1.5TF IBM SP4 Cheetah 4.5TF SGI Linux OIC 8TF IBM Linux NSTG.3TF Visualization Cluster.5TF IBM HPSS Supercomputers 24,880 CPUs 52TB Memory 58 TFlops GHz 44 TB Memory GHz 2 TB Memory (256) 1.5GHz 2TB Memory (864) 1.3GHz 1.1TB Memory (1376) 3.4GHz 2.6TB Memory (56) 3GHz 76GB Memory (128) 2.2GHz 128GB Memory Many Storage Devices Supported Shared Disk 240TB 32TB 36TB 32TB 80 TB 4.5TB 9TB 5TB TB Scientific Visualization Lab 27-projector Power Wall Test Systems 96-processor Cray XT3 32-processor Cray X1E* 16-Processor SGI Altix Evaluation Platforms 144-processor Cray XD1 with FPGAs SRC Mapstation Clearspeed Backup Storage 5PB 5 PB
4 LCF plan for the next 5 years: Cray X1E Cray XT3 IBM Blue Gene Vector Arch Global memory Powerful CPU Estimating the Petaflop-scale Cray X1E system: Cray XT3 BG/L@LLNL: 360TF with 128K cores IBM BG (ANL) > achieve Petaflop with ~ 500K TBD cores 100 TF Possible ORNL scenario: > 25K sockets with 4 cores each Cluster Arch Low latency High bandwidth Scalability 100K CPU MB/CPU 250 TF 25 TF 18 Whatever happens, we have to 5 TF deal with ~100K cores TF for petaflop-scale systems at the end of the decade! 1000 TF
5 Characteristics of Computational Nanoscience Interdisciplinary (like most science in the 21 st century) Builds on established domains like physics, chemistry, materials science, and biology (legacy codes). High performance computing will be a key component providing many opportunities. Computer architectures are increasingly complex and specialized. It will take large teams to use them. Since nanoscience is still an emerging field, computational nanoscience has to be extensible and reconfigurable.
6 Large Scientific User Facilities Neutron Reflectometer Ultra-high vacuum station Sample Users: high impact science Facility Instrumentation Computational Endstation for Nano- &? Materials Science Materials, Chemistry, Physics : Math : Computer Science Open Source Repository Generic Tool Kit Unified I/O systems Optimized Kernels Users: high HPC Users impact science
7 Computational Endstation for Nanoscience Step 1: endstation allocation on the NLCF {X1E: 300Kh; XT3: 3.5Mh; SGI Altix: unlimited} for high impact projects - High temperature superconductivity (production) - Maier, Kent, Jarrell, Schulthess - Spintronics (production) - Alvarez, Moreo, Dagotto, Schulthess - Nanomagnetism (production) - Eisenbach, Nicholson, Stocks, Kent, Schulthess - Physicochemical mechanism of mutating DNA under radiation (pilot) - Kent, Landman (UGA) - Molecular electronics (pilot) - Bernholc et al. (NCSU) Step 2: Systematically evolve software in to highperformance, stable, readily accessible instrumentation - high performance kernels - generic toolkit for nanoscience (extending C++/STL) - unified I/O system (XML based - incl. tools for accessibility from Fortran legacy codes) - visualization Step 3: integrate with user program of ORNL s Center for Nanophase Materials Sciences (CNMS)
8 What do we really need to study FePt nanoparticles (and other nanosystems)? E = KV sin 2 Θ mhcos 2 Θ H Θ H Θ FePt Take advantage of (atomic) degrees of freedom, ( s 1, s 2,..., s N ) in order to manipulate macroscopic properties m = 1/N i s i F (T, m) = E(T, m) k B T lnw (E, m)
9 The basic idea of our approach F (T, X) = E(T 0, X) k B T lnw (E, X) Compute energy with ab initio codes: LSMS: > 80% efficiency runs on ~1000 units VASP: ~50% efficiency runs on ~500 units (1 unit = 1 core, 4 cores,...) Compute density of states with extended Wang- Laudau method Zhou, Schulthess, Torbrügge, and Landau (Phys. Rev. Lett , 2006) Driver for LSMS and VASP With LSMS & petaflop: magnetic free energy surface for 500 atom nanoparticle - possible in 2009
10 Kohn-Sham Density Functional Theory (DFT) Easy in reciprocal space Easy in real space [ V (r)]ψ i = ɛ i ψ i V = F [ρ] +... ρ(r) = ψ i (r) 2 Self-consistent eigenproblem for {ɛ i }, {ψ i } Convenient evaluation of Hamiltonian in plane-wave basis. Use FFT for transformations. ψ = G C G e igr Many codes: VASP, PWSCF/ESPRESSO, CPMD, PARATEC, CASTEP, QBOX, ABINIT,...
11 Parallel FFT layouts 1 grid over four processors 2Ecut 4 grids on four processors Plane-waves chosen within cutoff radius Ecut: sparse basis in reciprocal (frequency) space Different distribution methods hybrid parallel. All bands simultaneous methods essential FFTs are small but many e.g. ( )^3 grid
12 VASP Very popular plane-wave DFT code PAW, ultrasoft pseudopotential DFT F90, MPI, BLAS, LAPACK, SCALAPACK Canonical 3D parallel FFTs Here: No major surgery. No heroics. Small diffs.
13 Fe399Pt408 Benchmark Exclude initialization 3 iterations (50+ for convergence) Spin polarized ferromagnetic solution. LDA. Fe 8+,Pt 10+ cores orbitals per spin inc. unoccupied PAW 19.6Ry/268eV cutoff 30.8x30.8x29.7 Angstrom supercell 126x126x120 FFT grid (defaults) Gamma point code (halved grid) ~ plane waves/orbital Hybrid parallel
14 Timings: Davidson Time/s (Faster is better) X1E XT3 P No. processors At 256 processors: X1E 1.7x XT3, 3.3x P690
15 Timings: RMM-DIIS Time/s (Faster is better) X1E P No. processors At 256 processors: X1E 2.8x P690
16 Profiling VASP on Cray X1E ~20% of peak at 128 MSPs (whole application) ~25% BLAS+LAPACK, ~25% FFTs 2 Key problems: - Scaling: Limited by global linear algebra Eigenvector solutions in subspace rotations - Single processor (MSP on Cray X1E) performance limited by short average vector length (33 for 128 MSPs)
17 Scaling Time/s EDDIAG ORTHCH No. MSPs Not a platform specific problem Turn-over due to SCALAPACK solve for all eigenvectors of dense diagonally dominant ~5000x5000 matrix. Small matrix compared to number of processors Need improved algorithms, tuned code, e.g. iterative Jacobi methods (Ian Bush/Daresbury). Suggestions welcome!
18 Single MSP performance BLAS performs well (>10 GFLOP/s) Limited by short vector lengths in FFTs (generic), realspace pseudopotential evaluation (code specific) Here: Focus on FFT performance The current code is poorly structured for the X1E: No easy access to lots of data. Common to other DFT codes. - In PAW method, FFT dimensions are small. - No blocking or multiple transforms - No exploit of data locality e.g. if on 1 processor - Explicitly MPI. Awkward to insert CAF
19 Plane wave FFT module Today Application code 3D FFT Future Application code Multiple FFT module 1D FFT MPI ND FFT MPI or?? Advantage: Other DFT codes benefit Norm Troullier/CRAY has written vectorized multi-streaming FFTs. Not connected yet.
20 Software development: direct user community natural but modern path User Community / Other Software Frameworks STATUS I/O I/O Common I/O system XML I/O Prototype App. Code App. Code App. Code App. Code App. Code App. Code Generic toolkits Optimized kernels App. Code Combination of User-developed and Code Repository Ψ-Mag, ALPS Current Research Using Cray, BG/L Basic Libraries BLAS, FFT, etc. Today Current Research Future - high performance kernels - generic toolkit for nanoscience (extending C++/STL) - unified I/O system (XML based - incl. tools for accessibility from Fortran legacy codes) - visualization
21 Results Atoms colored by moment 807 atoms 128 MSPs 600 CPU hours FM electronic structure+forces Publishable accuracy
22 Strong size effects in magnetic moments 43 atoms 55 atoms Clear non-bulk behavior in small clusters 201 atoms But! AFM or ferrimagnetic states are lowest energy O(10 mev/atom) for relaxed geometries Magnetic moment
23 3 Fe Bulk Magnetic moment (u B ) 2 Near-surface Fe atoms have enhanced moment 1 Relaxations can be significant. AF spins! Fully relaxed Bulk 807 atoms Distance from centre (A) Pt Bulk
24 Proton transfer in H2 O on TiO2 - interpretation of quasielastic neutron scattering VASP runs with ~ 1000 atoms reaching ~ 10 ns study proton transfer in water on TiO2 CNMS user project by Jorge Soffo turn calculations around in about one month
25 Nanoscience end-station Supported and maintained by NTI of the CNMS Capability: LCF Cray supercomputers Capacity computing: multi-teraflop Beowulf cluster, and allocation at NERSC Supported capabilities: - MD, MC (flexible models with Ψ-Mag toolkit) - QMC, Quantum cluster methods, Hubbard, spin-fermion - DFT (LDA, SIC-LSD), VASP, other electronic structure codes - Future plans: D-QMC, AF-QMC Available to users via CNMS user projects - see
26 The team / collaborators End-station concept: - ORNL: Peter Cummings (CNMS), Malcolm Stocks (MST) - Ames Lab: Bruce Harmon FePt nanoparticles and gen. Wang Landau: - ORNL: Paul Kent (CSMD), Cheggang Zhoug (CNMS), Mark Fahey (NCCS), Don Nicholson (CSMD), Markus Eisenbach (MST) - Univ. of Georgia at Athens: David Landau - Cray: Nathan Wichmann, Norm Troulier, Jeff Larkin, and John Leveske Software infrastructure - ORNL: Mike Summers (CSED), Xiuping Tao (CSM) - and the above - Florida State: Greg Brow - Univ. of Tennessee: Tom Swain and Kirck Sayer
27 Acknowledgment This research was conducted at the Center for Nanophase Materials Sciences, which is sponsored by the Division of Scientific User Facilities of the United States Department of Energy (DOE). It was supported in part by the Laboratory Research and Development fund at ORNL. The research was enabled my computational resources of the National Center for Computational Sciences, which is sponsored by the Office of Advanced Scientific Computing Research.
28
ab initio Electronic Structure Calculations
ab initio Electronic Structure Calculations New scalability frontiers using the BG/L Supercomputer C. Bekas, A. Curioni and W. Andreoni IBM, Zurich Research Laboratory Rueschlikon 8803, Switzerland ab
More informationExtreme scale simulations of high-temperature superconductivity. Thomas C. Schulthess
Extreme scale simulations of high-temperature superconductivity Thomas C. Schulthess T [K] Superconductivity: a state of matter with zero electrical resistivity Heike Kamerlingh Onnes (1853-1926) Discovery
More informationWeile Jia 1, Long Wang 1, Zongyan Cao 1, Jiyun Fu 1, Xuebin Chi 1, Weiguo Gao 2, Lin-Wang Wang 3
A plane wave pseudopotential density functional theory molecular dynamics code on multi-gpu machine - GPU Technology Conference, San Jose, May 17th, 2012 Weile Jia 1, Long Wang 1, Zongyan Cao 1, Jiyun
More informationA scalable method for ab initio computation of free energies in nanoscale systems
A scalable method for ab initio computation of free energies in nanoscale systems M. Eisenbach, C.-G. Zhou, D. M. Nicholson, G. Brown, J. Larkin, and T. C. Schulthess Oak Ridge National Laboratory, Oak
More informationThe QMC Petascale Project
The QMC Petascale Project Richard G. Hennig What will a petascale computer look like? What are the limitations of current QMC algorithms for petascale computers? How can Quantum Monte Carlo algorithms
More informationVASP: running on HPC resources. University of Vienna, Faculty of Physics and Center for Computational Materials Science, Vienna, Austria
VASP: running on HPC resources University of Vienna, Faculty of Physics and Center for Computational Materials Science, Vienna, Austria The Many-Body Schrödinger equation 0 @ 1 2 X i i + X i Ĥ (r 1,...,r
More informationParallel Eigensolver Performance on High Performance Computers
Parallel Eigensolver Performance on High Performance Computers Andrew Sunderland Advanced Research Computing Group STFC Daresbury Laboratory CUG 2008 Helsinki 1 Summary (Briefly) Introduce parallel diagonalization
More informationThe Plane-wave Pseudopotential Method
The Plane-wave Pseudopotential Method k(r) = X G c k,g e i(g+k) r Chris J Pickard Electrons in a Solid Nearly Free Electrons Nearly Free Electrons Nearly Free Electrons Electronic Structures Methods Empirical
More informationInstitute for Functional Imaging of Materials (IFIM)
Institute for Functional Imaging of Materials (IFIM) Sergei V. Kalinin Guiding the design of materials tailored for functionality Dynamic matter: information dimension Static matter Functional matter Imaging
More informationELSI: A Unified Software Interface for Kohn-Sham Electronic Structure Solvers
ELSI: A Unified Software Interface for Kohn-Sham Electronic Structure Solvers Victor Yu and the ELSI team Department of Mechanical Engineering & Materials Science Duke University Kohn-Sham Density-Functional
More informationBenchmark of the CPMD code on CRESCO HPC Facilities for Numerical Simulation of a Magnesium Nanoparticle.
Benchmark of the CPMD code on CRESCO HPC Facilities for Numerical Simulation of a Magnesium Nanoparticle. Simone Giusepponi a), Massimo Celino b), Salvatore Podda a), Giovanni Bracco a), Silvio Migliori
More informationFrom first principles calculations to statistical physics and simulation of magnetic materials
From first principles calculations to statistical physics and simulation of magnetic materials Markus Eisenbach Oak Ridge National Laboratory This work was sponsored in parts by the Center for Defect Physics,
More informationCRYSTAL in parallel: replicated and distributed (MPP) data. Why parallel?
CRYSTAL in parallel: replicated and distributed (MPP) data Roberto Orlando Dipartimento di Chimica Università di Torino Via Pietro Giuria 5, 10125 Torino (Italy) roberto.orlando@unito.it 1 Why parallel?
More informationGPU Computing Activities in KISTI
International Advanced Research Workshop on High Performance Computing, Grids and Clouds 2010 June 21~June 25 2010, Cetraro, Italy HPC Infrastructure and GPU Computing Activities in KISTI Hongsuk Yi hsyi@kisti.re.kr
More informationBig Bang, Big Iron: CMB Data Analysis at the Petascale and Beyond
Big Bang, Big Iron: CMB Data Analysis at the Petascale and Beyond Julian Borrill Computational Cosmology Center, LBL & Space Sciences Laboratory, UCB with Christopher Cantalupo, Theodore Kisner, Radek
More informationCombining new algorithms, new software design and new hardware to sustain a petaflops in simulations
Combining new algorithms, new software design and new hardware to sustain a petaflops in simulations Thomas C. Schulthess schulthess@cscs.ch 28 th SPEEPUP Workshop, Lausanne, Sep. 7, 2009 DCA++ Story:
More informationQuantum Chemical Calculations by Parallel Computer from Commodity PC Components
Nonlinear Analysis: Modelling and Control, 2007, Vol. 12, No. 4, 461 468 Quantum Chemical Calculations by Parallel Computer from Commodity PC Components S. Bekešienė 1, S. Sėrikovienė 2 1 Institute of
More informationPerformance optimization of WEST and Qbox on Intel Knights Landing
Performance optimization of WEST and Qbox on Intel Knights Landing Huihuo Zheng 1, Christopher Knight 1, Giulia Galli 1,2, Marco Govoni 1,2, and Francois Gygi 3 1 Argonne National Laboratory 2 University
More informationUpdate on Cray Earth Sciences Segment Activities and Roadmap
Update on Cray Earth Sciences Segment Activities and Roadmap 31 Oct 2006 12 th ECMWF Workshop on Use of HPC in Meteorology Per Nyberg Director, Marketing and Business Development Earth Sciences Segment
More informationMODULE 2: QUANTUM MECHANICS. Practice: Quantum ESPRESSO
MODULE 2: QUANTUM MECHANICS Practice: Quantum ESPRESSO I. What is Quantum ESPRESSO? 2 DFT software PW-DFT, PP, US-PP, PAW http://www.quantum-espresso.org FREE PW-DFT, PP, PAW http://www.abinit.org FREE
More informationPiz Daint & Piz Kesch : from general purpose supercomputing to an appliance for weather forecasting. Thomas C. Schulthess
Piz Daint & Piz Kesch : from general purpose supercomputing to an appliance for weather forecasting Thomas C. Schulthess 1 Cray XC30 with 5272 hybrid, GPU accelerated compute nodes Piz Daint Compute node:
More informationThe Performance Evolution of the Parallel Ocean Program on the Cray X1
The Performance Evolution of the Parallel Ocean Program on the Cray X1 Patrick H. Worley Oak Ridge National Laboratory John Levesque Cray Inc. 46th Cray User Group Conference May 18, 2003 Knoxville Marriott
More informationAlgorithms and Computational Aspects of DFT Calculations
Algorithms and Computational Aspects of DFT Calculations Part II Juan Meza and Chao Yang High Performance Computing Research Lawrence Berkeley National Laboratory IMA Tutorial Mathematical and Computational
More informationParallel Eigensolver Performance on High Performance Computers 1
Parallel Eigensolver Performance on High Performance Computers 1 Andrew Sunderland STFC Daresbury Laboratory, Warrington, UK Abstract Eigenvalue and eigenvector computations arise in a wide range of scientific
More informationParallel Eigensolver Performance on the HPCx System
Parallel Eigensolver Performance on the HPCx System Andrew Sunderland, Elena Breitmoser Terascaling Applications Group CCLRC Daresbury Laboratory EPCC, University of Edinburgh Outline 1. Brief Introduction
More informationLinear-scaling ab initio study of surface defects in metal oxide and carbon nanostructures
Linear-scaling ab initio study of surface defects in metal oxide and carbon nanostructures Rubén Pérez SPM Theory & Nanomechanics Group Departamento de Física Teórica de la Materia Condensada & Condensed
More informationMaking electronic structure methods scale: Large systems and (massively) parallel computing
AB Making electronic structure methods scale: Large systems and (massively) parallel computing Ville Havu Department of Applied Physics Helsinki University of Technology - TKK Ville.Havu@tkk.fi 1 Outline
More informationA knowledge-based approach to high-performance computing in ab initio simulations.
Mitglied der Helmholtz-Gemeinschaft A knowledge-based approach to high-performance computing in ab initio simulations. AICES Advisory Board Meeting. July 14th 2014 Edoardo Di Napoli Academic background
More informationCP2K. New Frontiers. ab initio Molecular Dynamics
CP2K New Frontiers in ab initio Molecular Dynamics Jürg Hutter, Joost VandeVondele, Valery Weber Physical-Chemistry Institute, University of Zurich Ab Initio Molecular Dynamics Molecular Dynamics Sampling
More informationEfficient implementation of the overlap operator on multi-gpus
Efficient implementation of the overlap operator on multi-gpus Andrei Alexandru Mike Lujan, Craig Pelissier, Ben Gamari, Frank Lee SAAHPC 2011 - University of Tennessee Outline Motivation Overlap operator
More informationThe Plane-Wave Pseudopotential Method
Hands-on Workshop on Density Functional Theory and Beyond: Computational Materials Science for Real Materials Trieste, August 6-15, 2013 The Plane-Wave Pseudopotential Method Ralph Gebauer ICTP, Trieste
More informationINITIAL INTEGRATION AND EVALUATION
INITIAL INTEGRATION AND EVALUATION OF SLATE PARALLEL BLAS IN LATTE Marc Cawkwell, Danny Perez, Arthur Voter Asim YarKhan, Gerald Ragghianti, Jack Dongarra, Introduction The aim of the joint milestone STMS10-52
More informationConquest order N ab initio Electronic Structure simulation code for quantum mechanical modelling in large scale
Fortran Expo: 15 Jun 2012 Conquest order N ab initio Electronic Structure simulation code for quantum mechanical modelling in large scale Lianheng Tong Overview Overview of Conquest project Brief Introduction
More informationRe-design of Higher level Matrix Algorithms for Multicore and Heterogeneous Architectures. Based on the presentation at UC Berkeley, October 7, 2009
III.1 Re-design of Higher level Matrix Algorithms for Multicore and Heterogeneous Architectures Based on the presentation at UC Berkeley, October 7, 2009 Background and motivation Running time of an algorithm
More informationELECTRONIC STRUCTURE CALCULATIONS FOR THE SOLID STATE PHYSICS
FROM RESEARCH TO INDUSTRY 32 ème forum ORAP 10 octobre 2013 Maison de la Simulation, Saclay, France ELECTRONIC STRUCTURE CALCULATIONS FOR THE SOLID STATE PHYSICS APPLICATION ON HPC, BLOCKING POINTS, Marc
More informationSome thoughts about energy efficient application execution on NEC LX Series compute clusters
Some thoughts about energy efficient application execution on NEC LX Series compute clusters G. Wellein, G. Hager, J. Treibig, M. Wittmann Erlangen Regional Computing Center & Department of Computer Science
More informationData Intensive Computing meets High Performance Computing
Data Intensive Computing meets High Performance Computing Kathy Yelick Associate Laboratory Director for Computing Sciences, Lawrence Berkeley National Laboratory Professor of Electrical Engineering and
More informationIntroduction to Parallelism in CASTEP
to ism in CASTEP Stewart Clark Band University of Durham 21 September 2012 Solve for all the bands/electrons (Band-) Band CASTEP solves the Kohn-Sham equations for electrons in a periodic array of nuclei:
More informationParallelization and benchmarks
Parallelization and benchmarks Content! Scalable implementation of the DFT/LCAO module! Plane wave DFT code! Parallel performance of the spin-free CC codes! Scalability of the Tensor Contraction Engine
More information1. Hydrogen atom in a box
1. Hydrogen atom in a box Recall H atom problem, V(r) = -1/r e r exact answer solved by expanding in Gaussian basis set, had to solve secular matrix involving matrix elements of basis functions place atom
More informationSymmetric Pivoting in ScaLAPACK Craig Lucas University of Manchester Cray User Group 8 May 2006, Lugano
Symmetric Pivoting in ScaLAPACK Craig Lucas University of Manchester Cray User Group 8 May 2006, Lugano Introduction Introduction We wanted to parallelize a serial algorithm for the pivoted Cholesky factorization
More informationAccelerating Linear Algebra on Heterogeneous Architectures of Multicore and GPUs using MAGMA and DPLASMA and StarPU Schedulers
UT College of Engineering Tutorial Accelerating Linear Algebra on Heterogeneous Architectures of Multicore and GPUs using MAGMA and DPLASMA and StarPU Schedulers Stan Tomov 1, George Bosilca 1, and Cédric
More informationParallelization Strategies for Density Matrix Renormalization Group algorithms on Shared-Memory Systems
Parallelization Strategies for Density Matrix Renormalization Group algorithms on Shared-Memory Systems G. Hager HPC Services, Computing Center Erlangen, Germany E. Jeckelmann Theoretical Physics, Univ.
More informationORBIT Code Review and Future Directions. S. Cousineau, A. Shishlo, J. Holmes ECloud07
ORBIT Code Review and Future Directions S. Cousineau, A. Shishlo, J. Holmes ECloud07 ORBIT Code ORBIT (Objective Ring Beam Injection and Transport code) ORBIT is an object-oriented, open-source code started
More informationElectronic Structure Calculations, Density Functional Theory and its Modern Implementations
Tutoriel Big RENOBLE Electronic Structure Calculations, Density Functional Theory and its Modern Implementations Thierry Deutsch L_Sim - CEA renoble 19 October 2011 Outline 1 of Atomistic calculations
More informationPerformance of the fusion code GYRO on three four generations of Crays. Mark Fahey University of Tennessee, Knoxville
Performance of the fusion code GYRO on three four generations of Crays Mark Fahey mfahey@utk.edu University of Tennessee, Knoxville Contents Introduction GYRO Overview Benchmark Problem Test Platforms
More informationBasic introduction of NWChem software
Basic introduction of NWChem software Background! NWChem is part of the Molecular Science Software Suite! Designed and developed to be a highly efficient and portable Massively Parallel computational chemistry
More informationA Reconfigurable Quantum Computer
A Reconfigurable Quantum Computer David Moehring CEO, IonQ, Inc. College Park, MD Quantum Computing for Business 4-6 December 2017, Mountain View, CA IonQ Highlights Full Stack Quantum Computing Company
More informationParallel Asynchronous Hybrid Krylov Methods for Minimization of Energy Consumption. Langshi CHEN 1,2,3 Supervised by Serge PETITON 2
1 / 23 Parallel Asynchronous Hybrid Krylov Methods for Minimization of Energy Consumption Langshi CHEN 1,2,3 Supervised by Serge PETITON 2 Maison de la Simulation Lille 1 University CNRS March 18, 2013
More informationMassive Parallelization of First Principles Molecular Dynamics Code
Massive Parallelization of First Principles Molecular Dynamics Code V Hidemi Komatsu V Takahiro Yamasaki V Shin-ichi Ichikawa (Manuscript received April 16, 2008) PHASE is a first principles molecular
More informationThe Quantum ESPRESSO Software Distribution
The Quantum ESPRESSO Software Distribution The DEMOCRITOS center of Italian INFM is dedicated to atomistic simulations of materials, with a strong emphasis on the development of high-quality scientific
More informationScalable and Power-Efficient Data Mining Kernels
Scalable and Power-Efficient Data Mining Kernels Alok Choudhary, John G. Searle Professor Dept. of Electrical Engineering and Computer Science and Professor, Kellogg School of Management Director of the
More informationReliability at Scale
Reliability at Scale Intelligent Storage Workshop 5 James Nunez Los Alamos National lab LA-UR-07-0828 & LA-UR-06-0397 May 15, 2007 A Word about scale Petaflop class machines LLNL Blue Gene 350 Tflops 128k
More informationALMA: All-scale predictive design of heat management material structures
ALMA: All-scale predictive design of heat management material structures Version Date: 2015.11.13. Last updated 2015.12.02 Purpose of this document: Definition of a data organisation that is applicable
More informationLecture 19. Architectural Directions
Lecture 19 Architectural Directions Today s lecture Advanced Architectures NUMA Blue Gene 2010 Scott B. Baden / CSE 160 / Winter 2010 2 Final examination Announcements Thursday, March 17, in this room:
More informationParallelization of Molecular Dynamics (with focus on Gromacs) SeSE 2014 p.1/29
Parallelization of Molecular Dynamics (with focus on Gromacs) SeSE 2014 p.1/29 Outline A few words on MD applications and the GROMACS package The main work in an MD simulation Parallelization Stream computing
More informationPetascale Quantum Simulations of Nano Systems and Biomolecules
Petascale Quantum Simulations of Nano Systems and Biomolecules Emil Briggs North Carolina State University 1. Outline of real-space Multigrid (RMG) 2. Scalability and hybrid/threaded models 3. GPU acceleration
More informationPreconditioned Eigenvalue Solvers for electronic structure calculations. Andrew V. Knyazev. Householder Symposium XVI May 26, 2005
1 Preconditioned Eigenvalue Solvers for electronic structure calculations Andrew V. Knyazev Department of Mathematics and Center for Computational Mathematics University of Colorado at Denver Householder
More informationImprovements for Implicit Linear Equation Solvers
Improvements for Implicit Linear Equation Solvers Roger Grimes, Bob Lucas, Clement Weisbecker Livermore Software Technology Corporation Abstract Solving large sparse linear systems of equations is often
More informationDirect Self-Consistent Field Computations on GPU Clusters
Direct Self-Consistent Field Computations on GPU Clusters Guochun Shi, Volodymyr Kindratenko National Center for Supercomputing Applications University of Illinois at UrbanaChampaign Ivan Ufimtsev, Todd
More informationFEAST eigenvalue algorithm and solver: review and perspectives
FEAST eigenvalue algorithm and solver: review and perspectives Eric Polizzi Department of Electrical and Computer Engineering University of Masachusetts, Amherst, USA Sparse Days, CERFACS, June 25, 2012
More informationNIMROD Project Overview
NIMROD Project Overview Christopher Carey - Univ. Wisconsin NIMROD Team www.nimrodteam.org CScADS Workshop July 23, 2007 Project Overview NIMROD models the macroscopic dynamics of magnetized plasmas by
More informationPorting a Sphere Optimization Program from lapack to scalapack
Porting a Sphere Optimization Program from lapack to scalapack Paul C. Leopardi Robert S. Womersley 12 October 2008 Abstract The sphere optimization program sphopt was originally written as a sequential
More informationCrossing the Chasm. On the Paths to Exascale: Presented by Mike Rezny, Monash University, Australia
On the Paths to Exascale: Crossing the Chasm Presented by Mike Rezny, Monash University, Australia michael.rezny@monash.edu Crossing the Chasm meeting Reading, 24 th October 2016 Version 0.1 In collaboration
More informationNuclear Physics and Computing: Exascale Partnerships. Juan Meza Senior Scientist Lawrence Berkeley National Laboratory
Nuclear Physics and Computing: Exascale Partnerships Juan Meza Senior Scientist Lawrence Berkeley National Laboratory Nuclear Science and Exascale i Workshop held in DC to identify scientific challenges
More informationDFT / SIESTA algorithms
DFT / SIESTA algorithms Javier Junquera José M. Soler References http://siesta.icmab.es Documentation Tutorials Atomic units e = m e = =1 atomic mass unit = m e atomic length unit = 1 Bohr = 0.5292 Ang
More informationquantum ESPRESSO stands for Quantum open-source Package for Research in Electronic Structure, Simulation, and Optimization
The quantum ESPRESSO distribution The IOM-DEMOCRITOS center of Italian CNR is dedicated to atomistic simulations of materials, with a strong emphasis on the development of high-quality scientific software
More informationMSC HPC Infrastructure Update. Alain St-Denis Canadian Meteorological Centre Meteorological Service of Canada
MSC HPC Infrastructure Update Alain St-Denis Canadian Meteorological Centre Meteorological Service of Canada Outline HPC Infrastructure Overview Supercomputer Configuration Scientific Direction 2 IT Infrastructure
More informationCarlo Cavazzoni, HPC department, CINECA
Large Scale Parallelism Carlo Cavazzoni, HPC department, CINECA Parallel Architectures Two basic architectural scheme: Distributed Memory Shared Memory Now most computers have a mixed architecture + accelerators
More informationBroyden Mixing for Nuclear Density Functional Calculations
Broyden Mixing for Nuclear Density Functional Calculations M.V. Stoitsov 1 Department of Physics and Astronomy, University of Tennessee, Knoxville, TN 37996, USA 2 Physics Division, Oak Ridge National
More informationLarge Scale Electronic Structure Calculations
Large Scale Electronic Structure Calculations Jürg Hutter University of Zurich 8. September, 2008 / Speedup08 CP2K Program System GNU General Public License Community Developers Platform on "Berlios" (cp2k.berlios.de)
More informationDGDFT: A Massively Parallel Method for Large Scale Density Functional Theory Calculations
DGDFT: A Massively Parallel Method for Large Scale Density Functional Theory Calculations The recently developed discontinuous Galerkin density functional theory (DGDFT)[21] aims at reducing the number
More informationPoisson Solver, Pseudopotentials, Atomic Forces in the BigDFT code
CECAM Tutorial on Wavelets in DFT, CECAM - LYON,, in the BigDFT code Kernel Luigi Genovese L_Sim - CEA Grenoble 28 November 2007 Outline, Kernel 1 The with Interpolating Scaling Functions in DFT for Interpolating
More informationWelcome to MCS 572. content and organization expectations of the course. definition and classification
Welcome to MCS 572 1 About the Course content and organization expectations of the course 2 Supercomputing definition and classification 3 Measuring Performance speedup and efficiency Amdahl s Law Gustafson
More informationReflecting on the Goal and Baseline of Exascale Computing
Reflecting on the Goal and Baseline of Exascale Computing Thomas C. Schulthess!1 Tracking supercomputer performance over time? Linpack benchmark solves: Ax = b!2 Tracking supercomputer performance over
More informationDilute Magnetic Semiconductors
John von Neumann Institute for Computing Dilute Magnetic Semiconductors L. Bergqvist, P. H. Dederichs published in NIC Symposium 28, G. Münster, D. Wolf, M. Kremer (Editors), John von Neumann Institute
More informationMulti-Length Scale Matrix Computations and Applications in Quantum Mechanical Simulations
Multi-Length Scale Matrix Computations and Applications in Quantum Mechanical Simulations Zhaojun Bai http://www.cs.ucdavis.edu/ bai joint work with Wenbin Chen, Roger Lee, Richard Scalettar, Ichitaro
More informationAccelerating linear algebra computations with hybrid GPU-multicore systems.
Accelerating linear algebra computations with hybrid GPU-multicore systems. Marc Baboulin INRIA/Université Paris-Sud joint work with Jack Dongarra (University of Tennessee and Oak Ridge National Laboratory)
More informationCP2K: the gaussian plane wave (GPW) method
CP2K: the gaussian plane wave (GPW) method Basis sets and Kohn-Sham energy calculation R. Vuilleumier Département de chimie Ecole normale supérieure Paris Tutorial CPMD-CP2K CPMD and CP2K CPMD CP2K http://www.cpmd.org
More informationOpportunities from Accurate and Efficient Density Functional Theory Calculations for Large Systems
Seminar CENTRE FOR PREDICTIVE MODELLING, WARWICK Opportunities from Accurate and Efficient Density Functional Theory Calculations for Large Systems Luigi Genovese L_Sim CEA Grenoble October 30, 2017 http://bigdft.org
More informationDensity Functional Theory
Density Functional Theory Iain Bethune EPCC ibethune@epcc.ed.ac.uk Overview Background Classical Atomistic Simulation Essential Quantum Mechanics DFT: Approximations and Theory DFT: Implementation using
More informationMassively parallel electronic structure calculations with Python software. Jussi Enkovaara Software Engineering CSC the finnish IT center for science
Massively parallel electronic structure calculations with Python software Jussi Enkovaara Software Engineering CSC the finnish IT center for science GPAW Software package for electronic structure calculations
More informationPseudopotentials: design, testing, typical errors
Pseudopotentials: design, testing, typical errors Kevin F. Garrity Part 1 National Institute of Standards and Technology (NIST) Uncertainty Quantification in Materials Modeling 2015 Parameter free calculations.
More informationStochastic Modelling of Electron Transport on different HPC architectures
Stochastic Modelling of Electron Transport on different HPC architectures www.hp-see.eu E. Atanassov, T. Gurov, A. Karaivan ova Institute of Information and Communication Technologies Bulgarian Academy
More informationSPARSE SOLVERS POISSON EQUATION. Margreet Nool. November 9, 2015 FOR THE. CWI, Multiscale Dynamics
SPARSE SOLVERS FOR THE POISSON EQUATION Margreet Nool CWI, Multiscale Dynamics November 9, 2015 OUTLINE OF THIS TALK 1 FISHPACK, LAPACK, PARDISO 2 SYSTEM OVERVIEW OF CARTESIUS 3 POISSON EQUATION 4 SOLVERS
More informationProjector-Augmented Wave Method:
Projector-Augmented Wave Method: An introduction Peter E. Blöchl Clausthal University of Technology Germany http://www.pt.tu-clausthal.de/atp/ 23. Juli 2003 Why PAW all-electron wave functions (EFG s,
More informationComparing the Efficiency of Iterative Eigenvalue Solvers: the Quantum ESPRESSO experience
Comparing the Efficiency of Iterative Eigenvalue Solvers: the Quantum ESPRESSO experience Stefano de Gironcoli Scuola Internazionale Superiore di Studi Avanzati Trieste-Italy 0 Diagonalization of the Kohn-Sham
More informationScalability Programme at ECMWF
Scalability Programme at ECMWF Picture: Stan Tomov, ICL, University of Tennessee, Knoxville Peter Bauer, Mike Hawkins, George Mozdzynski, Tiago Quintino, Deborah Salmond, Stephan Siemen, Yannick Trémolet
More informationKey concepts in Density Functional Theory (II)
Kohn-Sham scheme and band structures European Theoretical Spectroscopy Facility (ETSF) CNRS - Laboratoire des Solides Irradiés Ecole Polytechnique, Palaiseau - France Present Address: LPMCN Université
More informationBlock Iterative Eigensolvers for Sequences of Dense Correlated Eigenvalue Problems
Mitglied der Helmholtz-Gemeinschaft Block Iterative Eigensolvers for Sequences of Dense Correlated Eigenvalue Problems Birkbeck University, London, June the 29th 2012 Edoardo Di Napoli Motivation and Goals
More informationBasic introduction of NWChem software
Basic introduction of NWChem software Background NWChem is part of the Molecular Science Software Suite Designed and developed to be a highly efficient and portable Massively Parallel computational chemistry
More informationSupporting Information
Supporting Information The Origin of Active Oxygen in a Ternary CuO x /Co 3 O 4 -CeO Catalyst for CO Oxidation Zhigang Liu, *, Zili Wu, *, Xihong Peng, ++ Andrew Binder, Songhai Chai, Sheng Dai *,, School
More informationQuantum Cluster Simulations of Low D Systems
Quantum Cluster Simulations of Low D Systems Electronic Correlations on Many Length Scales M. Jarrell, University of Cincinnati High Perf. QMC Hybrid Method SP Sep. in 1D NEW MEM SP Sep. in 1D 2-Chain
More informationRecent Developments in the ELSI Infrastructure for Large-Scale Electronic Structure Theory
elsi-interchange.org MolSSI Workshop and ELSI Conference 2018, August 15, 2018, Richmond, VA Recent Developments in the ELSI Infrastructure for Large-Scale Electronic Structure Theory Victor Yu 1, William
More informationWRF performance tuning for the Intel Woodcrest Processor
WRF performance tuning for the Intel Woodcrest Processor A. Semenov, T. Kashevarova, P. Mankevich, D. Shkurko, K. Arturov, N. Panov Intel Corp., pr. ak. Lavrentieva 6/1, Novosibirsk, Russia, 630090 {alexander.l.semenov,tamara.p.kashevarova,pavel.v.mankevich,
More informationStudents & Postdocs Collaborators
Advancing first-principle symmetry-guided nuclear modeling for studies of nucleosynthesis and fundamental symmetries in nature Students & Postdocs Collaborators NCSA Blue Waters Symposium for Petascale
More informationWouldn t it be great if
IDEMA DISKCON Asia-Pacific 2009 Spin Torque MRAM with Perpendicular Magnetisation: A Scalable Path for Ultra-high Density Non-volatile Memory Dr. Randall Law Data Storage Institute Agency for Science Technology
More informationReferences. Documentation Manuals Tutorials Publications
References http://siesta.icmab.es Documentation Manuals Tutorials Publications Atomic units e = m e = =1 atomic mass unit = m e atomic length unit = 1 Bohr = 0.5292 Ang atomic energy unit = 1 Hartree =
More informationHYCOM and Navy ESPC Future High Performance Computing Needs. Alan J. Wallcraft. COAPS Short Seminar November 6, 2017
HYCOM and Navy ESPC Future High Performance Computing Needs Alan J. Wallcraft COAPS Short Seminar November 6, 2017 Forecasting Architectural Trends 3 NAVY OPERATIONAL GLOBAL OCEAN PREDICTION Trend is higher
More informationSome notes on efficient computing and setting up high performance computing environments
Some notes on efficient computing and setting up high performance computing environments Andrew O. Finley Department of Forestry, Michigan State University, Lansing, Michigan. April 17, 2017 1 Efficient
More information