MPI at MPI. Jens Saak. Max Planck Institute for Dynamics of Complex Technical Systems Computational Methods in Systems and Control Theory

Size: px
Start display at page:

Download "MPI at MPI. Jens Saak. Max Planck Institute for Dynamics of Complex Technical Systems Computational Methods in Systems and Control Theory"

Transcription

1 MAX PLANCK INSTITUTE November 5, 2010 MPI at MPI Jens Saak Max Planck Institute for Dynamics of Complex Technical Systems Computational Methods in Systems and Control Theory FOR DYNAMICS OF COMPLEX TECHNICAL SYSTEMS MAGDEBURG Max Planck Institute Magdeburg Jens Saak, MPI at MPI 1/10

2 Parallel Computing Basics Why Parallel Computing? O nce upon a time... Max Planck Institute Magdeburg Jens Saak, MPI at MPI 2/10

3 Parallel Computing Basics Why Parallel Computing? O nce upon a time......the world was simple... Max Planck Institute Magdeburg Jens Saak, MPI at MPI 2/10

4 Parallel Computing Basics Why Parallel Computing? O nce upon a time......the world was simple......and problems were small... Max Planck Institute Magdeburg Jens Saak, MPI at MPI 2/10

5 Parallel Computing Basics Why Parallel Computing? O nce upon a time......the world was simple......and problems were small......all computations could be carried out on a single computer. Max Planck Institute Magdeburg Jens Saak, MPI at MPI 2/10

6 Parallel Computing Basics Why Parallel Computing? Then the world grew complicated and problems became large and difficult. Max Planck Institute Magdeburg Jens Saak, MPI at MPI 2/10

7 Parallel Computing Basics Why Parallel Computing? Then the world grew complicated and problems became large and difficult. Computations became to challenging for a single computer. Max Planck Institute Magdeburg Jens Saak, MPI at MPI 2/10

8 Parallel Computing Basics Why Parallel Computing? Then the world grew complicated and problems became large and difficult. Computations became to challenging for a single computer. Therefore problems were split Max Planck Institute Magdeburg Jens Saak, MPI at MPI 2/10

9 Parallel Computing Basics Why Parallel Computing? Then the world grew complicated and problems became large and difficult. Computations became to challenging for a single computer. Therefore problems were split and distributed to multiple computers Max Planck Institute Magdeburg Jens Saak, MPI at MPI 2/10

10 Parallel Computing Basics Why Parallel Computing? Then the world grew complicated and problems became large and difficult. Computations became to challenging for a single computer. Therefore problems were split and distributed to multiple computers jointly finding their solution. Max Planck Institute Magdeburg Jens Saak, MPI at MPI 2/10

11 Parallel Computing Basics Types of Parallelism Quasi Parallelism, or Multitasking modern operating systems emulate parallelism via time-slicing Max Planck Institute Magdeburg Jens Saak, MPI at MPI 3/10

12 Parallel Computing Basics Types of Parallelism Quasi Parallelism, or Multitasking modern operating systems emulate parallelism via time-slicing Distributed Memory Parallelism computation on separate units with own memory on each unit Max Planck Institute Magdeburg Jens Saak, MPI at MPI 3/10

13 Parallel Computing Basics Types of Parallelism Quasi Parallelism, or Multitasking modern operating systems emulate parallelism via time-slicing Distributed Memory Parallelism computation on separate units with own memory on each unit Shared Memory Parallelism multiple computation units access a common memory Max Planck Institute Magdeburg Jens Saak, MPI at MPI 3/10

14 Parallel Computing Basics Implementations and Standards Quasi Parallelism operating systems scheduler processes process-threads (e.g. POSIX-threads, pthreads) Max Planck Institute Magdeburg Jens Saak, MPI at MPI 4/10

15 Parallel Computing Basics Implementations and Standards Quasi Parallelism operating systems scheduler processes process-threads (e.g. POSIX-threads, pthreads) Distributed Memory Parallelism Data handling via the Message Passing Interface (MPI) (standard for a collection of operations and their semantics, NOT prescribing the actual protocol or implementation.) MVAPICH/MVAPICH2 (implements MPI-1/MPI-2, BSD-License) OpenMPI (implements MPI-1, MPI-2, freely available) MPICH2 (implements MPI-1, MPI-2, MPI-2.1, MPI-2.2, freely available) Max Planck Institute Magdeburg Jens Saak, MPI at MPI 4/10

16 Parallel Computing Basics Implementations and Standards Quasi Parallelism operating systems scheduler processes process-threads (e.g. POSIX-threads, pthreads) Distributed Memory Parallelism Data handling via the Message Passing Interface (MPI) (standard for a collection of operations and their semantics, NOT prescribing the actual protocol or implementation.) MVAPICH/MVAPICH2 (implements MPI-1/MPI-2, BSD-License) OpenMPI (implements MPI-1, MPI-2, freely available) MPICH2 (implements MPI-1, MPI-2, MPI-2.1, MPI-2.2, freely available) Shared Memory Parallelism OpenMP API: compiler and preprocessor directives. (latest version 3.0; May 2008) GCC and later Intel Compilers 10.1 and later Oracle, IBM, Cray,... Max Planck Institute Magdeburg Jens Saak, MPI at MPI 4/10

17 Parallel Computing at MPI Magdeburg otto node[1-16] Typ ottoapp Megware appliance [1-2] [1-2] node[17-32] Typ [17-32] [17-32] otto[1-2] Frontend [40-41] ibr[1-2] [40-41] [40-41] [40-41] [17-32] node[33-48] Typ [33-48] [33-48] [1-11] [33-48] [ ] [ ] Institutsnetz dns: ntp: ntp.mpi-magdeburg.mpg.de Max Planck Institute Magdeburg Jens Saak, MPI at MPI 5/10

18 Parallel Computing at MPI Magdeburg otto frontend and service nodes ottoapp Megware appliance [1-2] node[1-16] Typ [1-2] node[17-32] Typ [17-32] [17-32] otto[1-2] Frontend [40-41] ibr[1-2] [40-41] [40-41] [40-41] [17-32] node[33-48] Typ [33-48] [33-48] [1-11] [33-48] [ ] [ ] Institutsnetz dns: ntp: ntp.mpi-magdeburg.mpg.de Max Planck Institute Magdeburg Jens Saak, MPI at MPI 5/10

19 Parallel Computing at MPI Magdeburg otto frontend and service nodes ottoapp Megware appliance [1-2] node[1-16] Typ [1-2] node[17-32] Typ [17-32] [17-32] otto[1-2] Frontend Cluster-Management: [40-41] ibr[1-2] [17-32] [40-41] MEGWARE ClusterWare [40-41] Appliance [40-41] [33-48] IPMI capable (Intelligent Platform Management Interface) [1-11] integration to job queueing system [33-48] [33-48] node[33-48] Typ [ ] [ ] Institutsnetz dns: ntp: ntp.mpi-magdeburg.mpg.de Max Planck Institute Magdeburg Jens Saak, MPI at MPI 5/10

20 Parallel Computing at MPI Magdeburg otto frontend and service nodes ottoapp Megware appliance otto[1-2] Frontend [1-2] frontend nodes: [1-2] [40-41] [1-11] user interaction, binary creation, job queueing,... ibr[1-2] [40-41] [40-41] [40-41] node[1-16] Typ 2 node[17-32] Typ [17-32] [17-32] [17-32] node[33-48] Typ [33-48] [33-48] [33-48] Institutsnetz dns: ntp: ntp.mpi-magdeburg.mpg.de [ ] [ ] Max Planck Institute Magdeburg Jens Saak, MPI at MPI 5/10

21 Parallel Computing at MPI Magdeburg otto network and storage node[1-16] Typ ottoapp Megware appliance [1-2] [1-2] node[17-32] Typ [17-32] [17-32] otto[1-2] Frontend [40-41] ibr[1-2] [40-41] [40-41] [40-41] [17-32] node[33-48] Typ [33-48] [33-48] [1-11] [33-48] service network [ ] [ ] Institutsnetz dns: ntp: ntp.mpi-magdeburg.mpg.de Max Planck Institute Magdeburg Jens Saak, MPI at MPI 5/10

22 Parallel Computing at MPI Magdeburg otto network and storage node[1-16] Typ ottoapp Megware appliance [1-2] [1-2] node[17-32] Typ [17-32] storage: Panasas PanFS-8 netto capacity: 28TB otto[1-2] Frontend connect: 2x10GBit Ethernet [40-41] [1-11] ibr[1-2] [40-41] [40-41] [40-41] [17-32] [17-32] node[33-48] Typ [33-48] [33-48] [33-48] service network Institutsnetz dns: ntp: ntp.mpi-magdeburg.mpg.de [ ] [ ] Max Planck Institute Magdeburg Jens Saak, MPI at MPI 5/10

23 Parallel Computing at MPI Magdeburg otto network and storage node[1-16] Typ ottoapp Megware appliance [1-2] [1-2] node[17-32] Typ [17-32] [17-32] otto[1-2] Frontend communication network: x Voltaire QDR Infiniband switches 36 ports, 8port inter-connnect transfer rate: 40GBit/s [40-41] [1-11] ibr[1-2] [40-41] [40-41] [40-41] [17-32] node[33-48] Typ [33-48] [33-48] [33-48] service network [ ] [ ] Institutsnetz dns: ntp: ntp.mpi-magdeburg.mpg.de Max Planck Institute Magdeburg Jens Saak, MPI at MPI 5/10

24 Parallel Computing at MPI Magdeburg otto node[1-16] Typ ottoapp Megware appliance [1-2] [1-2] node[17-32] Typ [17-32] [17-32] otto[1-2] Frontend [40-41] ibr[1-2] [40-41] [40-41] [40-41] [17-32] node[33-48] Typ [33-48] [33-48] [1-11] [33-48] compute nodes [ ] [ ] Institutsnetz dns: ntp: ntp.mpi-magdeburg.mpg.de Max Planck Institute Magdeburg Jens Saak, MPI at MPI 5/10

25 Parallel Computing at MPI Magdeburg otto node[1-16] Typ ottoapp Megware appliance [1-2] [1-2] node[17-32] Typ [17-32] [17-32] Institutsnetz dns: otto[1-2] Frontend ntp: ntp.mpi-magdeburg.mpg.de MEGWARE MiriQuid X5600: [40-41] ibr[1-2] [40-41] [40-41] [40-41] [1-11] x Xeon Westmere X GHz 6 cores per CPU peak performance per node: 127,68GFlops RAM: 48GB DDR3 ECC L3 Cache: 12MB [ ] [17-32] node[33-48] Typ [33-48] [33-48] [33-48] compute nodes [ ] Max Planck Institute Magdeburg Jens Saak, MPI at MPI 5/10

26 Parallel Computing at MPI Magdeburg otto node[1-16] Typ ottoapp Megware appliance [1-2] MEGWARE Saxonid C6100: otto[1-2] Frontend L3 Cache: 12MB [1-2] 2x AMD Opteron 6168 Magny-Cours 1,9GHz 12 cores per CPU RAM: 96GB DDR3 ECC hard disc: 650GB SAS (15000rps) [40-41] [1-11] ibr[1-2] [40-41] [40-41] [40-41] node[17-32] Typ [17-32] [17-32] [17-32] node[33-48] Typ [33-48] [33-48] [33-48] compute nodes [ ] [ ] Institutsnetz dns: ntp: ntp.mpi-magdeburg.mpg.de Max Planck Institute Magdeburg Jens Saak, MPI at MPI 5/10

27 Parallel Computing at MPI Magdeburg otto node[1-16] Typ ottoapp Megware appliance [1-2] [1-2] node[17-32] Typ [17-32] [17-32] otto[1-2] Frontend [40-41] ibr[1-2] [40-41] [40-41] [40-41] [17-32] node[33-48] Typ [33-48] [33-48] [1-11] [33-48] Institutsnetz dns: ntp: ntp.mpi-magdeburg.mpg.de [ ] quasi parallel: e.g., threaded I/O, distributed: every node has its own RAM, [ ] shared: 12/24 cores per node access the common RAM. Max Planck Institute Magdeburg Jens Saak, MPI at MPI 5/10

28 Parallel Computing at MPI Magdeburg Molecular Simulations and Design From wavefunctions to proteins in networks The Molecular Simulations and Design (MSD) group aims at an explanation of chemical and biological phenomena at molecular level. Complex phenomena in chemistry and biology MSD group develops and applies tools from Quantum mechanics QM/MM hybrid simulations Molecular dynamics Brownian dynamics Bioinformatics Protein structural modeling Contact: Dr. Matthias Stein Max Planck Institute Magdeburg Jens Saak, MPI at MPI 6/10

29 Parallel Computing at MPI Magdeburg Computational Methods in Systems and Control Theory (CSC) Interests: Mathematical methods for application in systems and control theory and their numerical realization, with focus on Numerical Linear Algebra. Parallel Computing: accelerate solvers on modern multi-core processors solve very large matrix equations, model order reduction problems, or optimal control problems on high performance parallel machines. Parallel Computing aware Software developed in CSC: PLiCMR C.M.E.S.S. Max Planck Institute Magdeburg Jens Saak, MPI at MPI 7/10

30 Parallel Computing in CSC PLiCMR Parallel Library in Control and Model Reduction. Distributed memory parallel model order reduction via Balanced Truncation Singular Perturbation Approximation Hankel-Norm Approximation Partners: High Performance Computing and Architectures Group (University Jaume I; Castellón; Spain) Max Planck Institute Magdeburg Jens Saak, MPI at MPI 8/10

31 Parallel Computing in CSC C.M.E.S.S. C-language Matrix Equation Sparse Solver C-Library for solving large sparse: Lyapunov equations FX + XF T = GG T, FXE T + EXF T = GG T, See poster, as well. Max Planck Institute Magdeburg Jens Saak, MPI at MPI 9/10

32 Parallel Computing in CSC C.M.E.S.S. C-language Matrix Equation Sparse Solver C-Library for solving large sparse: Lyapunov equations FX + XF T = GG T, FXE T + EXF T = GG T, Riccati equations C + A T X + XA XBX = 0, C + A T XE + E T XA E T XBXE = 0 See poster, as well. Max Planck Institute Magdeburg Jens Saak, MPI at MPI 9/10

33 Parallel Computing in CSC C.M.E.S.S. C-language Matrix Equation Sparse Solver C-Library for solving large sparse: Lyapunov equations FX + XF T = GG T, FXE T + EXF T = GG T, Riccati equations C + A T X + XA XBX = 0, C + A T XE + E T XA E T XBXE = 0 Model Reduction Balanced Truncation (first and second order systems) H 2-optimal Reduction (IRKA, TSIA) See poster, as well. Max Planck Institute Magdeburg Jens Saak, MPI at MPI 9/10

34 Parallel Computing in CSC C.M.E.S.S. C-language Matrix Equation Sparse Solver C-Library for solving large sparse: Lyapunov equations FX + XF T = GG T, FXE T + EXF T = GG T, Riccati equations C + A T X + XA XBX = 0, C + A T XE + E T XA E T XBXE = 0 Model Reduction Balanced Truncation (first and second order systems) H 2-optimal Reduction (IRKA, TSIA) Linear Quadratic Regulator Problems See poster, as well. Max Planck Institute Magdeburg Jens Saak, MPI at MPI 9/10

35 Parallel Computing in CSC C.M.E.S.S. C-language Matrix Equation Sparse Solver C-Library for solving large sparse: Lyapunov equations FX + XF T = GG T, FXE T + EXF T = GG T, Riccati equations C + A T X + XA XBX = 0, C + A T XE + E T XA E T XBXE = 0 Model Reduction Balanced Truncation (first and second order systems) H 2-optimal Reduction (IRKA, TSIA) Linear Quadratic Regulator Problems Currently focused on multi-core desktop computers, i.e., shared memory. See poster, as well. Max Planck Institute Magdeburg Jens Saak, MPI at MPI 9/10

36 The End Read MPI at MPI as Max Planck Institute Magdeburg Jens Saak, MPI at MPI 10/10

37 The End Read MPI at MPI as Message Passing Interface at Max Planck Institute Max Planck Institute Magdeburg Jens Saak, MPI at MPI 10/10

38 The End Read MPI at MPI as Message Passing Interface at Max Planck Institute Meet otto at MPI this November, Max Planck Institute Magdeburg Jens Saak, MPI at MPI 10/10

39 The End Read MPI at MPI as Message Passing Interface at Max Planck Institute Meet otto at MPI this November, and let us hope that this story ends: Max Planck Institute Magdeburg Jens Saak, MPI at MPI 10/10

40 The End Read MPI at MPI as Message Passing Interface at Max Planck Institute Meet otto at MPI this November, and let us hope that this story ends:...and they all computed happily ever after. Max Planck Institute Magdeburg Jens Saak, MPI at MPI 10/10

Cluster Computing: Updraft. Charles Reid Scientific Computing Summer Workshop June 29, 2010

Cluster Computing: Updraft. Charles Reid Scientific Computing Summer Workshop June 29, 2010 Cluster Computing: Updraft Charles Reid Scientific Computing Summer Workshop June 29, 2010 Updraft Cluster: Hardware 256 Dual Quad-Core Nodes 2048 Cores 2.8 GHz Intel Xeon Processors 16 GB memory per

More information

Julian Merten. GPU Computing and Alternative Architecture

Julian Merten. GPU Computing and Alternative Architecture Future Directions of Cosmological Simulations / Edinburgh 1 / 16 Julian Merten GPU Computing and Alternative Architecture Institut für Theoretische Astrophysik Zentrum für Astronomie Universität Heidelberg

More information

A Data Communication Reliability and Trustability Study for Cluster Computing

A Data Communication Reliability and Trustability Study for Cluster Computing A Data Communication Reliability and Trustability Study for Cluster Computing Speaker: Eduardo Colmenares Midwestern State University Wichita Falls, TX HPC Introduction Relevant to a variety of sciences,

More information

Piz Daint & Piz Kesch : from general purpose supercomputing to an appliance for weather forecasting. Thomas C. Schulthess

Piz Daint & Piz Kesch : from general purpose supercomputing to an appliance for weather forecasting. Thomas C. Schulthess Piz Daint & Piz Kesch : from general purpose supercomputing to an appliance for weather forecasting Thomas C. Schulthess 1 Cray XC30 with 5272 hybrid, GPU accelerated compute nodes Piz Daint Compute node:

More information

Weather Research and Forecasting (WRF) Performance Benchmark and Profiling. July 2012

Weather Research and Forecasting (WRF) Performance Benchmark and Profiling. July 2012 Weather Research and Forecasting (WRF) Performance Benchmark and Profiling July 2012 Note The following research was performed under the HPC Advisory Council activities Participating vendors: Intel, Dell,

More information

A Newton-Galerkin-ADI Method for Large-Scale Algebraic Riccati Equations

A Newton-Galerkin-ADI Method for Large-Scale Algebraic Riccati Equations A Newton-Galerkin-ADI Method for Large-Scale Algebraic Riccati Equations Peter Benner Max-Planck-Institute for Dynamics of Complex Technical Systems Computational Methods in Systems and Control Theory

More information

Some thoughts about energy efficient application execution on NEC LX Series compute clusters

Some thoughts about energy efficient application execution on NEC LX Series compute clusters Some thoughts about energy efficient application execution on NEC LX Series compute clusters G. Wellein, G. Hager, J. Treibig, M. Wittmann Erlangen Regional Computing Center & Department of Computer Science

More information

SPARSE SOLVERS POISSON EQUATION. Margreet Nool. November 9, 2015 FOR THE. CWI, Multiscale Dynamics

SPARSE SOLVERS POISSON EQUATION. Margreet Nool. November 9, 2015 FOR THE. CWI, Multiscale Dynamics SPARSE SOLVERS FOR THE POISSON EQUATION Margreet Nool CWI, Multiscale Dynamics November 9, 2015 OUTLINE OF THIS TALK 1 FISHPACK, LAPACK, PARDISO 2 SYSTEM OVERVIEW OF CARTESIUS 3 POISSON EQUATION 4 SOLVERS

More information

Model Order Reduction via Matlab Parallel Computing Toolbox. Istanbul Technical University

Model Order Reduction via Matlab Parallel Computing Toolbox. Istanbul Technical University Model Order Reduction via Matlab Parallel Computing Toolbox E. Fatih Yetkin & Hasan Dağ Istanbul Technical University Computational Science & Engineering Department September 21, 2009 E. Fatih Yetkin (Istanbul

More information

Claude Tadonki. MINES ParisTech PSL Research University Centre de Recherche Informatique

Claude Tadonki. MINES ParisTech PSL Research University Centre de Recherche Informatique Claude Tadonki MINES ParisTech PSL Research University Centre de Recherche Informatique claude.tadonki@mines-paristech.fr Monthly CRI Seminar MINES ParisTech - CRI June 06, 2016, Fontainebleau (France)

More information

Welcome to MCS 572. content and organization expectations of the course. definition and classification

Welcome to MCS 572. content and organization expectations of the course. definition and classification Welcome to MCS 572 1 About the Course content and organization expectations of the course 2 Supercomputing definition and classification 3 Measuring Performance speedup and efficiency Amdahl s Law Gustafson

More information

Static-scheduling and hybrid-programming in SuperLU DIST on multicore cluster systems

Static-scheduling and hybrid-programming in SuperLU DIST on multicore cluster systems Static-scheduling and hybrid-programming in SuperLU DIST on multicore cluster systems Ichitaro Yamazaki University of Tennessee, Knoxville Xiaoye Sherry Li Lawrence Berkeley National Laboratory MS49: Sparse

More information

Cactus Tools for Petascale Computing

Cactus Tools for Petascale Computing Cactus Tools for Petascale Computing Erik Schnetter Reno, November 2007 Gamma Ray Bursts ~10 7 km He Protoneutron Star Accretion Collapse to a Black Hole Jet Formation and Sustainment Fe-group nuclei Si

More information

Accelerating Model Reduction of Large Linear Systems with Graphics Processors

Accelerating Model Reduction of Large Linear Systems with Graphics Processors Accelerating Model Reduction of Large Linear Systems with Graphics Processors P. Benner 1, P. Ezzatti 2, D. Kressner 3, E.S. Quintana-Ortí 4, Alfredo Remón 4 1 Max-Plank-Institute for Dynamics of Complex

More information

Performance of the fusion code GYRO on three four generations of Crays. Mark Fahey University of Tennessee, Knoxville

Performance of the fusion code GYRO on three four generations of Crays. Mark Fahey University of Tennessee, Knoxville Performance of the fusion code GYRO on three four generations of Crays Mark Fahey mfahey@utk.edu University of Tennessee, Knoxville Contents Introduction GYRO Overview Benchmark Problem Test Platforms

More information

HYCOM and Navy ESPC Future High Performance Computing Needs. Alan J. Wallcraft. COAPS Short Seminar November 6, 2017

HYCOM and Navy ESPC Future High Performance Computing Needs. Alan J. Wallcraft. COAPS Short Seminar November 6, 2017 HYCOM and Navy ESPC Future High Performance Computing Needs Alan J. Wallcraft COAPS Short Seminar November 6, 2017 Forecasting Architectural Trends 3 NAVY OPERATIONAL GLOBAL OCEAN PREDICTION Trend is higher

More information

Petascale Quantum Simulations of Nano Systems and Biomolecules

Petascale Quantum Simulations of Nano Systems and Biomolecules Petascale Quantum Simulations of Nano Systems and Biomolecules Emil Briggs North Carolina State University 1. Outline of real-space Multigrid (RMG) 2. Scalability and hybrid/threaded models 3. GPU acceleration

More information

Quantum Chemical Calculations by Parallel Computer from Commodity PC Components

Quantum Chemical Calculations by Parallel Computer from Commodity PC Components Nonlinear Analysis: Modelling and Control, 2007, Vol. 12, No. 4, 461 468 Quantum Chemical Calculations by Parallel Computer from Commodity PC Components S. Bekešienė 1, S. Sėrikovienė 2 1 Institute of

More information

CRYSTAL in parallel: replicated and distributed (MPP) data. Why parallel?

CRYSTAL in parallel: replicated and distributed (MPP) data. Why parallel? CRYSTAL in parallel: replicated and distributed (MPP) data Roberto Orlando Dipartimento di Chimica Università di Torino Via Pietro Giuria 5, 10125 Torino (Italy) roberto.orlando@unito.it 1 Why parallel?

More information

QR Factorization of Tall and Skinny Matrices in a Grid Computing Environment

QR Factorization of Tall and Skinny Matrices in a Grid Computing Environment QR Factorization of Tall and Skinny Matrices in a Grid Computing Environment Emmanuel AGULLO (INRIA / LaBRI) Camille COTI (Iowa State University) Jack DONGARRA (University of Tennessee) Thomas HÉRAULT

More information

Model Reduction for Dynamical Systems

Model Reduction for Dynamical Systems Otto-von-Guericke Universität Magdeburg Faculty of Mathematics Summer term 2015 Model Reduction for Dynamical Systems Lecture 8 Peter Benner Lihong Feng Max Planck Institute for Dynamics of Complex Technical

More information

Perm State University Research-Education Center Parallel and Distributed Computing

Perm State University Research-Education Center Parallel and Distributed Computing Perm State University Research-Education Center Parallel and Distributed Computing A 25-minute Talk (S4493) at the GPU Technology Conference (GTC) 2014 MARCH 24-27, 2014 SAN JOSE, CA GPU-accelerated modeling

More information

Implementing NNLO into MCFM

Implementing NNLO into MCFM Implementing NNLO into MCFM Downloadable from mcfm.fnal.gov A Multi-Threaded Version of MCFM, J.M. Campbell, R.K. Ellis, W. Giele, 2015 Higgs boson production in association with a jet at NNLO using jettiness

More information

Red Sky. Pushing Toward Petascale with Commodity Systems. Matthew Bohnsack. Sandia National Laboratories Albuquerque, New Mexico USA

Red Sky. Pushing Toward Petascale with Commodity Systems. Matthew Bohnsack. Sandia National Laboratories Albuquerque, New Mexico USA Red Sky Pushing Toward Petascale with Commodity Systems Matthew Bohnsack Sandia National Laboratories Albuquerque, New Mexico USA mpbohns@sandia.gov Tuesday March 9, 2010 Matthew Bohnsack (Sandia Nat l

More information

Parallel Polynomial Evaluation

Parallel Polynomial Evaluation Parallel Polynomial Evaluation Jan Verschelde joint work with Genady Yoffe University of Illinois at Chicago Department of Mathematics, Statistics, and Computer Science http://www.math.uic.edu/ jan jan@math.uic.edu

More information

Performance Analysis of Lattice QCD Application with APGAS Programming Model

Performance Analysis of Lattice QCD Application with APGAS Programming Model Performance Analysis of Lattice QCD Application with APGAS Programming Model Koichi Shirahata 1, Jun Doi 2, Mikio Takeuchi 2 1: Tokyo Institute of Technology 2: IBM Research - Tokyo Programming Models

More information

Computing least squares condition numbers on hybrid multicore/gpu systems

Computing least squares condition numbers on hybrid multicore/gpu systems Computing least squares condition numbers on hybrid multicore/gpu systems M. Baboulin and J. Dongarra and R. Lacroix Abstract This paper presents an efficient computation for least squares conditioning

More information

Efficient Implementation of Large Scale Lyapunov and Riccati Equation Solvers

Efficient Implementation of Large Scale Lyapunov and Riccati Equation Solvers Efficient Implementation of Large Scale Lyapunov and Riccati Equation Solvers Jens Saak joint work with Peter Benner (MiIT) Professur Mathematik in Industrie und Technik (MiIT) Fakultät für Mathematik

More information

WRF performance tuning for the Intel Woodcrest Processor

WRF performance tuning for the Intel Woodcrest Processor WRF performance tuning for the Intel Woodcrest Processor A. Semenov, T. Kashevarova, P. Mankevich, D. Shkurko, K. Arturov, N. Panov Intel Corp., pr. ak. Lavrentieva 6/1, Novosibirsk, Russia, 630090 {alexander.l.semenov,tamara.p.kashevarova,pavel.v.mankevich,

More information

Stochastic Modelling of Electron Transport on different HPC architectures

Stochastic Modelling of Electron Transport on different HPC architectures Stochastic Modelling of Electron Transport on different HPC architectures www.hp-see.eu E. Atanassov, T. Gurov, A. Karaivan ova Institute of Information and Communication Technologies Bulgarian Academy

More information

Parallelization of the Molecular Orbital Program MOS-F

Parallelization of the Molecular Orbital Program MOS-F Parallelization of the Molecular Orbital Program MOS-F Akira Asato, Satoshi Onodera, Yoshie Inada, Elena Akhmatskaya, Ross Nobes, Azuma Matsuura, Atsuya Takahashi November 2003 Fujitsu Laboratories of

More information

Large-scale Electronic Structure Simulations with MVAPICH2 on Intel Knights Landing Manycore Processors

Large-scale Electronic Structure Simulations with MVAPICH2 on Intel Knights Landing Manycore Processors Large-scale Electronic Structure Simulations with MVAPICH2 on Intel Knights Landing Manycore Processors Hoon Ryu, Ph.D. (E: elec1020@kisti.re.kr) Principal Researcher / Korea Institute of Science and Technology

More information

The Green Index (TGI): A Metric for Evalua:ng Energy Efficiency in HPC Systems

The Green Index (TGI): A Metric for Evalua:ng Energy Efficiency in HPC Systems The Green Index (TGI): A Metric for Evalua:ng Energy Efficiency in HPC Systems Wu Feng and Balaji Subramaniam Metrics for Energy Efficiency Energy- Delay Product (EDP) Used primarily in circuit design

More information

Parallel Asynchronous Hybrid Krylov Methods for Minimization of Energy Consumption. Langshi CHEN 1,2,3 Supervised by Serge PETITON 2

Parallel Asynchronous Hybrid Krylov Methods for Minimization of Energy Consumption. Langshi CHEN 1,2,3 Supervised by Serge PETITON 2 1 / 23 Parallel Asynchronous Hybrid Krylov Methods for Minimization of Energy Consumption Langshi CHEN 1,2,3 Supervised by Serge PETITON 2 Maison de la Simulation Lille 1 University CNRS March 18, 2013

More information

From Piz Daint to Piz Kesch : the making of a GPU-based weather forecasting system. Oliver Fuhrer and Thomas C. Schulthess

From Piz Daint to Piz Kesch : the making of a GPU-based weather forecasting system. Oliver Fuhrer and Thomas C. Schulthess From Piz Daint to Piz Kesch : the making of a GPU-based weather forecasting system Oliver Fuhrer and Thomas C. Schulthess 1 Piz Daint Cray XC30 with 5272 hybrid, GPU accelerated compute nodes Compute node:

More information

The Ongoing Development of CSDP

The Ongoing Development of CSDP The Ongoing Development of CSDP Brian Borchers Department of Mathematics New Mexico Tech Socorro, NM 87801 borchers@nmt.edu Joseph Young Department of Mathematics New Mexico Tech (Now at Rice University)

More information

TR A Comparison of the Performance of SaP::GPU and Intel s Math Kernel Library (MKL) for Solving Dense Banded Linear Systems

TR A Comparison of the Performance of SaP::GPU and Intel s Math Kernel Library (MKL) for Solving Dense Banded Linear Systems TR-0-07 A Comparison of the Performance of ::GPU and Intel s Math Kernel Library (MKL) for Solving Dense Banded Linear Systems Ang Li, Omkar Deshmukh, Radu Serban, Dan Negrut May, 0 Abstract ::GPU is a

More information

Direct Self-Consistent Field Computations on GPU Clusters

Direct Self-Consistent Field Computations on GPU Clusters Direct Self-Consistent Field Computations on GPU Clusters Guochun Shi, Volodymyr Kindratenko National Center for Supercomputing Applications University of Illinois at UrbanaChampaign Ivan Ufimtsev, Todd

More information

R. Glenn Brook, Bilel Hadri*, Vincent C. Betro, Ryan C. Hulguin, and Ryan Braby Cray Users Group 2012 Stuttgart, Germany April 29 May 3, 2012

R. Glenn Brook, Bilel Hadri*, Vincent C. Betro, Ryan C. Hulguin, and Ryan Braby Cray Users Group 2012 Stuttgart, Germany April 29 May 3, 2012 R. Glenn Brook, Bilel Hadri*, Vincent C. Betro, Ryan C. Hulguin, and Ryan Braby Cray Users Group 2012 Stuttgart, Germany April 29 May 3, 2012 * presenting author Contents Overview on AACE Overview on MIC

More information

SLOVAKIA. Slovak Hydrometeorological Institute

SLOVAKIA. Slovak Hydrometeorological Institute WWW TECHNICAL PROGRESS REPORT ON THE GLOBAL DATA- PROCESSING AND FORECASTING SYSTEM (GDPFS), AND THE ANNUAL NUMERICAL WEATHER PREDICTION (NWP) PROGRESS REPORT FOR THE YEAR 2005 SLOVAKIA Slovak Hydrometeorological

More information

Research of the new Intel Xeon Phi architecture for solving a wide range of scientific problems at JINR

Research of the new Intel Xeon Phi architecture for solving a wide range of scientific problems at JINR Research of the new Intel Xeon Phi architecture for solving a wide range of scientific problems at JINR Podgainy D.V., Streltsova O.I., Zuev M.I. on behalf of Heterogeneous Computations team HybriLIT LIT,

More information

Leveraging Task-Parallelism in Energy-Efficient ILU Preconditioners

Leveraging Task-Parallelism in Energy-Efficient ILU Preconditioners Leveraging Task-Parallelism in Energy-Efficient ILU Preconditioners José I. Aliaga Leveraging task-parallelism in energy-efficient ILU preconditioners Universidad Jaime I (Castellón, Spain) José I. Aliaga

More information

Panorama des modèles et outils de programmation parallèle

Panorama des modèles et outils de programmation parallèle Panorama des modèles et outils de programmation parallèle Sylvain HENRY sylvain.henry@inria.fr University of Bordeaux - LaBRI - Inria - ENSEIRB April 19th, 2013 1/45 Outline Introduction Accelerators &

More information

4th year Project demo presentation

4th year Project demo presentation 4th year Project demo presentation Colm Ó héigeartaigh CASE4-99387212 coheig-case4@computing.dcu.ie 4th year Project demo presentation p. 1/23 Table of Contents An Introduction to Quantum Computing The

More information

GloMAP Mode on HECToR Phase2b (Cray XT6) Mark Richardson Numerical Algorithms Group

GloMAP Mode on HECToR Phase2b (Cray XT6) Mark Richardson Numerical Algorithms Group GloMAP Mode on HECToR Phase2b (Cray XT6) Mark Richardson Numerical Algorithms Group 1 Acknowledgements NERC, NCAS Research Councils UK, HECToR Resource University of Leeds School of Earth and Environment

More information

ab initio Electronic Structure Calculations

ab initio Electronic Structure Calculations ab initio Electronic Structure Calculations New scalability frontiers using the BG/L Supercomputer C. Bekas, A. Curioni and W. Andreoni IBM, Zurich Research Laboratory Rueschlikon 8803, Switzerland ab

More information

SPECIAL PROJECT PROGRESS REPORT

SPECIAL PROJECT PROGRESS REPORT SPECIAL PROJECT PROGRESS REPORT Progress Reports should be 2 to 10 pages in length, depending on importance of the project. All the following mandatory information needs to be provided. Reporting year

More information

Binding Performance and Power of Dense Linear Algebra Operations

Binding Performance and Power of Dense Linear Algebra Operations 10th IEEE International Symposium on Parallel and Distributed Processing with Applications Binding Performance and Power of Dense Linear Algebra Operations Maria Barreda, Manuel F. Dolz, Rafael Mayo, Enrique

More information

Highly-scalable branch and bound for maximum monomial agreement

Highly-scalable branch and bound for maximum monomial agreement Highly-scalable branch and bound for maximum monomial agreement Jonathan Eckstein (Rutgers) William Hart Cynthia A. Phillips Sandia National Laboratories Sandia National Laboratories is a multi-program

More information

Lightweight Superscalar Task Execution in Distributed Memory

Lightweight Superscalar Task Execution in Distributed Memory Lightweight Superscalar Task Execution in Distributed Memory Asim YarKhan 1 and Jack Dongarra 1,2,3 1 Innovative Computing Lab, University of Tennessee, Knoxville, TN 2 Oak Ridge National Lab, Oak Ridge,

More information

Deutscher Wetterdienst

Deutscher Wetterdienst Deutscher Wetterdienst The Enhanced DWD-RAPS Suite Testing Computers, Compilers and More? Ulrich Schättler, Florian Prill, Harald Anlauf Deutscher Wetterdienst Research and Development Deutscher Wetterdienst

More information

First, a look at using OpenACC on WRF subroutine advance_w dynamics routine

First, a look at using OpenACC on WRF subroutine advance_w dynamics routine First, a look at using OpenACC on WRF subroutine advance_w dynamics routine Second, an estimate of WRF multi-node performance on Cray XK6 with GPU accelerators Based on performance of WRF kernels, what

More information

Performance evaluation of scalable optoelectronics application on large-scale Knights Landing cluster

Performance evaluation of scalable optoelectronics application on large-scale Knights Landing cluster Performance evaluation of scalable optoelectronics application on large-scale Knights Landing cluster Yuta Hirokawa Graduate School of Systems and Information Engineering, University of Tsukuba hirokawa@hpcs.cs.tsukuba.ac.jp

More information

Parallel Performance Studies for a Numerical Simulator of Atomic Layer Deposition Michael J. Reid

Parallel Performance Studies for a Numerical Simulator of Atomic Layer Deposition Michael J. Reid Section 1: Introduction Parallel Performance Studies for a Numerical Simulator of Atomic Layer Deposition Michael J. Reid During the manufacture of integrated circuits, a process called atomic layer deposition

More information

Open-Source Parallel FE Software : FrontISTR -- Performance Considerations about B/F (Byte per Flop) of SpMV on K-Supercomputer and GPU-Clusters --

Open-Source Parallel FE Software : FrontISTR -- Performance Considerations about B/F (Byte per Flop) of SpMV on K-Supercomputer and GPU-Clusters -- Parallel Processing for Energy Efficiency October 3, 2013 NTNU, Trondheim, Norway Open-Source Parallel FE Software : FrontISTR -- Performance Considerations about B/F (Byte per Flop) of SpMV on K-Supercomputer

More information

Accelerating Linear Algebra on Heterogeneous Architectures of Multicore and GPUs using MAGMA and DPLASMA and StarPU Schedulers

Accelerating Linear Algebra on Heterogeneous Architectures of Multicore and GPUs using MAGMA and DPLASMA and StarPU Schedulers UT College of Engineering Tutorial Accelerating Linear Algebra on Heterogeneous Architectures of Multicore and GPUs using MAGMA and DPLASMA and StarPU Schedulers Stan Tomov 1, George Bosilca 1, and Cédric

More information

Heterogeneous programming for hybrid CPU-GPU systems: Lessons learned from computational chemistry

Heterogeneous programming for hybrid CPU-GPU systems: Lessons learned from computational chemistry Heterogeneous programming for hybrid CPU-GPU systems: Lessons learned from computational chemistry and Eugene DePrince Argonne National Laboratory (LCF and CNM) (Eugene moved to Georgia Tech last week)

More information

Application of Maxwell Equations to Human Body Modelling

Application of Maxwell Equations to Human Body Modelling Application of Maxwell Equations to Human Body Modelling Fumie Costen Room E3, E0c at Sackville Street Building, fc@cs.man.ac.uk November 6, 00 Fumie Costen Room E3, E0c at Sackville Street Building, Application

More information

Solving Quadratic Equations with XL on Parallel Architectures

Solving Quadratic Equations with XL on Parallel Architectures Solving Quadratic Equations with XL on Parallel Architectures Cheng Chen-Mou 1, Chou Tung 2, Ni Ru-Ben 2, Yang Bo-Yin 2 1 National Taiwan University 2 Academia Sinica Taipei, Taiwan Leuven, Sept. 11, 2012

More information

Some notes on efficient computing and setting up high performance computing environments

Some notes on efficient computing and setting up high performance computing environments Some notes on efficient computing and setting up high performance computing environments Andrew O. Finley Department of Forestry, Michigan State University, Lansing, Michigan. April 17, 2017 1 Efficient

More information

VMware VMmark V1.1 Results

VMware VMmark V1.1 Results Vendor and Hardware Platform: IBM System x3950 M2 Virtualization Platform: VMware ESX 3.5.0 U2 Build 110181 Performance VMware VMmark V1.1 Results Tested By: IBM Inc., RTP, NC Test Date: 2008-09-20 Performance

More information

Open-source finite element solver for domain decomposition problems

Open-source finite element solver for domain decomposition problems 1/29 Open-source finite element solver for domain decomposition problems C. Geuzaine 1, X. Antoine 2,3, D. Colignon 1, M. El Bouajaji 3,2 and B. Thierry 4 1 - University of Liège, Belgium 2 - University

More information

One Optimized I/O Configuration per HPC Application

One Optimized I/O Configuration per HPC Application One Optimized I/O Configuration per HPC Application Leveraging I/O Configurability of Amazon EC2 Cloud Mingliang Liu, Jidong Zhai, Yan Zhai Tsinghua University Xiaosong Ma North Carolina State University

More information

Parallel PIPS-SBB Multi-level parallelism for 2-stage SMIPS. Lluís-Miquel Munguia, Geoffrey M. Oxberry, Deepak Rajan, Yuji Shinano

Parallel PIPS-SBB Multi-level parallelism for 2-stage SMIPS. Lluís-Miquel Munguia, Geoffrey M. Oxberry, Deepak Rajan, Yuji Shinano Parallel PIPS-SBB Multi-level parallelism for 2-stage SMIPS Lluís-Miquel Munguia, Geoffrey M. Oxberry, Deepak Rajan, Yuji Shinano ... Our contribution PIPS-PSBB*: Multi-level parallelism for Stochastic

More information

Benchmark of the CPMD code on CRESCO HPC Facilities for Numerical Simulation of a Magnesium Nanoparticle.

Benchmark of the CPMD code on CRESCO HPC Facilities for Numerical Simulation of a Magnesium Nanoparticle. Benchmark of the CPMD code on CRESCO HPC Facilities for Numerical Simulation of a Magnesium Nanoparticle. Simone Giusepponi a), Massimo Celino b), Salvatore Podda a), Giovanni Bracco a), Silvio Migliori

More information

Quantum ESPRESSO Performance Benchmark and Profiling. February 2017

Quantum ESPRESSO Performance Benchmark and Profiling. February 2017 Quantum ESPRESSO Performance Benchmark and Profiling February 2017 2 Note The following research was performed under the HPC Advisory Council activities Compute resource - HPC Advisory Council Cluster

More information

P214 Efficient Computation of Passive Seismic Interferometry

P214 Efficient Computation of Passive Seismic Interferometry P214 Efficient Computation of Passive Seismic Interferometry J.W. Thorbecke* (Delft University of Technology) & G.G. Drijkoningen (Delft University of Technology) SUMMARY Seismic interferometry is from

More information

JOINT WMO TECHNICAL PROGRESS REPORT ON THE GLOBAL DATA PROCESSING AND FORECASTING SYSTEM AND NUMERICAL WEATHER PREDICTION RESEARCH ACTIVITIES FOR 2006

JOINT WMO TECHNICAL PROGRESS REPORT ON THE GLOBAL DATA PROCESSING AND FORECASTING SYSTEM AND NUMERICAL WEATHER PREDICTION RESEARCH ACTIVITIES FOR 2006 JOINT WMO TECHNICAL PROGRESS REPORT ON THE GLOBAL DATA PROCESSING AND FORECASTING SYSTEM AND NUMERICAL WEATHER PREDICTION RESEARCH ACTIVITIES FOR 2006 [TURKEY/Turkish State Meteorological Service] 1. Summary

More information

Level-3 BLAS on a GPU

Level-3 BLAS on a GPU Level-3 BLAS on a GPU Picking the Low Hanging Fruit Francisco Igual 1 Gregorio Quintana-Ortí 1 Robert A. van de Geijn 2 1 Departamento de Ingeniería y Ciencia de los Computadores. University Jaume I. Castellón

More information

INITIAL INTEGRATION AND EVALUATION

INITIAL INTEGRATION AND EVALUATION INITIAL INTEGRATION AND EVALUATION OF SLATE PARALLEL BLAS IN LATTE Marc Cawkwell, Danny Perez, Arthur Voter Asim YarKhan, Gerald Ragghianti, Jack Dongarra, Introduction The aim of the joint milestone STMS10-52

More information

Quality Up in Polynomial Homotopy Continuation

Quality Up in Polynomial Homotopy Continuation Quality Up in Polynomial Homotopy Continuation Jan Verschelde joint work with Genady Yoffe University of Illinois at Chicago Department of Mathematics, Statistics, and Computer Science http://www.math.uic.edu/

More information

Position Papers of the 2013 Federated Conference on Computer Science and Information Systems pp

Position Papers of the 2013 Federated Conference on Computer Science and Information Systems pp Position Papers of the 2013 Federated Conference on Computer Science and Information Systems pp. 27 32 Performance Evaluation of MPI/OpenMP Algorithm for 3D Time Dependent Problems Ivan Lirkov Institute

More information

Investigation of an Unusual Phase Transition Freezing on heating of liquid solution

Investigation of an Unusual Phase Transition Freezing on heating of liquid solution Investigation of an Unusual Phase Transition Freezing on heating of liquid solution Calin Gabriel Floare National Institute for R&D of Isotopic and Molecular Technologies, Cluj-Napoca, Romania Max von

More information

Scalable Hybrid Programming and Performance for SuperLU Sparse Direct Solver

Scalable Hybrid Programming and Performance for SuperLU Sparse Direct Solver Scalable Hybrid Programming and Performance for SuperLU Sparse Direct Solver Sherry Li Lawrence Berkeley National Laboratory Piyush Sao Rich Vuduc Georgia Institute of Technology CUG 14, May 4-8, 14, Lugano,

More information

Parallelization of Multilevel Preconditioners Constructed from Inverse-Based ILUs on Shared-Memory Multiprocessors

Parallelization of Multilevel Preconditioners Constructed from Inverse-Based ILUs on Shared-Memory Multiprocessors Parallelization of Multilevel Preconditioners Constructed from Inverse-Based ILUs on Shared-Memory Multiprocessors J.I. Aliaga 1 M. Bollhöfer 2 A.F. Martín 1 E.S. Quintana-Ortí 1 1 Deparment of Computer

More information

Scalable and Power-Efficient Data Mining Kernels

Scalable and Power-Efficient Data Mining Kernels Scalable and Power-Efficient Data Mining Kernels Alok Choudhary, John G. Searle Professor Dept. of Electrical Engineering and Computer Science and Professor, Kellogg School of Management Director of the

More information

Chile / Dirección Meteorológica de Chile (Chilean Weather Service)

Chile / Dirección Meteorológica de Chile (Chilean Weather Service) JOINT WMO TECHNICAL PROGRESS REPORT ON THE GLOBAL DATA PROCESSING AND FORECASTING SYSTEM AND NUMERICAL WEATHER PREDICTION RESEARCH ACTIVITIES FOR 2015 Chile / Dirección Meteorológica de Chile (Chilean

More information

Efficient implementation of the overlap operator on multi-gpus

Efficient implementation of the overlap operator on multi-gpus Efficient implementation of the overlap operator on multi-gpus Andrei Alexandru Mike Lujan, Craig Pelissier, Ben Gamari, Frank Lee SAAHPC 2011 - University of Tennessee Outline Motivation Overlap operator

More information

Benchmarking program performance evaluation of Parallel programming language XcalableMP on Many core processor

Benchmarking program performance evaluation of Parallel programming language XcalableMP on Many core processor XcalableMP 1 2 2 2 Xeon Phi Xeon XcalableMP HIMENO L Phi XL 16 Xeon 1 16 Phi XcalableMP MPI XcalableMP OpenMP 16 2048 Benchmarking program performance evaluation of Parallel programming language XcalableMP

More information

Massively parallel semi-lagrangian solution of the 6d Vlasov-Poisson problem

Massively parallel semi-lagrangian solution of the 6d Vlasov-Poisson problem Massively parallel semi-lagrangian solution of the 6d Vlasov-Poisson problem Katharina Kormann 1 Klaus Reuter 2 Markus Rampp 2 Eric Sonnendrücker 1 1 Max Planck Institut für Plasmaphysik 2 Max Planck Computing

More information

Introduction to Benchmark Test for Multi-scale Computational Materials Software

Introduction to Benchmark Test for Multi-scale Computational Materials Software Introduction to Benchmark Test for Multi-scale Computational Materials Software Shun Xu*, Jian Zhang, Zhong Jin xushun@sccas.cn Computer Network Information Center Chinese Academy of Sciences (IPCC member)

More information

ASTRA BASED SWARM OPTIMIZATIONS OF THE BERLinPro INJECTOR

ASTRA BASED SWARM OPTIMIZATIONS OF THE BERLinPro INJECTOR ASTRA BASED SWARM OPTIMIZATIONS OF THE BERLinPro INJECTOR M. Abo-Bakr FRAAC4 Michael Abo-Bakr 2012 ICAP Warnemünde 1 Content: Introduction ERL s Introduction BERLinPro Motivation Computational Aspects

More information

Performance Evaluation of MPI on Weather and Hydrological Models

Performance Evaluation of MPI on Weather and Hydrological Models NCAR/RAL Performance Evaluation of MPI on Weather and Hydrological Models Alessandro Fanfarillo elfanfa@ucar.edu August 8th 2018 Cheyenne - NCAR Supercomputer Cheyenne is a 5.34-petaflops, high-performance

More information

GPU acceleration of Newton s method for large systems of polynomial equations in double double and quad double arithmetic

GPU acceleration of Newton s method for large systems of polynomial equations in double double and quad double arithmetic GPU acceleration of Newton s method for large systems of polynomial equations in double double and quad double arithmetic Jan Verschelde joint work with Xiangcheng Yu University of Illinois at Chicago

More information

Multitasking Polynomial Homotopy Continuation in PHCpack. Jan Verschelde

Multitasking Polynomial Homotopy Continuation in PHCpack. Jan Verschelde Multitasking Polynomial Homotopy Continuation in PHCpack Jan Verschelde University of Illinois at Chicago Department of Mathematics, Statistics, and Computer Science http://www.math.uic.edu/ jan jan@math.uic.edu

More information

ELSI: A Unified Software Interface for Kohn-Sham Electronic Structure Solvers

ELSI: A Unified Software Interface for Kohn-Sham Electronic Structure Solvers ELSI: A Unified Software Interface for Kohn-Sham Electronic Structure Solvers Victor Yu and the ELSI team Department of Mechanical Engineering & Materials Science Duke University Kohn-Sham Density-Functional

More information

S N. hochdimensionaler Lyapunov- und Sylvestergleichungen. Peter Benner. Mathematik in Industrie und Technik Fakultät für Mathematik TU Chemnitz

S N. hochdimensionaler Lyapunov- und Sylvestergleichungen. Peter Benner. Mathematik in Industrie und Technik Fakultät für Mathematik TU Chemnitz Ansätze zur numerischen Lösung hochdimensionaler Lyapunov- und Sylvestergleichungen Peter Benner Mathematik in Industrie und Technik Fakultät für Mathematik TU Chemnitz S N SIMULATION www.tu-chemnitz.de/~benner

More information

Tips Geared Towards R. Adam J. Suarez. Arpil 10, 2015

Tips Geared Towards R. Adam J. Suarez. Arpil 10, 2015 Tips Geared Towards R Departments of Statistics North Carolina State University Arpil 10, 2015 1 / 30 Advantages of R As an interpretive and interactive language, developing an algorithm in R can be done

More information

The NRC ESSI Simulator

The NRC ESSI Simulator B. Jeremić, A. Kammerer, N. Tafazzoli, B. Kamrani University of California, Davis, CA Lawrence Berkeley National Laboratory, Berkeley, CA U.S. Nuclear Regulatory Commission, Washington DC SMiRT 21, New

More information

INTENSIVE COMPUTATION. Annalisa Massini

INTENSIVE COMPUTATION. Annalisa Massini INTENSIVE COMPUTATION Annalisa Massini 2015-2016 Course topics The course will cover topics that are in some sense related to intensive computation: Matlab (an introduction) GPU (an introduction) Sparse

More information

Opportunities for ELPA to Accelerate the Solution of the Bethe-Salpeter Eigenvalue Problem

Opportunities for ELPA to Accelerate the Solution of the Bethe-Salpeter Eigenvalue Problem Opportunities for ELPA to Accelerate the Solution of the Bethe-Salpeter Eigenvalue Problem Peter Benner, Andreas Marek, Carolin Penke August 16, 2018 ELSI Workshop 2018 Partners: The Problem The Bethe-Salpeter

More information

High-performance processing and development with Madagascar. July 24, 2010 Madagascar development team

High-performance processing and development with Madagascar. July 24, 2010 Madagascar development team High-performance processing and development with Madagascar July 24, 2010 Madagascar development team Outline 1 HPC terminology and frameworks 2 Utilizing data parallelism 3 HPC development with Madagascar

More information

A heuristic algorithm for the Aircraft Landing Problem

A heuristic algorithm for the Aircraft Landing Problem 22nd International Congress on Modelling and Simulation, Hobart, Tasmania, Australia, 3 to 8 December 2017 mssanz.org.au/modsim2017 A heuristic algorithm for the Aircraft Landing Problem Amir Salehipour

More information

Presentation Outline

Presentation Outline Parallel Multi-Zone Methods for Large- Scale Multidisciplinary Computational Physics Simulations Ding Li, Guoping Xia and Charles L. Merkle Purdue University The 6th International Conference on Linux Clusters

More information

Supercomputer Programme

Supercomputer Programme Supercomputer Programme A seven-year programme to enhance the computational and numerical prediction capabilities of the Bureau s forecast and warning services. Tim Pugh, Lesley Seebeck, Tennessee Leeuwenburg,

More information

Making electronic structure methods scale: Large systems and (massively) parallel computing

Making electronic structure methods scale: Large systems and (massively) parallel computing AB Making electronic structure methods scale: Large systems and (massively) parallel computing Ville Havu Department of Applied Physics Helsinki University of Technology - TKK Ville.Havu@tkk.fi 1 Outline

More information

CRYSTAL in parallel: replicated and distributed (MPP) data

CRYSTAL in parallel: replicated and distributed (MPP) data CRYSTAL in parallel: replicated and distributed (MPP) data Lorenzo Maschio Dipar0mento di Chimica, Università di Torino lorenzo.maschio@unito.it Several slides courtesy of Roberto Orlando lorenzo.maschio@unito.it

More information

JOINT WMO TECHNICAL PROGRESS REPORT ON THE GLOBAL DATA PROCESSING AND FORECASTING SYSTEM AND NUMERICAL WEATHER PREDICTION RESEARCH ACTIVITIES FOR 2007

JOINT WMO TECHNICAL PROGRESS REPORT ON THE GLOBAL DATA PROCESSING AND FORECASTING SYSTEM AND NUMERICAL WEATHER PREDICTION RESEARCH ACTIVITIES FOR 2007 JOINT WMO TECHNICAL PROGRESS REPORT ON THE GLOBAL DATA PROCESSING AND FORECASTING SYSTEM AND NUMERICAL WEATHER PREDICTION RESEARCH ACTIVITIES FOR 2007 [TURKEY/Turkish State Meteorological Service] 1. Summary

More information

Reducing Noisy-Neighbor Impact with a Fuzzy Affinity- Aware Scheduler

Reducing Noisy-Neighbor Impact with a Fuzzy Affinity- Aware Scheduler Reducing Noisy-Neighbor Impact with a Fuzzy Affinity- Aware Scheduler L U I S T O M Á S A N D J O H A N T O R D S S O N D E PA R T M E N T O F C O M P U T I N G S C I E N C E U M E Å U N I V E R S I T

More information

A Massively Parallel Eigenvalue Solver for Small Matrices on Multicore and Manycore Architectures

A Massively Parallel Eigenvalue Solver for Small Matrices on Multicore and Manycore Architectures A Massively Parallel Eigenvalue Solver for Small Matrices on Multicore and Manycore Architectures Manfred Liebmann Technische Universität München Chair of Optimal Control Center for Mathematical Sciences,

More information