A quick introduction to MPI (Message Passing Interface)

Size: px
Start display at page:

Download "A quick introduction to MPI (Message Passing Interface)"

Transcription

1 A quick introduction to MPI (Message Passing Interface) M1IF - APPD Oguz Kaya Pierre Pradic École Normale Supérieure de Lyon, France 1 / 34 Oguz Kaya, Pierre Pradic M1IF - Presentation MPI

2 Introduction 2 / 34 Oguz Kaya, Pierre Pradic M1IF - Presentation MPI

3 What is MPI? Standardized and portable message-passing system. Started in the 90 s, still used today in research and industry. Good theoretical model. Good performances on HPC networks (InfiniBand...). 3 / 34 Oguz Kaya, Pierre Pradic M1IF - Presentation MPI

4 What is MPI? De facto standard for communications in HPC applications. 4 / 34 Oguz Kaya, Pierre Pradic M1IF - Presentation MPI

5 Programming model APIs: C and Fortran APIs. C++ API deprecated by MPI-3 (2008). Environment: Many implementations of the standard (mainly OpenMPI and MPICH) Compiler (wrappers around gcc) Runtime (mpirun) 5 / 34 Oguz Kaya, Pierre Pradic M1IF - Presentation MPI

6 Programming model Compiling: gcc mpicc g++ mpic++ / mpicxx gfortran mpifort Executing: mpirun -n <nb procs> <executable> <args> ex : mpirun -n 10./a.out note: mpiexec and orterun are synonyms of mpirun see man mpirun for more details 6 / 34 Oguz Kaya, Pierre Pradic M1IF - Presentation MPI

7 MPI context Context limits All MPI call must be nested in the MPI context delimited by MPI_Init and MPI_Finalize. 1 #i n c l u d e <mpi. h> 2 3 i n t main ( i n t argc, char a r g v [ ] ) 4 { 5 MPI_Init(& argc, &a r g v ) ; 6 7 // MPI_Finalize ( ) ; r e t u r n 0 ; 12 } 7 / 34 Oguz Kaya, Pierre Pradic M1IF - Presentation MPI

8 MPI context Hello World 1 #i n c l u d e <s t d i o. h> 2 #i n c l u d e <mpi. h> 3 4 i n t main ( i n t argc, char a r g v [ ] ) 5 { 6 i n t rank, s i z e ; 7 8 MPI_Init(& argc, &a r g v ) ; 9 10 MPI_Comm_rank(MPI_COMM_WORLD, &r a n k ) ; 11 MPI_Comm_size (MPI_COMM_WORLD, & s i z e ) ; p r i n t f ( " H e l l o from proc %d / %d\n", rank, s i z e ) ; MPI_Finalize ( ) ; r e t u r n 0 ; 8 / 34 Oguz Kaya, Pierre Pradic M1IF - Presentation MPI

9 Synchronization Code: p r i n t f ( "[%d ] s t e p 1\n", rank ) ; MPI_ Barrier (MPI_COMM_WORLD) ; p r i n t f ( "[%d ] s t e p 2\n", rank ) ; Output: [ 0 ] s t e p 1 [ 1 ] s t e p 1 [ 2 ] s t e p 1 [ 3 ] s t e p 1 [ 3 ] s t e p 2 [ 0 ] s t e p 2 [ 2 ] s t e p 2 [ 1 ] s t e p 2 9 / 34 Oguz Kaya, Pierre Pradic M1IF - Presentation MPI

10 Point-to-point communication 10 / 34 Oguz Kaya, Pierre Pradic M1IF - Presentation MPI

11 Send and Receive Sending data: i n t MPI_Send ( c o n s t v o i d data, i n t count, MPI_Datatype datatype, i n t d e s t i n a t i o n, i n t tag, MPI_Comm communicator ) ; Receiving data: i n t MPI_Recv ( v o i d data, i n t count, MPI_Datatype datatype, i n t s ource, i n t tag, MPI_Comm communicator, MPI_Status s t a t u s ) ; 11 / 34 Oguz Kaya, Pierre Pradic M1IF - Presentation MPI

12 Example 1 i n t rank, s i z e ; 2 MPI_Comm_rank(MPI_COMM_WORLD, &r a n k ) ; 3 MPI_Comm_size (MPI_COMM_WORLD, & s i z e ) ; 4 5 i n t number ; 6 s w i t c h ( rank ) 7 { 8 case 0 : 9 number = 1; 10 MPI_Send(&number, 1, MPI_INT, 1, 0, MPI_COMM_WORLD) ; 11 break ; 12 case 1 : 13 MPI_Recv(&number, 1, MPI_INT, 0, 0, MPI_COMM_WORLD, 14 MPI_STATUS_IGNORE ) ; 15 p r i n t f ( " r e c e i v e d number : %d\n", number ) ; 16 break ; 17 } 12 / 34 Oguz Kaya, Pierre Pradic M1IF - Presentation MPI

13 Asynchronious communications Sending data: i n t MPI_Isend ( c o n s t v o i d data, i n t count, MPI_Datatype datatype, i n t d e s t i n a t i o n, i n t tag, MPI_Comm communicator, MPI_Request r e q u e s t ) ; Receiving data: i n t MPI_Irecv ( v o i d data, i n t count, MPI_Datatype datatype, i n t s ource, i n t tag, MPI_Comm communicator, MPI_Request r e q u e s t ) ; 13 / 34 Oguz Kaya, Pierre Pradic M1IF - Presentation MPI

14 Other functions MPI_Probe, MPI_Iprobe MPI_Test, MPI_Testany, MPI_Testall MPI_Cancel MPI_Wtime, MPI_Wtick 14 / 34 Oguz Kaya, Pierre Pradic M1IF - Presentation MPI

15 Simple datatypes MPI_SHORT MPI_INT MPI_LONG MPI_LONG_LONG MPI_UNSIGNED_CHAR MPI_UNSIGNED_SHORT MPI_UNSIGNED MPI_UNSIGNED_LONG MPI_UNSIGNED_LONG_LONG MPI_FLOAT MPI_DOUBLE MPI_LONG_DOUBLE MPI_BYTE short int int long int long long int unsigned char unsigned short int unsigned int unsigned long int unsigned long long int float double long double char 15 / 34 Oguz Kaya, Pierre Pradic M1IF - Presentation MPI

16 Composed datatypes Composed structure: Structures; Array. Possibilities are almost limitless but sometimes difficult to setup. 16 / 34 Oguz Kaya, Pierre Pradic M1IF - Presentation MPI

17 Collective communications 17 / 34 Oguz Kaya, Pierre Pradic M1IF - Presentation MPI

18 One-to-all communication Broadcast 18 / 34 Oguz Kaya, Pierre Pradic M1IF - Presentation MPI

19 One-to-all communication Broadcast i n t MPI_Bcast ( v o i d data, i n t count, MPI_Datatype datatype, i n t root, MPI_Comm communicator ) ; 19 / 34 Oguz Kaya, Pierre Pradic M1IF - Presentation MPI

20 One-to-all communication Reduce 20 / 34 Oguz Kaya, Pierre Pradic M1IF - Presentation MPI

21 One-to-all communication Reduce i n t MPI_Reduce ( c o n s t v o i d sendbuf, v o i d r e c v b u f, i n t count, MPI_Datatype datatype, MPI_Op o p e r a t o r, i n t root, MPI_Comm communicator ) ; 21 / 34 Oguz Kaya, Pierre Pradic M1IF - Presentation MPI

22 One-to-all communication Scatter 22 / 34 Oguz Kaya, Pierre Pradic M1IF - Presentation MPI

23 One-to-all communication Scatter i n t MPI_Scatter ( c o n s t v o i d sendbuf, i n t sendcount, MPI_Datatype sendtype, v o i d r e c v b u f, i n t r e c v c o u n t, MPI_Datatype r e c v t y p e, i n t root, MPI_Comm communicator ) ; 23 / 34 Oguz Kaya, Pierre Pradic M1IF - Presentation MPI

24 One-to-all communication Gather 24 / 34 Oguz Kaya, Pierre Pradic M1IF - Presentation MPI

25 One-to-all communication Gather i n t MPI_Gather ( c o n s t v o i d sendbuf, i n t sendcount, MPI_Datatype sendtype, v o i d r e c v b u f, i n t r e c v c o u n t, MPI_Datatype r e c v t y p e, i n t root, MPI_Comm communicator ) ; 25 / 34 Oguz Kaya, Pierre Pradic M1IF - Presentation MPI

26 All-to-all communication Allreduce 26 / 34 Oguz Kaya, Pierre Pradic M1IF - Presentation MPI

27 All-to-all communication AllReduce i n t MPI_Allreduce ( c o n s t v o i d sendbuf, v o i d r e c v b u f, i n t count, MPI_Datatype datatype, MPI_Op o p e r a t o r, MPI_Comm communicator ) ; 27 / 34 Oguz Kaya, Pierre Pradic M1IF - Presentation MPI

28 All-to-all communication Allgather 28 / 34 Oguz Kaya, Pierre Pradic M1IF - Presentation MPI

29 All-to-all communication AllGather i n t MPI_Allgather ( c o n s t v o i d sendbuf, i n t sendcount, MPI_Datatype sendtype, v o i d r e c v b u f, i n t r e c v c o u n t, MPI_Datatype r e c v t y p e, MPI_Comm communicator ) ; 29 / 34 Oguz Kaya, Pierre Pradic M1IF - Presentation MPI

30 All-to-all communication Alltoall 30 / 34 Oguz Kaya, Pierre Pradic M1IF - Presentation MPI

31 All-to-all communication Alltoall i n t M P I _ A l l t o a l l ( c o n s t v o i d sendbuf, i n t sendcount, MPI_Datatype sendtype, v o i d r e c v b u f, i n t r e c v c o u n t, MPI_Datatype r e c v t y p e, MPI_Comm communicator ) ; 31 / 34 Oguz Kaya, Pierre Pradic M1IF - Presentation MPI

32 Custom communicators 32 / 34 Oguz Kaya, Pierre Pradic M1IF - Presentation MPI

33 MPI_COMM_WORLD can be split into smaller, more appropriate communicators. i n t MPI_Comm_split (MPI_Comm communicator, i n t c o l o r, i n t key, MPI_Comm newcommunicator ) ; 33 / 34 Oguz Kaya, Pierre Pradic M1IF - Presentation MPI

34 Example i n t rank, s i z e ; 3 MPI_Init(& argc, &a r g v ) ; 4 MPI_Comm_rank(MPI_COMM_WORLD, &r a n k ) ; 5 MPI_Comm_size (MPI_COMM_WORLD, & s i z e ) ; 6 7 i n t hrank, vrank ; 8 i n t h s i z e, v s i z e ; 9 MPI_Comm hcomm, vcomm ; 10 MPI_Comm_split (MPI_COMM_WORLD, r a n k%p, rank, &vcomm ) ; 11 MPI_Comm_split (MPI_COMM_WORLD, r a n k /p, rank, &hcomm ) ; 12 MPI_Comm_rank(hcomm, &hrank ) ; 13 MPI_Comm_size (hcomm, &h s i z e ) ; 14 MPI_Comm_rank(vcomm, &vrank ) ; 15 MPI_Comm_size (vcomm, &v s i z e ) ; / 34 Oguz Kaya, Pierre Pradic M1IF - Presentation MPI

Lecture 24 - MPI ECE 459: Programming for Performance

Lecture 24 - MPI ECE 459: Programming for Performance ECE 459: Programming for Performance Jon Eyolfson March 9, 2012 What is MPI? Messaging Passing Interface A language-independent communation protocol for parallel computers Run the same code on a number

More information

The Message Passing Interface (MPI) TMA4280 Introduction to Supercomputing

The Message Passing Interface (MPI) TMA4280 Introduction to Supercomputing The Message Passing Interface (MPI) TMA4280 Introduction to Supercomputing NTNU, IMF February 9. 2018 1 Recap: Parallelism with MPI An MPI execution is started on a set of processes P with: mpirun -n N

More information

1 / 28. Parallel Programming.

1 / 28. Parallel Programming. 1 / 28 Parallel Programming pauldj@aices.rwth-aachen.de Collective Communication 2 / 28 Barrier Broadcast Reduce Scatter Gather Allgather Reduce-scatter Allreduce Alltoall. References Collective Communication:

More information

Antonio Falabella. 3 rd nternational Summer School on INtelligent Signal Processing for FrontIEr Research and Industry, September 2015, Hamburg

Antonio Falabella. 3 rd nternational Summer School on INtelligent Signal Processing for FrontIEr Research and Industry, September 2015, Hamburg INFN - CNAF (Bologna) 3 rd nternational Summer School on INtelligent Signal Processing for FrontIEr Research and Industry, 14-25 September 2015, Hamburg 1 / 44 Overview 1 2 3 4 5 2 / 44 to Computing The

More information

The Sieve of Erastothenes

The Sieve of Erastothenes The Sieve of Erastothenes Parallel and Distributed Computing Department of Computer Science and Engineering (DEI) Instituto Superior Técnico October 25, 2010 José Monteiro (DEI / IST) Parallel and Distributed

More information

( ) ? ( ) ? ? ? l RB-H

( ) ? ( ) ? ? ? l RB-H 1 2018 4 17 2 30 4 17 17 8 1 1. 4 10 ( ) 2. 4 17 l 3. 4 24 l 4. 5 1 l ( ) 5. 5 8 l 2 6. 5 15 l - 2018 8 6 24 7. 5 22 l 8. 6 5 l - 9. 6 12 l 10. 6 19? l l 11. 7 3? l 12. 7 10? l 13. 7 17? l RB-H 3 4 5 6

More information

2018/10/

2018/10/ 1 2018 10 2 2 30 10 2 17 8 1 1. 9 25 2. 10 2 l 3. 10 9 l 4. 10 16 l 5. 10 23 l 2 6. 10 30 l - 2019 2 4 24 7. 11 6 l 8. 11 20 l - 9. 11 27 l 10. 12 4 l l 11. 12 11 l 12. 12 18?? l 13. 1 8 l RB-H 3 4 5 6

More information

2.10 Packing. 59 MPI 2.10 Packing MPI_PACK( INBUF, INCOUNT, DATATYPE, OUTBUF, OUTSIZE, POSITION, COMM, IERR )

2.10 Packing. 59 MPI 2.10 Packing MPI_PACK( INBUF, INCOUNT, DATATYPE, OUTBUF, OUTSIZE, POSITION, COMM, IERR ) 59 MPI 2.10 Packing 2.10 Packing MPI_PACK( INBUF, INCOUNT, DATATYPE, OUTBUF, OUTSIZE, POSITION, COMM, IERR ) : : INBUF ( ), OUTBUF( ) INTEGER : : INCOUNT, DATATYPE, OUTSIZE, POSITION, COMM, IERR

More information

CEE 618 Scientific Parallel Computing (Lecture 4): Message-Passing Interface (MPI)

CEE 618 Scientific Parallel Computing (Lecture 4): Message-Passing Interface (MPI) 1 / 48 CEE 618 Scientific Parallel Computing (Lecture 4): Message-Passing Interface (MPI) Albert S. Kim Department of Civil and Environmental Engineering University of Hawai i at Manoa 2540 Dole Street,

More information

Distributed Memory Programming With MPI

Distributed Memory Programming With MPI John Interdisciplinary Center for Applied Mathematics & Information Technology Department Virginia Tech... Applied Computational Science II Department of Scientific Computing Florida State University http://people.sc.fsu.edu/

More information

Introductory MPI June 2008

Introductory MPI June 2008 7: http://people.sc.fsu.edu/ jburkardt/presentations/ fdi 2008 lecture7.pdf... John Information Technology Department Virginia Tech... FDI Summer Track V: Parallel Programming 10-12 June 2008 MPI: Why,

More information

Proving an Asynchronous Message Passing Program Correct

Proving an Asynchronous Message Passing Program Correct Available at the COMP3151 website. Proving an Asynchronous Message Passing Program Correct Kai Engelhardt Revision: 1.4 This note demonstrates how to construct a correctness proof for a program that uses

More information

Modelling and implementation of algorithms in applied mathematics using MPI

Modelling and implementation of algorithms in applied mathematics using MPI Modelling and implementation of algorithms in applied mathematics using MPI Lecture 3: Linear Systems: Simple Iterative Methods and their parallelization, Programming MPI G. Rapin Brazil March 2011 Outline

More information

Parallel Program Performance Analysis

Parallel Program Performance Analysis Parallel Program Performance Analysis Chris Kauffman CS 499: Spring 2016 GMU Logistics Today Final details of HW2 interviews HW2 timings HW2 Questions Parallel Performance Theory Special Office Hours Mon

More information

Finite difference methods. Finite difference methods p. 1

Finite difference methods. Finite difference methods p. 1 Finite difference methods Finite difference methods p. 1 Overview 1D heat equation u t = κu xx +f(x,t) as a motivating example Quick intro of the finite difference method Recapitulation of parallelization

More information

Parallel Programming. Parallel algorithms Linear systems solvers

Parallel Programming. Parallel algorithms Linear systems solvers Parallel Programming Parallel algorithms Linear systems solvers Terminology System of linear equations Solve Ax = b for x Special matrices Upper triangular Lower triangular Diagonally dominant Symmetric

More information

Panorama des modèles et outils de programmation parallèle

Panorama des modèles et outils de programmation parallèle Panorama des modèles et outils de programmation parallèle Sylvain HENRY sylvain.henry@inria.fr University of Bordeaux - LaBRI - Inria - ENSEIRB April 19th, 2013 1/45 Outline Introduction Accelerators &

More information

Bayesian Computing for Astronomical Data Analysis

Bayesian Computing for Astronomical Data Analysis Eberly College of Science Center for Astrostatistics Bayesian Computing for Astronomical Data Analysis June 11-13, 2014 Bayesian Computing for Astronomical Data Analysis (June 11-13, 2014) Contents Title

More information

Lecture 4. Writing parallel programs with MPI Measuring performance

Lecture 4. Writing parallel programs with MPI Measuring performance Lecture 4 Writing parallel programs with MPI Measuring performance Announcements Wednesday s office hour moved to 1.30 A new version of Ring (Ring_new) that handles linear sequences of message lengths

More information

Factoring large integers using parallel Quadratic Sieve

Factoring large integers using parallel Quadratic Sieve Factoring large integers using parallel Quadratic Sieve Olof Åsbrink d95-oas@nada.kth.se Joel Brynielsson joel@nada.kth.se 14th April 2 Abstract Integer factorization is a well studied topic. Parts of

More information

High-performance processing and development with Madagascar. July 24, 2010 Madagascar development team

High-performance processing and development with Madagascar. July 24, 2010 Madagascar development team High-performance processing and development with Madagascar July 24, 2010 Madagascar development team Outline 1 HPC terminology and frameworks 2 Utilizing data parallelism 3 HPC development with Madagascar

More information

The equation does not have a closed form solution, but we see that it changes sign in [-1,0]

The equation does not have a closed form solution, but we see that it changes sign in [-1,0] Numerical methods Introduction Let f(x)=e x -x 2 +x. When f(x)=0? The equation does not have a closed form solution, but we see that it changes sign in [-1,0] Because f(-0.5)=-0.1435, the root is in [-0.5,0]

More information

Parallel Programming

Parallel Programming Parallel Programming Prof. Paolo Bientinesi pauldj@aices.rwth-aachen.de WS 16/17 Communicators MPI_Comm_split( MPI_Comm comm, int color, int key, MPI_Comm* newcomm)

More information

Parallelization of the Dirac operator. Pushan Majumdar. Indian Association for the Cultivation of Sciences, Jadavpur, Kolkata

Parallelization of the Dirac operator. Pushan Majumdar. Indian Association for the Cultivation of Sciences, Jadavpur, Kolkata Parallelization of the Dirac operator Pushan Majumdar Indian Association for the Cultivation of Sciences, Jadavpur, Kolkata Outline Introduction Algorithms Parallelization Comparison of performances Conclusions

More information

Parallel Computing with R

Parallel Computing with R Parallel 1 / 107 Parallel Computing with R Paul E. Johnson 1 2 1 Department of Political Science 2 Center for Research Methods and Data Analysis, University of Kansas 2017 Parallel 2 / 107 Outline 1 Bookmark:

More information

Parallel programming using MPI. Analysis and optimization. Bhupender Thakur, Jim Lupo, Le Yan, Alex Pacheco

Parallel programming using MPI. Analysis and optimization. Bhupender Thakur, Jim Lupo, Le Yan, Alex Pacheco Parallel programming using MPI Analysis and optimization Bhupender Thakur, Jim Lupo, Le Yan, Alex Pacheco Outline l Parallel programming: Basic definitions l Choosing right algorithms: Optimal serial and

More information

Introduction to MPI. School of Computational Science, 21 October 2005

Introduction to MPI. School of Computational Science, 21 October 2005 Introduction to MPI John Burkardt School of Computational Science Florida State University... http://people.sc.fsu.edu/ jburkardt/presentations/ mpi 2005 fsu.pdf School of Computational Science, 21 October

More information

Hierarchical Clock Synchronization in MPI

Hierarchical Clock Synchronization in MPI Hierarchical Clock Synchronization in MPI Sascha Hunold TU Wien, Faculty of Informatics Vienna, Austria Email: hunold@par.tuwien.ac.at Alexandra Carpen-Amarie Fraunhofer ITWM Kaiserslautern, Germany Email:

More information

Trivially parallel computing

Trivially parallel computing Parallel Computing After briefly discussing the often neglected, but in praxis frequently encountered, issue of trivially parallel computing, we turn to parallel computing with information exchange. Our

More information

Using Happens-Before Relationship to debug MPI non-determinism. Anh Vo and Alan Humphrey

Using Happens-Before Relationship to debug MPI non-determinism. Anh Vo and Alan Humphrey Using Happens-Before Relationship to debug MPI non-determinism Anh Vo and Alan Humphrey {avo,ahumphre}@cs.utah.edu Distributed event ordering is crucial Bob receives two undated letters from his dad One

More information

Algorithms for Collective Communication. Design and Analysis of Parallel Algorithms

Algorithms for Collective Communication. Design and Analysis of Parallel Algorithms Algorithms for Collective Communication Design and Analysis of Parallel Algorithms Source A. Grama, A. Gupta, G. Karypis, and V. Kumar. Introduction to Parallel Computing, Chapter 4, 2003. Outline One-to-all

More information

Parallel Sparse Matrix - Vector Product

Parallel Sparse Matrix - Vector Product IMM Technical Report 2012-10 Parallel Sparse Matrix - Vector Product - Pure MPI and hybrid MPI-OpenMP implementation - Authors: Joe Alexandersen Boyan Lazarov Bernd Dammann Abstract: This technical report

More information

Scalable Tools for Debugging Non-Deterministic MPI Applications

Scalable Tools for Debugging Non-Deterministic MPI Applications Scalable Tools for Debugging Non-Deterministic MPI Applications ReMPI: MPI Record-and-Replay tool Scalable Tools Workshop August 2nd, 2016 Kento Sato, Dong H. Ahn, Ignacio Laguna, Gregory L. Lee, Mar>n

More information

COSC 4397 Parallel Computation

COSC 4397 Parallel Computation COSC 4397 Solvng the Laplace Equaton wth MPI Sprng Numercal dfferentaton forward dfference formula y From the defnton of dervatves f( x+ f( f ( = lm h h one can derve an approxmaton for the st dervatve

More information

B629 project - StreamIt MPI Backend. Nilesh Mahajan

B629 project - StreamIt MPI Backend. Nilesh Mahajan B629 project - StreamIt MPI Backend Nilesh Mahajan March 26, 2013 Abstract StreamIt is a language based on the dataflow model of computation. StreamIt consists of computation units called filters connected

More information

arxiv: v2 [cs.se] 27 Nov 2016

arxiv: v2 [cs.se] 27 Nov 2016 Static Analysis of Communicating Processes using Symbolic Transducers (Extended version) arxiv:1611.07812v2 [cs.se] 27 Nov 2016 Vincent Botbol 1,2 vincent.botbol@cea.fr, Emmanuel Chailloux 1 emmanuel.chailloux@lip6.fr,

More information

Parallelism in Magnetic Force Microscopy Studies Algorithm and Optimization

Parallelism in Magnetic Force Microscopy Studies Algorithm and Optimization Parallelism in Magnetic Force Microscopy Studies Algorithm and Optimization Peretyatko 1 A.A., Nefedev 1,2 K.V. 1 Far Eastern Federal University, School of Natural Sciences, Department of Theoretical and

More information

Type-Based Verification of Message-Passing Parallel Programs

Type-Based Verification of Message-Passing Parallel Programs Type-Based Verification of Message-Passing Parallel Programs Vasco Thudichum Vasconcelos, Francisco Martins, Eduardo R. B. Marques, Hugo A. López, César Santos, and Nobuko Yoshida DI FCUL TR 2014 4 DOI:10455/6902

More information

Practical Model Checking Method for Verifying Correctness of MPI Programs

Practical Model Checking Method for Verifying Correctness of MPI Programs Practical Model Checking Method for Verifying Correctness of MPI Programs Salman Pervez 1, Ganesh Gopalakrishnan 1, Robert M. Kirby 1, Robert Palmer 1, Rajeev Thakur 2, and William Gropp 2 1 School of

More information

Parallel Programming & Cluster Computing Transport Codes and Shifting Henry Neeman, University of Oklahoma Paul Gray, University of Northern Iowa

Parallel Programming & Cluster Computing Transport Codes and Shifting Henry Neeman, University of Oklahoma Paul Gray, University of Northern Iowa Parallel Programming & Cluster Computing Transport Codes and Shifting Henry Neeman, University of Oklahoma Paul Gray, University of Northern Iowa SC08 Education Program s Workshop on Parallel & Cluster

More information

A.1 Newton s, Hamilton s, and Euler-Lagrange s Equations

A.1 Newton s, Hamilton s, and Euler-Lagrange s Equations A Appendix A.1 Newton s, Hamilton s, and Euler-Lagrange s Equations We consider the equations of classical mechanics for a system in R d which consists of N particles with the masses {m 1,...,m N }.Wedenotetheposition

More information

PLANETESIM: Fourier transformations in the Pencil Code

PLANETESIM: Fourier transformations in the Pencil Code Available on-line at www.prace-ri.eu Partnership for Advanced Computing in Europe PLANETESIM: Fourier transformations in the Pencil Code Joachim Hein a,, Anders Johanson b a Centre of Mathematical Sciences

More information

Current Developments and Applications related to the Discrete Adjoint Solver in SU2

Current Developments and Applications related to the Discrete Adjoint Solver in SU2 Current Developments and Applications related to the Discrete Adjoint Solver in SU2 Tim Albring, Max Sagebaum, Ole Burghardt, Lisa Kusch, Nicolas R. Gauger Chair for Scientific Computing TU Kaiserslautern

More information

Benchmarking Super Computers

Benchmarking Super Computers Benchmarking Super Computers Benchmarks of Clustis3 and Numascale Elisabeth Solheim Master of Science in Informatics Submission date: Januar 2015 Supervisor: Anne Cathrine Elster, IDI Norwegian University

More information

Inverse problems. High-order optimization and parallel computing. Lecture 7

Inverse problems. High-order optimization and parallel computing. Lecture 7 Inverse problems High-order optimization and parallel computing Nikolai Piskunov 2014 Lecture 7 Non-linear least square fit The (conjugate) gradient search has one important problem which often occurs

More information

UNMIXING 4-D PTYCHOGRAPHIC IMAGES

UNMIXING 4-D PTYCHOGRAPHIC IMAGES UNMIXING 4-D PTYCHOGRAPHIC IMAGES Mentors: Dr. Rick Archibald(ORNL), Dr. Azzam Haidar(UTK), Dr. Stanimire Tomov(UTK), and Dr. Kwai Wong(UTK) PROJECT BY: MICHAELA SHOFFNER(UTK) ZHEN ZHANG(CUHK) HUANLIN

More information

Model Order Reduction via Matlab Parallel Computing Toolbox. Istanbul Technical University

Model Order Reduction via Matlab Parallel Computing Toolbox. Istanbul Technical University Model Order Reduction via Matlab Parallel Computing Toolbox E. Fatih Yetkin & Hasan Dağ Istanbul Technical University Computational Science & Engineering Department September 21, 2009 E. Fatih Yetkin (Istanbul

More information

CS429: Computer Organization and Architecture

CS429: Computer Organization and Architecture CS429: Computer Organization and Architecture Warren Hunt, Jr. and Bill Young Department of Computer Sciences University of Texas at Austin Last updated: September 3, 2014 at 08:38 CS429 Slideset C: 1

More information

An Integrative Model for Parallelism

An Integrative Model for Parallelism An Integrative Model for Parallelism Victor Eijkhout ICERM workshop 2012/01/09 Introduction Formal part Examples Extension to other memory models Conclusion tw-12-exascale 2012/01/09 2 Introduction tw-12-exascale

More information

The Performance Evolution of the Parallel Ocean Program on the Cray X1

The Performance Evolution of the Parallel Ocean Program on the Cray X1 The Performance Evolution of the Parallel Ocean Program on the Cray X1 Patrick H. Worley Oak Ridge National Laboratory John Levesque Cray Inc. 46th Cray User Group Conference May 18, 2003 Knoxville Marriott

More information

Data byte 0 Data byte 1 Data byte 2 Data byte 3 Data byte 4. 0xA Register Address MSB data byte Data byte Data byte LSB data byte

Data byte 0 Data byte 1 Data byte 2 Data byte 3 Data byte 4. 0xA Register Address MSB data byte Data byte Data byte LSB data byte SFP200 CAN 2.0B Protocol Implementation Communications Features CAN 2.0b extended frame format 500 kbit/s Polling mechanism allows host to determine the rate of incoming data Registers The SFP200 provides

More information

Weather Research and Forecasting (WRF) Performance Benchmark and Profiling. July 2012

Weather Research and Forecasting (WRF) Performance Benchmark and Profiling. July 2012 Weather Research and Forecasting (WRF) Performance Benchmark and Profiling July 2012 Note The following research was performed under the HPC Advisory Council activities Participating vendors: Intel, Dell,

More information

Projectile Motion Slide 1/16. Projectile Motion. Fall Semester. Parallel Computing

Projectile Motion Slide 1/16. Projectile Motion. Fall Semester. Parallel Computing Projectile Motion Slide 1/16 Projectile Motion Fall Semester Projectile Motion Slide 2/16 Topic Outline Historical Perspective ABC and ENIAC Ballistics tables Projectile Motion Air resistance Euler s method

More information

Scheduling divisible loads with return messages on heterogeneous master-worker platforms

Scheduling divisible loads with return messages on heterogeneous master-worker platforms Scheduling divisible loads with return messages on heterogeneous master-worker platforms Olivier Beaumont, Loris Marchal, Yves Robert Laboratoire de l Informatique du Parallélisme École Normale Supérieure

More information

Computer Science Introductory Course MSc - Introduction to Java

Computer Science Introductory Course MSc - Introduction to Java Computer Science Introductory Course MSc - Introduction to Java Lecture 1: Diving into java Pablo Oliveira ENST Outline 1 Introduction 2 Primitive types 3 Operators 4 5 Control Flow

More information

Domain specific libraries. Material science codes on innovative HPC architectures Anton Kozhevnikov, CSCS December 5, 2016

Domain specific libraries. Material science codes on innovative HPC architectures Anton Kozhevnikov, CSCS December 5, 2016 Domain specific libraries Material science codes on innovative HPC architectures Anton Kozhevnikov, CSCS December 5, 2016 Part 1: Introduction Kohn-Shame equations 1 2 Eigen-value problem + v eff (r) j(r)

More information

Performance of WRF using UPC

Performance of WRF using UPC Performance of WRF using UPC Hee-Sik Kim and Jong-Gwan Do * Cray Korea ABSTRACT: The Weather Research and Forecasting (WRF) model is a next-generation mesoscale numerical weather prediction system. We

More information

Distributed Memory Parallelization in NGSolve

Distributed Memory Parallelization in NGSolve Distributed Memory Parallelization in NGSolve Lukas Kogler June, 2017 Inst. for Analysis and Scientific Computing, TU Wien From Shared to Distributed Memory Shared Memory Parallelization via threads (

More information

Lightweight Superscalar Task Execution in Distributed Memory

Lightweight Superscalar Task Execution in Distributed Memory Lightweight Superscalar Task Execution in Distributed Memory Asim YarKhan 1 and Jack Dongarra 1,2,3 1 Innovative Computing Lab, University of Tennessee, Knoxville, TN 2 Oak Ridge National Lab, Oak Ridge,

More information

Efficient algorithms for symmetric tensor contractions

Efficient algorithms for symmetric tensor contractions Efficient algorithms for symmetric tensor contractions Edgar Solomonik 1 Department of EECS, UC Berkeley Oct 22, 2013 1 / 42 Edgar Solomonik Symmetric tensor contractions 1/ 42 Motivation The goal is to

More information

Scalable Non-blocking Preconditioned Conjugate Gradient Methods

Scalable Non-blocking Preconditioned Conjugate Gradient Methods Scalable Non-blocking Preconditioned Conjugate Gradient Methods Paul Eller and William Gropp University of Illinois at Urbana-Champaign Department of Computer Science Supercomputing 16 Paul Eller and William

More information

MPI at MPI. Jens Saak. Max Planck Institute for Dynamics of Complex Technical Systems Computational Methods in Systems and Control Theory

MPI at MPI. Jens Saak. Max Planck Institute for Dynamics of Complex Technical Systems Computational Methods in Systems and Control Theory MAX PLANCK INSTITUTE November 5, 2010 MPI at MPI Jens Saak Max Planck Institute for Dynamics of Complex Technical Systems Computational Methods in Systems and Control Theory FOR DYNAMICS OF COMPLEX TECHNICAL

More information

Numerically Solving Partial Differential Equations

Numerically Solving Partial Differential Equations Numerically Solving Partial Differential Equations Michael Lavell Department of Applied Mathematics and Statistics Abstract The physics describing the fundamental principles of fluid dynamics can be written

More information

Linking the Continuous and the Discrete

Linking the Continuous and the Discrete Linking the Continuous and the Discrete Coupling Molecular Dynamics to Continuum Fluid Mechanics Edward Smith Working with: Prof D. M. Heyes, Dr D. Dini, Dr T. A. Zaki & Mr David Trevelyan Mechanical Engineering

More information

INTRODUCTION. This is not a full c-programming course. It is not even a full 'Java to c' programming course.

INTRODUCTION. This is not a full c-programming course. It is not even a full 'Java to c' programming course. C PROGRAMMING 1 INTRODUCTION This is not a full c-programming course. It is not even a full 'Java to c' programming course. 2 LITTERATURE 3. 1 FOR C-PROGRAMMING The C Programming Language (Kernighan and

More information

A Numerical QCD Hello World

A Numerical QCD Hello World A Numerical QCD Hello World Bálint Thomas Jefferson National Accelerator Facility Newport News, VA, USA INT Summer School On Lattice QCD, 2007 What is involved in a Lattice Calculation What is a lattice

More information

Overview: Synchronous Computations

Overview: Synchronous Computations Overview: Synchronous Computations barriers: linear, tree-based and butterfly degrees of synchronization synchronous example 1: Jacobi Iterations serial and parallel code, performance analysis synchronous

More information

Barrier. Overview: Synchronous Computations. Barriers. Counter-based or Linear Barriers

Barrier. Overview: Synchronous Computations. Barriers. Counter-based or Linear Barriers Overview: Synchronous Computations Barrier barriers: linear, tree-based and butterfly degrees of synchronization synchronous example : Jacobi Iterations serial and parallel code, performance analysis synchronous

More information

Scalable Domain Decomposition Preconditioners For Heterogeneous Elliptic Problems

Scalable Domain Decomposition Preconditioners For Heterogeneous Elliptic Problems Scalable Domain Decomposition Preconditioners For Heterogeneous Elliptic Problems Pierre Jolivet, F. Hecht, F. Nataf, C. Prud homme Laboratoire Jacques-Louis Lions Laboratoire Jean Kuntzmann INRIA Rocquencourt

More information

Optimization of a parallel 3d-FFT with non-blocking collective operations

Optimization of a parallel 3d-FFT with non-blocking collective operations Optimization of a parallel 3d-FFT with non-blocking collective operations Chair of Computer Architecture Technical University of Chemnitz Département de Physique Théorique et Appliquée Commissariat à l

More information

Particle in cell simulations

Particle in cell simulations Particle in cell simulations Part III: Boundary conditions and parallelization Benoît Cerutti IPAG, CNRS, Université Grenoble Alpes, Grenoble, France. 1 Astrosim, Lyon, June 26 July 7, 2017. Plan of the

More information

PARALLEL SOLVER OF LARGE SYSTEMS OF LINEAR INEQUALITIES USING FOURIER-MOTZKIN ELIMINATION

PARALLEL SOLVER OF LARGE SYSTEMS OF LINEAR INEQUALITIES USING FOURIER-MOTZKIN ELIMINATION Computing and Informatics, Vol. 35, 2016, 1307 1337 PARALLEL SOLVER OF LARGE SYSTEMS OF LINEAR INEQUALITIES USING FOURIER-MOTZKIN ELIMINATION Ivan Šimeček, Richard Fritsch, Daniel Langr, Róbert Lórencz

More information

SC09 HPC Challenge Submission for XcalableMP

SC09 HPC Challenge Submission for XcalableMP SC09 HPC Challenge Submission for XcalableMP Jinpil Lee Graduate School of Systems and Information Engineering University of Tsukuba jinpil@hpcs.cs.tsukuba.ac.jp Mitsuhisa Sato Center for Computational

More information

Quantum ESPRESSO Performance Benchmark and Profiling. February 2017

Quantum ESPRESSO Performance Benchmark and Profiling. February 2017 Quantum ESPRESSO Performance Benchmark and Profiling February 2017 2 Note The following research was performed under the HPC Advisory Council activities Compute resource - HPC Advisory Council Cluster

More information

Nuclear Engineering Analysis

Nuclear Engineering Analysis Chapter! Nuclear Engineering Analysis Nuclear engineering analysis is a very broad subject title. Nuclear engineering and medical physics are disciplines that are ultimately very applied; while the "pure

More information

Parallel Algorithms. A. Legrand. Arnaud Legrand, CNRS, University of Grenoble. LIG laboratory, October 2, / 235

Parallel Algorithms. A. Legrand. Arnaud Legrand, CNRS, University of Grenoble. LIG laboratory, October 2, / 235 Parallel Arnaud Legrand, CNRS, University of Grenoble LIG laboratory, arnaud.legrand@imag.fr October 2, 2011 1 / 235 Outline Parallel Part I Models Part II Communications on a Ring Part III Speedup and

More information

COSC 6374 Parallel Computation

COSC 6374 Parallel Computation COSC 67 Parallel Comptaton Partal Derental Eqatons Edgar Gabrel Fall 0 Nmercal derentaton orward derence ormla From te denton o dervatves one can derve an appromaton or te st dervatve Te same ormla can

More information

A New Dominant Point-Based Parallel Algorithm for Multiple Longest Common Subsequence Problem

A New Dominant Point-Based Parallel Algorithm for Multiple Longest Common Subsequence Problem A New Dominant Point-Based Parallel Algorithm for Multiple Longest Common Subsequence Problem Dmitry Korkin This work introduces a new parallel algorithm for computing a multiple longest common subsequence

More information

Introduction to Programming (Java) 3/12

Introduction to Programming (Java) 3/12 Introduction to Programming (Java) 3/12 Michal Krátký Department of Computer Science Technical University of Ostrava Introduction to Programming (Java) 2008/2009 c 2006 2008 Michal Krátký Introduction

More information

Image Reconstruction And Poisson s equation

Image Reconstruction And Poisson s equation Chapter 1, p. 1/58 Image Reconstruction And Poisson s equation School of Engineering Sciences Parallel s for Large-Scale Problems I Chapter 1, p. 2/58 Outline 1 2 3 4 Chapter 1, p. 3/58 Question What have

More information

Digital Systems EEE4084F. [30 marks]

Digital Systems EEE4084F. [30 marks] Digital Systems EEE4084F [30 marks] Practical 3: Simulation of Planet Vogela with its Moon and Vogel Spiral Star Formation using OpenGL, OpenMP, and MPI Introduction The objective of this assignment is

More information

Go Tutorial. Ian Lance Taylor. Introduction. Why? Language. Go Tutorial. Ian Lance Taylor. GCC Summit, October 27, 2010

Go Tutorial. Ian Lance Taylor. Introduction. Why? Language. Go Tutorial. Ian Lance Taylor. GCC Summit, October 27, 2010 GCC Summit, October 27, 2010 Go Go is a new experimental general purpose programming language. Main developers are: Russ Cox Robert Griesemer Rob Pike Ken Thompson It was released as free software in November

More information

Searching, mainly via Hash tables

Searching, mainly via Hash tables Data structures and algorithms Part 11 Searching, mainly via Hash tables Petr Felkel 26.1.2007 Topics Searching Hashing Hash function Resolving collisions Hashing with chaining Open addressing Linear Probing

More information

COMS 6100 Class Notes

COMS 6100 Class Notes COMS 6100 Class Notes Daniel Solus September 20, 2016 1 General Remarks The Lecture notes submitted by the class have been very good. Integer division seemed to be a common oversight when working the Fortran

More information

Anatomy of SINGULAR talk at p. 1

Anatomy of SINGULAR talk at p. 1 Anatomy of SINGULAR talk at MaGiX@LIX 2011- p. 1 Anatomy of SINGULAR talk at MaGiX@LIX 2011 - Hans Schönemann hannes@mathematik.uni-kl.de Department of Mathematics University of Kaiserslautern Anatomy

More information

Computational Numerical Integration for Spherical Quadratures. Verified by the Boltzmann Equation

Computational Numerical Integration for Spherical Quadratures. Verified by the Boltzmann Equation Computational Numerical Integration for Spherical Quadratures Verified by the Boltzmann Equation Huston Rogers. 1 Glenn Brook, Mentor 2 Greg Peterson, Mentor 2 1 The University of Alabama 2 Joint Institute

More information

Branch Prediction based attacks using Hardware performance Counters IIT Kharagpur

Branch Prediction based attacks using Hardware performance Counters IIT Kharagpur Branch Prediction based attacks using Hardware performance Counters IIT Kharagpur March 19, 2018 Modular Exponentiation Public key Cryptography March 19, 2018 Branch Prediction Attacks 2 / 54 Modular Exponentiation

More information

Scalability of Partial Differential Equations Preconditioner Resilient to Soft and Hard Faults

Scalability of Partial Differential Equations Preconditioner Resilient to Soft and Hard Faults Scalability of Partial Differential Equations Preconditioner Resilient to Soft and Hard Faults K.Morris, F.Rizzi, K.Sargsyan, K.Dahlgren, P.Mycek, C.Safta, O.LeMaitre, O.Knio, B.Debusschere Sandia National

More information

Efficiency of Dynamic Load Balancing Based on Permanent Cells for Parallel Molecular Dynamics Simulation

Efficiency of Dynamic Load Balancing Based on Permanent Cells for Parallel Molecular Dynamics Simulation Efficiency of Dynamic Load Balancing Based on Permanent Cells for Parallel Molecular Dynamics Simulation Ryoko Hayashi and Susumu Horiguchi School of Information Science, Japan Advanced Institute of Science

More information

Simulation und wissenschaftliches. Rechnen (SiwiR)

Simulation und wissenschaftliches. Rechnen (SiwiR) Simulation und wissenschaftliches Rechnen (SiwiR) Christoph Pflaum p. 1/249 Content of the Lecture Why performance optimization? Short introduction to computer architecture and performance problems Performance

More information

Topology aware Cartesian grid mapping with MPI

Topology aware Cartesian grid mapping with MPI Topology aware Cartesian gri mapping with MPI Christoph Niethammer niethammer@hlrs.e Rolf Rabenseifner rabenseifner@hlrs.e High Performance Computing Center (HLRS), University of Stuttgart, Germany HLRS,

More information

arxiv:astro-ph/ v1 12 Apr 1995

arxiv:astro-ph/ v1 12 Apr 1995 arxiv:astro-ph/9504040v1 12 Apr 1995 Parallel Linear General Relativity and CMB Anisotropies A Technical Paper submitted to Supercomputing 95 in HTML format Paul W. Bode bode@alcor.mit.edu Edmund Bertschinger

More information

APPENDIX A FASTFLOW A.1 DESIGN PRINCIPLES

APPENDIX A FASTFLOW A.1 DESIGN PRINCIPLES APPENDIX A FASTFLOW A.1 DESIGN PRINCIPLES FastFlow 1 has been designed to provide programmers with efficient parallelism exploitation patterns suitable to implement (fine grain) stream parallel applications.

More information

Comp 11 Lectures. Mike Shah. June 20, Tufts University. Mike Shah (Tufts University) Comp 11 Lectures June 20, / 38

Comp 11 Lectures. Mike Shah. June 20, Tufts University. Mike Shah (Tufts University) Comp 11 Lectures June 20, / 38 Comp 11 Lectures Mike Shah Tufts University June 20, 2017 Mike Shah (Tufts University) Comp 11 Lectures June 20, 2017 1 / 38 Please do not distribute or host these slides without prior permission. Mike

More information

ACCELERATING WEATHER PREDICTION WITH NVIDIA GPUS

ACCELERATING WEATHER PREDICTION WITH NVIDIA GPUS ACCELERATING WEATHER PREDICTION WITH NVIDIA GPUS Alan Gray, Developer Technology Engineer, NVIDIA ECMWF 18th Workshop on high performance computing in meteorology, 28 th September 2018 ESCAPE NVIDIA s

More information

Convergence Models and Surprising Results for the Asynchronous Jacobi Method

Convergence Models and Surprising Results for the Asynchronous Jacobi Method Convergence Models and Surprising Results for the Asynchronous Jacobi Method Jordi Wolfson-Pou School of Computational Science and Engineering Georgia Institute of Technology Atlanta, Georgia, United States

More information

Entropy of Parallel Execution and Communication

Entropy of Parallel Execution and Communication International Journal of Networking and Computing www.ijnc.org ISSN 2185-2839 (print) ISSN 2185-2847 (online) Volume 7, Number 1, pages 2 28, January 2017 Ernesto Gomez School of Engineering and Computer

More information

Implementation of a preconditioned eigensolver using Hypre

Implementation of a preconditioned eigensolver using Hypre Implementation of a preconditioned eigensolver using Hypre Andrew V. Knyazev 1, and Merico E. Argentati 1 1 Department of Mathematics, University of Colorado at Denver, USA SUMMARY This paper describes

More information

3D Parallel FEM (I) Kengo Nakajima. Programming for Parallel Computing ( ) Seminar on Advanced Computing ( )

3D Parallel FEM (I) Kengo Nakajima. Programming for Parallel Computing ( ) Seminar on Advanced Computing ( ) 3D Parallel FEM (I) Kengo Nakajima Programming for Parallel Computing (616-2057) Seminar on Advanced Computing (616-4009) pfem3d-1 2 Target Application Parallel version of heat3d Using MPI pfem3d-1 3 Installation

More information

Three-Dimensional Explicit Parallel Finite Element Analysis of Functionally Graded Solids under Impact Loading. Ganesh Anandakumar and Jeong-Ho Kim

Three-Dimensional Explicit Parallel Finite Element Analysis of Functionally Graded Solids under Impact Loading. Ganesh Anandakumar and Jeong-Ho Kim Three-Dimensional Eplicit Parallel Finite Element Analsis of Functionall Graded Solids under Impact Loading Ganesh Anandaumar and Jeong-Ho Kim Department of Civil and Environmental Engineering, Universit

More information