CEE 618 Scientific Parallel Computing (Lecture 4): Message-Passing Interface (MPI)

Size: px

Start display at page:

Download "CEE 618 Scientific Parallel Computing (Lecture 4): Message-Passing Interface (MPI)"

Lora Martin
5 years ago
Views:

1 1 / 48 CEE 618 Scientific Parallel Computing (Lecture 4): Message-Passing Interface (MPI) Albert S. Kim Department of Civil and Environmental Engineering University of Hawai i at Manoa 2540 Dole Street, Holmes 383, Honolulu, Hawaii 96822

2 2 / 48 Table of Contents 1 Cluster progress 2 Introduction to MPI (Message-Passing Interface) 3 Calculation of π using MPI Basics Wall Time Broadcast Barrier Data Types Reduce Operation Types Resources

3 3 / 48 Outline Cluster progress 1 Cluster progress 2 Introduction to MPI (Message-Passing Interface) 3 Calculation of π using MPI Basics Wall Time Broadcast Barrier Data Types Reduce Operation Types Resources

Cluster progress My first home-made cluster, UCLA 1997 4 / 48 1 CPU: Pentium II 450MHz 2 Memory: 128MB/PC 3 Network card: Netgear

4 Cluster progress My first home-made cluster, UCLA / 48 1 CPU: Pentium II 450MHz 2 Memory: 128MB/PC 3 Network card: Netgear FX310, 100/10MBPS Ethernet card 4 Switch: Netgear 8 port, 100/10MBPS Ethernet switch 5 KVM sharing device: Belkin Omni Cube 4 port

5 Cluster progress UH 2001, the second home-made cluster 5 / 48 1 Composed of 16 PCs sharing ONE keyboard, monitor, and mouse 2 Red Hat Linux 7.2 installed (free) 3 Connected to a private network by a data switch ( ) 4 More than 30 times faster than Pentium 1.0 GHz system

6 Cluster progress UH 2007, the third from Dell 1 Linux Cluster from Dell Inc. under support from NSF. 2 Initially 16 nodes, 2 Intel(R) Xeon(TM) CPU 2.80GHz, and 2 GB memory per core 3 Queuing system: Platform Lava Ñ LSF, Platform Computing Inc. 4 Programming Language: Intel FORTRAN 77/90 and Intel C/C++ 5 Libraries: BLAS, ATLAS, GotoBLAS, BLACS, LAPACK, ScaLAPACK, OPENMPI ( 6 / 48

Cluster progress 1 GNU-4.1.2 & Intel-11.1 compilers, OpenMPI-1.4.1, PBSPro-10.

7 Cluster progress 1 GNU & Intel-11.1 compilers, OpenMPI-1.4.1, PBSPro Host name: jaws.mhpcc.hawaii.edu 3 IP addresses: & / 48

8 Cluster progress UH 2013, the system updated 8 / 48 1 The second rack was added. 2 Additional 3 nodes, 8 Intel(R) Xeon(R) CPU E GHz per node, and 2 GB memory per core 3 Currently total 56 cores with 2GB memory each. 4 Queuing system: PBS (Portrable Batch System), torque 5 Programming Language: Intel FORTRAN and Intel C/C++ (version ) 6 Libraries: OPENMPI-1.6.1

9 Introduction to MPI (Message-Passing Interface) Outline 9 / 48 1 Cluster progress 2 Introduction to MPI (Message-Passing Interface) 3 Calculation of π using MPI Basics Wall Time Broadcast Barrier Data Types Reduce Operation Types Resources

10 Introduction to MPI (Message-Passing Interface) What is MPI? 10 / 48 MESSAGE-PASSING INTERFACE 1 A program library, NOT a language.

11 Introduction to MPI (Message-Passing Interface) What is MPI? 10 / 48 MESSAGE-PASSING INTERFACE 1 A program library, NOT a language. 2 Called from FORTRAN 77/90, C/C++, and Python (and Java). 3 Most widely used parallel library, but NOT a revolutionary way for parallel computation. 4 A collection of the best features of (many) existing message-passing systems.

12 Introduction to MPI (Message-Passing Interface) How MPI works? Figure: Distributed memory system using 4 nodes. 1 mpif90 are generated when MPI is installed with specific compilers. 2 We don t execute mpirun directly. In practice, we will use Makefile (and later qsub command). 11 / 48

13 Introduction to MPI (Message-Passing Interface) How MPI works? Figure: Distributed memory system using 4 nodes. Suppose we have a cluster composed of four computers: alpha, beta, gamma, and delta. (Each computer has one core.) Usually, the first computer (alpha) is a master machine (and file server). (E.g., fractal) 1 mpif90 are generated when MPI is installed with specific compilers. 2 We don t execute mpirun directly. In practice, we will use Makefile (and later qsub command). 11 / 48

$) Usually, the first computer (alpha) is a master machine (and file server). (E.g., fractal) You have a MPI code, mympi.f, in your working directory of alpha.$

14 Introduction to MPI (Message-Passing Interface) How MPI works? Figure: Distributed memory system using 4 nodes. Suppose we have a cluster composed of four computers: alpha, beta, gamma, and delta. (Each computer has one core.) Usually, the first computer (alpha) is a master machine (and file server). (E.g., fractal) You have a MPI code, mympi.f, in your working directory of alpha. 1 Compile the code 1 : mpif90\mympi.f90\-o\mympi.x ê 2 Run it using 4 nodes 2 : mpirun\-np \4\mympi.x ê 1 mpif90 are generated when MPI is installed with specific compilers. 2 We don t execute mpirun directly. In practice, we will use Makefile (and later qsub command). 11 / 48

15 Introduction to MPI (Message-Passing Interface) TCP/IP communication through ssh 12 / 48 MPI uses a default machine file that contains (for example) ÝÑ compute s IP (alpha) ÝÑ compute s IP (beta) ÝÑ compute s IP (gamma) ÝÑ compute s IP (delta)

16 Introduction to MPI (Message-Passing Interface) TCP/IP communication through ssh 12 / 48 MPI uses a default machine file that contains (for example) ÝÑ compute s IP (alpha) ÝÑ compute s IP (beta) ÝÑ compute s IP (gamma) ÝÑ compute s IP (delta) Basically, when a MPI job is submitted using mpirun, this machine file is read, and node numbers (ranks) are automatically assigned in a sequence (not always ordered) ÝÑ ÝÑ ÝÑ ÝÑ 3

17 Introduction to MPI (Message-Passing Interface) TCP/IP communication through ssh 12 / 48 MPI uses a default machine file that contains (for example) ÝÑ compute s IP (alpha) ÝÑ compute s IP (beta) ÝÑ compute s IP (gamma) ÝÑ compute s IP (delta) Basically, when a MPI job is submitted using mpirun, this machine file is read, and node numbers (ranks) are automatically assigned in a sequence (not always ordered) ÝÑ ÝÑ ÝÑ ÝÑ 3 In most cases including ours, a queueing system (PBS, LSF or others) takes care of job allocation to optimize computational resources. Inter-node communication was through rsh (remote shell) in the past, but is now ssh (secure shell) (rsh+data encription).

18 Introduction to MPI (Message-Passing Interface) Basic Structure of MPI programs 13 / 48 program mympi i m p l i c i t none include mpif. h integer : : numprocs, rank, ierr, rc, RESULTLEN character ( len=20) : : PNAME c a l l MPI_INIT ( i e r r ) i f ( i e r r /= MPI_SUCCESS) then p r i n t, E r r o r s t a r t i n g MPI program. Terminating. c a l l MPI_ABORT(MPI_COMM_WORLD, rc, i e r r ) end i f c a l l MPI_COMM_RANK(MPI_COMM_WORLD, rank, i e r r ) c a l l MPI_COMM_SIZE(MPI_COMM_WORLD, numprocs, i e r r ) c a l l MPI_GET_PROCESSOR_NAME(PNAME, RESULTLEN, i e r r ) write (, " ( No. of procs =,1X, i4,1 x,, My rank =,1X, i4,, pname =, A20 ) " )& numprocs, rank, pname c a l l MPI_FINALIZE ( i e r r ) end How many MPI calls?

19 Introduction to MPI (Message-Passing Interface) Basic Structure of MPI programs 13 / 48 program mympi i m p l i c i t none include mpif. h integer : : numprocs, rank, ierr, rc, RESULTLEN character ( len=20) : : PNAME c a l l MPI_INIT ( i e r r ) i f ( i e r r /= MPI_SUCCESS) then p r i n t, E r r o r s t a r t i n g MPI program. Terminating. c a l l MPI_ABORT(MPI_COMM_WORLD, rc, i e r r ) end i f c a l l MPI_COMM_RANK(MPI_COMM_WORLD, rank, i e r r ) c a l l MPI_COMM_SIZE(MPI_COMM_WORLD, numprocs, i e r r ) c a l l MPI_GET_PROCESSOR_NAME(PNAME, RESULTLEN, i e r r ) write (, " ( No. of procs =,1X, i4,1 x,, My rank =,1X, i4,, pname =, A20 ) " )& numprocs, rank, pname c a l l MPI_FINALIZE ( i e r r ) end How many MPI calls? 5

20 Introduction to MPI (Message-Passing Interface) Makefile 14 / 48 1 mpif90 =/ opt / openmpi i n t e l / bin / mpif90 l i m f 2 mpirun =/ opt / openmpi i n t e l / bin / mpirun 3 mpiopt= mca b t l tcp, s e l f mca b t l _ t c p _ i f _ i n c l u d e eth0 4 5 s r c r o o t =mympi 6 s r c f i l e =$ ( s r c r o o t ). f90 7 exefile =$ ( srcroot ). x 8 numprocs= a l l : 11 $ ( mpif90 ) $ ( s r c f i l e ) o $ ( e x e f i l e ) srun : 14 $ ( mpirun ) mca b t l tcp, s e l f np 1. / $ ( e x e f i l e ) prun : 17 $ ( mpirun ) $ ( mpiopt ) h o s t f i l e. / mpihosts np $ ( numprocs ). / $ ( e x e f i l e ) edit : 20 vim $ ( s r c f i l e )

21 Introduction to MPI (Message-Passing Interface) PBS 15 / 48 We will discuss how to use open-mpi with pbs later.

22 Introduction to MPI (Message-Passing Interface) Job output file 16 / 48 1 No. o f procs = 6, My rank = 0, pname = compute No. o f procs = 6, My rank = 2, pname = compute No. o f procs = 6, My rank = 4, pname = compute No. o f procs = 6, My rank = 5, pname = compute No. o f procs = 6, My rank = 3, pname = compute No. o f procs = 6, My rank = 1, pname = compute 1 01 Note that ranks are not well ordered.

23 Introduction to MPI (Message-Passing Interface) 1. call MPI_INIT (ierr) 17 / 48 1 c a l l MPI_INIT ( i e r r ) 2 i f ( i e r r /= MPI_SUCCESS) then 3 p r i n t, E r r o r s t a r t i n g MPI program. Terminating. 4 c a l l MPI_ABORT(MPI_COMM_WORLD, rc, i e r r ) 5 end i f 6 7 c a l l MPI_COMM_RANK(MPI_COMM_WORLD, rank, i e r r ) 8 c a l l MPI_COMM_SIZE(MPI_COMM_WORLD, numprocs, i e r r ) 9 c a l l MPI_GET_PROCESSOR_NAME(PNAME, RESULTLEN, i e r r )

24 Introduction to MPI (Message-Passing Interface) 1. call MPI_INIT (ierr) 17 / 48 1 c a l l MPI_INIT ( i e r r ) 2 i f ( i e r r /= MPI_SUCCESS) then 3 p r i n t, E r r o r s t a r t i n g MPI program. Terminating. 4 c a l l MPI_ABORT(MPI_COMM_WORLD, rc, i e r r ) 5 end i f 6 7 c a l l MPI_COMM_RANK(MPI_COMM_WORLD, rank, i e r r ) 8 c a l l MPI_COMM_SIZE(MPI_COMM_WORLD, numprocs, i e r r ) 9 c a l l MPI_GET_PROCESSOR_NAME(PNAME, RESULTLEN, i e r r ) Initializes the MPI execution environment. This function must be called in every MPI program, must be called before any other MPI functions, and must be called only once in an MPI program. ierr = error code, which must be either MPI_SUCCESS (=0) or an implementation-defined error code. Where is MPI_SUCCESS (=0) set? [Ans.]

25 Introduction to MPI (Message-Passing Interface) 1. call MPI_INIT (ierr) 17 / 48 1 c a l l MPI_INIT ( i e r r ) 2 i f ( i e r r /= MPI_SUCCESS) then 3 p r i n t, E r r o r s t a r t i n g MPI program. Terminating. 4 c a l l MPI_ABORT(MPI_COMM_WORLD, rc, i e r r ) 5 end i f 6 7 c a l l MPI_COMM_RANK(MPI_COMM_WORLD, rank, i e r r ) 8 c a l l MPI_COMM_SIZE(MPI_COMM_WORLD, numprocs, i e r r ) 9 c a l l MPI_GET_PROCESSOR_NAME(PNAME, RESULTLEN, i e r r ) Initializes the MPI execution environment. This function must be called in every MPI program, must be called before any other MPI functions, and must be called only once in an MPI program. ierr = error code, which must be either MPI_SUCCESS (=0) or an implementation-defined error code. Where is MPI_SUCCESS (=0) set? [Ans.] mpi.h

26 Introduction to MPI (Message-Passing Interface) 2. call MPI_ABORT (MPI_COMM_WORLD, rc,ierr) 18 / 48 1 c a l l MPI_INIT ( i e r r ) 2 i f ( i e r r /= MPI_SUCCESS) then 3 p r i n t, E r r o r s t a r t i n g MPI program. Terminating. 4 c a l l MPI_ABORT(MPI_COMM_WORLD, rc, i e r r ) 5 end i f 6 7 c a l l MPI_COMM_RANK(MPI_COMM_WORLD, rank, i e r r ) 8 c a l l MPI_COMM_SIZE(MPI_COMM_WORLD, numprocs, i e r r ) 9 c a l l MPI_GET_PROCESSOR_NAME(PNAME, RESULTLEN, i e r r )

27 Introduction to MPI (Message-Passing Interface) 2. call MPI_ABORT (MPI_COMM_WORLD, rc,ierr) Terminates (aborts) all MPI processes associated with the communicator. In most MPI implementations it terminates ALL processes regardless of the communicator specified. MPI_COMM_WORLD = default communicator defines one context and the set of all processes. It is one of items defined in mpi.h. rc = error code 18 / 48 1 c a l l MPI_INIT ( i e r r ) 2 i f ( i e r r /= MPI_SUCCESS) then 3 p r i n t, E r r o r s t a r t i n g MPI program. Terminating. 4 c a l l MPI_ABORT(MPI_COMM_WORLD, rc, i e r r ) 5 end i f 6 7 c a l l MPI_COMM_RANK(MPI_COMM_WORLD, rank, i e r r ) 8 c a l l MPI_COMM_SIZE(MPI_COMM_WORLD, numprocs, i e r r ) 9 c a l l MPI_GET_PROCESSOR_NAME(PNAME, RESULTLEN, i e r r )

28 Introduction to MPI (Message-Passing Interface) 3. call MPI_COMM_RANK (MPI_COMM_WORLD,rank,ierr) 1 c a l l MPI_INIT ( i e r r ) 2 i f ( i e r r /= MPI_SUCCESS) then 3 p r i n t, E r r o r s t a r t i n g MPI program. Terminating. 4 c a l l MPI_ABORT(MPI_COMM_WORLD, rc, i e r r ) 5 end i f 6 7 c a l l MPI_COMM_RANK(MPI_COMM_WORLD, rank, i e r r ) 8 c a l l MPI_COMM_SIZE(MPI_COMM_WORLD, numprocs, i e r r ) 9 c a l l MPI_GET_PROCESSOR_NAME(PNAME, RESULTLEN, i e r r ) Determines the rank of the calling process within the communicator. whoami? Initially, each process will be assigned a unique integer rank between 0 and number of processors - 1 (i.e., numprocs-1), within the communicator MPI_COMM_WORLD. This rank is often referred to as a task ID. If 8 processors are used, then ranks are (?) 19 / 48

29 Introduction to MPI (Message-Passing Interface) 3. call MPI_COMM_RANK (MPI_COMM_WORLD,rank,ierr) 1 c a l l MPI_INIT ( i e r r ) 2 i f ( i e r r /= MPI_SUCCESS) then 3 p r i n t, E r r o r s t a r t i n g MPI program. Terminating. 4 c a l l MPI_ABORT(MPI_COMM_WORLD, rc, i e r r ) 5 end i f 6 7 c a l l MPI_COMM_RANK(MPI_COMM_WORLD, rank, i e r r ) 8 c a l l MPI_COMM_SIZE(MPI_COMM_WORLD, numprocs, i e r r ) 9 c a l l MPI_GET_PROCESSOR_NAME(PNAME, RESULTLEN, i e r r ) Determines the rank of the calling process within the communicator. whoami? Initially, each process will be assigned a unique integer rank between 0 and number of processors - 1 (i.e., numprocs-1), within the communicator MPI_COMM_WORLD. This rank is often referred to as a task ID. If 8 processors are used, then ranks are (?) 19 / 48

30 Introduction to MPI (Message-Passing Interface) 4. call MPI_COMM_SIZE (MPI_COMM_WORLD, numprocs, ierr) 20 / 48 1 c a l l MPI_INIT ( i e r r ) 2 i f ( i e r r /= MPI_SUCCESS) then 3 p r i n t, E r r o r s t a r t i n g MPI program. Terminating. 4 c a l l MPI_ABORT(MPI_COMM_WORLD, rc, i e r r ) 5 end i f 6 7 c a l l MPI_COMM_RANK(MPI_COMM_WORLD, rank, i e r r ) 8 c a l l MPI_COMM_SIZE(MPI_COMM_WORLD, numprocs, i e r r ) 9 c a l l MPI_GET_PROCESSOR_NAME(PNAME, RESULTLEN, i e r r )

31 Introduction to MPI (Message-Passing Interface) 4. call MPI_COMM_SIZE (MPI_COMM_WORLD, numprocs, ierr) 20 / 48 1 c a l l MPI_INIT ( i e r r ) 2 i f ( i e r r /= MPI_SUCCESS) then 3 p r i n t, E r r o r s t a r t i n g MPI program. Terminating. 4 c a l l MPI_ABORT(MPI_COMM_WORLD, rc, i e r r ) 5 end i f 6 7 c a l l MPI_COMM_RANK(MPI_COMM_WORLD, rank, i e r r ) 8 c a l l MPI_COMM_SIZE(MPI_COMM_WORLD, numprocs, i e r r ) 9 c a l l MPI_GET_PROCESSOR_NAME(PNAME, RESULTLEN, i e r r ) Determines (or obtains) the number of processes in the group associated with a communicator. Generally used within the communicator MPI_COMM_WORLD to determine the number of processes (numprocs) being used by your own application.

32 Introduction to MPI (Message-Passing Interface) 4. call MPI_COMM_SIZE (MPI_COMM_WORLD, numprocs, ierr) 1 c a l l MPI_INIT ( i e r r ) 2 i f ( i e r r /= MPI_SUCCESS) then 3 p r i n t, E r r o r s t a r t i n g MPI program. Terminating. 4 c a l l MPI_ABORT(MPI_COMM_WORLD, rc, i e r r ) 5 end i f 6 7 c a l l MPI_COMM_RANK(MPI_COMM_WORLD, rank, i e r r ) 8 c a l l MPI_COMM_SIZE(MPI_COMM_WORLD, numprocs, i e r r ) 9 c a l l MPI_GET_PROCESSOR_NAME(PNAME, RESULTLEN, i e r r ) Determines (or obtains) the number of processes in the group associated with a communicator. Generally used within the communicator MPI_COMM_WORLD to determine the number of processes (numprocs) being used by your own application. It matches with the number after -np in» mpirun\-np \4\mympi.x ê 20 / 48

33 Introduction to MPI (Message-Passing Interface) 5. call MPI_GET_PROCESSOR_NAME (PANME, RESULTLEN, ierr) 21 / 48 1 c a l l MPI_INIT ( i e r r ) 2 i f ( i e r r /= MPI_SUCCESS) then 3 p r i n t, E r r o r s t a r t i n g MPI program. Terminating. 4 c a l l MPI_ABORT(MPI_COMM_WORLD, rc, i e r r ) 5 end i f 6 7 c a l l MPI_COMM_RANK(MPI_COMM_WORLD, rank, i e r r ) 8 c a l l MPI_COMM_SIZE(MPI_COMM_WORLD, numprocs, i e r r ) 9 c a l l MPI_GET_PROCESSOR_NAME(PNAME, RESULTLEN, i e r r )

34 Introduction to MPI (Message-Passing Interface) 5. call MPI_GET_PROCESSOR_NAME (PANME, RESULTLEN, ierr) 21 / 48 1 c a l l MPI_INIT ( i e r r ) 2 i f ( i e r r /= MPI_SUCCESS) then 3 p r i n t, E r r o r s t a r t i n g MPI program. Terminating. 4 c a l l MPI_ABORT(MPI_COMM_WORLD, rc, i e r r ) 5 end i f 6 7 c a l l MPI_COMM_RANK(MPI_COMM_WORLD, rank, i e r r ) 8 c a l l MPI_COMM_SIZE(MPI_COMM_WORLD, numprocs, i e r r ) 9 c a l l MPI_GET_PROCESSOR_NAME(PNAME, RESULTLEN, i e r r ) Returns the name of the local processor at the time of the call, i.e, PNAME.

35 Introduction to MPI (Message-Passing Interface) 5. call MPI_GET_PROCESSOR_NAME (PANME, RESULTLEN, ierr) 21 / 48 1 c a l l MPI_INIT ( i e r r ) 2 i f ( i e r r /= MPI_SUCCESS) then 3 p r i n t, E r r o r s t a r t i n g MPI program. Terminating. 4 c a l l MPI_ABORT(MPI_COMM_WORLD, rc, i e r r ) 5 end i f 6 7 c a l l MPI_COMM_RANK(MPI_COMM_WORLD, rank, i e r r ) 8 c a l l MPI_COMM_SIZE(MPI_COMM_WORLD, numprocs, i e r r ) 9 c a l l MPI_GET_PROCESSOR_NAME(PNAME, RESULTLEN, i e r r ) Returns the name of the local processor at the time of the call, i.e, PNAME. RESULTLEN is the character length of PNAME

36 Introduction to MPI (Message-Passing Interface) 6. call MPI_FINALIZE (ierr) 22 / write (, " ( No. of procs =,1X, i4,1 x,, My rank =,1X, i4,, pna 3 numprocs, rank, pname 4 5 c a l l MPI_FINALIZE ( i e r r ) 6 end Terminates the MPI execution environment. This function should be the last MPI routine called in every MPI program - NO other MPI routines may be called after it. Note that all MPI programs start with MPI_INIT (ierr) do something in-between, and end with MPI_FINALIZE (ierr).

37 Introduction to MPI (Message-Passing Interface) 6. call MPI_FINALIZE (ierr) 1 2 write (, " ( No. of procs =,1X, i4,1 x,, My rank =,1X, i4,, pna 3 numprocs, rank, pname 4 5 c a l l MPI_FINALIZE ( i e r r ) 6 end Terminates the MPI execution environment. This function should be the last MPI routine called in every MPI program - NO other MPI routines may be called after it. Note that all MPI programs start with MPI_INIT (ierr) do something in-between, and end with MPI_FINALIZE (ierr). Question: Who is/are doing print * above? 22 / 48

38 Introduction to MPI (Message-Passing Interface) 6. call MPI_FINALIZE (ierr) 1 2 write (, " ( No. of procs =,1X, i4,1 x,, My rank =,1X, i4,, pna 3 numprocs, rank, pname 4 5 c a l l MPI_FINALIZE ( i e r r ) 6 end Terminates the MPI execution environment. This function should be the last MPI routine called in every MPI program - NO other MPI routines may be called after it. Note that all MPI programs start with MPI_INIT (ierr) do something in-between, and end with MPI_FINALIZE (ierr). Question: Who is/are doing print * above? Ans. all 4 processes 22 / 48

39 Introduction to MPI (Message-Passing Interface) Summary: rank and the number of processors 23 / 48 If the number of processors is N procs, then N procs processors will have ranks of 0, 1, 2,, N procs 2, and N procs 1.

40 Introduction to MPI (Message-Passing Interface) Summary: rank and the number of processors 23 / 48 If the number of processors is N procs, then N procs processors will have ranks of 0, 1, 2,, N procs 2, and N procs 1. For example, if 6 processors are used for a parallel MPI calculation, then ranks of the processors will be

41 Introduction to MPI (Message-Passing Interface) Summary: rank and the number of processors 23 / 48 If the number of processors is N procs, then N procs processors will have ranks of 0, 1, 2,, N procs 2, and N procs 1. For example, if 6 processors are used for a parallel MPI calculation, then ranks of the processors will be 0, 1, 2, 3, 4, and 5.

42 Introduction to MPI (Message-Passing Interface) Random Quiz 24 / 48 What are required MPI routines to generate a minimum parallel code? And, how many? ANS:

43 Introduction to MPI (Message-Passing Interface) Random Quiz 24 / 48 What are required MPI routines to generate a minimum parallel code? And, how many? ANS: MPI_INIT MPI_FINALIZE 2

44 25 / 48 Outline Calculation of π using MPI 1 Cluster progress 2 Introduction to MPI (Message-Passing Interface) 3 Calculation of π using MPI Basics Wall Time Broadcast Barrier Data Types Reduce Operation Types Resources

45 Calculation of π using MPI Mathematical Identity Basics Numerically integrate g pxq 4 1 x 2 (1) from x 0 to 1. Here, g pxq is f pxq in the reference book 3. 3 Using MPI, Portable Parallel Programming with the Message-Passing Interface, by William Gropp, Ewing Lusk, Anthony Skjellum, The MIT press, page / 48

46 Calculation of π using MPI Mathematical Identity Basics Numerically integrate g pxq 4 1 x 2 (1) from x 0 to 1. Here, g pxq is f pxq in the reference book 3.» x1 x x 2 dx tan 1 x1 pxq x0 4 tan 1 p1q tan 1 p0q π Make your own derivation, substituting x tan y! π (2) 3 Using MPI, Portable Parallel Programming with the Message-Passing Interface, by William Gropp, Ewing Lusk, Anthony Skjellum, The MIT press, page / 48

47 27 / 48 Integration Scheme Calculation of π using MPI Basics Figure: Integrating to find the value of π with n 5 where n is the number of points (or rectangles) for the integration.

48 28 / 48 Calculation of π using MPI Basics The number of integration and the number of processes (=cores) 1 If n 100 and N procs 4, the each process takes care of 25 points. Processor 0: n 1 25 Processor 1: n Processor 2: n Processor 3: n

49 28 / 48 Calculation of π using MPI Basics The number of integration and the number of processes (=cores) 1 If n 100 and N procs 4, the each process takes care of 25 points. Processor 0: n 1 25 Processor 1: n Processor 2: n Processor 3: n However, what if n{n procs is NOT an integer, for example, n 100 and N procs 6? You can try n{n procs or n{ pn procs 1q. Then how do you handle remainders?.

50 29 / 48 Calculation of π using MPI Optimum load balance Basics * If n{n procs is NOT an integer, one possible optimal way is jumping as many steps as the number of processes (i.e., 6)

51 29 / 48 Calculation of π using MPI Optimum load balance Basics * If n{n procs is NOT an integer, one possible optimal way is jumping as many steps as the number of processes (i.e., 6) Processor 0: n 1, 7, 13,..., 91, 97 Processor 1: n 2, 8, 14,..., 92, 98 Processor 2: n 3, 9, 15,..., 93, 99 Processor 3: n 4, 10, 16,..., 94, 100 Processor 4: n 5, 11, 17,..., 95 Processor 5: n 6, 12, 18,..., 96 This method is generally applicable without any restriction. Assigned tasks are almost identical to each process.

52 30 / 48 Calculation of π using MPI fpi.f90 the first half Basics 1 program main 2 include mpif. h 3 double precision : : PI25DT 4 parameter ( PI25DT = d0 ) 5 double precision : : mypi, pi, h, sum, x, f, a 6 double precision : : T1, T2 7 integer : : n, myid, numprocs, i, r c 8! f u n c t i o n to i n t e g r a t e 9 f ( a ) = 4. d0 / ( 1. d0 + a a ) c a l l MPI_INIT ( i e r r ) 12 c a l l MPI_COMM_RANK( MPI_COMM_WORLD, myid, i e r r ) 13 c a l l MPI_COMM_SIZE( MPI_COMM_WORLD, numprocs, i e r r ) n = h = 1.0 d0 / dble ( n ) 17 T1 = MPI_WTIME ( ) c a l l MPI_BCAST( n, 1,MPI_INTEGER, 0,MPI_COMM_WORLD, i e r r )

53 31 / 48 INTEGER Types Calculation of π using MPI Basics 1 INTEGER*2 (non-standard) (2 Bytes = 16 bits): From to = INTEGER*4 (4 bytes = 32 bits), MPI default: From to = INTEGER*8 (8 bytes = 64 bits): From to =

54 32 / 48 MPI_WTIME() Calculation of π using MPI Wall Time T1 = MPI_WTIME()... T2 = MPI_WTIME() T2 - T1 1 Returns an elapsed wall clock time in seconds (double precision) on the calling processor. 2 Time in seconds since an arbitrary time in the past, e.g., Only the difference, T 2 T 1, makes sense.

55 33 / 48 MPI_BCAST Calculation of π using MPI Broadcast call MPI_BCAST( n, 1, MPI_INTEGER, 0, MPI_COMM_WORLD, ierr ) MPI_BCAST (buffer, count, datatype, root, comm, ierr ) broadcasts a message from the process with rank "root" to all other processes of the group.

56 33 / 48 MPI_BCAST Calculation of π using MPI Broadcast call MPI_BCAST( n, 1, MPI_INTEGER, 0, MPI_COMM_WORLD, ierr ) MPI_BCAST (buffer, count, datatype, root, comm, ierr ) broadcasts a message from the process with rank "root" to all other processes of the group. 1 buffer - name (or starting address) of buffer, i.e., variable (choice)

57 MPI_BCAST Calculation of π using MPI Broadcast call MPI_BCAST( n, 1, MPI_INTEGER, 0, MPI_COMM_WORLD, ierr ) MPI_BCAST (buffer, count, datatype, root, comm, ierr ) broadcasts a message from the process with rank "root" to all other processes of the group. 1 buffer - name (or starting address) of buffer, i.e., variable (choice) 2 count - the number of entries in buffer, i.e., how many (integer) 33 / 48

58 MPI_BCAST Calculation of π using MPI Broadcast call MPI_BCAST( n, 1, MPI_INTEGER, 0, MPI_COMM_WORLD, ierr ) MPI_BCAST (buffer, count, datatype, root, comm, ierr ) broadcasts a message from the process with rank "root" to all other processes of the group. 1 buffer - name (or starting address) of buffer, i.e., variable (choice) 2 count - the number of entries in buffer, i.e., how many (integer) 3 datatype - data type of buffer such as integer, double precision,... (handle) 33 / 48

59 MPI_BCAST Calculation of π using MPI Broadcast call MPI_BCAST( n, 1, MPI_INTEGER, 0, MPI_COMM_WORLD, ierr ) MPI_BCAST (buffer, count, datatype, root, comm, ierr ) broadcasts a message from the process with rank "root" to all other processes of the group. 1 buffer - name (or starting address) of buffer, i.e., variable (choice) 2 count - the number of entries in buffer, i.e., how many (integer) 3 datatype - data type of buffer such as integer, double precision,... (handle) 4 root - rank of broadcast root, i.e., from whom?, broadcaster (integer) 33 / 48

60 MPI_BCAST Calculation of π using MPI Broadcast call MPI_BCAST( n, 1, MPI_INTEGER, 0, MPI_COMM_WORLD, ierr ) MPI_BCAST (buffer, count, datatype, root, comm, ierr ) broadcasts a message from the process with rank "root" to all other processes of the group. 1 buffer - name (or starting address) of buffer, i.e., variable (choice) 2 count - the number of entries in buffer, i.e., how many (integer) 3 datatype - data type of buffer such as integer, double precision,... (handle) 4 root - rank of broadcast root, i.e., from whom?, broadcaster (integer) 5 comm - communicator (handle), i.e., a communication language, default is MPI_COMM_WORLD 33 / 48

61 MPI_BCAST Calculation of π using MPI Broadcast call MPI_BCAST( n, 1, MPI_INTEGER, 0, MPI_COMM_WORLD, ierr ) MPI_BCAST (buffer, count, datatype, root, comm, ierr ) broadcasts a message from the process with rank "root" to all other processes of the group. 1 buffer - name (or starting address) of buffer, i.e., variable (choice) 2 count - the number of entries in buffer, i.e., how many (integer) 3 datatype - data type of buffer such as integer, double precision,... (handle) 4 root - rank of broadcast root, i.e., from whom?, broadcaster (integer) 5 comm - communicator (handle), i.e., a communication language, default is MPI_COMM_WORLD Note: This MPI_BCAST call is only to explain how it works in the example code, which means π will be properly calculated without this call. This is because each process already knows the value of n. 33 / 48

62 34 / 48 Calculation of π using MPI Example of MPI_BCAST Broadcast program broadcast i m p l i c i t none include mpif. h integer numprocs, rank, ierr, rc integer Nbcast c a l l MPI_INIT ( i e r r ) i f ( i e r r /= MPI_SUCCESS) then p r i n t, E r r o r s t a r t i n g MPI program. Terminating. c a l l MPI_ABORT(MPI_COMM_WORLD, rc, i e r r ) end i f c a l l MPI_COMM_RANK(MPI_COMM_WORLD, rank, i e r r ) c a l l MPI_COMM_SIZE(MPI_COMM_WORLD, numprocs, i e r r ) Nbcast = 10 p r i n t, I am, rank, of, numprocs, and Nbcast =, Nbcast c a l l MPI_BARRIER(MPI_COMM_WORLD, i e r r ) i f ( rank ==0) Nbcast = 10 c a l l MPI_BCAST ( Nbcast,1,MPI_INTEGER,0,MPI_COMM_WORLD, i e r r ) p r i n t, I am, rank, of, numprocs, and Nbcast =, Nbcast c a l l MPI_FINALIZE ( i e r r ) end An MPI program is executed by each processor concurrently with given conditions which could be different from each processor, i.e., rank-specific.

63 35 / 48 Calculation of π using MPI Broadcast Outcome of MPI_BCAST example I am 0 of 6 and Nbcast = -10 I am 1 of 6 and Nbcast = -10 I am 2 of 6 and Nbcast = -10 I am 5 of 6 and Nbcast = -10 I am 3 of 6 and Nbcast = -10 I am 4 of 6 and Nbcast = -10 I am 0 of 6 and Nbcast = 10 I am 1 of 6 and Nbcast = 10 I am 2 of 6 and Nbcast = 10 I am 4 of 6 and Nbcast = 10 I am 5 of 6 and Nbcast = 10 I am 3 of 6 and Nbcast = 10 Note: Without calling MPI_BARRIER, the above messages will be highly disordered because processes compete each other to print message to stout. For each run, change the executable file name.

64 36 / 48 MPI_BARRIER Calculation of π using MPI Barrier call MPI_BARRIER( MPI_COMM_WORLD, ierr ) MPI_BARRIER (comm, ierr ) creates a barrier synchronization in a group. Each task, when reaching the MPI_Barrier call, blocks until all tasks in the group reach the same MPI_Barrier call. Let s check everybody is in the bus and then drive to the vacation place. No home alone!

65 Calculation of π using MPI MPI FORTRAN Data Types Data Types 1 MPI_CHARACTER : character(1) 2 MPI_INTEGER : integer(4) 3 MPI_REAL : real(4) 4 MPI_DOUBLE_PRECISION : double precision = real(8) 5 MPI_COMPLEX : complex 6 MPI_LOGICAL : logical 7 MPI_BYTE : 8 binary digits 8 MPI_PACKED : data packed or unpacked with MPI_Pack()/ MPI_Unpack Note: C/C++ do not have complex and logical data-types. 37 / 48

66 Calculation of π using MPI fpi.f90 second half 38 / 48 1 sum = 0.0 d0 2 do i = myid +1, n, numprocs 3 x = h ( dble ( i ) 0.5 d0 ) 4 sum = sum + f ( x ) 5 enddo 6 mypi = h sum 7 8 c a l l MPI_REDUCE ( mypi, pi, 1, MPI_DOUBLE_PRECISION, MPI_SUM, 0,MPI_COMM_WORLD 9 T2 = MPI_WTIME ( ) 10 i f ( myid == 0) then 11 write (, " ( p i i s approximately :, F18.16 ) " ) p i 12 write (, " ( E r r o r i s :, F18.16 ) " ) abs ( p i PI25DT ) 13 write (, ) " S t a r t time = ", T1 14 write (, ) " End time = ", T2 15 write (, ) " Elapsed time = ", T2 T1 16 write (, ) " The number of processes = ", numprocs 17 endif 18 c a l l MPI_FINALIZE ( rc ) 19 stop 20 end

67 39 / 48 MPI_REDUCE Calculation of π using MPI Reduce call MPI_REDUCE (mypi, pi, 1, MPI_DOUBLE_PRECISION, MPI_SUM, 0, MPI_COMM_WORLD, ierr) MPI_REDUCE (sendbuf, recvbuf, count, datatype, op, root, comm, ierr) reduces values on all processes to a single value.

68 39 / 48 MPI_REDUCE Calculation of π using MPI Reduce call MPI_REDUCE (mypi, pi, 1, MPI_DOUBLE_PRECISION, MPI_SUM, 0, MPI_COMM_WORLD, ierr) MPI_REDUCE (sendbuf, recvbuf, count, datatype, op, root, comm, ierr) reduces values on all processes to a single value. 1 sendbuf - address of send buffer by each processor, i.e., variable (choice)

69 39 / 48 MPI_REDUCE Calculation of π using MPI Reduce call MPI_REDUCE (mypi, pi, 1, MPI_DOUBLE_PRECISION, MPI_SUM, 0, MPI_COMM_WORLD, ierr) MPI_REDUCE (sendbuf, recvbuf, count, datatype, op, root, comm, ierr) reduces values on all processes to a single value. 1 sendbuf - address of send buffer by each processor, i.e., variable (choice) 2 recvbuf - address of receive buffer by reducer, i.e., a different variable (choice)

70 39 / 48 MPI_REDUCE Calculation of π using MPI Reduce call MPI_REDUCE (mypi, pi, 1, MPI_DOUBLE_PRECISION, MPI_SUM, 0, MPI_COMM_WORLD, ierr) MPI_REDUCE (sendbuf, recvbuf, count, datatype, op, root, comm, ierr) reduces values on all processes to a single value. 1 sendbuf - address of send buffer by each processor, i.e., variable (choice) 2 recvbuf - address of receive buffer by reducer, i.e., a different variable (choice) 3 count - number of elements in send buffer, i.e., how many (integer)

71 39 / 48 MPI_REDUCE Calculation of π using MPI Reduce call MPI_REDUCE (mypi, pi, 1, MPI_DOUBLE_PRECISION, MPI_SUM, 0, MPI_COMM_WORLD, ierr) MPI_REDUCE (sendbuf, recvbuf, count, datatype, op, root, comm, ierr) reduces values on all processes to a single value. 1 sendbuf - address of send buffer by each processor, i.e., variable (choice) 2 recvbuf - address of receive buffer by reducer, i.e., a different variable (choice) 3 count - number of elements in send buffer, i.e., how many (integer) 4 datatype - data type of elements of send buffer such as integer, real,... (handle)

72 39 / 48 MPI_REDUCE Calculation of π using MPI Reduce call MPI_REDUCE (mypi, pi, 1, MPI_DOUBLE_PRECISION, MPI_SUM, 0, MPI_COMM_WORLD, ierr) MPI_REDUCE (sendbuf, recvbuf, count, datatype, op, root, comm, ierr) reduces values on all processes to a single value. 1 sendbuf - address of send buffer by each processor, i.e., variable (choice) 2 recvbuf - address of receive buffer by reducer, i.e., a different variable (choice) 3 count - number of elements in send buffer, i.e., how many (integer) 4 datatype - data type of elements of send buffer such as integer, real,... (handle) 5 op - reducing aritematic operation (handle)

73 39 / 48 MPI_REDUCE Calculation of π using MPI Reduce call MPI_REDUCE (mypi, pi, 1, MPI_DOUBLE_PRECISION, MPI_SUM, 0, MPI_COMM_WORLD, ierr) MPI_REDUCE (sendbuf, recvbuf, count, datatype, op, root, comm, ierr) reduces values on all processes to a single value. 1 sendbuf - address of send buffer by each processor, i.e., variable (choice) 2 recvbuf - address of receive buffer by reducer, i.e., a different variable (choice) 3 count - number of elements in send buffer, i.e., how many (integer) 4 datatype - data type of elements of send buffer such as integer, real,... (handle) 5 op - reducing aritematic operation (handle) 6 root - rank of root process, i.e., from whom?, broadcaster (integer)

74 39 / 48 MPI_REDUCE Calculation of π using MPI Reduce call MPI_REDUCE (mypi, pi, 1, MPI_DOUBLE_PRECISION, MPI_SUM, 0, MPI_COMM_WORLD, ierr) MPI_REDUCE (sendbuf, recvbuf, count, datatype, op, root, comm, ierr) reduces values on all processes to a single value. 1 sendbuf - address of send buffer by each processor, i.e., variable (choice) 2 recvbuf - address of receive buffer by reducer, i.e., a different variable (choice) 3 count - number of elements in send buffer, i.e., how many (integer) 4 datatype - data type of elements of send buffer such as integer, real,... (handle) 5 op - reducing aritematic operation (handle) 6 root - rank of root process, i.e., from whom?, broadcaster (integer) 7 comm - communicator, i.e., a communication language, default is MPI_COMM_WORLD (handle)

75 40 / 48 Calculation of π using MPI Example of MPI_REDUCE Reduce 1 program broadcast 2 i m p l i c i t none 3 include mpif. h 4 integer : : numprocs, rank, i e r r, rc 5 integer : : Ireduced, Nreduced 6 c a l l MPI_INIT ( i e r r ) 7 i f ( i e r r /= MPI_SUCCESS) then 8 p r i n t, E r r o r s t a r t i n g MPI program. Terminating. 9 c a l l MPI_ABORT(MPI_COMM_WORLD, rc, i e r r ) 10 end i f c a l l MPI_COMM_RANK(MPI_COMM_WORLD, rank, i e r r ) 13 c a l l MPI_COMM_SIZE(MPI_COMM_WORLD, numprocs, i e r r ) 14 Ireduced = 1 + rank 15 Nreduced = 0 16 p r i n t, I am, rank, of, numprocs, and Ireduced =, Ireduced 17 c a l l MPI_BARRIER(MPI_COMM_WORLD, i e r r ) 18 c a l l MPI_REDUCE & 19 ( Ireduced, Nreduced, 1, MPI_INTEGER, MPI_SUM, 0, MPI_COMM_WORLD, i e r r ) 20 i f ( rank ==0) & 21 p r i n t, I am, rank, of, numprocs, and Nreduced =, Nreduced 22 c a l l MPI_FINALIZE ( i e r r ) 23 end

76 41 / 48 Calculation of π using MPI Reduce Outcome of MPI_REDUCE example I am 0 of 6 and Ireduced = 1 I am 1 of 6 and Ireduced = 2 I am 2 of 6 and Ireduced = 3 I am 3 of 6 and Ireduced = 4 I am 4 of 6 and Ireduced = 5 I am 5 of 6 and Ireduced = 6 I am 0 of 6 and Nreduced = Note that rank = Ireduced -1. Without calling MPI_BARRIER, the final answer of 21 can be positioned anywhere.

77 42 / 48 MPI Operations Calculation of π using MPI Operation Types 1 MPI_MAX maximum (integer, real, complex )

78 42 / 48 MPI Operations Calculation of π using MPI Operation Types 1 MPI_MAX maximum (integer, real, complex ) 2 MPI_MIN minimum (integer, real, complex)

79 42 / 48 MPI Operations Calculation of π using MPI Operation Types 1 MPI_MAX maximum (integer, real, complex ) 2 MPI_MIN minimum (integer, real, complex) 3 MPI_SUM sum (integer, real, complex)

80 42 / 48 MPI Operations Calculation of π using MPI Operation Types 1 MPI_MAX maximum (integer, real, complex ) 2 MPI_MIN minimum (integer, real, complex) 3 MPI_SUM sum (integer, real, complex) 4 MPI_PROD product (integer, real, complex)

81 42 / 48 MPI Operations Calculation of π using MPI Operation Types 1 MPI_MAX maximum (integer, real, complex ) 2 MPI_MIN minimum (integer, real, complex) 3 MPI_SUM sum (integer, real, complex) 4 MPI_PROD product (integer, real, complex) 5 MPI_LAND logical AND (logical)

82 42 / 48 MPI Operations Calculation of π using MPI Operation Types 1 MPI_MAX maximum (integer, real, complex ) 2 MPI_MIN minimum (integer, real, complex) 3 MPI_SUM sum (integer, real, complex) 4 MPI_PROD product (integer, real, complex) 5 MPI_LAND logical AND (logical) 6 MPI_LOR logical OR (logical)

83 42 / 48 MPI Operations Calculation of π using MPI Operation Types 1 MPI_MAX maximum (integer, real, complex ) 2 MPI_MIN minimum (integer, real, complex) 3 MPI_SUM sum (integer, real, complex) 4 MPI_PROD product (integer, real, complex) 5 MPI_LAND logical AND (logical) 6 MPI_LOR logical OR (logical) 7 MPI_LXOR logical XOR (logical)

84 42 / 48 MPI Operations Calculation of π using MPI Operation Types 1 MPI_MAX maximum (integer, real, complex ) 2 MPI_MIN minimum (integer, real, complex) 3 MPI_SUM sum (integer, real, complex) 4 MPI_PROD product (integer, real, complex) 5 MPI_LAND logical AND (logical) 6 MPI_LOR logical OR (logical) 7 MPI_LXOR logical XOR (logical) 8 MPI_BAND bit-wise AND (integer, MPI_BYTE) 9 MPI_BOR bit-wise OR (integer, MPI_BYTE) 10 MPI_BXOR bit-wise XOR (integer, MPI_BYTE) 11 MPI_MAXLOC max value and location (real, complex, double precision ) 12 MPI_MINLOC min value and location (real, complex, double precision )

85 43 / 48 Calculation of π using MPI Resources and References Resources 1 Gropp et al. Using MPI: Portable parallel programming with the Message-Passing Interface. MIT Press (1997)

86 44 / 48 Makefile Calculation of π using MPI Resources mpif90 =/ opt / openmpi i n t e l / bin / mpif90 l i m f foropt= diag disable 8290 diag disable 8291 diag disable mpirun =/ opt / openmpi i n t e l / bin / mpirun mpiopt= mca b t l tcp, s e l f mca b t l _ t c p _ i f _ i n c l u d e eth0 s r c r o o t = f p i s r c f i l e =$ ( s r c r o o t ). f90 e x e f i l e =$ ( s r c r o o t ). x numprocs=6 a l l : srun : prun : e d i t : $ ( mpif90 ) $ ( f o r o p t ) $ ( s r c f i l e ) o $ ( e x e f i l e ) time $ ( mpirun ) mca b t l tcp, s e l f np 1. / $ ( e x e f i l e ) time $ ( mpirun ) $ ( mpiopt ) h o s t f i l e. / mpihosts np $ ( numprocs ). / $ ( e x e f i l e ) vim $ ( s r c f i l e )

87 45 / 48 script file Calculation of π using MPI Resources We will discuss this later.

88 46 / 48 Calculation of π using MPI Job output file part Resources p i i s approximately : E r r o r i s : S t a r t time = End time = Elapsed time = The number o f processes = user 0.04 system 0:01.84 elapsed 6%CPU (0 avgtext +0avgdata m 0 i n p u t s +0 outputs (0 major+4125minor ) p a g e f a u l t s 0swaps

89 47 / 48 Lab work Calculation of π using MPI 1 MPI codes are stored at /opt/cee618s13/class04/ 2 Copy examples under subdirectories of MPI-basic, MPI-bcase, MPI-reduce and MPI-pi and study them. 3 Type/enter make followed by make prun. 4 Start your homework.

90 Calculation of π using MPI Speed up test 48 / 48 1 Change the number of processes of MPI-pi application from 1 to 8 and see how much speed up is acheieved.

91 Calculation of π using MPI Speed up test 48 / 48 1 Change the number of processes of MPI-pi application from 1 to 8 and see how much speed up is acheieved.

92 Calculation of π using MPI Speed up test 48 / 48 1 Change the number of processes of MPI-pi application from 1 to 8 and see how much speed up is acheieved.

93 Calculation of π using MPI Speed up test 48 / 48 1 Change the number of processes of MPI-pi application from 1 to 8 and see how much speed up is acheieved.

94 Calculation of π using MPI Speed up test 48 / 48 1 Change the number of processes of MPI-pi application from 1 to 8 and see how much speed up is acheieved.

The Message Passing Interface (MPI) TMA4280 Introduction to Supercomputing

The Message Passing Interface (MPI) TMA4280 Introduction to Supercomputing NTNU, IMF February 9. 2018 1 Recap: Parallelism with MPI An MPI execution is started on a set of processes P with: mpirun -n N