CEE 618 Scientific Parallel Computing (Lecture 4): Message-Passing Interface (MPI)

Size: px
Start display at page:

Download "CEE 618 Scientific Parallel Computing (Lecture 4): Message-Passing Interface (MPI)"

Transcription

1 1 / 48 CEE 618 Scientific Parallel Computing (Lecture 4): Message-Passing Interface (MPI) Albert S. Kim Department of Civil and Environmental Engineering University of Hawai i at Manoa 2540 Dole Street, Holmes 383, Honolulu, Hawaii 96822

2 2 / 48 Table of Contents 1 Cluster progress 2 Introduction to MPI (Message-Passing Interface) 3 Calculation of π using MPI Basics Wall Time Broadcast Barrier Data Types Reduce Operation Types Resources

3 3 / 48 Outline Cluster progress 1 Cluster progress 2 Introduction to MPI (Message-Passing Interface) 3 Calculation of π using MPI Basics Wall Time Broadcast Barrier Data Types Reduce Operation Types Resources

4 Cluster progress My first home-made cluster, UCLA / 48 1 CPU: Pentium II 450MHz 2 Memory: 128MB/PC 3 Network card: Netgear FX310, 100/10MBPS Ethernet card 4 Switch: Netgear 8 port, 100/10MBPS Ethernet switch 5 KVM sharing device: Belkin Omni Cube 4 port

5 Cluster progress UH 2001, the second home-made cluster 5 / 48 1 Composed of 16 PCs sharing ONE keyboard, monitor, and mouse 2 Red Hat Linux 7.2 installed (free) 3 Connected to a private network by a data switch ( ) 4 More than 30 times faster than Pentium 1.0 GHz system

6 Cluster progress UH 2007, the third from Dell 1 Linux Cluster from Dell Inc. under support from NSF. 2 Initially 16 nodes, 2 Intel(R) Xeon(TM) CPU 2.80GHz, and 2 GB memory per core 3 Queuing system: Platform Lava Ñ LSF, Platform Computing Inc. 4 Programming Language: Intel FORTRAN 77/90 and Intel C/C++ 5 Libraries: BLAS, ATLAS, GotoBLAS, BLACS, LAPACK, ScaLAPACK, OPENMPI ( 6 / 48

7 Cluster progress 1 GNU & Intel-11.1 compilers, OpenMPI-1.4.1, PBSPro Host name: jaws.mhpcc.hawaii.edu 3 IP addresses: & / 48

8 Cluster progress UH 2013, the system updated 8 / 48 1 The second rack was added. 2 Additional 3 nodes, 8 Intel(R) Xeon(R) CPU E GHz per node, and 2 GB memory per core 3 Currently total 56 cores with 2GB memory each. 4 Queuing system: PBS (Portrable Batch System), torque 5 Programming Language: Intel FORTRAN and Intel C/C++ (version ) 6 Libraries: OPENMPI-1.6.1

9 Introduction to MPI (Message-Passing Interface) Outline 9 / 48 1 Cluster progress 2 Introduction to MPI (Message-Passing Interface) 3 Calculation of π using MPI Basics Wall Time Broadcast Barrier Data Types Reduce Operation Types Resources

10 Introduction to MPI (Message-Passing Interface) What is MPI? 10 / 48 MESSAGE-PASSING INTERFACE 1 A program library, NOT a language.

11 Introduction to MPI (Message-Passing Interface) What is MPI? 10 / 48 MESSAGE-PASSING INTERFACE 1 A program library, NOT a language. 2 Called from FORTRAN 77/90, C/C++, and Python (and Java). 3 Most widely used parallel library, but NOT a revolutionary way for parallel computation. 4 A collection of the best features of (many) existing message-passing systems.

12 Introduction to MPI (Message-Passing Interface) How MPI works? Figure: Distributed memory system using 4 nodes. 1 mpif90 are generated when MPI is installed with specific compilers. 2 We don t execute mpirun directly. In practice, we will use Makefile (and later qsub command). 11 / 48

13 Introduction to MPI (Message-Passing Interface) How MPI works? Figure: Distributed memory system using 4 nodes. Suppose we have a cluster composed of four computers: alpha, beta, gamma, and delta. (Each computer has one core.) Usually, the first computer (alpha) is a master machine (and file server). (E.g., fractal) 1 mpif90 are generated when MPI is installed with specific compilers. 2 We don t execute mpirun directly. In practice, we will use Makefile (and later qsub command). 11 / 48

14 Introduction to MPI (Message-Passing Interface) How MPI works? Figure: Distributed memory system using 4 nodes. Suppose we have a cluster composed of four computers: alpha, beta, gamma, and delta. (Each computer has one core.) Usually, the first computer (alpha) is a master machine (and file server). (E.g., fractal) You have a MPI code, mympi.f, in your working directory of alpha. 1 Compile the code 1 : mpif90\mympi.f90\-o\mympi.x ê 2 Run it using 4 nodes 2 : mpirun\-np \4\mympi.x ê 1 mpif90 are generated when MPI is installed with specific compilers. 2 We don t execute mpirun directly. In practice, we will use Makefile (and later qsub command). 11 / 48

15 Introduction to MPI (Message-Passing Interface) TCP/IP communication through ssh 12 / 48 MPI uses a default machine file that contains (for example) ÝÑ compute s IP (alpha) ÝÑ compute s IP (beta) ÝÑ compute s IP (gamma) ÝÑ compute s IP (delta)

16 Introduction to MPI (Message-Passing Interface) TCP/IP communication through ssh 12 / 48 MPI uses a default machine file that contains (for example) ÝÑ compute s IP (alpha) ÝÑ compute s IP (beta) ÝÑ compute s IP (gamma) ÝÑ compute s IP (delta) Basically, when a MPI job is submitted using mpirun, this machine file is read, and node numbers (ranks) are automatically assigned in a sequence (not always ordered) ÝÑ ÝÑ ÝÑ ÝÑ 3

17 Introduction to MPI (Message-Passing Interface) TCP/IP communication through ssh 12 / 48 MPI uses a default machine file that contains (for example) ÝÑ compute s IP (alpha) ÝÑ compute s IP (beta) ÝÑ compute s IP (gamma) ÝÑ compute s IP (delta) Basically, when a MPI job is submitted using mpirun, this machine file is read, and node numbers (ranks) are automatically assigned in a sequence (not always ordered) ÝÑ ÝÑ ÝÑ ÝÑ 3 In most cases including ours, a queueing system (PBS, LSF or others) takes care of job allocation to optimize computational resources. Inter-node communication was through rsh (remote shell) in the past, but is now ssh (secure shell) (rsh+data encription).

18 Introduction to MPI (Message-Passing Interface) Basic Structure of MPI programs 13 / 48 program mympi i m p l i c i t none include mpif. h integer : : numprocs, rank, ierr, rc, RESULTLEN character ( len=20) : : PNAME c a l l MPI_INIT ( i e r r ) i f ( i e r r /= MPI_SUCCESS) then p r i n t, E r r o r s t a r t i n g MPI program. Terminating. c a l l MPI_ABORT(MPI_COMM_WORLD, rc, i e r r ) end i f c a l l MPI_COMM_RANK(MPI_COMM_WORLD, rank, i e r r ) c a l l MPI_COMM_SIZE(MPI_COMM_WORLD, numprocs, i e r r ) c a l l MPI_GET_PROCESSOR_NAME(PNAME, RESULTLEN, i e r r ) write (, " ( No. of procs =,1X, i4,1 x,, My rank =,1X, i4,, pname =, A20 ) " )& numprocs, rank, pname c a l l MPI_FINALIZE ( i e r r ) end How many MPI calls?

19 Introduction to MPI (Message-Passing Interface) Basic Structure of MPI programs 13 / 48 program mympi i m p l i c i t none include mpif. h integer : : numprocs, rank, ierr, rc, RESULTLEN character ( len=20) : : PNAME c a l l MPI_INIT ( i e r r ) i f ( i e r r /= MPI_SUCCESS) then p r i n t, E r r o r s t a r t i n g MPI program. Terminating. c a l l MPI_ABORT(MPI_COMM_WORLD, rc, i e r r ) end i f c a l l MPI_COMM_RANK(MPI_COMM_WORLD, rank, i e r r ) c a l l MPI_COMM_SIZE(MPI_COMM_WORLD, numprocs, i e r r ) c a l l MPI_GET_PROCESSOR_NAME(PNAME, RESULTLEN, i e r r ) write (, " ( No. of procs =,1X, i4,1 x,, My rank =,1X, i4,, pname =, A20 ) " )& numprocs, rank, pname c a l l MPI_FINALIZE ( i e r r ) end How many MPI calls? 5

20 Introduction to MPI (Message-Passing Interface) Makefile 14 / 48 1 mpif90 =/ opt / openmpi i n t e l / bin / mpif90 l i m f 2 mpirun =/ opt / openmpi i n t e l / bin / mpirun 3 mpiopt= mca b t l tcp, s e l f mca b t l _ t c p _ i f _ i n c l u d e eth0 4 5 s r c r o o t =mympi 6 s r c f i l e =$ ( s r c r o o t ). f90 7 exefile =$ ( srcroot ). x 8 numprocs= a l l : 11 $ ( mpif90 ) $ ( s r c f i l e ) o $ ( e x e f i l e ) srun : 14 $ ( mpirun ) mca b t l tcp, s e l f np 1. / $ ( e x e f i l e ) prun : 17 $ ( mpirun ) $ ( mpiopt ) h o s t f i l e. / mpihosts np $ ( numprocs ). / $ ( e x e f i l e ) edit : 20 vim $ ( s r c f i l e )

21 Introduction to MPI (Message-Passing Interface) PBS 15 / 48 We will discuss how to use open-mpi with pbs later.

22 Introduction to MPI (Message-Passing Interface) Job output file 16 / 48 1 No. o f procs = 6, My rank = 0, pname = compute No. o f procs = 6, My rank = 2, pname = compute No. o f procs = 6, My rank = 4, pname = compute No. o f procs = 6, My rank = 5, pname = compute No. o f procs = 6, My rank = 3, pname = compute No. o f procs = 6, My rank = 1, pname = compute 1 01 Note that ranks are not well ordered.

23 Introduction to MPI (Message-Passing Interface) 1. call MPI_INIT (ierr) 17 / 48 1 c a l l MPI_INIT ( i e r r ) 2 i f ( i e r r /= MPI_SUCCESS) then 3 p r i n t, E r r o r s t a r t i n g MPI program. Terminating. 4 c a l l MPI_ABORT(MPI_COMM_WORLD, rc, i e r r ) 5 end i f 6 7 c a l l MPI_COMM_RANK(MPI_COMM_WORLD, rank, i e r r ) 8 c a l l MPI_COMM_SIZE(MPI_COMM_WORLD, numprocs, i e r r ) 9 c a l l MPI_GET_PROCESSOR_NAME(PNAME, RESULTLEN, i e r r )

24 Introduction to MPI (Message-Passing Interface) 1. call MPI_INIT (ierr) 17 / 48 1 c a l l MPI_INIT ( i e r r ) 2 i f ( i e r r /= MPI_SUCCESS) then 3 p r i n t, E r r o r s t a r t i n g MPI program. Terminating. 4 c a l l MPI_ABORT(MPI_COMM_WORLD, rc, i e r r ) 5 end i f 6 7 c a l l MPI_COMM_RANK(MPI_COMM_WORLD, rank, i e r r ) 8 c a l l MPI_COMM_SIZE(MPI_COMM_WORLD, numprocs, i e r r ) 9 c a l l MPI_GET_PROCESSOR_NAME(PNAME, RESULTLEN, i e r r ) Initializes the MPI execution environment. This function must be called in every MPI program, must be called before any other MPI functions, and must be called only once in an MPI program. ierr = error code, which must be either MPI_SUCCESS (=0) or an implementation-defined error code. Where is MPI_SUCCESS (=0) set? [Ans.]

25 Introduction to MPI (Message-Passing Interface) 1. call MPI_INIT (ierr) 17 / 48 1 c a l l MPI_INIT ( i e r r ) 2 i f ( i e r r /= MPI_SUCCESS) then 3 p r i n t, E r r o r s t a r t i n g MPI program. Terminating. 4 c a l l MPI_ABORT(MPI_COMM_WORLD, rc, i e r r ) 5 end i f 6 7 c a l l MPI_COMM_RANK(MPI_COMM_WORLD, rank, i e r r ) 8 c a l l MPI_COMM_SIZE(MPI_COMM_WORLD, numprocs, i e r r ) 9 c a l l MPI_GET_PROCESSOR_NAME(PNAME, RESULTLEN, i e r r ) Initializes the MPI execution environment. This function must be called in every MPI program, must be called before any other MPI functions, and must be called only once in an MPI program. ierr = error code, which must be either MPI_SUCCESS (=0) or an implementation-defined error code. Where is MPI_SUCCESS (=0) set? [Ans.] mpi.h

26 Introduction to MPI (Message-Passing Interface) 2. call MPI_ABORT (MPI_COMM_WORLD, rc,ierr) 18 / 48 1 c a l l MPI_INIT ( i e r r ) 2 i f ( i e r r /= MPI_SUCCESS) then 3 p r i n t, E r r o r s t a r t i n g MPI program. Terminating. 4 c a l l MPI_ABORT(MPI_COMM_WORLD, rc, i e r r ) 5 end i f 6 7 c a l l MPI_COMM_RANK(MPI_COMM_WORLD, rank, i e r r ) 8 c a l l MPI_COMM_SIZE(MPI_COMM_WORLD, numprocs, i e r r ) 9 c a l l MPI_GET_PROCESSOR_NAME(PNAME, RESULTLEN, i e r r )

27 Introduction to MPI (Message-Passing Interface) 2. call MPI_ABORT (MPI_COMM_WORLD, rc,ierr) Terminates (aborts) all MPI processes associated with the communicator. In most MPI implementations it terminates ALL processes regardless of the communicator specified. MPI_COMM_WORLD = default communicator defines one context and the set of all processes. It is one of items defined in mpi.h. rc = error code 18 / 48 1 c a l l MPI_INIT ( i e r r ) 2 i f ( i e r r /= MPI_SUCCESS) then 3 p r i n t, E r r o r s t a r t i n g MPI program. Terminating. 4 c a l l MPI_ABORT(MPI_COMM_WORLD, rc, i e r r ) 5 end i f 6 7 c a l l MPI_COMM_RANK(MPI_COMM_WORLD, rank, i e r r ) 8 c a l l MPI_COMM_SIZE(MPI_COMM_WORLD, numprocs, i e r r ) 9 c a l l MPI_GET_PROCESSOR_NAME(PNAME, RESULTLEN, i e r r )

28 Introduction to MPI (Message-Passing Interface) 3. call MPI_COMM_RANK (MPI_COMM_WORLD,rank,ierr) 1 c a l l MPI_INIT ( i e r r ) 2 i f ( i e r r /= MPI_SUCCESS) then 3 p r i n t, E r r o r s t a r t i n g MPI program. Terminating. 4 c a l l MPI_ABORT(MPI_COMM_WORLD, rc, i e r r ) 5 end i f 6 7 c a l l MPI_COMM_RANK(MPI_COMM_WORLD, rank, i e r r ) 8 c a l l MPI_COMM_SIZE(MPI_COMM_WORLD, numprocs, i e r r ) 9 c a l l MPI_GET_PROCESSOR_NAME(PNAME, RESULTLEN, i e r r ) Determines the rank of the calling process within the communicator. whoami? Initially, each process will be assigned a unique integer rank between 0 and number of processors - 1 (i.e., numprocs-1), within the communicator MPI_COMM_WORLD. This rank is often referred to as a task ID. If 8 processors are used, then ranks are (?) 19 / 48

29 Introduction to MPI (Message-Passing Interface) 3. call MPI_COMM_RANK (MPI_COMM_WORLD,rank,ierr) 1 c a l l MPI_INIT ( i e r r ) 2 i f ( i e r r /= MPI_SUCCESS) then 3 p r i n t, E r r o r s t a r t i n g MPI program. Terminating. 4 c a l l MPI_ABORT(MPI_COMM_WORLD, rc, i e r r ) 5 end i f 6 7 c a l l MPI_COMM_RANK(MPI_COMM_WORLD, rank, i e r r ) 8 c a l l MPI_COMM_SIZE(MPI_COMM_WORLD, numprocs, i e r r ) 9 c a l l MPI_GET_PROCESSOR_NAME(PNAME, RESULTLEN, i e r r ) Determines the rank of the calling process within the communicator. whoami? Initially, each process will be assigned a unique integer rank between 0 and number of processors - 1 (i.e., numprocs-1), within the communicator MPI_COMM_WORLD. This rank is often referred to as a task ID. If 8 processors are used, then ranks are (?) 19 / 48

30 Introduction to MPI (Message-Passing Interface) 4. call MPI_COMM_SIZE (MPI_COMM_WORLD, numprocs, ierr) 20 / 48 1 c a l l MPI_INIT ( i e r r ) 2 i f ( i e r r /= MPI_SUCCESS) then 3 p r i n t, E r r o r s t a r t i n g MPI program. Terminating. 4 c a l l MPI_ABORT(MPI_COMM_WORLD, rc, i e r r ) 5 end i f 6 7 c a l l MPI_COMM_RANK(MPI_COMM_WORLD, rank, i e r r ) 8 c a l l MPI_COMM_SIZE(MPI_COMM_WORLD, numprocs, i e r r ) 9 c a l l MPI_GET_PROCESSOR_NAME(PNAME, RESULTLEN, i e r r )

31 Introduction to MPI (Message-Passing Interface) 4. call MPI_COMM_SIZE (MPI_COMM_WORLD, numprocs, ierr) 20 / 48 1 c a l l MPI_INIT ( i e r r ) 2 i f ( i e r r /= MPI_SUCCESS) then 3 p r i n t, E r r o r s t a r t i n g MPI program. Terminating. 4 c a l l MPI_ABORT(MPI_COMM_WORLD, rc, i e r r ) 5 end i f 6 7 c a l l MPI_COMM_RANK(MPI_COMM_WORLD, rank, i e r r ) 8 c a l l MPI_COMM_SIZE(MPI_COMM_WORLD, numprocs, i e r r ) 9 c a l l MPI_GET_PROCESSOR_NAME(PNAME, RESULTLEN, i e r r ) Determines (or obtains) the number of processes in the group associated with a communicator. Generally used within the communicator MPI_COMM_WORLD to determine the number of processes (numprocs) being used by your own application.

32 Introduction to MPI (Message-Passing Interface) 4. call MPI_COMM_SIZE (MPI_COMM_WORLD, numprocs, ierr) 1 c a l l MPI_INIT ( i e r r ) 2 i f ( i e r r /= MPI_SUCCESS) then 3 p r i n t, E r r o r s t a r t i n g MPI program. Terminating. 4 c a l l MPI_ABORT(MPI_COMM_WORLD, rc, i e r r ) 5 end i f 6 7 c a l l MPI_COMM_RANK(MPI_COMM_WORLD, rank, i e r r ) 8 c a l l MPI_COMM_SIZE(MPI_COMM_WORLD, numprocs, i e r r ) 9 c a l l MPI_GET_PROCESSOR_NAME(PNAME, RESULTLEN, i e r r ) Determines (or obtains) the number of processes in the group associated with a communicator. Generally used within the communicator MPI_COMM_WORLD to determine the number of processes (numprocs) being used by your own application. It matches with the number after -np in» mpirun\-np \4\mympi.x ê 20 / 48

33 Introduction to MPI (Message-Passing Interface) 5. call MPI_GET_PROCESSOR_NAME (PANME, RESULTLEN, ierr) 21 / 48 1 c a l l MPI_INIT ( i e r r ) 2 i f ( i e r r /= MPI_SUCCESS) then 3 p r i n t, E r r o r s t a r t i n g MPI program. Terminating. 4 c a l l MPI_ABORT(MPI_COMM_WORLD, rc, i e r r ) 5 end i f 6 7 c a l l MPI_COMM_RANK(MPI_COMM_WORLD, rank, i e r r ) 8 c a l l MPI_COMM_SIZE(MPI_COMM_WORLD, numprocs, i e r r ) 9 c a l l MPI_GET_PROCESSOR_NAME(PNAME, RESULTLEN, i e r r )

34 Introduction to MPI (Message-Passing Interface) 5. call MPI_GET_PROCESSOR_NAME (PANME, RESULTLEN, ierr) 21 / 48 1 c a l l MPI_INIT ( i e r r ) 2 i f ( i e r r /= MPI_SUCCESS) then 3 p r i n t, E r r o r s t a r t i n g MPI program. Terminating. 4 c a l l MPI_ABORT(MPI_COMM_WORLD, rc, i e r r ) 5 end i f 6 7 c a l l MPI_COMM_RANK(MPI_COMM_WORLD, rank, i e r r ) 8 c a l l MPI_COMM_SIZE(MPI_COMM_WORLD, numprocs, i e r r ) 9 c a l l MPI_GET_PROCESSOR_NAME(PNAME, RESULTLEN, i e r r ) Returns the name of the local processor at the time of the call, i.e, PNAME.

35 Introduction to MPI (Message-Passing Interface) 5. call MPI_GET_PROCESSOR_NAME (PANME, RESULTLEN, ierr) 21 / 48 1 c a l l MPI_INIT ( i e r r ) 2 i f ( i e r r /= MPI_SUCCESS) then 3 p r i n t, E r r o r s t a r t i n g MPI program. Terminating. 4 c a l l MPI_ABORT(MPI_COMM_WORLD, rc, i e r r ) 5 end i f 6 7 c a l l MPI_COMM_RANK(MPI_COMM_WORLD, rank, i e r r ) 8 c a l l MPI_COMM_SIZE(MPI_COMM_WORLD, numprocs, i e r r ) 9 c a l l MPI_GET_PROCESSOR_NAME(PNAME, RESULTLEN, i e r r ) Returns the name of the local processor at the time of the call, i.e, PNAME. RESULTLEN is the character length of PNAME

36 Introduction to MPI (Message-Passing Interface) 6. call MPI_FINALIZE (ierr) 22 / write (, " ( No. of procs =,1X, i4,1 x,, My rank =,1X, i4,, pna 3 numprocs, rank, pname 4 5 c a l l MPI_FINALIZE ( i e r r ) 6 end Terminates the MPI execution environment. This function should be the last MPI routine called in every MPI program - NO other MPI routines may be called after it. Note that all MPI programs start with MPI_INIT (ierr) do something in-between, and end with MPI_FINALIZE (ierr).

37 Introduction to MPI (Message-Passing Interface) 6. call MPI_FINALIZE (ierr) 1 2 write (, " ( No. of procs =,1X, i4,1 x,, My rank =,1X, i4,, pna 3 numprocs, rank, pname 4 5 c a l l MPI_FINALIZE ( i e r r ) 6 end Terminates the MPI execution environment. This function should be the last MPI routine called in every MPI program - NO other MPI routines may be called after it. Note that all MPI programs start with MPI_INIT (ierr) do something in-between, and end with MPI_FINALIZE (ierr). Question: Who is/are doing print * above? 22 / 48

38 Introduction to MPI (Message-Passing Interface) 6. call MPI_FINALIZE (ierr) 1 2 write (, " ( No. of procs =,1X, i4,1 x,, My rank =,1X, i4,, pna 3 numprocs, rank, pname 4 5 c a l l MPI_FINALIZE ( i e r r ) 6 end Terminates the MPI execution environment. This function should be the last MPI routine called in every MPI program - NO other MPI routines may be called after it. Note that all MPI programs start with MPI_INIT (ierr) do something in-between, and end with MPI_FINALIZE (ierr). Question: Who is/are doing print * above? Ans. all 4 processes 22 / 48

39 Introduction to MPI (Message-Passing Interface) Summary: rank and the number of processors 23 / 48 If the number of processors is N procs, then N procs processors will have ranks of 0, 1, 2,, N procs 2, and N procs 1.

40 Introduction to MPI (Message-Passing Interface) Summary: rank and the number of processors 23 / 48 If the number of processors is N procs, then N procs processors will have ranks of 0, 1, 2,, N procs 2, and N procs 1. For example, if 6 processors are used for a parallel MPI calculation, then ranks of the processors will be

41 Introduction to MPI (Message-Passing Interface) Summary: rank and the number of processors 23 / 48 If the number of processors is N procs, then N procs processors will have ranks of 0, 1, 2,, N procs 2, and N procs 1. For example, if 6 processors are used for a parallel MPI calculation, then ranks of the processors will be 0, 1, 2, 3, 4, and 5.

42 Introduction to MPI (Message-Passing Interface) Random Quiz 24 / 48 What are required MPI routines to generate a minimum parallel code? And, how many? ANS:

43 Introduction to MPI (Message-Passing Interface) Random Quiz 24 / 48 What are required MPI routines to generate a minimum parallel code? And, how many? ANS: MPI_INIT MPI_FINALIZE 2

44 25 / 48 Outline Calculation of π using MPI 1 Cluster progress 2 Introduction to MPI (Message-Passing Interface) 3 Calculation of π using MPI Basics Wall Time Broadcast Barrier Data Types Reduce Operation Types Resources

45 Calculation of π using MPI Mathematical Identity Basics Numerically integrate g pxq 4 1 x 2 (1) from x 0 to 1. Here, g pxq is f pxq in the reference book 3. 3 Using MPI, Portable Parallel Programming with the Message-Passing Interface, by William Gropp, Ewing Lusk, Anthony Skjellum, The MIT press, page / 48

46 Calculation of π using MPI Mathematical Identity Basics Numerically integrate g pxq 4 1 x 2 (1) from x 0 to 1. Here, g pxq is f pxq in the reference book 3.» x1 x x 2 dx tan 1 x1 pxq x0 4 tan 1 p1q tan 1 p0q π Make your own derivation, substituting x tan y! π (2) 3 Using MPI, Portable Parallel Programming with the Message-Passing Interface, by William Gropp, Ewing Lusk, Anthony Skjellum, The MIT press, page / 48

47 27 / 48 Integration Scheme Calculation of π using MPI Basics Figure: Integrating to find the value of π with n 5 where n is the number of points (or rectangles) for the integration.

48 28 / 48 Calculation of π using MPI Basics The number of integration and the number of processes (=cores) 1 If n 100 and N procs 4, the each process takes care of 25 points. Processor 0: n 1 25 Processor 1: n Processor 2: n Processor 3: n

49 28 / 48 Calculation of π using MPI Basics The number of integration and the number of processes (=cores) 1 If n 100 and N procs 4, the each process takes care of 25 points. Processor 0: n 1 25 Processor 1: n Processor 2: n Processor 3: n However, what if n{n procs is NOT an integer, for example, n 100 and N procs 6? You can try n{n procs or n{ pn procs 1q. Then how do you handle remainders?.

50 29 / 48 Calculation of π using MPI Optimum load balance Basics * If n{n procs is NOT an integer, one possible optimal way is jumping as many steps as the number of processes (i.e., 6)

51 29 / 48 Calculation of π using MPI Optimum load balance Basics * If n{n procs is NOT an integer, one possible optimal way is jumping as many steps as the number of processes (i.e., 6) Processor 0: n 1, 7, 13,..., 91, 97 Processor 1: n 2, 8, 14,..., 92, 98 Processor 2: n 3, 9, 15,..., 93, 99 Processor 3: n 4, 10, 16,..., 94, 100 Processor 4: n 5, 11, 17,..., 95 Processor 5: n 6, 12, 18,..., 96 This method is generally applicable without any restriction. Assigned tasks are almost identical to each process.

52 30 / 48 Calculation of π using MPI fpi.f90 the first half Basics 1 program main 2 include mpif. h 3 double precision : : PI25DT 4 parameter ( PI25DT = d0 ) 5 double precision : : mypi, pi, h, sum, x, f, a 6 double precision : : T1, T2 7 integer : : n, myid, numprocs, i, r c 8! f u n c t i o n to i n t e g r a t e 9 f ( a ) = 4. d0 / ( 1. d0 + a a ) c a l l MPI_INIT ( i e r r ) 12 c a l l MPI_COMM_RANK( MPI_COMM_WORLD, myid, i e r r ) 13 c a l l MPI_COMM_SIZE( MPI_COMM_WORLD, numprocs, i e r r ) n = h = 1.0 d0 / dble ( n ) 17 T1 = MPI_WTIME ( ) c a l l MPI_BCAST( n, 1,MPI_INTEGER, 0,MPI_COMM_WORLD, i e r r )

53 31 / 48 INTEGER Types Calculation of π using MPI Basics 1 INTEGER*2 (non-standard) (2 Bytes = 16 bits): From to = INTEGER*4 (4 bytes = 32 bits), MPI default: From to = INTEGER*8 (8 bytes = 64 bits): From to =

54 32 / 48 MPI_WTIME() Calculation of π using MPI Wall Time T1 = MPI_WTIME()... T2 = MPI_WTIME() T2 - T1 1 Returns an elapsed wall clock time in seconds (double precision) on the calling processor. 2 Time in seconds since an arbitrary time in the past, e.g., Only the difference, T 2 T 1, makes sense.

55 33 / 48 MPI_BCAST Calculation of π using MPI Broadcast call MPI_BCAST( n, 1, MPI_INTEGER, 0, MPI_COMM_WORLD, ierr ) MPI_BCAST (buffer, count, datatype, root, comm, ierr ) broadcasts a message from the process with rank "root" to all other processes of the group.

56 33 / 48 MPI_BCAST Calculation of π using MPI Broadcast call MPI_BCAST( n, 1, MPI_INTEGER, 0, MPI_COMM_WORLD, ierr ) MPI_BCAST (buffer, count, datatype, root, comm, ierr ) broadcasts a message from the process with rank "root" to all other processes of the group. 1 buffer - name (or starting address) of buffer, i.e., variable (choice)

57 MPI_BCAST Calculation of π using MPI Broadcast call MPI_BCAST( n, 1, MPI_INTEGER, 0, MPI_COMM_WORLD, ierr ) MPI_BCAST (buffer, count, datatype, root, comm, ierr ) broadcasts a message from the process with rank "root" to all other processes of the group. 1 buffer - name (or starting address) of buffer, i.e., variable (choice) 2 count - the number of entries in buffer, i.e., how many (integer) 33 / 48

58 MPI_BCAST Calculation of π using MPI Broadcast call MPI_BCAST( n, 1, MPI_INTEGER, 0, MPI_COMM_WORLD, ierr ) MPI_BCAST (buffer, count, datatype, root, comm, ierr ) broadcasts a message from the process with rank "root" to all other processes of the group. 1 buffer - name (or starting address) of buffer, i.e., variable (choice) 2 count - the number of entries in buffer, i.e., how many (integer) 3 datatype - data type of buffer such as integer, double precision,... (handle) 33 / 48

59 MPI_BCAST Calculation of π using MPI Broadcast call MPI_BCAST( n, 1, MPI_INTEGER, 0, MPI_COMM_WORLD, ierr ) MPI_BCAST (buffer, count, datatype, root, comm, ierr ) broadcasts a message from the process with rank "root" to all other processes of the group. 1 buffer - name (or starting address) of buffer, i.e., variable (choice) 2 count - the number of entries in buffer, i.e., how many (integer) 3 datatype - data type of buffer such as integer, double precision,... (handle) 4 root - rank of broadcast root, i.e., from whom?, broadcaster (integer) 33 / 48

60 MPI_BCAST Calculation of π using MPI Broadcast call MPI_BCAST( n, 1, MPI_INTEGER, 0, MPI_COMM_WORLD, ierr ) MPI_BCAST (buffer, count, datatype, root, comm, ierr ) broadcasts a message from the process with rank "root" to all other processes of the group. 1 buffer - name (or starting address) of buffer, i.e., variable (choice) 2 count - the number of entries in buffer, i.e., how many (integer) 3 datatype - data type of buffer such as integer, double precision,... (handle) 4 root - rank of broadcast root, i.e., from whom?, broadcaster (integer) 5 comm - communicator (handle), i.e., a communication language, default is MPI_COMM_WORLD 33 / 48

61 MPI_BCAST Calculation of π using MPI Broadcast call MPI_BCAST( n, 1, MPI_INTEGER, 0, MPI_COMM_WORLD, ierr ) MPI_BCAST (buffer, count, datatype, root, comm, ierr ) broadcasts a message from the process with rank "root" to all other processes of the group. 1 buffer - name (or starting address) of buffer, i.e., variable (choice) 2 count - the number of entries in buffer, i.e., how many (integer) 3 datatype - data type of buffer such as integer, double precision,... (handle) 4 root - rank of broadcast root, i.e., from whom?, broadcaster (integer) 5 comm - communicator (handle), i.e., a communication language, default is MPI_COMM_WORLD Note: This MPI_BCAST call is only to explain how it works in the example code, which means π will be properly calculated without this call. This is because each process already knows the value of n. 33 / 48

62 34 / 48 Calculation of π using MPI Example of MPI_BCAST Broadcast program broadcast i m p l i c i t none include mpif. h integer numprocs, rank, ierr, rc integer Nbcast c a l l MPI_INIT ( i e r r ) i f ( i e r r /= MPI_SUCCESS) then p r i n t, E r r o r s t a r t i n g MPI program. Terminating. c a l l MPI_ABORT(MPI_COMM_WORLD, rc, i e r r ) end i f c a l l MPI_COMM_RANK(MPI_COMM_WORLD, rank, i e r r ) c a l l MPI_COMM_SIZE(MPI_COMM_WORLD, numprocs, i e r r ) Nbcast = 10 p r i n t, I am, rank, of, numprocs, and Nbcast =, Nbcast c a l l MPI_BARRIER(MPI_COMM_WORLD, i e r r ) i f ( rank ==0) Nbcast = 10 c a l l MPI_BCAST ( Nbcast,1,MPI_INTEGER,0,MPI_COMM_WORLD, i e r r ) p r i n t, I am, rank, of, numprocs, and Nbcast =, Nbcast c a l l MPI_FINALIZE ( i e r r ) end An MPI program is executed by each processor concurrently with given conditions which could be different from each processor, i.e., rank-specific.

63 35 / 48 Calculation of π using MPI Broadcast Outcome of MPI_BCAST example I am 0 of 6 and Nbcast = -10 I am 1 of 6 and Nbcast = -10 I am 2 of 6 and Nbcast = -10 I am 5 of 6 and Nbcast = -10 I am 3 of 6 and Nbcast = -10 I am 4 of 6 and Nbcast = -10 I am 0 of 6 and Nbcast = 10 I am 1 of 6 and Nbcast = 10 I am 2 of 6 and Nbcast = 10 I am 4 of 6 and Nbcast = 10 I am 5 of 6 and Nbcast = 10 I am 3 of 6 and Nbcast = 10 Note: Without calling MPI_BARRIER, the above messages will be highly disordered because processes compete each other to print message to stout. For each run, change the executable file name.

64 36 / 48 MPI_BARRIER Calculation of π using MPI Barrier call MPI_BARRIER( MPI_COMM_WORLD, ierr ) MPI_BARRIER (comm, ierr ) creates a barrier synchronization in a group. Each task, when reaching the MPI_Barrier call, blocks until all tasks in the group reach the same MPI_Barrier call. Let s check everybody is in the bus and then drive to the vacation place. No home alone!

65 Calculation of π using MPI MPI FORTRAN Data Types Data Types 1 MPI_CHARACTER : character(1) 2 MPI_INTEGER : integer(4) 3 MPI_REAL : real(4) 4 MPI_DOUBLE_PRECISION : double precision = real(8) 5 MPI_COMPLEX : complex 6 MPI_LOGICAL : logical 7 MPI_BYTE : 8 binary digits 8 MPI_PACKED : data packed or unpacked with MPI_Pack()/ MPI_Unpack Note: C/C++ do not have complex and logical data-types. 37 / 48

66 Calculation of π using MPI fpi.f90 second half 38 / 48 1 sum = 0.0 d0 2 do i = myid +1, n, numprocs 3 x = h ( dble ( i ) 0.5 d0 ) 4 sum = sum + f ( x ) 5 enddo 6 mypi = h sum 7 8 c a l l MPI_REDUCE ( mypi, pi, 1, MPI_DOUBLE_PRECISION, MPI_SUM, 0,MPI_COMM_WORLD 9 T2 = MPI_WTIME ( ) 10 i f ( myid == 0) then 11 write (, " ( p i i s approximately :, F18.16 ) " ) p i 12 write (, " ( E r r o r i s :, F18.16 ) " ) abs ( p i PI25DT ) 13 write (, ) " S t a r t time = ", T1 14 write (, ) " End time = ", T2 15 write (, ) " Elapsed time = ", T2 T1 16 write (, ) " The number of processes = ", numprocs 17 endif 18 c a l l MPI_FINALIZE ( rc ) 19 stop 20 end

67 39 / 48 MPI_REDUCE Calculation of π using MPI Reduce call MPI_REDUCE (mypi, pi, 1, MPI_DOUBLE_PRECISION, MPI_SUM, 0, MPI_COMM_WORLD, ierr) MPI_REDUCE (sendbuf, recvbuf, count, datatype, op, root, comm, ierr) reduces values on all processes to a single value.

68 39 / 48 MPI_REDUCE Calculation of π using MPI Reduce call MPI_REDUCE (mypi, pi, 1, MPI_DOUBLE_PRECISION, MPI_SUM, 0, MPI_COMM_WORLD, ierr) MPI_REDUCE (sendbuf, recvbuf, count, datatype, op, root, comm, ierr) reduces values on all processes to a single value. 1 sendbuf - address of send buffer by each processor, i.e., variable (choice)

69 39 / 48 MPI_REDUCE Calculation of π using MPI Reduce call MPI_REDUCE (mypi, pi, 1, MPI_DOUBLE_PRECISION, MPI_SUM, 0, MPI_COMM_WORLD, ierr) MPI_REDUCE (sendbuf, recvbuf, count, datatype, op, root, comm, ierr) reduces values on all processes to a single value. 1 sendbuf - address of send buffer by each processor, i.e., variable (choice) 2 recvbuf - address of receive buffer by reducer, i.e., a different variable (choice)

70 39 / 48 MPI_REDUCE Calculation of π using MPI Reduce call MPI_REDUCE (mypi, pi, 1, MPI_DOUBLE_PRECISION, MPI_SUM, 0, MPI_COMM_WORLD, ierr) MPI_REDUCE (sendbuf, recvbuf, count, datatype, op, root, comm, ierr) reduces values on all processes to a single value. 1 sendbuf - address of send buffer by each processor, i.e., variable (choice) 2 recvbuf - address of receive buffer by reducer, i.e., a different variable (choice) 3 count - number of elements in send buffer, i.e., how many (integer)

71 39 / 48 MPI_REDUCE Calculation of π using MPI Reduce call MPI_REDUCE (mypi, pi, 1, MPI_DOUBLE_PRECISION, MPI_SUM, 0, MPI_COMM_WORLD, ierr) MPI_REDUCE (sendbuf, recvbuf, count, datatype, op, root, comm, ierr) reduces values on all processes to a single value. 1 sendbuf - address of send buffer by each processor, i.e., variable (choice) 2 recvbuf - address of receive buffer by reducer, i.e., a different variable (choice) 3 count - number of elements in send buffer, i.e., how many (integer) 4 datatype - data type of elements of send buffer such as integer, real,... (handle)

72 39 / 48 MPI_REDUCE Calculation of π using MPI Reduce call MPI_REDUCE (mypi, pi, 1, MPI_DOUBLE_PRECISION, MPI_SUM, 0, MPI_COMM_WORLD, ierr) MPI_REDUCE (sendbuf, recvbuf, count, datatype, op, root, comm, ierr) reduces values on all processes to a single value. 1 sendbuf - address of send buffer by each processor, i.e., variable (choice) 2 recvbuf - address of receive buffer by reducer, i.e., a different variable (choice) 3 count - number of elements in send buffer, i.e., how many (integer) 4 datatype - data type of elements of send buffer such as integer, real,... (handle) 5 op - reducing aritematic operation (handle)

73 39 / 48 MPI_REDUCE Calculation of π using MPI Reduce call MPI_REDUCE (mypi, pi, 1, MPI_DOUBLE_PRECISION, MPI_SUM, 0, MPI_COMM_WORLD, ierr) MPI_REDUCE (sendbuf, recvbuf, count, datatype, op, root, comm, ierr) reduces values on all processes to a single value. 1 sendbuf - address of send buffer by each processor, i.e., variable (choice) 2 recvbuf - address of receive buffer by reducer, i.e., a different variable (choice) 3 count - number of elements in send buffer, i.e., how many (integer) 4 datatype - data type of elements of send buffer such as integer, real,... (handle) 5 op - reducing aritematic operation (handle) 6 root - rank of root process, i.e., from whom?, broadcaster (integer)

74 39 / 48 MPI_REDUCE Calculation of π using MPI Reduce call MPI_REDUCE (mypi, pi, 1, MPI_DOUBLE_PRECISION, MPI_SUM, 0, MPI_COMM_WORLD, ierr) MPI_REDUCE (sendbuf, recvbuf, count, datatype, op, root, comm, ierr) reduces values on all processes to a single value. 1 sendbuf - address of send buffer by each processor, i.e., variable (choice) 2 recvbuf - address of receive buffer by reducer, i.e., a different variable (choice) 3 count - number of elements in send buffer, i.e., how many (integer) 4 datatype - data type of elements of send buffer such as integer, real,... (handle) 5 op - reducing aritematic operation (handle) 6 root - rank of root process, i.e., from whom?, broadcaster (integer) 7 comm - communicator, i.e., a communication language, default is MPI_COMM_WORLD (handle)

75 40 / 48 Calculation of π using MPI Example of MPI_REDUCE Reduce 1 program broadcast 2 i m p l i c i t none 3 include mpif. h 4 integer : : numprocs, rank, i e r r, rc 5 integer : : Ireduced, Nreduced 6 c a l l MPI_INIT ( i e r r ) 7 i f ( i e r r /= MPI_SUCCESS) then 8 p r i n t, E r r o r s t a r t i n g MPI program. Terminating. 9 c a l l MPI_ABORT(MPI_COMM_WORLD, rc, i e r r ) 10 end i f c a l l MPI_COMM_RANK(MPI_COMM_WORLD, rank, i e r r ) 13 c a l l MPI_COMM_SIZE(MPI_COMM_WORLD, numprocs, i e r r ) 14 Ireduced = 1 + rank 15 Nreduced = 0 16 p r i n t, I am, rank, of, numprocs, and Ireduced =, Ireduced 17 c a l l MPI_BARRIER(MPI_COMM_WORLD, i e r r ) 18 c a l l MPI_REDUCE & 19 ( Ireduced, Nreduced, 1, MPI_INTEGER, MPI_SUM, 0, MPI_COMM_WORLD, i e r r ) 20 i f ( rank ==0) & 21 p r i n t, I am, rank, of, numprocs, and Nreduced =, Nreduced 22 c a l l MPI_FINALIZE ( i e r r ) 23 end

76 41 / 48 Calculation of π using MPI Reduce Outcome of MPI_REDUCE example I am 0 of 6 and Ireduced = 1 I am 1 of 6 and Ireduced = 2 I am 2 of 6 and Ireduced = 3 I am 3 of 6 and Ireduced = 4 I am 4 of 6 and Ireduced = 5 I am 5 of 6 and Ireduced = 6 I am 0 of 6 and Nreduced = Note that rank = Ireduced -1. Without calling MPI_BARRIER, the final answer of 21 can be positioned anywhere.

77 42 / 48 MPI Operations Calculation of π using MPI Operation Types 1 MPI_MAX maximum (integer, real, complex )

78 42 / 48 MPI Operations Calculation of π using MPI Operation Types 1 MPI_MAX maximum (integer, real, complex ) 2 MPI_MIN minimum (integer, real, complex)

79 42 / 48 MPI Operations Calculation of π using MPI Operation Types 1 MPI_MAX maximum (integer, real, complex ) 2 MPI_MIN minimum (integer, real, complex) 3 MPI_SUM sum (integer, real, complex)

80 42 / 48 MPI Operations Calculation of π using MPI Operation Types 1 MPI_MAX maximum (integer, real, complex ) 2 MPI_MIN minimum (integer, real, complex) 3 MPI_SUM sum (integer, real, complex) 4 MPI_PROD product (integer, real, complex)

81 42 / 48 MPI Operations Calculation of π using MPI Operation Types 1 MPI_MAX maximum (integer, real, complex ) 2 MPI_MIN minimum (integer, real, complex) 3 MPI_SUM sum (integer, real, complex) 4 MPI_PROD product (integer, real, complex) 5 MPI_LAND logical AND (logical)

82 42 / 48 MPI Operations Calculation of π using MPI Operation Types 1 MPI_MAX maximum (integer, real, complex ) 2 MPI_MIN minimum (integer, real, complex) 3 MPI_SUM sum (integer, real, complex) 4 MPI_PROD product (integer, real, complex) 5 MPI_LAND logical AND (logical) 6 MPI_LOR logical OR (logical)

83 42 / 48 MPI Operations Calculation of π using MPI Operation Types 1 MPI_MAX maximum (integer, real, complex ) 2 MPI_MIN minimum (integer, real, complex) 3 MPI_SUM sum (integer, real, complex) 4 MPI_PROD product (integer, real, complex) 5 MPI_LAND logical AND (logical) 6 MPI_LOR logical OR (logical) 7 MPI_LXOR logical XOR (logical)

84 42 / 48 MPI Operations Calculation of π using MPI Operation Types 1 MPI_MAX maximum (integer, real, complex ) 2 MPI_MIN minimum (integer, real, complex) 3 MPI_SUM sum (integer, real, complex) 4 MPI_PROD product (integer, real, complex) 5 MPI_LAND logical AND (logical) 6 MPI_LOR logical OR (logical) 7 MPI_LXOR logical XOR (logical) 8 MPI_BAND bit-wise AND (integer, MPI_BYTE) 9 MPI_BOR bit-wise OR (integer, MPI_BYTE) 10 MPI_BXOR bit-wise XOR (integer, MPI_BYTE) 11 MPI_MAXLOC max value and location (real, complex, double precision ) 12 MPI_MINLOC min value and location (real, complex, double precision )

85 43 / 48 Calculation of π using MPI Resources and References Resources 1 Gropp et al. Using MPI: Portable parallel programming with the Message-Passing Interface. MIT Press (1997)

86 44 / 48 Makefile Calculation of π using MPI Resources mpif90 =/ opt / openmpi i n t e l / bin / mpif90 l i m f foropt= diag disable 8290 diag disable 8291 diag disable mpirun =/ opt / openmpi i n t e l / bin / mpirun mpiopt= mca b t l tcp, s e l f mca b t l _ t c p _ i f _ i n c l u d e eth0 s r c r o o t = f p i s r c f i l e =$ ( s r c r o o t ). f90 e x e f i l e =$ ( s r c r o o t ). x numprocs=6 a l l : srun : prun : e d i t : $ ( mpif90 ) $ ( f o r o p t ) $ ( s r c f i l e ) o $ ( e x e f i l e ) time $ ( mpirun ) mca b t l tcp, s e l f np 1. / $ ( e x e f i l e ) time $ ( mpirun ) $ ( mpiopt ) h o s t f i l e. / mpihosts np $ ( numprocs ). / $ ( e x e f i l e ) vim $ ( s r c f i l e )

87 45 / 48 script file Calculation of π using MPI Resources We will discuss this later.

88 46 / 48 Calculation of π using MPI Job output file part Resources p i i s approximately : E r r o r i s : S t a r t time = End time = Elapsed time = The number o f processes = user 0.04 system 0:01.84 elapsed 6%CPU (0 avgtext +0avgdata m 0 i n p u t s +0 outputs (0 major+4125minor ) p a g e f a u l t s 0swaps

89 47 / 48 Lab work Calculation of π using MPI 1 MPI codes are stored at /opt/cee618s13/class04/ 2 Copy examples under subdirectories of MPI-basic, MPI-bcase, MPI-reduce and MPI-pi and study them. 3 Type/enter make followed by make prun. 4 Start your homework.

90 Calculation of π using MPI Speed up test 48 / 48 1 Change the number of processes of MPI-pi application from 1 to 8 and see how much speed up is acheieved.

91 Calculation of π using MPI Speed up test 48 / 48 1 Change the number of processes of MPI-pi application from 1 to 8 and see how much speed up is acheieved.

92 Calculation of π using MPI Speed up test 48 / 48 1 Change the number of processes of MPI-pi application from 1 to 8 and see how much speed up is acheieved.

93 Calculation of π using MPI Speed up test 48 / 48 1 Change the number of processes of MPI-pi application from 1 to 8 and see how much speed up is acheieved.

94 Calculation of π using MPI Speed up test 48 / 48 1 Change the number of processes of MPI-pi application from 1 to 8 and see how much speed up is acheieved.

The Message Passing Interface (MPI) TMA4280 Introduction to Supercomputing

The Message Passing Interface (MPI) TMA4280 Introduction to Supercomputing The Message Passing Interface (MPI) TMA4280 Introduction to Supercomputing NTNU, IMF February 9. 2018 1 Recap: Parallelism with MPI An MPI execution is started on a set of processes P with: mpirun -n N

More information

CEE 618 Scientific Parallel Computing (Lecture 7): OpenMP (con td) and Matrix Multiplication

CEE 618 Scientific Parallel Computing (Lecture 7): OpenMP (con td) and Matrix Multiplication 1 / 26 CEE 618 Scientific Parallel Computing (Lecture 7): OpenMP (con td) and Matrix Multiplication Albert S. Kim Department of Civil and Environmental Engineering University of Hawai i at Manoa 2540 Dole

More information

2.10 Packing. 59 MPI 2.10 Packing MPI_PACK( INBUF, INCOUNT, DATATYPE, OUTBUF, OUTSIZE, POSITION, COMM, IERR )

2.10 Packing. 59 MPI 2.10 Packing MPI_PACK( INBUF, INCOUNT, DATATYPE, OUTBUF, OUTSIZE, POSITION, COMM, IERR ) 59 MPI 2.10 Packing 2.10 Packing MPI_PACK( INBUF, INCOUNT, DATATYPE, OUTBUF, OUTSIZE, POSITION, COMM, IERR ) : : INBUF ( ), OUTBUF( ) INTEGER : : INCOUNT, DATATYPE, OUTSIZE, POSITION, COMM, IERR

More information

Lecture 24 - MPI ECE 459: Programming for Performance

Lecture 24 - MPI ECE 459: Programming for Performance ECE 459: Programming for Performance Jon Eyolfson March 9, 2012 What is MPI? Messaging Passing Interface A language-independent communation protocol for parallel computers Run the same code on a number

More information

( ) ? ( ) ? ? ? l RB-H

( ) ? ( ) ? ? ? l RB-H 1 2018 4 17 2 30 4 17 17 8 1 1. 4 10 ( ) 2. 4 17 l 3. 4 24 l 4. 5 1 l ( ) 5. 5 8 l 2 6. 5 15 l - 2018 8 6 24 7. 5 22 l 8. 6 5 l - 9. 6 12 l 10. 6 19? l l 11. 7 3? l 12. 7 10? l 13. 7 17? l RB-H 3 4 5 6

More information

Antonio Falabella. 3 rd nternational Summer School on INtelligent Signal Processing for FrontIEr Research and Industry, September 2015, Hamburg

Antonio Falabella. 3 rd nternational Summer School on INtelligent Signal Processing for FrontIEr Research and Industry, September 2015, Hamburg INFN - CNAF (Bologna) 3 rd nternational Summer School on INtelligent Signal Processing for FrontIEr Research and Industry, 14-25 September 2015, Hamburg 1 / 44 Overview 1 2 3 4 5 2 / 44 to Computing The

More information

2018/10/

2018/10/ 1 2018 10 2 2 30 10 2 17 8 1 1. 9 25 2. 10 2 l 3. 10 9 l 4. 10 16 l 5. 10 23 l 2 6. 10 30 l - 2019 2 4 24 7. 11 6 l 8. 11 20 l - 9. 11 27 l 10. 12 4 l l 11. 12 11 l 12. 12 18?? l 13. 1 8 l RB-H 3 4 5 6

More information

Introductory MPI June 2008

Introductory MPI June 2008 7: http://people.sc.fsu.edu/ jburkardt/presentations/ fdi 2008 lecture7.pdf... John Information Technology Department Virginia Tech... FDI Summer Track V: Parallel Programming 10-12 June 2008 MPI: Why,

More information

The Sieve of Erastothenes

The Sieve of Erastothenes The Sieve of Erastothenes Parallel and Distributed Computing Department of Computer Science and Engineering (DEI) Instituto Superior Técnico October 25, 2010 José Monteiro (DEI / IST) Parallel and Distributed

More information

A quick introduction to MPI (Message Passing Interface)

A quick introduction to MPI (Message Passing Interface) A quick introduction to MPI (Message Passing Interface) M1IF - APPD Oguz Kaya Pierre Pradic École Normale Supérieure de Lyon, France 1 / 34 Oguz Kaya, Pierre Pradic M1IF - Presentation MPI Introduction

More information

Distributed Memory Programming With MPI

Distributed Memory Programming With MPI John Interdisciplinary Center for Applied Mathematics & Information Technology Department Virginia Tech... Applied Computational Science II Department of Scientific Computing Florida State University http://people.sc.fsu.edu/

More information

The equation does not have a closed form solution, but we see that it changes sign in [-1,0]

The equation does not have a closed form solution, but we see that it changes sign in [-1,0] Numerical methods Introduction Let f(x)=e x -x 2 +x. When f(x)=0? The equation does not have a closed form solution, but we see that it changes sign in [-1,0] Because f(-0.5)=-0.1435, the root is in [-0.5,0]

More information

Parallelization of the Dirac operator. Pushan Majumdar. Indian Association for the Cultivation of Sciences, Jadavpur, Kolkata

Parallelization of the Dirac operator. Pushan Majumdar. Indian Association for the Cultivation of Sciences, Jadavpur, Kolkata Parallelization of the Dirac operator Pushan Majumdar Indian Association for the Cultivation of Sciences, Jadavpur, Kolkata Outline Introduction Algorithms Parallelization Comparison of performances Conclusions

More information

CEE 618 Scientific Parallel Computing (Lecture 3)

CEE 618 Scientific Parallel Computing (Lecture 3) 1 / 36 CEE 618 Scientific Parallel Computing (Lecture 3) Linear Algebra Basics using LAPACK Albert S. Kim Department of Civil and Environmental Engineering University of Hawai i at Manoa 2540 Dole Street,

More information

Parallelization of the QC-lib Quantum Computer Simulator Library

Parallelization of the QC-lib Quantum Computer Simulator Library Parallelization of the QC-lib Quantum Computer Simulator Library Ian Glendinning and Bernhard Ömer September 9, 23 PPAM 23 1 Ian Glendinning / September 9, 23 Outline Introduction Quantum Bits, Registers

More information

Introduction to MPI. School of Computational Science, 21 October 2005

Introduction to MPI. School of Computational Science, 21 October 2005 Introduction to MPI John Burkardt School of Computational Science Florida State University... http://people.sc.fsu.edu/ jburkardt/presentations/ mpi 2005 fsu.pdf School of Computational Science, 21 October

More information

Lecture 4. Writing parallel programs with MPI Measuring performance

Lecture 4. Writing parallel programs with MPI Measuring performance Lecture 4 Writing parallel programs with MPI Measuring performance Announcements Wednesday s office hour moved to 1.30 A new version of Ring (Ring_new) that handles linear sequences of message lengths

More information

High-performance processing and development with Madagascar. July 24, 2010 Madagascar development team

High-performance processing and development with Madagascar. July 24, 2010 Madagascar development team High-performance processing and development with Madagascar July 24, 2010 Madagascar development team Outline 1 HPC terminology and frameworks 2 Utilizing data parallelism 3 HPC development with Madagascar

More information

Trivially parallel computing

Trivially parallel computing Parallel Computing After briefly discussing the often neglected, but in praxis frequently encountered, issue of trivially parallel computing, we turn to parallel computing with information exchange. Our

More information

Inverse problems. High-order optimization and parallel computing. Lecture 7

Inverse problems. High-order optimization and parallel computing. Lecture 7 Inverse problems High-order optimization and parallel computing Nikolai Piskunov 2014 Lecture 7 Non-linear least square fit The (conjugate) gradient search has one important problem which often occurs

More information

Parallelization of the QC-lib Quantum Computer Simulator Library

Parallelization of the QC-lib Quantum Computer Simulator Library Parallelization of the QC-lib Quantum Computer Simulator Library Ian Glendinning and Bernhard Ömer VCPC European Centre for Parallel Computing at Vienna Liechtensteinstraße 22, A-19 Vienna, Austria http://www.vcpc.univie.ac.at/qc/

More information

Administrivia. Course Objectives. Overview. Lecture Notes Week markem/cs333/ 2. Staff. 3. Prerequisites. 4. Grading. 1. Theory and application

Administrivia. Course Objectives. Overview. Lecture Notes Week markem/cs333/ 2. Staff. 3. Prerequisites. 4. Grading. 1. Theory and application Administrivia 1. markem/cs333/ 2. Staff 3. Prerequisites 4. Grading Course Objectives 1. Theory and application 2. Benefits 3. Labs TAs Overview 1. What is a computer system? CPU PC ALU System bus Memory

More information

Digital Systems EEE4084F. [30 marks]

Digital Systems EEE4084F. [30 marks] Digital Systems EEE4084F [30 marks] Practical 3: Simulation of Planet Vogela with its Moon and Vogel Spiral Star Formation using OpenGL, OpenMP, and MPI Introduction The objective of this assignment is

More information

Quantum Chemical Calculations by Parallel Computer from Commodity PC Components

Quantum Chemical Calculations by Parallel Computer from Commodity PC Components Nonlinear Analysis: Modelling and Control, 2007, Vol. 12, No. 4, 461 468 Quantum Chemical Calculations by Parallel Computer from Commodity PC Components S. Bekešienė 1, S. Sėrikovienė 2 1 Institute of

More information

Quantum ESPRESSO Performance Benchmark and Profiling. February 2017

Quantum ESPRESSO Performance Benchmark and Profiling. February 2017 Quantum ESPRESSO Performance Benchmark and Profiling February 2017 2 Note The following research was performed under the HPC Advisory Council activities Compute resource - HPC Advisory Council Cluster

More information

1 / 28. Parallel Programming.

1 / 28. Parallel Programming. 1 / 28 Parallel Programming pauldj@aices.rwth-aachen.de Collective Communication 2 / 28 Barrier Broadcast Reduce Scatter Gather Allgather Reduce-scatter Allreduce Alltoall. References Collective Communication:

More information

Outline. policies for the first part. with some potential answers... MCS 260 Lecture 10.0 Introduction to Computer Science Jan Verschelde, 9 July 2014

Outline. policies for the first part. with some potential answers... MCS 260 Lecture 10.0 Introduction to Computer Science Jan Verschelde, 9 July 2014 Outline 1 midterm exam on Friday 11 July 2014 policies for the first part 2 questions with some potential answers... MCS 260 Lecture 10.0 Introduction to Computer Science Jan Verschelde, 9 July 2014 Intro

More information

Kurt Schmidt. June 5, 2017

Kurt Schmidt. June 5, 2017 Dept. of Computer Science, Drexel University June 5, 2017 Examples are taken from Kernighan & Pike, The Practice of ming, Addison-Wesley, 1999 Objective: To learn when and how to optimize the performance

More information

CEE 618 Scientific Parallel Computing (Lecture 12)

CEE 618 Scientific Parallel Computing (Lecture 12) 1 / 26 CEE 618 Scientific Parallel Computing (Lecture 12) Dissipative Hydrodynamics (DHD) Albert S. Kim Department of Civil and Environmental Engineering University of Hawai i at Manoa 2540 Dole Street,

More information

Weather Research and Forecasting (WRF) Performance Benchmark and Profiling. July 2012

Weather Research and Forecasting (WRF) Performance Benchmark and Profiling. July 2012 Weather Research and Forecasting (WRF) Performance Benchmark and Profiling July 2012 Note The following research was performed under the HPC Advisory Council activities Participating vendors: Intel, Dell,

More information

Some notes on efficient computing and setting up high performance computing environments

Some notes on efficient computing and setting up high performance computing environments Some notes on efficient computing and setting up high performance computing environments Andrew O. Finley Department of Forestry, Michigan State University, Lansing, Michigan. April 17, 2017 1 Efficient

More information

Modelling and implementation of algorithms in applied mathematics using MPI

Modelling and implementation of algorithms in applied mathematics using MPI Modelling and implementation of algorithms in applied mathematics using MPI Lecture 3: Linear Systems: Simple Iterative Methods and their parallelization, Programming MPI G. Rapin Brazil March 2011 Outline

More information

Parallelization of the Molecular Orbital Program MOS-F

Parallelization of the Molecular Orbital Program MOS-F Parallelization of the Molecular Orbital Program MOS-F Akira Asato, Satoshi Onodera, Yoshie Inada, Elena Akhmatskaya, Ross Nobes, Azuma Matsuura, Atsuya Takahashi November 2003 Fujitsu Laboratories of

More information

A Data Communication Reliability and Trustability Study for Cluster Computing

A Data Communication Reliability and Trustability Study for Cluster Computing A Data Communication Reliability and Trustability Study for Cluster Computing Speaker: Eduardo Colmenares Midwestern State University Wichita Falls, TX HPC Introduction Relevant to a variety of sciences,

More information

CRYSTAL in parallel: replicated and distributed (MPP) data. Why parallel?

CRYSTAL in parallel: replicated and distributed (MPP) data. Why parallel? CRYSTAL in parallel: replicated and distributed (MPP) data Roberto Orlando Dipartimento di Chimica Università di Torino Via Pietro Giuria 5, 10125 Torino (Italy) roberto.orlando@unito.it 1 Why parallel?

More information

4th year Project demo presentation

4th year Project demo presentation 4th year Project demo presentation Colm Ó héigeartaigh CASE4-99387212 coheig-case4@computing.dcu.ie 4th year Project demo presentation p. 1/23 Table of Contents An Introduction to Quantum Computing The

More information

Cluster Computing: Updraft. Charles Reid Scientific Computing Summer Workshop June 29, 2010

Cluster Computing: Updraft. Charles Reid Scientific Computing Summer Workshop June 29, 2010 Cluster Computing: Updraft Charles Reid Scientific Computing Summer Workshop June 29, 2010 Updraft Cluster: Hardware 256 Dual Quad-Core Nodes 2048 Cores 2.8 GHz Intel Xeon Processors 16 GB memory per

More information

Che-Wei Chang Department of Computer Science and Information Engineering, Chang Gung University

Che-Wei Chang Department of Computer Science and Information Engineering, Chang Gung University Che-Wei Chang chewei@mail.cgu.edu.tw Department of Computer Science and Information Engineering, Chang Gung University } 2017/11/15 Midterm } 2017/11/22 Final Project Announcement 2 1. Introduction 2.

More information

Symmetric Pivoting in ScaLAPACK Craig Lucas University of Manchester Cray User Group 8 May 2006, Lugano

Symmetric Pivoting in ScaLAPACK Craig Lucas University of Manchester Cray User Group 8 May 2006, Lugano Symmetric Pivoting in ScaLAPACK Craig Lucas University of Manchester Cray User Group 8 May 2006, Lugano Introduction Introduction We wanted to parallelize a serial algorithm for the pivoted Cholesky factorization

More information

ECEN 449: Microprocessor System Design Department of Electrical and Computer Engineering Texas A&M University

ECEN 449: Microprocessor System Design Department of Electrical and Computer Engineering Texas A&M University ECEN 449: Microprocessor System Design Department of Electrical and Computer Engineering Texas A&M University Prof. Sunil P Khatri (Lab exercise created and tested by Ramu Endluri, He Zhou and Sunil P

More information

ECEN 651: Microprogrammed Control of Digital Systems Department of Electrical and Computer Engineering Texas A&M University

ECEN 651: Microprogrammed Control of Digital Systems Department of Electrical and Computer Engineering Texas A&M University ECEN 651: Microprogrammed Control of Digital Systems Department of Electrical and Computer Engineering Texas A&M University Prof. Mi Lu TA: Ehsan Rohani Laboratory Exercise #4 MIPS Assembly and Simulation

More information

CprE 281: Digital Logic

CprE 281: Digital Logic CprE 28: Digital Logic Instructor: Alexander Stoytchev http://www.ece.iastate.edu/~alexs/classes/ Simple Processor CprE 28: Digital Logic Iowa State University, Ames, IA Copyright Alexander Stoytchev Digital

More information

ww.padasalai.net

ww.padasalai.net t w w ADHITHYA TRB- TET COACHING CENTRE KANCHIPURAM SUNDER MATRIC SCHOOL - 9786851468 TEST - 2 COMPUTER SCIENC PG - TRB DATE : 17. 03. 2019 t et t et t t t t UNIT 1 COMPUTER SYSTEM ARCHITECTURE t t t t

More information

Socket Programming. Daniel Zappala. CS 360 Internet Programming Brigham Young University

Socket Programming. Daniel Zappala. CS 360 Internet Programming Brigham Young University Socket Programming Daniel Zappala CS 360 Internet Programming Brigham Young University Sockets, Addresses, Ports Clients and Servers 3/33 clients request a service from a server using a protocol need an

More information

Efficient algorithms for symmetric tensor contractions

Efficient algorithms for symmetric tensor contractions Efficient algorithms for symmetric tensor contractions Edgar Solomonik 1 Department of EECS, UC Berkeley Oct 22, 2013 1 / 42 Edgar Solomonik Symmetric tensor contractions 1/ 42 Motivation The goal is to

More information

11 Parallel programming models

11 Parallel programming models 237 // Program Design 10.3 Assessing parallel programs 11 Parallel programming models Many different models for expressing parallelism in programming languages Actor model Erlang Scala Coordination languages

More information

Distributed Memory Parallelization in NGSolve

Distributed Memory Parallelization in NGSolve Distributed Memory Parallelization in NGSolve Lukas Kogler June, 2017 Inst. for Analysis and Scientific Computing, TU Wien From Shared to Distributed Memory Shared Memory Parallelization via threads (

More information

Factoring large integers using parallel Quadratic Sieve

Factoring large integers using parallel Quadratic Sieve Factoring large integers using parallel Quadratic Sieve Olof Åsbrink d95-oas@nada.kth.se Joel Brynielsson joel@nada.kth.se 14th April 2 Abstract Integer factorization is a well studied topic. Parts of

More information

Digital Electronics Part 1: Binary Logic

Digital Electronics Part 1: Binary Logic Digital Electronics Part 1: Binary Logic Electronic devices in your everyday life What makes these products examples of electronic devices? What are some things they have in common? 2 How do electronics

More information

COMPUTER SCIENCE TRIPOS

COMPUTER SCIENCE TRIPOS CST.2016.2.1 COMPUTER SCIENCE TRIPOS Part IA Tuesday 31 May 2016 1.30 to 4.30 COMPUTER SCIENCE Paper 2 Answer one question from each of Sections A, B and C, and two questions from Section D. Submit the

More information

Parallel Program Performance Analysis

Parallel Program Performance Analysis Parallel Program Performance Analysis Chris Kauffman CS 499: Spring 2016 GMU Logistics Today Final details of HW2 interviews HW2 timings HW2 Questions Parallel Performance Theory Special Office Hours Mon

More information

FORCE ENERGY. only if F = F(r). 1 Nano-scale (10 9 m) 2 Nano to Micro-scale (10 6 m) 3 Nano to Meso-scale (10 3 m)

FORCE ENERGY. only if F = F(r). 1 Nano-scale (10 9 m) 2 Nano to Micro-scale (10 6 m) 3 Nano to Meso-scale (10 3 m) What is the force? CEE 618 Scientific Parallel Computing (Lecture 12) Dissipative Hydrodynamics (DHD) Albert S. Kim Department of Civil and Environmental Engineering University of Hawai i at Manoa 2540

More information

Discrete-event simulations

Discrete-event simulations Discrete-event simulations Lecturer: Dmitri A. Moltchanov E-mail: moltchan@cs.tut.fi http://www.cs.tut.fi/kurssit/elt-53606/ OUTLINE: Why do we need simulations? Step-by-step simulations; Classifications;

More information

CS Data Structures and Algorithm Analysis

CS Data Structures and Algorithm Analysis CS 483 - Data Structures and Algorithm Analysis Lecture II: Chapter 2 R. Paul Wiegand George Mason University, Department of Computer Science February 1, 2006 Outline 1 Analysis Framework 2 Asymptotic

More information

NV-DVR09NET NV-DVR016NET

NV-DVR09NET NV-DVR016NET NV-DVR09NET NV-DVR016NET !,.,. :,.!,,.,!,,, CMOS/MOSFET. : 89/336/EEC, 93/68/EEC, 72/23/EEC,.,,. Novus Security Sp z o.o... 4 1. NV-DVR09NET NV-DVR016NET. 2.,. 3.,... 4... ( ) /. 5..... 6.,,.,. 7.,.. 8.,,.

More information

NovaToast SmartVision Project Requirements

NovaToast SmartVision Project Requirements NovaToast SmartVision Project Requirements Jacob Anderson William Chen Christopher Kim Jonathan Simozar Brian Wan Revision History v1.0: Initial creation of the document and first draft. v1.1 (2/9): Added

More information

WRF performance tuning for the Intel Woodcrest Processor

WRF performance tuning for the Intel Woodcrest Processor WRF performance tuning for the Intel Woodcrest Processor A. Semenov, T. Kashevarova, P. Mankevich, D. Shkurko, K. Arturov, N. Panov Intel Corp., pr. ak. Lavrentieva 6/1, Novosibirsk, Russia, 630090 {alexander.l.semenov,tamara.p.kashevarova,pavel.v.mankevich,

More information

CS1800: Hex & Logic. Professor Kevin Gold

CS1800: Hex & Logic. Professor Kevin Gold CS1800: Hex & Logic Professor Kevin Gold Reviewing Last Time: Binary Last time, we saw that arbitrary numbers can be represented in binary. Each place in a binary number stands for a different power of

More information

Motors Automation Energy Transmission & Distribution Coatings. Servo Drive SCA06 V1.5X. Addendum to the Programming Manual SCA06 V1.

Motors Automation Energy Transmission & Distribution Coatings. Servo Drive SCA06 V1.5X. Addendum to the Programming Manual SCA06 V1. Motors Automation Energy Transmission & Distribution Coatings Servo Drive SCA06 V1.5X SCA06 V1.4X Series: SCA06 Language: English Document Number: 10003604017 / 01 Software Version: V1.5X Publication Date:

More information

CS425: Algorithms for Web Scale Data

CS425: Algorithms for Web Scale Data CS425: Algorithms for Web Scale Data Most of the slides are from the Mining of Massive Datasets book. These slides have been modified for CS425. The original slides can be accessed at: www.mmds.org Challenges

More information

Porting a sphere optimization program from LAPACK to ScaLAPACK

Porting a sphere optimization program from LAPACK to ScaLAPACK Porting a sphere optimization program from LAPACK to ScaLAPACK Mathematical Sciences Institute, Australian National University. For presentation at Computational Techniques and Applications Conference

More information

TitriSoft 2.5. Content

TitriSoft 2.5. Content Content TitriSoft 2.5... 1 Content... 2 General Remarks... 3 Requirements of TitriSoft 2.5... 4 Installation... 5 General Strategy... 7 Hardware Center... 10 Method Center... 13 Titration Center... 28

More information

2.6 Complexity Theory for Map-Reduce. Star Joins 2.6. COMPLEXITY THEORY FOR MAP-REDUCE 51

2.6 Complexity Theory for Map-Reduce. Star Joins 2.6. COMPLEXITY THEORY FOR MAP-REDUCE 51 2.6. COMPLEXITY THEORY FOR MAP-REDUCE 51 Star Joins A common structure for data mining of commercial data is the star join. For example, a chain store like Walmart keeps a fact table whose tuples each

More information

Comp 11 Lectures. Mike Shah. July 26, Tufts University. Mike Shah (Tufts University) Comp 11 Lectures July 26, / 40

Comp 11 Lectures. Mike Shah. July 26, Tufts University. Mike Shah (Tufts University) Comp 11 Lectures July 26, / 40 Comp 11 Lectures Mike Shah Tufts University July 26, 2017 Mike Shah (Tufts University) Comp 11 Lectures July 26, 2017 1 / 40 Please do not distribute or host these slides without prior permission. Mike

More information

The Analysis of Microburst (Burstiness) on Virtual Switch

The Analysis of Microburst (Burstiness) on Virtual Switch The Analysis of Microburst (Burstiness) on Virtual Switch Chunghan Lee Fujitsu Laboratories 09.19.2016 Copyright 2016 FUJITSU LABORATORIES LIMITED Background What is Network Function Virtualization (NFV)?

More information

EECS150 - Digital Design Lecture 23 - FFs revisited, FIFOs, ECCs, LSFRs. Cross-coupled NOR gates

EECS150 - Digital Design Lecture 23 - FFs revisited, FIFOs, ECCs, LSFRs. Cross-coupled NOR gates EECS150 - Digital Design Lecture 23 - FFs revisited, FIFOs, ECCs, LSFRs April 16, 2009 John Wawrzynek Spring 2009 EECS150 - Lec24-blocks Page 1 Cross-coupled NOR gates remember, If both R=0 & S=0, then

More information

Databases through Python-Flask and MariaDB

Databases through Python-Flask and MariaDB 1 Databases through Python-Flask and MariaDB Tanmay Agarwal, Durga Keerthi and G V V Sharma Contents 1 Python-flask 1 1.1 Installation.......... 1 1.2 Testing Flask......... 1 2 Mariadb 1 2.1 Software

More information

Recent Progress of Parallel SAMCEF with MUMPS MUMPS User Group Meeting 2013

Recent Progress of Parallel SAMCEF with MUMPS MUMPS User Group Meeting 2013 Recent Progress of Parallel SAMCEF with User Group Meeting 213 Jean-Pierre Delsemme Product Development Manager Summary SAMCEF, a brief history Co-simulation, a good candidate for parallel processing MAAXIMUS,

More information

RRQR Factorization Linux and Windows MEX-Files for MATLAB

RRQR Factorization Linux and Windows MEX-Files for MATLAB Documentation RRQR Factorization Linux and Windows MEX-Files for MATLAB March 29, 2007 1 Contents of the distribution file The distribution file contains the following files: rrqrgate.dll: the Windows-MEX-File;

More information

Crossing the Chasm. On the Paths to Exascale: Presented by Mike Rezny, Monash University, Australia

Crossing the Chasm. On the Paths to Exascale: Presented by Mike Rezny, Monash University, Australia On the Paths to Exascale: Crossing the Chasm Presented by Mike Rezny, Monash University, Australia michael.rezny@monash.edu Crossing the Chasm meeting Reading, 24 th October 2016 Version 0.1 In collaboration

More information

High-performance Technical Computing with Erlang

High-performance Technical Computing with Erlang High-performance Technical Computing with Erlang Alceste Scalas Giovanni Casu Piero Pili Center for Advanced Studies, Research and Development in Sardinia ACM ICFP 2008 Erlang Workshop September 27th,

More information

Projectile Motion Slide 1/16. Projectile Motion. Fall Semester. Parallel Computing

Projectile Motion Slide 1/16. Projectile Motion. Fall Semester. Parallel Computing Projectile Motion Slide 1/16 Projectile Motion Fall Semester Projectile Motion Slide 2/16 Topic Outline Historical Perspective ABC and ENIAC Ballistics tables Projectile Motion Air resistance Euler s method

More information

Divisible load theory

Divisible load theory Divisible load theory Loris Marchal November 5, 2012 1 The context Context of the study Scientific computing : large needs in computation or storage resources Need to use systems with several processors

More information

Parallelism in FreeFem++.

Parallelism in FreeFem++. Parallelism in FreeFem++. Guy Atenekeng 1 Frederic Hecht 2 Laura Grigori 1 Jacques Morice 2 Frederic Nataf 2 1 INRIA, Saclay 2 University of Paris 6 Workshop on FreeFem++, 2009 Outline 1 Introduction Motivation

More information

INF2270 Spring Philipp Häfliger. Lecture 8: Superscalar CPUs, Course Summary/Repetition (1/2)

INF2270 Spring Philipp Häfliger. Lecture 8: Superscalar CPUs, Course Summary/Repetition (1/2) INF2270 Spring 2010 Philipp Häfliger Summary/Repetition (1/2) content From Scalar to Superscalar Lecture Summary and Brief Repetition Binary numbers Boolean Algebra Combinational Logic Circuits Encoder/Decoder

More information

Matrix Eigensystem Tutorial For Parallel Computation

Matrix Eigensystem Tutorial For Parallel Computation Matrix Eigensystem Tutorial For Parallel Computation High Performance Computing Center (HPC) http://www.hpc.unm.edu 5/21/2003 1 Topic Outline Slide Main purpose of this tutorial 5 The assumptions made

More information

The Performance Evolution of the Parallel Ocean Program on the Cray X1

The Performance Evolution of the Parallel Ocean Program on the Cray X1 The Performance Evolution of the Parallel Ocean Program on the Cray X1 Patrick H. Worley Oak Ridge National Laboratory John Levesque Cray Inc. 46th Cray User Group Conference May 18, 2003 Knoxville Marriott

More information

Cubefree. Construction Algorithm for Cubefree Groups. A GAP4 Package. Heiko Dietrich

Cubefree. Construction Algorithm for Cubefree Groups. A GAP4 Package. Heiko Dietrich Cubefree Construction Algorithm for Cubefree Groups A GAP4 Package by Heiko Dietrich School of Mathematical Sciences Monash University Clayton VIC 3800 Australia email: heiko.dietrich@monash.edu September

More information

Implementation of a preconditioned eigensolver using Hypre

Implementation of a preconditioned eigensolver using Hypre Implementation of a preconditioned eigensolver using Hypre Andrew V. Knyazev 1, and Merico E. Argentati 1 1 Department of Mathematics, University of Colorado at Denver, USA SUMMARY This paper describes

More information

MPI Implementations for Solving Dot - Product on Heterogeneous Platforms

MPI Implementations for Solving Dot - Product on Heterogeneous Platforms MPI Implementations for Solving Dot - Product on Heterogeneous Platforms Panagiotis D. Michailidis and Konstantinos G. Margaritis Abstract This paper is focused on designing two parallel dot product implementations

More information

CMSC 313 Lecture 16 Announcement: no office hours today. Good-bye Assembly Language Programming Overview of second half on Digital Logic DigSim Demo

CMSC 313 Lecture 16 Announcement: no office hours today. Good-bye Assembly Language Programming Overview of second half on Digital Logic DigSim Demo CMSC 33 Lecture 6 nnouncement: no office hours today. Good-bye ssembly Language Programming Overview of second half on Digital Logic DigSim Demo UMC, CMSC33, Richard Chang Good-bye ssembly

More information

High-Performance Scientific Computing

High-Performance Scientific Computing High-Performance Scientific Computing Instructor: Randy LeVeque TA: Grady Lemoine Applied Mathematics 483/583, Spring 2011 http://www.amath.washington.edu/~rjl/am583 World s fastest computers http://top500.org

More information

Tips Geared Towards R. Adam J. Suarez. Arpil 10, 2015

Tips Geared Towards R. Adam J. Suarez. Arpil 10, 2015 Tips Geared Towards R Departments of Statistics North Carolina State University Arpil 10, 2015 1 / 30 Advantages of R As an interpretive and interactive language, developing an algorithm in R can be done

More information

Algorithms for Collective Communication. Design and Analysis of Parallel Algorithms

Algorithms for Collective Communication. Design and Analysis of Parallel Algorithms Algorithms for Collective Communication Design and Analysis of Parallel Algorithms Source A. Grama, A. Gupta, G. Karypis, and V. Kumar. Introduction to Parallel Computing, Chapter 4, 2003. Outline One-to-all

More information

Model Order Reduction via Matlab Parallel Computing Toolbox. Istanbul Technical University

Model Order Reduction via Matlab Parallel Computing Toolbox. Istanbul Technical University Model Order Reduction via Matlab Parallel Computing Toolbox E. Fatih Yetkin & Hasan Dağ Istanbul Technical University Computational Science & Engineering Department September 21, 2009 E. Fatih Yetkin (Istanbul

More information

Overview: Synchronous Computations

Overview: Synchronous Computations Overview: Synchronous Computations barriers: linear, tree-based and butterfly degrees of synchronization synchronous example 1: Jacobi Iterations serial and parallel code, performance analysis synchronous

More information

Efficient Longest Common Subsequence Computation using Bulk-Synchronous Parallelism

Efficient Longest Common Subsequence Computation using Bulk-Synchronous Parallelism Efficient Longest Common Subsequence Computation using Bulk-Synchronous Parallelism Peter Krusche Department of Computer Science University of Warwick June 2006 Outline 1 Introduction Motivation The BSP

More information

Assignment 4: Object creation

Assignment 4: Object creation Assignment 4: Object creation ETH Zurich Hand-out: 13 November 2006 Due: 21 November 2006 Copyright FarWorks, Inc. Gary Larson 1 Summary Today you are going to create a stand-alone program. How to create

More information

Panorama des modèles et outils de programmation parallèle

Panorama des modèles et outils de programmation parallèle Panorama des modèles et outils de programmation parallèle Sylvain HENRY sylvain.henry@inria.fr University of Bordeaux - LaBRI - Inria - ENSEIRB April 19th, 2013 1/45 Outline Introduction Accelerators &

More information

Timing Results of a Parallel FFTsynth

Timing Results of a Parallel FFTsynth Purdue University Purdue e-pubs Department of Computer Science Technical Reports Department of Computer Science 1994 Timing Results of a Parallel FFTsynth Robert E. Lynch Purdue University, rel@cs.purdue.edu

More information

Goals for Performance Lecture

Goals for Performance Lecture Goals for Performance Lecture Understand performance, speedup, throughput, latency Relationship between cycle time, cycles/instruction (CPI), number of instructions (the performance equation) Amdahl s

More information

Solving the Inverse Toeplitz Eigenproblem Using ScaLAPACK and MPI *

Solving the Inverse Toeplitz Eigenproblem Using ScaLAPACK and MPI * Solving the Inverse Toeplitz Eigenproblem Using ScaLAPACK and MPI * J.M. Badía and A.M. Vidal Dpto. Informática., Univ Jaume I. 07, Castellón, Spain. badia@inf.uji.es Dpto. Sistemas Informáticos y Computación.

More information

INF Models of concurrency

INF Models of concurrency INF4140 - Models of concurrency RPC and Rendezvous INF4140 Lecture 15. Nov. 2017 RPC and Rendezvous Outline More on asynchronous message passing interacting processes with different patterns of communication

More information

High Performance Computing

High Performance Computing Master Degree Program in Computer Science and Networking, 2014-15 High Performance Computing 2 nd appello February 11, 2015 Write your name, surname, student identification number (numero di matricola),

More information

One Optimized I/O Configuration per HPC Application

One Optimized I/O Configuration per HPC Application One Optimized I/O Configuration per HPC Application Leveraging I/O Configurability of Amazon EC2 Cloud Mingliang Liu, Jidong Zhai, Yan Zhai Tsinghua University Xiaosong Ma North Carolina State University

More information

Time. Today. l Physical clocks l Logical clocks

Time. Today. l Physical clocks l Logical clocks Time Today l Physical clocks l Logical clocks Events, process states and clocks " A distributed system a collection P of N singlethreaded processes without shared memory Each process p i has a state s

More information

CPU Scheduling. CPU Scheduler

CPU Scheduling. CPU Scheduler CPU Scheduling These slides are created by Dr. Huang of George Mason University. Students registered in Dr. Huang s courses at GMU can make a single machine readable copy and print a single copy of each

More information

Module 5: CPU Scheduling

Module 5: CPU Scheduling Module 5: CPU Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms Multiple-Processor Scheduling Real-Time Scheduling Algorithm Evaluation 5.1 Basic Concepts Maximum CPU utilization obtained

More information

Performance and Application of Observation Sensitivity to Global Forecasts on the KMA Cray XE6

Performance and Application of Observation Sensitivity to Global Forecasts on the KMA Cray XE6 Performance and Application of Observation Sensitivity to Global Forecasts on the KMA Cray XE6 Sangwon Joo, Yoonjae Kim, Hyuncheol Shin, Eunhee Lee, Eunjung Kim (Korea Meteorological Administration) Tae-Hun

More information

CAEFEM v9.5 Information

CAEFEM v9.5 Information CAEFEM v9.5 Information Concurrent Analysis Corporation, 50 Via Ricardo, Thousand Oaks, CA 91320 USA Tel. (805) 375 1060, Fax (805) 375 1061 email: info@caefem.com or support@caefem.com Web: http://www.caefem.com

More information