CEE 618 Scientific Parallel Computing (Lecture 4): Message-Passing Interface (MPI)
|
|
- Lora Martin
- 5 years ago
- Views:
Transcription
1 1 / 48 CEE 618 Scientific Parallel Computing (Lecture 4): Message-Passing Interface (MPI) Albert S. Kim Department of Civil and Environmental Engineering University of Hawai i at Manoa 2540 Dole Street, Holmes 383, Honolulu, Hawaii 96822
2 2 / 48 Table of Contents 1 Cluster progress 2 Introduction to MPI (Message-Passing Interface) 3 Calculation of π using MPI Basics Wall Time Broadcast Barrier Data Types Reduce Operation Types Resources
3 3 / 48 Outline Cluster progress 1 Cluster progress 2 Introduction to MPI (Message-Passing Interface) 3 Calculation of π using MPI Basics Wall Time Broadcast Barrier Data Types Reduce Operation Types Resources
4 Cluster progress My first home-made cluster, UCLA / 48 1 CPU: Pentium II 450MHz 2 Memory: 128MB/PC 3 Network card: Netgear FX310, 100/10MBPS Ethernet card 4 Switch: Netgear 8 port, 100/10MBPS Ethernet switch 5 KVM sharing device: Belkin Omni Cube 4 port
5 Cluster progress UH 2001, the second home-made cluster 5 / 48 1 Composed of 16 PCs sharing ONE keyboard, monitor, and mouse 2 Red Hat Linux 7.2 installed (free) 3 Connected to a private network by a data switch ( ) 4 More than 30 times faster than Pentium 1.0 GHz system
6 Cluster progress UH 2007, the third from Dell 1 Linux Cluster from Dell Inc. under support from NSF. 2 Initially 16 nodes, 2 Intel(R) Xeon(TM) CPU 2.80GHz, and 2 GB memory per core 3 Queuing system: Platform Lava Ñ LSF, Platform Computing Inc. 4 Programming Language: Intel FORTRAN 77/90 and Intel C/C++ 5 Libraries: BLAS, ATLAS, GotoBLAS, BLACS, LAPACK, ScaLAPACK, OPENMPI ( 6 / 48
7 Cluster progress 1 GNU & Intel-11.1 compilers, OpenMPI-1.4.1, PBSPro Host name: jaws.mhpcc.hawaii.edu 3 IP addresses: & / 48
8 Cluster progress UH 2013, the system updated 8 / 48 1 The second rack was added. 2 Additional 3 nodes, 8 Intel(R) Xeon(R) CPU E GHz per node, and 2 GB memory per core 3 Currently total 56 cores with 2GB memory each. 4 Queuing system: PBS (Portrable Batch System), torque 5 Programming Language: Intel FORTRAN and Intel C/C++ (version ) 6 Libraries: OPENMPI-1.6.1
9 Introduction to MPI (Message-Passing Interface) Outline 9 / 48 1 Cluster progress 2 Introduction to MPI (Message-Passing Interface) 3 Calculation of π using MPI Basics Wall Time Broadcast Barrier Data Types Reduce Operation Types Resources
10 Introduction to MPI (Message-Passing Interface) What is MPI? 10 / 48 MESSAGE-PASSING INTERFACE 1 A program library, NOT a language.
11 Introduction to MPI (Message-Passing Interface) What is MPI? 10 / 48 MESSAGE-PASSING INTERFACE 1 A program library, NOT a language. 2 Called from FORTRAN 77/90, C/C++, and Python (and Java). 3 Most widely used parallel library, but NOT a revolutionary way for parallel computation. 4 A collection of the best features of (many) existing message-passing systems.
12 Introduction to MPI (Message-Passing Interface) How MPI works? Figure: Distributed memory system using 4 nodes. 1 mpif90 are generated when MPI is installed with specific compilers. 2 We don t execute mpirun directly. In practice, we will use Makefile (and later qsub command). 11 / 48
13 Introduction to MPI (Message-Passing Interface) How MPI works? Figure: Distributed memory system using 4 nodes. Suppose we have a cluster composed of four computers: alpha, beta, gamma, and delta. (Each computer has one core.) Usually, the first computer (alpha) is a master machine (and file server). (E.g., fractal) 1 mpif90 are generated when MPI is installed with specific compilers. 2 We don t execute mpirun directly. In practice, we will use Makefile (and later qsub command). 11 / 48
14 Introduction to MPI (Message-Passing Interface) How MPI works? Figure: Distributed memory system using 4 nodes. Suppose we have a cluster composed of four computers: alpha, beta, gamma, and delta. (Each computer has one core.) Usually, the first computer (alpha) is a master machine (and file server). (E.g., fractal) You have a MPI code, mympi.f, in your working directory of alpha. 1 Compile the code 1 : mpif90\mympi.f90\-o\mympi.x ê 2 Run it using 4 nodes 2 : mpirun\-np \4\mympi.x ê 1 mpif90 are generated when MPI is installed with specific compilers. 2 We don t execute mpirun directly. In practice, we will use Makefile (and later qsub command). 11 / 48
15 Introduction to MPI (Message-Passing Interface) TCP/IP communication through ssh 12 / 48 MPI uses a default machine file that contains (for example) ÝÑ compute s IP (alpha) ÝÑ compute s IP (beta) ÝÑ compute s IP (gamma) ÝÑ compute s IP (delta)
16 Introduction to MPI (Message-Passing Interface) TCP/IP communication through ssh 12 / 48 MPI uses a default machine file that contains (for example) ÝÑ compute s IP (alpha) ÝÑ compute s IP (beta) ÝÑ compute s IP (gamma) ÝÑ compute s IP (delta) Basically, when a MPI job is submitted using mpirun, this machine file is read, and node numbers (ranks) are automatically assigned in a sequence (not always ordered) ÝÑ ÝÑ ÝÑ ÝÑ 3
17 Introduction to MPI (Message-Passing Interface) TCP/IP communication through ssh 12 / 48 MPI uses a default machine file that contains (for example) ÝÑ compute s IP (alpha) ÝÑ compute s IP (beta) ÝÑ compute s IP (gamma) ÝÑ compute s IP (delta) Basically, when a MPI job is submitted using mpirun, this machine file is read, and node numbers (ranks) are automatically assigned in a sequence (not always ordered) ÝÑ ÝÑ ÝÑ ÝÑ 3 In most cases including ours, a queueing system (PBS, LSF or others) takes care of job allocation to optimize computational resources. Inter-node communication was through rsh (remote shell) in the past, but is now ssh (secure shell) (rsh+data encription).
18 Introduction to MPI (Message-Passing Interface) Basic Structure of MPI programs 13 / 48 program mympi i m p l i c i t none include mpif. h integer : : numprocs, rank, ierr, rc, RESULTLEN character ( len=20) : : PNAME c a l l MPI_INIT ( i e r r ) i f ( i e r r /= MPI_SUCCESS) then p r i n t, E r r o r s t a r t i n g MPI program. Terminating. c a l l MPI_ABORT(MPI_COMM_WORLD, rc, i e r r ) end i f c a l l MPI_COMM_RANK(MPI_COMM_WORLD, rank, i e r r ) c a l l MPI_COMM_SIZE(MPI_COMM_WORLD, numprocs, i e r r ) c a l l MPI_GET_PROCESSOR_NAME(PNAME, RESULTLEN, i e r r ) write (, " ( No. of procs =,1X, i4,1 x,, My rank =,1X, i4,, pname =, A20 ) " )& numprocs, rank, pname c a l l MPI_FINALIZE ( i e r r ) end How many MPI calls?
19 Introduction to MPI (Message-Passing Interface) Basic Structure of MPI programs 13 / 48 program mympi i m p l i c i t none include mpif. h integer : : numprocs, rank, ierr, rc, RESULTLEN character ( len=20) : : PNAME c a l l MPI_INIT ( i e r r ) i f ( i e r r /= MPI_SUCCESS) then p r i n t, E r r o r s t a r t i n g MPI program. Terminating. c a l l MPI_ABORT(MPI_COMM_WORLD, rc, i e r r ) end i f c a l l MPI_COMM_RANK(MPI_COMM_WORLD, rank, i e r r ) c a l l MPI_COMM_SIZE(MPI_COMM_WORLD, numprocs, i e r r ) c a l l MPI_GET_PROCESSOR_NAME(PNAME, RESULTLEN, i e r r ) write (, " ( No. of procs =,1X, i4,1 x,, My rank =,1X, i4,, pname =, A20 ) " )& numprocs, rank, pname c a l l MPI_FINALIZE ( i e r r ) end How many MPI calls? 5
20 Introduction to MPI (Message-Passing Interface) Makefile 14 / 48 1 mpif90 =/ opt / openmpi i n t e l / bin / mpif90 l i m f 2 mpirun =/ opt / openmpi i n t e l / bin / mpirun 3 mpiopt= mca b t l tcp, s e l f mca b t l _ t c p _ i f _ i n c l u d e eth0 4 5 s r c r o o t =mympi 6 s r c f i l e =$ ( s r c r o o t ). f90 7 exefile =$ ( srcroot ). x 8 numprocs= a l l : 11 $ ( mpif90 ) $ ( s r c f i l e ) o $ ( e x e f i l e ) srun : 14 $ ( mpirun ) mca b t l tcp, s e l f np 1. / $ ( e x e f i l e ) prun : 17 $ ( mpirun ) $ ( mpiopt ) h o s t f i l e. / mpihosts np $ ( numprocs ). / $ ( e x e f i l e ) edit : 20 vim $ ( s r c f i l e )
21 Introduction to MPI (Message-Passing Interface) PBS 15 / 48 We will discuss how to use open-mpi with pbs later.
22 Introduction to MPI (Message-Passing Interface) Job output file 16 / 48 1 No. o f procs = 6, My rank = 0, pname = compute No. o f procs = 6, My rank = 2, pname = compute No. o f procs = 6, My rank = 4, pname = compute No. o f procs = 6, My rank = 5, pname = compute No. o f procs = 6, My rank = 3, pname = compute No. o f procs = 6, My rank = 1, pname = compute 1 01 Note that ranks are not well ordered.
23 Introduction to MPI (Message-Passing Interface) 1. call MPI_INIT (ierr) 17 / 48 1 c a l l MPI_INIT ( i e r r ) 2 i f ( i e r r /= MPI_SUCCESS) then 3 p r i n t, E r r o r s t a r t i n g MPI program. Terminating. 4 c a l l MPI_ABORT(MPI_COMM_WORLD, rc, i e r r ) 5 end i f 6 7 c a l l MPI_COMM_RANK(MPI_COMM_WORLD, rank, i e r r ) 8 c a l l MPI_COMM_SIZE(MPI_COMM_WORLD, numprocs, i e r r ) 9 c a l l MPI_GET_PROCESSOR_NAME(PNAME, RESULTLEN, i e r r )
24 Introduction to MPI (Message-Passing Interface) 1. call MPI_INIT (ierr) 17 / 48 1 c a l l MPI_INIT ( i e r r ) 2 i f ( i e r r /= MPI_SUCCESS) then 3 p r i n t, E r r o r s t a r t i n g MPI program. Terminating. 4 c a l l MPI_ABORT(MPI_COMM_WORLD, rc, i e r r ) 5 end i f 6 7 c a l l MPI_COMM_RANK(MPI_COMM_WORLD, rank, i e r r ) 8 c a l l MPI_COMM_SIZE(MPI_COMM_WORLD, numprocs, i e r r ) 9 c a l l MPI_GET_PROCESSOR_NAME(PNAME, RESULTLEN, i e r r ) Initializes the MPI execution environment. This function must be called in every MPI program, must be called before any other MPI functions, and must be called only once in an MPI program. ierr = error code, which must be either MPI_SUCCESS (=0) or an implementation-defined error code. Where is MPI_SUCCESS (=0) set? [Ans.]
25 Introduction to MPI (Message-Passing Interface) 1. call MPI_INIT (ierr) 17 / 48 1 c a l l MPI_INIT ( i e r r ) 2 i f ( i e r r /= MPI_SUCCESS) then 3 p r i n t, E r r o r s t a r t i n g MPI program. Terminating. 4 c a l l MPI_ABORT(MPI_COMM_WORLD, rc, i e r r ) 5 end i f 6 7 c a l l MPI_COMM_RANK(MPI_COMM_WORLD, rank, i e r r ) 8 c a l l MPI_COMM_SIZE(MPI_COMM_WORLD, numprocs, i e r r ) 9 c a l l MPI_GET_PROCESSOR_NAME(PNAME, RESULTLEN, i e r r ) Initializes the MPI execution environment. This function must be called in every MPI program, must be called before any other MPI functions, and must be called only once in an MPI program. ierr = error code, which must be either MPI_SUCCESS (=0) or an implementation-defined error code. Where is MPI_SUCCESS (=0) set? [Ans.] mpi.h
26 Introduction to MPI (Message-Passing Interface) 2. call MPI_ABORT (MPI_COMM_WORLD, rc,ierr) 18 / 48 1 c a l l MPI_INIT ( i e r r ) 2 i f ( i e r r /= MPI_SUCCESS) then 3 p r i n t, E r r o r s t a r t i n g MPI program. Terminating. 4 c a l l MPI_ABORT(MPI_COMM_WORLD, rc, i e r r ) 5 end i f 6 7 c a l l MPI_COMM_RANK(MPI_COMM_WORLD, rank, i e r r ) 8 c a l l MPI_COMM_SIZE(MPI_COMM_WORLD, numprocs, i e r r ) 9 c a l l MPI_GET_PROCESSOR_NAME(PNAME, RESULTLEN, i e r r )
27 Introduction to MPI (Message-Passing Interface) 2. call MPI_ABORT (MPI_COMM_WORLD, rc,ierr) Terminates (aborts) all MPI processes associated with the communicator. In most MPI implementations it terminates ALL processes regardless of the communicator specified. MPI_COMM_WORLD = default communicator defines one context and the set of all processes. It is one of items defined in mpi.h. rc = error code 18 / 48 1 c a l l MPI_INIT ( i e r r ) 2 i f ( i e r r /= MPI_SUCCESS) then 3 p r i n t, E r r o r s t a r t i n g MPI program. Terminating. 4 c a l l MPI_ABORT(MPI_COMM_WORLD, rc, i e r r ) 5 end i f 6 7 c a l l MPI_COMM_RANK(MPI_COMM_WORLD, rank, i e r r ) 8 c a l l MPI_COMM_SIZE(MPI_COMM_WORLD, numprocs, i e r r ) 9 c a l l MPI_GET_PROCESSOR_NAME(PNAME, RESULTLEN, i e r r )
28 Introduction to MPI (Message-Passing Interface) 3. call MPI_COMM_RANK (MPI_COMM_WORLD,rank,ierr) 1 c a l l MPI_INIT ( i e r r ) 2 i f ( i e r r /= MPI_SUCCESS) then 3 p r i n t, E r r o r s t a r t i n g MPI program. Terminating. 4 c a l l MPI_ABORT(MPI_COMM_WORLD, rc, i e r r ) 5 end i f 6 7 c a l l MPI_COMM_RANK(MPI_COMM_WORLD, rank, i e r r ) 8 c a l l MPI_COMM_SIZE(MPI_COMM_WORLD, numprocs, i e r r ) 9 c a l l MPI_GET_PROCESSOR_NAME(PNAME, RESULTLEN, i e r r ) Determines the rank of the calling process within the communicator. whoami? Initially, each process will be assigned a unique integer rank between 0 and number of processors - 1 (i.e., numprocs-1), within the communicator MPI_COMM_WORLD. This rank is often referred to as a task ID. If 8 processors are used, then ranks are (?) 19 / 48
29 Introduction to MPI (Message-Passing Interface) 3. call MPI_COMM_RANK (MPI_COMM_WORLD,rank,ierr) 1 c a l l MPI_INIT ( i e r r ) 2 i f ( i e r r /= MPI_SUCCESS) then 3 p r i n t, E r r o r s t a r t i n g MPI program. Terminating. 4 c a l l MPI_ABORT(MPI_COMM_WORLD, rc, i e r r ) 5 end i f 6 7 c a l l MPI_COMM_RANK(MPI_COMM_WORLD, rank, i e r r ) 8 c a l l MPI_COMM_SIZE(MPI_COMM_WORLD, numprocs, i e r r ) 9 c a l l MPI_GET_PROCESSOR_NAME(PNAME, RESULTLEN, i e r r ) Determines the rank of the calling process within the communicator. whoami? Initially, each process will be assigned a unique integer rank between 0 and number of processors - 1 (i.e., numprocs-1), within the communicator MPI_COMM_WORLD. This rank is often referred to as a task ID. If 8 processors are used, then ranks are (?) 19 / 48
30 Introduction to MPI (Message-Passing Interface) 4. call MPI_COMM_SIZE (MPI_COMM_WORLD, numprocs, ierr) 20 / 48 1 c a l l MPI_INIT ( i e r r ) 2 i f ( i e r r /= MPI_SUCCESS) then 3 p r i n t, E r r o r s t a r t i n g MPI program. Terminating. 4 c a l l MPI_ABORT(MPI_COMM_WORLD, rc, i e r r ) 5 end i f 6 7 c a l l MPI_COMM_RANK(MPI_COMM_WORLD, rank, i e r r ) 8 c a l l MPI_COMM_SIZE(MPI_COMM_WORLD, numprocs, i e r r ) 9 c a l l MPI_GET_PROCESSOR_NAME(PNAME, RESULTLEN, i e r r )
31 Introduction to MPI (Message-Passing Interface) 4. call MPI_COMM_SIZE (MPI_COMM_WORLD, numprocs, ierr) 20 / 48 1 c a l l MPI_INIT ( i e r r ) 2 i f ( i e r r /= MPI_SUCCESS) then 3 p r i n t, E r r o r s t a r t i n g MPI program. Terminating. 4 c a l l MPI_ABORT(MPI_COMM_WORLD, rc, i e r r ) 5 end i f 6 7 c a l l MPI_COMM_RANK(MPI_COMM_WORLD, rank, i e r r ) 8 c a l l MPI_COMM_SIZE(MPI_COMM_WORLD, numprocs, i e r r ) 9 c a l l MPI_GET_PROCESSOR_NAME(PNAME, RESULTLEN, i e r r ) Determines (or obtains) the number of processes in the group associated with a communicator. Generally used within the communicator MPI_COMM_WORLD to determine the number of processes (numprocs) being used by your own application.
32 Introduction to MPI (Message-Passing Interface) 4. call MPI_COMM_SIZE (MPI_COMM_WORLD, numprocs, ierr) 1 c a l l MPI_INIT ( i e r r ) 2 i f ( i e r r /= MPI_SUCCESS) then 3 p r i n t, E r r o r s t a r t i n g MPI program. Terminating. 4 c a l l MPI_ABORT(MPI_COMM_WORLD, rc, i e r r ) 5 end i f 6 7 c a l l MPI_COMM_RANK(MPI_COMM_WORLD, rank, i e r r ) 8 c a l l MPI_COMM_SIZE(MPI_COMM_WORLD, numprocs, i e r r ) 9 c a l l MPI_GET_PROCESSOR_NAME(PNAME, RESULTLEN, i e r r ) Determines (or obtains) the number of processes in the group associated with a communicator. Generally used within the communicator MPI_COMM_WORLD to determine the number of processes (numprocs) being used by your own application. It matches with the number after -np in» mpirun\-np \4\mympi.x ê 20 / 48
33 Introduction to MPI (Message-Passing Interface) 5. call MPI_GET_PROCESSOR_NAME (PANME, RESULTLEN, ierr) 21 / 48 1 c a l l MPI_INIT ( i e r r ) 2 i f ( i e r r /= MPI_SUCCESS) then 3 p r i n t, E r r o r s t a r t i n g MPI program. Terminating. 4 c a l l MPI_ABORT(MPI_COMM_WORLD, rc, i e r r ) 5 end i f 6 7 c a l l MPI_COMM_RANK(MPI_COMM_WORLD, rank, i e r r ) 8 c a l l MPI_COMM_SIZE(MPI_COMM_WORLD, numprocs, i e r r ) 9 c a l l MPI_GET_PROCESSOR_NAME(PNAME, RESULTLEN, i e r r )
34 Introduction to MPI (Message-Passing Interface) 5. call MPI_GET_PROCESSOR_NAME (PANME, RESULTLEN, ierr) 21 / 48 1 c a l l MPI_INIT ( i e r r ) 2 i f ( i e r r /= MPI_SUCCESS) then 3 p r i n t, E r r o r s t a r t i n g MPI program. Terminating. 4 c a l l MPI_ABORT(MPI_COMM_WORLD, rc, i e r r ) 5 end i f 6 7 c a l l MPI_COMM_RANK(MPI_COMM_WORLD, rank, i e r r ) 8 c a l l MPI_COMM_SIZE(MPI_COMM_WORLD, numprocs, i e r r ) 9 c a l l MPI_GET_PROCESSOR_NAME(PNAME, RESULTLEN, i e r r ) Returns the name of the local processor at the time of the call, i.e, PNAME.
35 Introduction to MPI (Message-Passing Interface) 5. call MPI_GET_PROCESSOR_NAME (PANME, RESULTLEN, ierr) 21 / 48 1 c a l l MPI_INIT ( i e r r ) 2 i f ( i e r r /= MPI_SUCCESS) then 3 p r i n t, E r r o r s t a r t i n g MPI program. Terminating. 4 c a l l MPI_ABORT(MPI_COMM_WORLD, rc, i e r r ) 5 end i f 6 7 c a l l MPI_COMM_RANK(MPI_COMM_WORLD, rank, i e r r ) 8 c a l l MPI_COMM_SIZE(MPI_COMM_WORLD, numprocs, i e r r ) 9 c a l l MPI_GET_PROCESSOR_NAME(PNAME, RESULTLEN, i e r r ) Returns the name of the local processor at the time of the call, i.e, PNAME. RESULTLEN is the character length of PNAME
36 Introduction to MPI (Message-Passing Interface) 6. call MPI_FINALIZE (ierr) 22 / write (, " ( No. of procs =,1X, i4,1 x,, My rank =,1X, i4,, pna 3 numprocs, rank, pname 4 5 c a l l MPI_FINALIZE ( i e r r ) 6 end Terminates the MPI execution environment. This function should be the last MPI routine called in every MPI program - NO other MPI routines may be called after it. Note that all MPI programs start with MPI_INIT (ierr) do something in-between, and end with MPI_FINALIZE (ierr).
37 Introduction to MPI (Message-Passing Interface) 6. call MPI_FINALIZE (ierr) 1 2 write (, " ( No. of procs =,1X, i4,1 x,, My rank =,1X, i4,, pna 3 numprocs, rank, pname 4 5 c a l l MPI_FINALIZE ( i e r r ) 6 end Terminates the MPI execution environment. This function should be the last MPI routine called in every MPI program - NO other MPI routines may be called after it. Note that all MPI programs start with MPI_INIT (ierr) do something in-between, and end with MPI_FINALIZE (ierr). Question: Who is/are doing print * above? 22 / 48
38 Introduction to MPI (Message-Passing Interface) 6. call MPI_FINALIZE (ierr) 1 2 write (, " ( No. of procs =,1X, i4,1 x,, My rank =,1X, i4,, pna 3 numprocs, rank, pname 4 5 c a l l MPI_FINALIZE ( i e r r ) 6 end Terminates the MPI execution environment. This function should be the last MPI routine called in every MPI program - NO other MPI routines may be called after it. Note that all MPI programs start with MPI_INIT (ierr) do something in-between, and end with MPI_FINALIZE (ierr). Question: Who is/are doing print * above? Ans. all 4 processes 22 / 48
39 Introduction to MPI (Message-Passing Interface) Summary: rank and the number of processors 23 / 48 If the number of processors is N procs, then N procs processors will have ranks of 0, 1, 2,, N procs 2, and N procs 1.
40 Introduction to MPI (Message-Passing Interface) Summary: rank and the number of processors 23 / 48 If the number of processors is N procs, then N procs processors will have ranks of 0, 1, 2,, N procs 2, and N procs 1. For example, if 6 processors are used for a parallel MPI calculation, then ranks of the processors will be
41 Introduction to MPI (Message-Passing Interface) Summary: rank and the number of processors 23 / 48 If the number of processors is N procs, then N procs processors will have ranks of 0, 1, 2,, N procs 2, and N procs 1. For example, if 6 processors are used for a parallel MPI calculation, then ranks of the processors will be 0, 1, 2, 3, 4, and 5.
42 Introduction to MPI (Message-Passing Interface) Random Quiz 24 / 48 What are required MPI routines to generate a minimum parallel code? And, how many? ANS:
43 Introduction to MPI (Message-Passing Interface) Random Quiz 24 / 48 What are required MPI routines to generate a minimum parallel code? And, how many? ANS: MPI_INIT MPI_FINALIZE 2
44 25 / 48 Outline Calculation of π using MPI 1 Cluster progress 2 Introduction to MPI (Message-Passing Interface) 3 Calculation of π using MPI Basics Wall Time Broadcast Barrier Data Types Reduce Operation Types Resources
45 Calculation of π using MPI Mathematical Identity Basics Numerically integrate g pxq 4 1 x 2 (1) from x 0 to 1. Here, g pxq is f pxq in the reference book 3. 3 Using MPI, Portable Parallel Programming with the Message-Passing Interface, by William Gropp, Ewing Lusk, Anthony Skjellum, The MIT press, page / 48
46 Calculation of π using MPI Mathematical Identity Basics Numerically integrate g pxq 4 1 x 2 (1) from x 0 to 1. Here, g pxq is f pxq in the reference book 3.» x1 x x 2 dx tan 1 x1 pxq x0 4 tan 1 p1q tan 1 p0q π Make your own derivation, substituting x tan y! π (2) 3 Using MPI, Portable Parallel Programming with the Message-Passing Interface, by William Gropp, Ewing Lusk, Anthony Skjellum, The MIT press, page / 48
47 27 / 48 Integration Scheme Calculation of π using MPI Basics Figure: Integrating to find the value of π with n 5 where n is the number of points (or rectangles) for the integration.
48 28 / 48 Calculation of π using MPI Basics The number of integration and the number of processes (=cores) 1 If n 100 and N procs 4, the each process takes care of 25 points. Processor 0: n 1 25 Processor 1: n Processor 2: n Processor 3: n
49 28 / 48 Calculation of π using MPI Basics The number of integration and the number of processes (=cores) 1 If n 100 and N procs 4, the each process takes care of 25 points. Processor 0: n 1 25 Processor 1: n Processor 2: n Processor 3: n However, what if n{n procs is NOT an integer, for example, n 100 and N procs 6? You can try n{n procs or n{ pn procs 1q. Then how do you handle remainders?.
50 29 / 48 Calculation of π using MPI Optimum load balance Basics * If n{n procs is NOT an integer, one possible optimal way is jumping as many steps as the number of processes (i.e., 6)
51 29 / 48 Calculation of π using MPI Optimum load balance Basics * If n{n procs is NOT an integer, one possible optimal way is jumping as many steps as the number of processes (i.e., 6) Processor 0: n 1, 7, 13,..., 91, 97 Processor 1: n 2, 8, 14,..., 92, 98 Processor 2: n 3, 9, 15,..., 93, 99 Processor 3: n 4, 10, 16,..., 94, 100 Processor 4: n 5, 11, 17,..., 95 Processor 5: n 6, 12, 18,..., 96 This method is generally applicable without any restriction. Assigned tasks are almost identical to each process.
52 30 / 48 Calculation of π using MPI fpi.f90 the first half Basics 1 program main 2 include mpif. h 3 double precision : : PI25DT 4 parameter ( PI25DT = d0 ) 5 double precision : : mypi, pi, h, sum, x, f, a 6 double precision : : T1, T2 7 integer : : n, myid, numprocs, i, r c 8! f u n c t i o n to i n t e g r a t e 9 f ( a ) = 4. d0 / ( 1. d0 + a a ) c a l l MPI_INIT ( i e r r ) 12 c a l l MPI_COMM_RANK( MPI_COMM_WORLD, myid, i e r r ) 13 c a l l MPI_COMM_SIZE( MPI_COMM_WORLD, numprocs, i e r r ) n = h = 1.0 d0 / dble ( n ) 17 T1 = MPI_WTIME ( ) c a l l MPI_BCAST( n, 1,MPI_INTEGER, 0,MPI_COMM_WORLD, i e r r )
53 31 / 48 INTEGER Types Calculation of π using MPI Basics 1 INTEGER*2 (non-standard) (2 Bytes = 16 bits): From to = INTEGER*4 (4 bytes = 32 bits), MPI default: From to = INTEGER*8 (8 bytes = 64 bits): From to =
54 32 / 48 MPI_WTIME() Calculation of π using MPI Wall Time T1 = MPI_WTIME()... T2 = MPI_WTIME() T2 - T1 1 Returns an elapsed wall clock time in seconds (double precision) on the calling processor. 2 Time in seconds since an arbitrary time in the past, e.g., Only the difference, T 2 T 1, makes sense.
55 33 / 48 MPI_BCAST Calculation of π using MPI Broadcast call MPI_BCAST( n, 1, MPI_INTEGER, 0, MPI_COMM_WORLD, ierr ) MPI_BCAST (buffer, count, datatype, root, comm, ierr ) broadcasts a message from the process with rank "root" to all other processes of the group.
56 33 / 48 MPI_BCAST Calculation of π using MPI Broadcast call MPI_BCAST( n, 1, MPI_INTEGER, 0, MPI_COMM_WORLD, ierr ) MPI_BCAST (buffer, count, datatype, root, comm, ierr ) broadcasts a message from the process with rank "root" to all other processes of the group. 1 buffer - name (or starting address) of buffer, i.e., variable (choice)
57 MPI_BCAST Calculation of π using MPI Broadcast call MPI_BCAST( n, 1, MPI_INTEGER, 0, MPI_COMM_WORLD, ierr ) MPI_BCAST (buffer, count, datatype, root, comm, ierr ) broadcasts a message from the process with rank "root" to all other processes of the group. 1 buffer - name (or starting address) of buffer, i.e., variable (choice) 2 count - the number of entries in buffer, i.e., how many (integer) 33 / 48
58 MPI_BCAST Calculation of π using MPI Broadcast call MPI_BCAST( n, 1, MPI_INTEGER, 0, MPI_COMM_WORLD, ierr ) MPI_BCAST (buffer, count, datatype, root, comm, ierr ) broadcasts a message from the process with rank "root" to all other processes of the group. 1 buffer - name (or starting address) of buffer, i.e., variable (choice) 2 count - the number of entries in buffer, i.e., how many (integer) 3 datatype - data type of buffer such as integer, double precision,... (handle) 33 / 48
59 MPI_BCAST Calculation of π using MPI Broadcast call MPI_BCAST( n, 1, MPI_INTEGER, 0, MPI_COMM_WORLD, ierr ) MPI_BCAST (buffer, count, datatype, root, comm, ierr ) broadcasts a message from the process with rank "root" to all other processes of the group. 1 buffer - name (or starting address) of buffer, i.e., variable (choice) 2 count - the number of entries in buffer, i.e., how many (integer) 3 datatype - data type of buffer such as integer, double precision,... (handle) 4 root - rank of broadcast root, i.e., from whom?, broadcaster (integer) 33 / 48
60 MPI_BCAST Calculation of π using MPI Broadcast call MPI_BCAST( n, 1, MPI_INTEGER, 0, MPI_COMM_WORLD, ierr ) MPI_BCAST (buffer, count, datatype, root, comm, ierr ) broadcasts a message from the process with rank "root" to all other processes of the group. 1 buffer - name (or starting address) of buffer, i.e., variable (choice) 2 count - the number of entries in buffer, i.e., how many (integer) 3 datatype - data type of buffer such as integer, double precision,... (handle) 4 root - rank of broadcast root, i.e., from whom?, broadcaster (integer) 5 comm - communicator (handle), i.e., a communication language, default is MPI_COMM_WORLD 33 / 48
61 MPI_BCAST Calculation of π using MPI Broadcast call MPI_BCAST( n, 1, MPI_INTEGER, 0, MPI_COMM_WORLD, ierr ) MPI_BCAST (buffer, count, datatype, root, comm, ierr ) broadcasts a message from the process with rank "root" to all other processes of the group. 1 buffer - name (or starting address) of buffer, i.e., variable (choice) 2 count - the number of entries in buffer, i.e., how many (integer) 3 datatype - data type of buffer such as integer, double precision,... (handle) 4 root - rank of broadcast root, i.e., from whom?, broadcaster (integer) 5 comm - communicator (handle), i.e., a communication language, default is MPI_COMM_WORLD Note: This MPI_BCAST call is only to explain how it works in the example code, which means π will be properly calculated without this call. This is because each process already knows the value of n. 33 / 48
62 34 / 48 Calculation of π using MPI Example of MPI_BCAST Broadcast program broadcast i m p l i c i t none include mpif. h integer numprocs, rank, ierr, rc integer Nbcast c a l l MPI_INIT ( i e r r ) i f ( i e r r /= MPI_SUCCESS) then p r i n t, E r r o r s t a r t i n g MPI program. Terminating. c a l l MPI_ABORT(MPI_COMM_WORLD, rc, i e r r ) end i f c a l l MPI_COMM_RANK(MPI_COMM_WORLD, rank, i e r r ) c a l l MPI_COMM_SIZE(MPI_COMM_WORLD, numprocs, i e r r ) Nbcast = 10 p r i n t, I am, rank, of, numprocs, and Nbcast =, Nbcast c a l l MPI_BARRIER(MPI_COMM_WORLD, i e r r ) i f ( rank ==0) Nbcast = 10 c a l l MPI_BCAST ( Nbcast,1,MPI_INTEGER,0,MPI_COMM_WORLD, i e r r ) p r i n t, I am, rank, of, numprocs, and Nbcast =, Nbcast c a l l MPI_FINALIZE ( i e r r ) end An MPI program is executed by each processor concurrently with given conditions which could be different from each processor, i.e., rank-specific.
63 35 / 48 Calculation of π using MPI Broadcast Outcome of MPI_BCAST example I am 0 of 6 and Nbcast = -10 I am 1 of 6 and Nbcast = -10 I am 2 of 6 and Nbcast = -10 I am 5 of 6 and Nbcast = -10 I am 3 of 6 and Nbcast = -10 I am 4 of 6 and Nbcast = -10 I am 0 of 6 and Nbcast = 10 I am 1 of 6 and Nbcast = 10 I am 2 of 6 and Nbcast = 10 I am 4 of 6 and Nbcast = 10 I am 5 of 6 and Nbcast = 10 I am 3 of 6 and Nbcast = 10 Note: Without calling MPI_BARRIER, the above messages will be highly disordered because processes compete each other to print message to stout. For each run, change the executable file name.
64 36 / 48 MPI_BARRIER Calculation of π using MPI Barrier call MPI_BARRIER( MPI_COMM_WORLD, ierr ) MPI_BARRIER (comm, ierr ) creates a barrier synchronization in a group. Each task, when reaching the MPI_Barrier call, blocks until all tasks in the group reach the same MPI_Barrier call. Let s check everybody is in the bus and then drive to the vacation place. No home alone!
65 Calculation of π using MPI MPI FORTRAN Data Types Data Types 1 MPI_CHARACTER : character(1) 2 MPI_INTEGER : integer(4) 3 MPI_REAL : real(4) 4 MPI_DOUBLE_PRECISION : double precision = real(8) 5 MPI_COMPLEX : complex 6 MPI_LOGICAL : logical 7 MPI_BYTE : 8 binary digits 8 MPI_PACKED : data packed or unpacked with MPI_Pack()/ MPI_Unpack Note: C/C++ do not have complex and logical data-types. 37 / 48
66 Calculation of π using MPI fpi.f90 second half 38 / 48 1 sum = 0.0 d0 2 do i = myid +1, n, numprocs 3 x = h ( dble ( i ) 0.5 d0 ) 4 sum = sum + f ( x ) 5 enddo 6 mypi = h sum 7 8 c a l l MPI_REDUCE ( mypi, pi, 1, MPI_DOUBLE_PRECISION, MPI_SUM, 0,MPI_COMM_WORLD 9 T2 = MPI_WTIME ( ) 10 i f ( myid == 0) then 11 write (, " ( p i i s approximately :, F18.16 ) " ) p i 12 write (, " ( E r r o r i s :, F18.16 ) " ) abs ( p i PI25DT ) 13 write (, ) " S t a r t time = ", T1 14 write (, ) " End time = ", T2 15 write (, ) " Elapsed time = ", T2 T1 16 write (, ) " The number of processes = ", numprocs 17 endif 18 c a l l MPI_FINALIZE ( rc ) 19 stop 20 end
67 39 / 48 MPI_REDUCE Calculation of π using MPI Reduce call MPI_REDUCE (mypi, pi, 1, MPI_DOUBLE_PRECISION, MPI_SUM, 0, MPI_COMM_WORLD, ierr) MPI_REDUCE (sendbuf, recvbuf, count, datatype, op, root, comm, ierr) reduces values on all processes to a single value.
68 39 / 48 MPI_REDUCE Calculation of π using MPI Reduce call MPI_REDUCE (mypi, pi, 1, MPI_DOUBLE_PRECISION, MPI_SUM, 0, MPI_COMM_WORLD, ierr) MPI_REDUCE (sendbuf, recvbuf, count, datatype, op, root, comm, ierr) reduces values on all processes to a single value. 1 sendbuf - address of send buffer by each processor, i.e., variable (choice)
69 39 / 48 MPI_REDUCE Calculation of π using MPI Reduce call MPI_REDUCE (mypi, pi, 1, MPI_DOUBLE_PRECISION, MPI_SUM, 0, MPI_COMM_WORLD, ierr) MPI_REDUCE (sendbuf, recvbuf, count, datatype, op, root, comm, ierr) reduces values on all processes to a single value. 1 sendbuf - address of send buffer by each processor, i.e., variable (choice) 2 recvbuf - address of receive buffer by reducer, i.e., a different variable (choice)
70 39 / 48 MPI_REDUCE Calculation of π using MPI Reduce call MPI_REDUCE (mypi, pi, 1, MPI_DOUBLE_PRECISION, MPI_SUM, 0, MPI_COMM_WORLD, ierr) MPI_REDUCE (sendbuf, recvbuf, count, datatype, op, root, comm, ierr) reduces values on all processes to a single value. 1 sendbuf - address of send buffer by each processor, i.e., variable (choice) 2 recvbuf - address of receive buffer by reducer, i.e., a different variable (choice) 3 count - number of elements in send buffer, i.e., how many (integer)
71 39 / 48 MPI_REDUCE Calculation of π using MPI Reduce call MPI_REDUCE (mypi, pi, 1, MPI_DOUBLE_PRECISION, MPI_SUM, 0, MPI_COMM_WORLD, ierr) MPI_REDUCE (sendbuf, recvbuf, count, datatype, op, root, comm, ierr) reduces values on all processes to a single value. 1 sendbuf - address of send buffer by each processor, i.e., variable (choice) 2 recvbuf - address of receive buffer by reducer, i.e., a different variable (choice) 3 count - number of elements in send buffer, i.e., how many (integer) 4 datatype - data type of elements of send buffer such as integer, real,... (handle)
72 39 / 48 MPI_REDUCE Calculation of π using MPI Reduce call MPI_REDUCE (mypi, pi, 1, MPI_DOUBLE_PRECISION, MPI_SUM, 0, MPI_COMM_WORLD, ierr) MPI_REDUCE (sendbuf, recvbuf, count, datatype, op, root, comm, ierr) reduces values on all processes to a single value. 1 sendbuf - address of send buffer by each processor, i.e., variable (choice) 2 recvbuf - address of receive buffer by reducer, i.e., a different variable (choice) 3 count - number of elements in send buffer, i.e., how many (integer) 4 datatype - data type of elements of send buffer such as integer, real,... (handle) 5 op - reducing aritematic operation (handle)
73 39 / 48 MPI_REDUCE Calculation of π using MPI Reduce call MPI_REDUCE (mypi, pi, 1, MPI_DOUBLE_PRECISION, MPI_SUM, 0, MPI_COMM_WORLD, ierr) MPI_REDUCE (sendbuf, recvbuf, count, datatype, op, root, comm, ierr) reduces values on all processes to a single value. 1 sendbuf - address of send buffer by each processor, i.e., variable (choice) 2 recvbuf - address of receive buffer by reducer, i.e., a different variable (choice) 3 count - number of elements in send buffer, i.e., how many (integer) 4 datatype - data type of elements of send buffer such as integer, real,... (handle) 5 op - reducing aritematic operation (handle) 6 root - rank of root process, i.e., from whom?, broadcaster (integer)
74 39 / 48 MPI_REDUCE Calculation of π using MPI Reduce call MPI_REDUCE (mypi, pi, 1, MPI_DOUBLE_PRECISION, MPI_SUM, 0, MPI_COMM_WORLD, ierr) MPI_REDUCE (sendbuf, recvbuf, count, datatype, op, root, comm, ierr) reduces values on all processes to a single value. 1 sendbuf - address of send buffer by each processor, i.e., variable (choice) 2 recvbuf - address of receive buffer by reducer, i.e., a different variable (choice) 3 count - number of elements in send buffer, i.e., how many (integer) 4 datatype - data type of elements of send buffer such as integer, real,... (handle) 5 op - reducing aritematic operation (handle) 6 root - rank of root process, i.e., from whom?, broadcaster (integer) 7 comm - communicator, i.e., a communication language, default is MPI_COMM_WORLD (handle)
75 40 / 48 Calculation of π using MPI Example of MPI_REDUCE Reduce 1 program broadcast 2 i m p l i c i t none 3 include mpif. h 4 integer : : numprocs, rank, i e r r, rc 5 integer : : Ireduced, Nreduced 6 c a l l MPI_INIT ( i e r r ) 7 i f ( i e r r /= MPI_SUCCESS) then 8 p r i n t, E r r o r s t a r t i n g MPI program. Terminating. 9 c a l l MPI_ABORT(MPI_COMM_WORLD, rc, i e r r ) 10 end i f c a l l MPI_COMM_RANK(MPI_COMM_WORLD, rank, i e r r ) 13 c a l l MPI_COMM_SIZE(MPI_COMM_WORLD, numprocs, i e r r ) 14 Ireduced = 1 + rank 15 Nreduced = 0 16 p r i n t, I am, rank, of, numprocs, and Ireduced =, Ireduced 17 c a l l MPI_BARRIER(MPI_COMM_WORLD, i e r r ) 18 c a l l MPI_REDUCE & 19 ( Ireduced, Nreduced, 1, MPI_INTEGER, MPI_SUM, 0, MPI_COMM_WORLD, i e r r ) 20 i f ( rank ==0) & 21 p r i n t, I am, rank, of, numprocs, and Nreduced =, Nreduced 22 c a l l MPI_FINALIZE ( i e r r ) 23 end
76 41 / 48 Calculation of π using MPI Reduce Outcome of MPI_REDUCE example I am 0 of 6 and Ireduced = 1 I am 1 of 6 and Ireduced = 2 I am 2 of 6 and Ireduced = 3 I am 3 of 6 and Ireduced = 4 I am 4 of 6 and Ireduced = 5 I am 5 of 6 and Ireduced = 6 I am 0 of 6 and Nreduced = Note that rank = Ireduced -1. Without calling MPI_BARRIER, the final answer of 21 can be positioned anywhere.
77 42 / 48 MPI Operations Calculation of π using MPI Operation Types 1 MPI_MAX maximum (integer, real, complex )
78 42 / 48 MPI Operations Calculation of π using MPI Operation Types 1 MPI_MAX maximum (integer, real, complex ) 2 MPI_MIN minimum (integer, real, complex)
79 42 / 48 MPI Operations Calculation of π using MPI Operation Types 1 MPI_MAX maximum (integer, real, complex ) 2 MPI_MIN minimum (integer, real, complex) 3 MPI_SUM sum (integer, real, complex)
80 42 / 48 MPI Operations Calculation of π using MPI Operation Types 1 MPI_MAX maximum (integer, real, complex ) 2 MPI_MIN minimum (integer, real, complex) 3 MPI_SUM sum (integer, real, complex) 4 MPI_PROD product (integer, real, complex)
81 42 / 48 MPI Operations Calculation of π using MPI Operation Types 1 MPI_MAX maximum (integer, real, complex ) 2 MPI_MIN minimum (integer, real, complex) 3 MPI_SUM sum (integer, real, complex) 4 MPI_PROD product (integer, real, complex) 5 MPI_LAND logical AND (logical)
82 42 / 48 MPI Operations Calculation of π using MPI Operation Types 1 MPI_MAX maximum (integer, real, complex ) 2 MPI_MIN minimum (integer, real, complex) 3 MPI_SUM sum (integer, real, complex) 4 MPI_PROD product (integer, real, complex) 5 MPI_LAND logical AND (logical) 6 MPI_LOR logical OR (logical)
83 42 / 48 MPI Operations Calculation of π using MPI Operation Types 1 MPI_MAX maximum (integer, real, complex ) 2 MPI_MIN minimum (integer, real, complex) 3 MPI_SUM sum (integer, real, complex) 4 MPI_PROD product (integer, real, complex) 5 MPI_LAND logical AND (logical) 6 MPI_LOR logical OR (logical) 7 MPI_LXOR logical XOR (logical)
84 42 / 48 MPI Operations Calculation of π using MPI Operation Types 1 MPI_MAX maximum (integer, real, complex ) 2 MPI_MIN minimum (integer, real, complex) 3 MPI_SUM sum (integer, real, complex) 4 MPI_PROD product (integer, real, complex) 5 MPI_LAND logical AND (logical) 6 MPI_LOR logical OR (logical) 7 MPI_LXOR logical XOR (logical) 8 MPI_BAND bit-wise AND (integer, MPI_BYTE) 9 MPI_BOR bit-wise OR (integer, MPI_BYTE) 10 MPI_BXOR bit-wise XOR (integer, MPI_BYTE) 11 MPI_MAXLOC max value and location (real, complex, double precision ) 12 MPI_MINLOC min value and location (real, complex, double precision )
85 43 / 48 Calculation of π using MPI Resources and References Resources 1 Gropp et al. Using MPI: Portable parallel programming with the Message-Passing Interface. MIT Press (1997)
86 44 / 48 Makefile Calculation of π using MPI Resources mpif90 =/ opt / openmpi i n t e l / bin / mpif90 l i m f foropt= diag disable 8290 diag disable 8291 diag disable mpirun =/ opt / openmpi i n t e l / bin / mpirun mpiopt= mca b t l tcp, s e l f mca b t l _ t c p _ i f _ i n c l u d e eth0 s r c r o o t = f p i s r c f i l e =$ ( s r c r o o t ). f90 e x e f i l e =$ ( s r c r o o t ). x numprocs=6 a l l : srun : prun : e d i t : $ ( mpif90 ) $ ( f o r o p t ) $ ( s r c f i l e ) o $ ( e x e f i l e ) time $ ( mpirun ) mca b t l tcp, s e l f np 1. / $ ( e x e f i l e ) time $ ( mpirun ) $ ( mpiopt ) h o s t f i l e. / mpihosts np $ ( numprocs ). / $ ( e x e f i l e ) vim $ ( s r c f i l e )
87 45 / 48 script file Calculation of π using MPI Resources We will discuss this later.
88 46 / 48 Calculation of π using MPI Job output file part Resources p i i s approximately : E r r o r i s : S t a r t time = End time = Elapsed time = The number o f processes = user 0.04 system 0:01.84 elapsed 6%CPU (0 avgtext +0avgdata m 0 i n p u t s +0 outputs (0 major+4125minor ) p a g e f a u l t s 0swaps
89 47 / 48 Lab work Calculation of π using MPI 1 MPI codes are stored at /opt/cee618s13/class04/ 2 Copy examples under subdirectories of MPI-basic, MPI-bcase, MPI-reduce and MPI-pi and study them. 3 Type/enter make followed by make prun. 4 Start your homework.
90 Calculation of π using MPI Speed up test 48 / 48 1 Change the number of processes of MPI-pi application from 1 to 8 and see how much speed up is acheieved.
91 Calculation of π using MPI Speed up test 48 / 48 1 Change the number of processes of MPI-pi application from 1 to 8 and see how much speed up is acheieved.
92 Calculation of π using MPI Speed up test 48 / 48 1 Change the number of processes of MPI-pi application from 1 to 8 and see how much speed up is acheieved.
93 Calculation of π using MPI Speed up test 48 / 48 1 Change the number of processes of MPI-pi application from 1 to 8 and see how much speed up is acheieved.
94 Calculation of π using MPI Speed up test 48 / 48 1 Change the number of processes of MPI-pi application from 1 to 8 and see how much speed up is acheieved.
The Message Passing Interface (MPI) TMA4280 Introduction to Supercomputing
The Message Passing Interface (MPI) TMA4280 Introduction to Supercomputing NTNU, IMF February 9. 2018 1 Recap: Parallelism with MPI An MPI execution is started on a set of processes P with: mpirun -n N
More informationCEE 618 Scientific Parallel Computing (Lecture 7): OpenMP (con td) and Matrix Multiplication
1 / 26 CEE 618 Scientific Parallel Computing (Lecture 7): OpenMP (con td) and Matrix Multiplication Albert S. Kim Department of Civil and Environmental Engineering University of Hawai i at Manoa 2540 Dole
More information2.10 Packing. 59 MPI 2.10 Packing MPI_PACK( INBUF, INCOUNT, DATATYPE, OUTBUF, OUTSIZE, POSITION, COMM, IERR )
59 MPI 2.10 Packing 2.10 Packing MPI_PACK( INBUF, INCOUNT, DATATYPE, OUTBUF, OUTSIZE, POSITION, COMM, IERR ) : : INBUF ( ), OUTBUF( ) INTEGER : : INCOUNT, DATATYPE, OUTSIZE, POSITION, COMM, IERR
More informationLecture 24 - MPI ECE 459: Programming for Performance
ECE 459: Programming for Performance Jon Eyolfson March 9, 2012 What is MPI? Messaging Passing Interface A language-independent communation protocol for parallel computers Run the same code on a number
More information( ) ? ( ) ? ? ? l RB-H
1 2018 4 17 2 30 4 17 17 8 1 1. 4 10 ( ) 2. 4 17 l 3. 4 24 l 4. 5 1 l ( ) 5. 5 8 l 2 6. 5 15 l - 2018 8 6 24 7. 5 22 l 8. 6 5 l - 9. 6 12 l 10. 6 19? l l 11. 7 3? l 12. 7 10? l 13. 7 17? l RB-H 3 4 5 6
More informationAntonio Falabella. 3 rd nternational Summer School on INtelligent Signal Processing for FrontIEr Research and Industry, September 2015, Hamburg
INFN - CNAF (Bologna) 3 rd nternational Summer School on INtelligent Signal Processing for FrontIEr Research and Industry, 14-25 September 2015, Hamburg 1 / 44 Overview 1 2 3 4 5 2 / 44 to Computing The
More information2018/10/
1 2018 10 2 2 30 10 2 17 8 1 1. 9 25 2. 10 2 l 3. 10 9 l 4. 10 16 l 5. 10 23 l 2 6. 10 30 l - 2019 2 4 24 7. 11 6 l 8. 11 20 l - 9. 11 27 l 10. 12 4 l l 11. 12 11 l 12. 12 18?? l 13. 1 8 l RB-H 3 4 5 6
More informationIntroductory MPI June 2008
7: http://people.sc.fsu.edu/ jburkardt/presentations/ fdi 2008 lecture7.pdf... John Information Technology Department Virginia Tech... FDI Summer Track V: Parallel Programming 10-12 June 2008 MPI: Why,
More informationThe Sieve of Erastothenes
The Sieve of Erastothenes Parallel and Distributed Computing Department of Computer Science and Engineering (DEI) Instituto Superior Técnico October 25, 2010 José Monteiro (DEI / IST) Parallel and Distributed
More informationA quick introduction to MPI (Message Passing Interface)
A quick introduction to MPI (Message Passing Interface) M1IF - APPD Oguz Kaya Pierre Pradic École Normale Supérieure de Lyon, France 1 / 34 Oguz Kaya, Pierre Pradic M1IF - Presentation MPI Introduction
More informationDistributed Memory Programming With MPI
John Interdisciplinary Center for Applied Mathematics & Information Technology Department Virginia Tech... Applied Computational Science II Department of Scientific Computing Florida State University http://people.sc.fsu.edu/
More informationThe equation does not have a closed form solution, but we see that it changes sign in [-1,0]
Numerical methods Introduction Let f(x)=e x -x 2 +x. When f(x)=0? The equation does not have a closed form solution, but we see that it changes sign in [-1,0] Because f(-0.5)=-0.1435, the root is in [-0.5,0]
More informationParallelization of the Dirac operator. Pushan Majumdar. Indian Association for the Cultivation of Sciences, Jadavpur, Kolkata
Parallelization of the Dirac operator Pushan Majumdar Indian Association for the Cultivation of Sciences, Jadavpur, Kolkata Outline Introduction Algorithms Parallelization Comparison of performances Conclusions
More informationCEE 618 Scientific Parallel Computing (Lecture 3)
1 / 36 CEE 618 Scientific Parallel Computing (Lecture 3) Linear Algebra Basics using LAPACK Albert S. Kim Department of Civil and Environmental Engineering University of Hawai i at Manoa 2540 Dole Street,
More informationParallelization of the QC-lib Quantum Computer Simulator Library
Parallelization of the QC-lib Quantum Computer Simulator Library Ian Glendinning and Bernhard Ömer September 9, 23 PPAM 23 1 Ian Glendinning / September 9, 23 Outline Introduction Quantum Bits, Registers
More informationIntroduction to MPI. School of Computational Science, 21 October 2005
Introduction to MPI John Burkardt School of Computational Science Florida State University... http://people.sc.fsu.edu/ jburkardt/presentations/ mpi 2005 fsu.pdf School of Computational Science, 21 October
More informationLecture 4. Writing parallel programs with MPI Measuring performance
Lecture 4 Writing parallel programs with MPI Measuring performance Announcements Wednesday s office hour moved to 1.30 A new version of Ring (Ring_new) that handles linear sequences of message lengths
More informationHigh-performance processing and development with Madagascar. July 24, 2010 Madagascar development team
High-performance processing and development with Madagascar July 24, 2010 Madagascar development team Outline 1 HPC terminology and frameworks 2 Utilizing data parallelism 3 HPC development with Madagascar
More informationTrivially parallel computing
Parallel Computing After briefly discussing the often neglected, but in praxis frequently encountered, issue of trivially parallel computing, we turn to parallel computing with information exchange. Our
More informationInverse problems. High-order optimization and parallel computing. Lecture 7
Inverse problems High-order optimization and parallel computing Nikolai Piskunov 2014 Lecture 7 Non-linear least square fit The (conjugate) gradient search has one important problem which often occurs
More informationParallelization of the QC-lib Quantum Computer Simulator Library
Parallelization of the QC-lib Quantum Computer Simulator Library Ian Glendinning and Bernhard Ömer VCPC European Centre for Parallel Computing at Vienna Liechtensteinstraße 22, A-19 Vienna, Austria http://www.vcpc.univie.ac.at/qc/
More informationAdministrivia. Course Objectives. Overview. Lecture Notes Week markem/cs333/ 2. Staff. 3. Prerequisites. 4. Grading. 1. Theory and application
Administrivia 1. markem/cs333/ 2. Staff 3. Prerequisites 4. Grading Course Objectives 1. Theory and application 2. Benefits 3. Labs TAs Overview 1. What is a computer system? CPU PC ALU System bus Memory
More informationDigital Systems EEE4084F. [30 marks]
Digital Systems EEE4084F [30 marks] Practical 3: Simulation of Planet Vogela with its Moon and Vogel Spiral Star Formation using OpenGL, OpenMP, and MPI Introduction The objective of this assignment is
More informationQuantum Chemical Calculations by Parallel Computer from Commodity PC Components
Nonlinear Analysis: Modelling and Control, 2007, Vol. 12, No. 4, 461 468 Quantum Chemical Calculations by Parallel Computer from Commodity PC Components S. Bekešienė 1, S. Sėrikovienė 2 1 Institute of
More informationQuantum ESPRESSO Performance Benchmark and Profiling. February 2017
Quantum ESPRESSO Performance Benchmark and Profiling February 2017 2 Note The following research was performed under the HPC Advisory Council activities Compute resource - HPC Advisory Council Cluster
More information1 / 28. Parallel Programming.
1 / 28 Parallel Programming pauldj@aices.rwth-aachen.de Collective Communication 2 / 28 Barrier Broadcast Reduce Scatter Gather Allgather Reduce-scatter Allreduce Alltoall. References Collective Communication:
More informationOutline. policies for the first part. with some potential answers... MCS 260 Lecture 10.0 Introduction to Computer Science Jan Verschelde, 9 July 2014
Outline 1 midterm exam on Friday 11 July 2014 policies for the first part 2 questions with some potential answers... MCS 260 Lecture 10.0 Introduction to Computer Science Jan Verschelde, 9 July 2014 Intro
More informationKurt Schmidt. June 5, 2017
Dept. of Computer Science, Drexel University June 5, 2017 Examples are taken from Kernighan & Pike, The Practice of ming, Addison-Wesley, 1999 Objective: To learn when and how to optimize the performance
More informationCEE 618 Scientific Parallel Computing (Lecture 12)
1 / 26 CEE 618 Scientific Parallel Computing (Lecture 12) Dissipative Hydrodynamics (DHD) Albert S. Kim Department of Civil and Environmental Engineering University of Hawai i at Manoa 2540 Dole Street,
More informationWeather Research and Forecasting (WRF) Performance Benchmark and Profiling. July 2012
Weather Research and Forecasting (WRF) Performance Benchmark and Profiling July 2012 Note The following research was performed under the HPC Advisory Council activities Participating vendors: Intel, Dell,
More informationSome notes on efficient computing and setting up high performance computing environments
Some notes on efficient computing and setting up high performance computing environments Andrew O. Finley Department of Forestry, Michigan State University, Lansing, Michigan. April 17, 2017 1 Efficient
More informationModelling and implementation of algorithms in applied mathematics using MPI
Modelling and implementation of algorithms in applied mathematics using MPI Lecture 3: Linear Systems: Simple Iterative Methods and their parallelization, Programming MPI G. Rapin Brazil March 2011 Outline
More informationParallelization of the Molecular Orbital Program MOS-F
Parallelization of the Molecular Orbital Program MOS-F Akira Asato, Satoshi Onodera, Yoshie Inada, Elena Akhmatskaya, Ross Nobes, Azuma Matsuura, Atsuya Takahashi November 2003 Fujitsu Laboratories of
More informationA Data Communication Reliability and Trustability Study for Cluster Computing
A Data Communication Reliability and Trustability Study for Cluster Computing Speaker: Eduardo Colmenares Midwestern State University Wichita Falls, TX HPC Introduction Relevant to a variety of sciences,
More informationCRYSTAL in parallel: replicated and distributed (MPP) data. Why parallel?
CRYSTAL in parallel: replicated and distributed (MPP) data Roberto Orlando Dipartimento di Chimica Università di Torino Via Pietro Giuria 5, 10125 Torino (Italy) roberto.orlando@unito.it 1 Why parallel?
More information4th year Project demo presentation
4th year Project demo presentation Colm Ó héigeartaigh CASE4-99387212 coheig-case4@computing.dcu.ie 4th year Project demo presentation p. 1/23 Table of Contents An Introduction to Quantum Computing The
More informationCluster Computing: Updraft. Charles Reid Scientific Computing Summer Workshop June 29, 2010
Cluster Computing: Updraft Charles Reid Scientific Computing Summer Workshop June 29, 2010 Updraft Cluster: Hardware 256 Dual Quad-Core Nodes 2048 Cores 2.8 GHz Intel Xeon Processors 16 GB memory per
More informationChe-Wei Chang Department of Computer Science and Information Engineering, Chang Gung University
Che-Wei Chang chewei@mail.cgu.edu.tw Department of Computer Science and Information Engineering, Chang Gung University } 2017/11/15 Midterm } 2017/11/22 Final Project Announcement 2 1. Introduction 2.
More informationSymmetric Pivoting in ScaLAPACK Craig Lucas University of Manchester Cray User Group 8 May 2006, Lugano
Symmetric Pivoting in ScaLAPACK Craig Lucas University of Manchester Cray User Group 8 May 2006, Lugano Introduction Introduction We wanted to parallelize a serial algorithm for the pivoted Cholesky factorization
More informationECEN 449: Microprocessor System Design Department of Electrical and Computer Engineering Texas A&M University
ECEN 449: Microprocessor System Design Department of Electrical and Computer Engineering Texas A&M University Prof. Sunil P Khatri (Lab exercise created and tested by Ramu Endluri, He Zhou and Sunil P
More informationECEN 651: Microprogrammed Control of Digital Systems Department of Electrical and Computer Engineering Texas A&M University
ECEN 651: Microprogrammed Control of Digital Systems Department of Electrical and Computer Engineering Texas A&M University Prof. Mi Lu TA: Ehsan Rohani Laboratory Exercise #4 MIPS Assembly and Simulation
More informationCprE 281: Digital Logic
CprE 28: Digital Logic Instructor: Alexander Stoytchev http://www.ece.iastate.edu/~alexs/classes/ Simple Processor CprE 28: Digital Logic Iowa State University, Ames, IA Copyright Alexander Stoytchev Digital
More informationww.padasalai.net
t w w ADHITHYA TRB- TET COACHING CENTRE KANCHIPURAM SUNDER MATRIC SCHOOL - 9786851468 TEST - 2 COMPUTER SCIENC PG - TRB DATE : 17. 03. 2019 t et t et t t t t UNIT 1 COMPUTER SYSTEM ARCHITECTURE t t t t
More informationSocket Programming. Daniel Zappala. CS 360 Internet Programming Brigham Young University
Socket Programming Daniel Zappala CS 360 Internet Programming Brigham Young University Sockets, Addresses, Ports Clients and Servers 3/33 clients request a service from a server using a protocol need an
More informationEfficient algorithms for symmetric tensor contractions
Efficient algorithms for symmetric tensor contractions Edgar Solomonik 1 Department of EECS, UC Berkeley Oct 22, 2013 1 / 42 Edgar Solomonik Symmetric tensor contractions 1/ 42 Motivation The goal is to
More information11 Parallel programming models
237 // Program Design 10.3 Assessing parallel programs 11 Parallel programming models Many different models for expressing parallelism in programming languages Actor model Erlang Scala Coordination languages
More informationDistributed Memory Parallelization in NGSolve
Distributed Memory Parallelization in NGSolve Lukas Kogler June, 2017 Inst. for Analysis and Scientific Computing, TU Wien From Shared to Distributed Memory Shared Memory Parallelization via threads (
More informationFactoring large integers using parallel Quadratic Sieve
Factoring large integers using parallel Quadratic Sieve Olof Åsbrink d95-oas@nada.kth.se Joel Brynielsson joel@nada.kth.se 14th April 2 Abstract Integer factorization is a well studied topic. Parts of
More informationDigital Electronics Part 1: Binary Logic
Digital Electronics Part 1: Binary Logic Electronic devices in your everyday life What makes these products examples of electronic devices? What are some things they have in common? 2 How do electronics
More informationCOMPUTER SCIENCE TRIPOS
CST.2016.2.1 COMPUTER SCIENCE TRIPOS Part IA Tuesday 31 May 2016 1.30 to 4.30 COMPUTER SCIENCE Paper 2 Answer one question from each of Sections A, B and C, and two questions from Section D. Submit the
More informationParallel Program Performance Analysis
Parallel Program Performance Analysis Chris Kauffman CS 499: Spring 2016 GMU Logistics Today Final details of HW2 interviews HW2 timings HW2 Questions Parallel Performance Theory Special Office Hours Mon
More informationFORCE ENERGY. only if F = F(r). 1 Nano-scale (10 9 m) 2 Nano to Micro-scale (10 6 m) 3 Nano to Meso-scale (10 3 m)
What is the force? CEE 618 Scientific Parallel Computing (Lecture 12) Dissipative Hydrodynamics (DHD) Albert S. Kim Department of Civil and Environmental Engineering University of Hawai i at Manoa 2540
More informationDiscrete-event simulations
Discrete-event simulations Lecturer: Dmitri A. Moltchanov E-mail: moltchan@cs.tut.fi http://www.cs.tut.fi/kurssit/elt-53606/ OUTLINE: Why do we need simulations? Step-by-step simulations; Classifications;
More informationCS Data Structures and Algorithm Analysis
CS 483 - Data Structures and Algorithm Analysis Lecture II: Chapter 2 R. Paul Wiegand George Mason University, Department of Computer Science February 1, 2006 Outline 1 Analysis Framework 2 Asymptotic
More informationNV-DVR09NET NV-DVR016NET
NV-DVR09NET NV-DVR016NET !,.,. :,.!,,.,!,,, CMOS/MOSFET. : 89/336/EEC, 93/68/EEC, 72/23/EEC,.,,. Novus Security Sp z o.o... 4 1. NV-DVR09NET NV-DVR016NET. 2.,. 3.,... 4... ( ) /. 5..... 6.,,.,. 7.,.. 8.,,.
More informationNovaToast SmartVision Project Requirements
NovaToast SmartVision Project Requirements Jacob Anderson William Chen Christopher Kim Jonathan Simozar Brian Wan Revision History v1.0: Initial creation of the document and first draft. v1.1 (2/9): Added
More informationWRF performance tuning for the Intel Woodcrest Processor
WRF performance tuning for the Intel Woodcrest Processor A. Semenov, T. Kashevarova, P. Mankevich, D. Shkurko, K. Arturov, N. Panov Intel Corp., pr. ak. Lavrentieva 6/1, Novosibirsk, Russia, 630090 {alexander.l.semenov,tamara.p.kashevarova,pavel.v.mankevich,
More informationCS1800: Hex & Logic. Professor Kevin Gold
CS1800: Hex & Logic Professor Kevin Gold Reviewing Last Time: Binary Last time, we saw that arbitrary numbers can be represented in binary. Each place in a binary number stands for a different power of
More informationMotors Automation Energy Transmission & Distribution Coatings. Servo Drive SCA06 V1.5X. Addendum to the Programming Manual SCA06 V1.
Motors Automation Energy Transmission & Distribution Coatings Servo Drive SCA06 V1.5X SCA06 V1.4X Series: SCA06 Language: English Document Number: 10003604017 / 01 Software Version: V1.5X Publication Date:
More informationCS425: Algorithms for Web Scale Data
CS425: Algorithms for Web Scale Data Most of the slides are from the Mining of Massive Datasets book. These slides have been modified for CS425. The original slides can be accessed at: www.mmds.org Challenges
More informationPorting a sphere optimization program from LAPACK to ScaLAPACK
Porting a sphere optimization program from LAPACK to ScaLAPACK Mathematical Sciences Institute, Australian National University. For presentation at Computational Techniques and Applications Conference
More informationTitriSoft 2.5. Content
Content TitriSoft 2.5... 1 Content... 2 General Remarks... 3 Requirements of TitriSoft 2.5... 4 Installation... 5 General Strategy... 7 Hardware Center... 10 Method Center... 13 Titration Center... 28
More information2.6 Complexity Theory for Map-Reduce. Star Joins 2.6. COMPLEXITY THEORY FOR MAP-REDUCE 51
2.6. COMPLEXITY THEORY FOR MAP-REDUCE 51 Star Joins A common structure for data mining of commercial data is the star join. For example, a chain store like Walmart keeps a fact table whose tuples each
More informationComp 11 Lectures. Mike Shah. July 26, Tufts University. Mike Shah (Tufts University) Comp 11 Lectures July 26, / 40
Comp 11 Lectures Mike Shah Tufts University July 26, 2017 Mike Shah (Tufts University) Comp 11 Lectures July 26, 2017 1 / 40 Please do not distribute or host these slides without prior permission. Mike
More informationThe Analysis of Microburst (Burstiness) on Virtual Switch
The Analysis of Microburst (Burstiness) on Virtual Switch Chunghan Lee Fujitsu Laboratories 09.19.2016 Copyright 2016 FUJITSU LABORATORIES LIMITED Background What is Network Function Virtualization (NFV)?
More informationEECS150 - Digital Design Lecture 23 - FFs revisited, FIFOs, ECCs, LSFRs. Cross-coupled NOR gates
EECS150 - Digital Design Lecture 23 - FFs revisited, FIFOs, ECCs, LSFRs April 16, 2009 John Wawrzynek Spring 2009 EECS150 - Lec24-blocks Page 1 Cross-coupled NOR gates remember, If both R=0 & S=0, then
More informationDatabases through Python-Flask and MariaDB
1 Databases through Python-Flask and MariaDB Tanmay Agarwal, Durga Keerthi and G V V Sharma Contents 1 Python-flask 1 1.1 Installation.......... 1 1.2 Testing Flask......... 1 2 Mariadb 1 2.1 Software
More informationRecent Progress of Parallel SAMCEF with MUMPS MUMPS User Group Meeting 2013
Recent Progress of Parallel SAMCEF with User Group Meeting 213 Jean-Pierre Delsemme Product Development Manager Summary SAMCEF, a brief history Co-simulation, a good candidate for parallel processing MAAXIMUS,
More informationRRQR Factorization Linux and Windows MEX-Files for MATLAB
Documentation RRQR Factorization Linux and Windows MEX-Files for MATLAB March 29, 2007 1 Contents of the distribution file The distribution file contains the following files: rrqrgate.dll: the Windows-MEX-File;
More informationCrossing the Chasm. On the Paths to Exascale: Presented by Mike Rezny, Monash University, Australia
On the Paths to Exascale: Crossing the Chasm Presented by Mike Rezny, Monash University, Australia michael.rezny@monash.edu Crossing the Chasm meeting Reading, 24 th October 2016 Version 0.1 In collaboration
More informationHigh-performance Technical Computing with Erlang
High-performance Technical Computing with Erlang Alceste Scalas Giovanni Casu Piero Pili Center for Advanced Studies, Research and Development in Sardinia ACM ICFP 2008 Erlang Workshop September 27th,
More informationProjectile Motion Slide 1/16. Projectile Motion. Fall Semester. Parallel Computing
Projectile Motion Slide 1/16 Projectile Motion Fall Semester Projectile Motion Slide 2/16 Topic Outline Historical Perspective ABC and ENIAC Ballistics tables Projectile Motion Air resistance Euler s method
More informationDivisible load theory
Divisible load theory Loris Marchal November 5, 2012 1 The context Context of the study Scientific computing : large needs in computation or storage resources Need to use systems with several processors
More informationParallelism in FreeFem++.
Parallelism in FreeFem++. Guy Atenekeng 1 Frederic Hecht 2 Laura Grigori 1 Jacques Morice 2 Frederic Nataf 2 1 INRIA, Saclay 2 University of Paris 6 Workshop on FreeFem++, 2009 Outline 1 Introduction Motivation
More informationINF2270 Spring Philipp Häfliger. Lecture 8: Superscalar CPUs, Course Summary/Repetition (1/2)
INF2270 Spring 2010 Philipp Häfliger Summary/Repetition (1/2) content From Scalar to Superscalar Lecture Summary and Brief Repetition Binary numbers Boolean Algebra Combinational Logic Circuits Encoder/Decoder
More informationMatrix Eigensystem Tutorial For Parallel Computation
Matrix Eigensystem Tutorial For Parallel Computation High Performance Computing Center (HPC) http://www.hpc.unm.edu 5/21/2003 1 Topic Outline Slide Main purpose of this tutorial 5 The assumptions made
More informationThe Performance Evolution of the Parallel Ocean Program on the Cray X1
The Performance Evolution of the Parallel Ocean Program on the Cray X1 Patrick H. Worley Oak Ridge National Laboratory John Levesque Cray Inc. 46th Cray User Group Conference May 18, 2003 Knoxville Marriott
More informationCubefree. Construction Algorithm for Cubefree Groups. A GAP4 Package. Heiko Dietrich
Cubefree Construction Algorithm for Cubefree Groups A GAP4 Package by Heiko Dietrich School of Mathematical Sciences Monash University Clayton VIC 3800 Australia email: heiko.dietrich@monash.edu September
More informationImplementation of a preconditioned eigensolver using Hypre
Implementation of a preconditioned eigensolver using Hypre Andrew V. Knyazev 1, and Merico E. Argentati 1 1 Department of Mathematics, University of Colorado at Denver, USA SUMMARY This paper describes
More informationMPI Implementations for Solving Dot - Product on Heterogeneous Platforms
MPI Implementations for Solving Dot - Product on Heterogeneous Platforms Panagiotis D. Michailidis and Konstantinos G. Margaritis Abstract This paper is focused on designing two parallel dot product implementations
More informationCMSC 313 Lecture 16 Announcement: no office hours today. Good-bye Assembly Language Programming Overview of second half on Digital Logic DigSim Demo
CMSC 33 Lecture 6 nnouncement: no office hours today. Good-bye ssembly Language Programming Overview of second half on Digital Logic DigSim Demo UMC, CMSC33, Richard Chang Good-bye ssembly
More informationHigh-Performance Scientific Computing
High-Performance Scientific Computing Instructor: Randy LeVeque TA: Grady Lemoine Applied Mathematics 483/583, Spring 2011 http://www.amath.washington.edu/~rjl/am583 World s fastest computers http://top500.org
More informationTips Geared Towards R. Adam J. Suarez. Arpil 10, 2015
Tips Geared Towards R Departments of Statistics North Carolina State University Arpil 10, 2015 1 / 30 Advantages of R As an interpretive and interactive language, developing an algorithm in R can be done
More informationAlgorithms for Collective Communication. Design and Analysis of Parallel Algorithms
Algorithms for Collective Communication Design and Analysis of Parallel Algorithms Source A. Grama, A. Gupta, G. Karypis, and V. Kumar. Introduction to Parallel Computing, Chapter 4, 2003. Outline One-to-all
More informationModel Order Reduction via Matlab Parallel Computing Toolbox. Istanbul Technical University
Model Order Reduction via Matlab Parallel Computing Toolbox E. Fatih Yetkin & Hasan Dağ Istanbul Technical University Computational Science & Engineering Department September 21, 2009 E. Fatih Yetkin (Istanbul
More informationOverview: Synchronous Computations
Overview: Synchronous Computations barriers: linear, tree-based and butterfly degrees of synchronization synchronous example 1: Jacobi Iterations serial and parallel code, performance analysis synchronous
More informationEfficient Longest Common Subsequence Computation using Bulk-Synchronous Parallelism
Efficient Longest Common Subsequence Computation using Bulk-Synchronous Parallelism Peter Krusche Department of Computer Science University of Warwick June 2006 Outline 1 Introduction Motivation The BSP
More informationAssignment 4: Object creation
Assignment 4: Object creation ETH Zurich Hand-out: 13 November 2006 Due: 21 November 2006 Copyright FarWorks, Inc. Gary Larson 1 Summary Today you are going to create a stand-alone program. How to create
More informationPanorama des modèles et outils de programmation parallèle
Panorama des modèles et outils de programmation parallèle Sylvain HENRY sylvain.henry@inria.fr University of Bordeaux - LaBRI - Inria - ENSEIRB April 19th, 2013 1/45 Outline Introduction Accelerators &
More informationTiming Results of a Parallel FFTsynth
Purdue University Purdue e-pubs Department of Computer Science Technical Reports Department of Computer Science 1994 Timing Results of a Parallel FFTsynth Robert E. Lynch Purdue University, rel@cs.purdue.edu
More informationGoals for Performance Lecture
Goals for Performance Lecture Understand performance, speedup, throughput, latency Relationship between cycle time, cycles/instruction (CPI), number of instructions (the performance equation) Amdahl s
More informationSolving the Inverse Toeplitz Eigenproblem Using ScaLAPACK and MPI *
Solving the Inverse Toeplitz Eigenproblem Using ScaLAPACK and MPI * J.M. Badía and A.M. Vidal Dpto. Informática., Univ Jaume I. 07, Castellón, Spain. badia@inf.uji.es Dpto. Sistemas Informáticos y Computación.
More informationINF Models of concurrency
INF4140 - Models of concurrency RPC and Rendezvous INF4140 Lecture 15. Nov. 2017 RPC and Rendezvous Outline More on asynchronous message passing interacting processes with different patterns of communication
More informationHigh Performance Computing
Master Degree Program in Computer Science and Networking, 2014-15 High Performance Computing 2 nd appello February 11, 2015 Write your name, surname, student identification number (numero di matricola),
More informationOne Optimized I/O Configuration per HPC Application
One Optimized I/O Configuration per HPC Application Leveraging I/O Configurability of Amazon EC2 Cloud Mingliang Liu, Jidong Zhai, Yan Zhai Tsinghua University Xiaosong Ma North Carolina State University
More informationTime. Today. l Physical clocks l Logical clocks
Time Today l Physical clocks l Logical clocks Events, process states and clocks " A distributed system a collection P of N singlethreaded processes without shared memory Each process p i has a state s
More informationCPU Scheduling. CPU Scheduler
CPU Scheduling These slides are created by Dr. Huang of George Mason University. Students registered in Dr. Huang s courses at GMU can make a single machine readable copy and print a single copy of each
More informationModule 5: CPU Scheduling
Module 5: CPU Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms Multiple-Processor Scheduling Real-Time Scheduling Algorithm Evaluation 5.1 Basic Concepts Maximum CPU utilization obtained
More informationPerformance and Application of Observation Sensitivity to Global Forecasts on the KMA Cray XE6
Performance and Application of Observation Sensitivity to Global Forecasts on the KMA Cray XE6 Sangwon Joo, Yoonjae Kim, Hyuncheol Shin, Eunhee Lee, Eunjung Kim (Korea Meteorological Administration) Tae-Hun
More informationCAEFEM v9.5 Information
CAEFEM v9.5 Information Concurrent Analysis Corporation, 50 Via Ricardo, Thousand Oaks, CA 91320 USA Tel. (805) 375 1060, Fax (805) 375 1061 email: info@caefem.com or support@caefem.com Web: http://www.caefem.com
More information