Parallel Programming in C with MPI and OpenMP

Size: px

Start display at page:

Download "Parallel Programming in C with MPI and OpenMP"

Gwen Anthony
5 years ago
Views:

1 Parallel Programming in C with MPI and OpenMP Michael J. Quinn

2 Chapter 13 Finite Difference Methods

3 Outline n Ordinary and partial differential equations n Finite difference methods n Vibrating string problem n Steady state heat distribution problem

4 Ordinary and Partial Differential Equations n Ordinary differential equation: equation containing derivatives of a function of one variable in one dimension n Partial differential equation: equation containing derivatives of a function in higher Dimensions where the differentiation occurs in some dimension(s).

5 Examples of Phenomena Modeled by PDEs n Air flow over an aircraft wing n Blood circulation in human body n Water circulation in an ocean n Bridge deformations as its carries traffic n Evolution of a thunderstorm n Oscillations of a skyscraper hit by earthquake n Heat transfer

6 Model of Sea Surface Temperature in Atlantic Ocean Courtesy MICOM group at the Rosenstiel School of Marine and Atmospheric Science, University of Miami

7 Difference Quotients f(x+h/2) f'(x) f(x-h/2) x-h x-h/2 x x+h/2 x+h

8 Formulas for 1st, 2d Derivatives h h x f h x f x f 2) / ( 2) / ( ) '( - - +» h h x f h x f x f 2) / ( 2) / ( ) '( - - +» 2 ) ( ) ( 2 ) ( ) ''( h h x f x f h x f x f »

9 f (x) and f (x) n f (x) often called df/dx ufirst derivative, the rate of change of f in the x direction. If f is location then f is speed. n f (x) often called d 2 f/dx 2 usecond derivative, the rate of change of f in x direction. If f is speed, then f is accelleration.

10 Vibrating String Problem t=0.0 and 1.0 t=0.1 and 0.9 t=0.2 and 0.8 t = 0.3 and 0.7 t=0.4 and 0.6 t= Vibrating string modeled by a hyperbolic PDE

Solution Stored in 2-D Matrix n Each row represents state of string at some point in time n Each column shows how position of string at a particular

11 Solution Stored in 2-D Matrix n Each row represents state of string at some point in time n Each column shows how position of string at a particular point changes with time n In some cases (jacobi 1D in PA4) we are just interested in the end state (heat transfer) and we keep only 2 vectors (prev and curr)

12 Discrete Space, Time Intervals Lead to 2-D Matrix T u = u(x, t ) i,j i j t 0 a x

13 Heart of Sequential C Program u[j+1][i] = 2.0*(1.0-L)*u[j][i] + L*(u[j][i+1] + u[j][i-1]) - u[j-1][i]; u(i,j+1) u(i-1,j) u(i,j) u(i+1,j) u(i,j-1)

14 Parallel Program Design n Associate primitive task with each element of matrix n Examine communication pattern n Agglomerate tasks in same column n Static number of identical tasks n Regular communication pattern n Strategy: agglomerate columns, assign one block of columns to each task

15 Result of Agglomeration and Mapping

16 Communication Still Needed n Initial values (in lowest row) are computed without communication n Values in black cells cannot be computed without access to values held by other tasks

17 Ghost Points n Ghost points: memory locations used to store redundant copies of data held by neighboring processes n Allocating ghost points as extra columns simplifies parallel algorithm by allowing same loop to update all cells

18 Matrices Augmented with Ghost Points Lilac cells are the ghost points.

19 Communication in an Iteration This iteration the process is responsible for computing the values of the yellow cells.

20 Computation in an Iteration This iteration the process is responsible for computing the values of the yellow cells. The striped cells are the ones accessed as the yellow cell values are computed.

21 Complexity Analysis n Computation time per element is constant, so sequential time complexity per iteration is Q(n) n Elements divided evenly among processes, so parallel computational complexity per iteration is Q(n / p) n During each iteration a process with an interior block sends two messages and receives two messages, so communication complexity per iteration is Q(1)

Replicating Computations n If only one value is transmitted, the execution time is dominated by communication (message latency) n We can reduce the number of communications by

22 Replicating Computations n If only one value is transmitted, the execution time is dominated by communication (message latency) n We can reduce the number of communications by replicating computations n If we send k values instead of one, we can advance the simulation k time steps before another communication n We call these k values Ghost or Halo elements

23 Replicating Computations Without replication: Step 1 Exchange data With replication: Step 2 Step 1 Exchange data

24 Communication Time vs. Number of Ghost Points Time Ghost Points

25 Steady State Heat Distribution Problem Ice bath Steam Steam Steam

26 Heart of Sequential C Program w[i][j] = (u[i-1][j] + u[i+1][j] + u[i][j-1] + u[i][j+1]) / 4.0; u(i,j+1) u(i-1,j) w(i,j) u(i+1,j) u(i,j-1)

27 Parallel Algorithm 1 n Associate primitive task with each matrix element n Agglomerate tasks in contiguous rows (rowwise block striped decomposition) n Add rows of ghost points above and below rectangular region controlled by process

28 Example Decomposition grid divided among 4 processors

29 Example Decomposition grid divided among 4 processors Here we can also avoid communication by replicated computation (larger ghost buffers)

30 Example Decomposition grid divided among 16 processors

31 Implementation Details n Using ghost points around 2-D blocks requires extra copying steps n Ghost points for left and right sides are not in contiguous memory locations n An auxiliary buffer must be used when receiving these ghost point values n Similarly, buffer must be used when sending column of values to a neighboring process

32 Summary (1) n PDEs used to model behavior of a wide variety of physical systems n Realistic problems yield PDEs too difficult to solve analytically, so scientists solve them numerically n Two most common numerical techniques for solving PDEs u finite element method u finite difference method

33 Summary (2) n Finite different methods umatrix-based methods store matrix explicitly umatrix-free implementations store matrix implicitly n We have designed and analyzed parallel algorithms based on matrix-free implementations

34 Summary (3) n Ghost points store copies of values held by other processes n Explored increasing number of ghost points and replicating computation in order to reduce number of message exchanges n Optimal number of ghost points depends on characteristics of parallel system

Solution of Linear Systems

Solution of Linear Systems Parallel and Distributed Computing Department of Computer Science and Engineering (DEI) Instituto Superior Técnico May 12, 2016 CPD (DEI / IST) Parallel and Distributed Computing