COSC 4397 Parallel Computation

COSC 4397 Solvng the Laplace Equaton wth MPI Sprng Numercal dfferentaton forward dfference formula y From the defnton of dervatves f( x+ f( f ( = lm h h one can derve an approxmaton for the st dervatve f( x+ f( f ( h xn h f( x n +h The same formula can be obtaned from the Taylor seres, e.g. h f ( x+ = f( + hf ( + f ( ξ) f( x+ hf( h f ( = f ( ξ) h f ( x n ) x

(5:) (5:) Central Dfference Formula A better formula s derved f lookng at the followng two terms 3 y h h ξ f ( x+ = f( + hf ( + f ( + f ( ) 3! 3 h h f ( x = f( hf ( + f ( f ( ξ) 3! Subtractng equaton (5:) from (5:) leads to hh f ( = [ f( x+ f( x ] h [...] x x n +h n h x h h s quadratc n the error term f( f ( x n ) Central Dfference Formula for nd Dervatves Extend (5:) and (5:) by an addtonal term 3 4 h h h f x+ = f( + hf ( + f ( + f ( + 3! 4! (4) ( f 3 4 h h h f x = f( hf ( + f ( f ( + 3! 4! Addng both equatons leads to 4 h f ( = [ f( x+ f( + f( x ] [...] h ( ξ) (4) ( f ( ξ )

Numercal dfferentaton - summary Forward dfference formula: f( x+ f( f ( = h Central dfference formula for the st dervatve: f ( = [ f( x+ f( x ] h Central dfference formula for the nd dervatve: f ( = [ f( x+ f( + f( x ] h Dfferental equatons - termnology Dfferental equatons: equatons contanng the dervatve of a functon as a varable An ordnary dfferental equaton (ODE) only contans functons of one ndependent varable A partal dfferental equaton (PDE) contans functons of multple ndependent varables and ther partal dervatves The order of a dfferental equaton s that of the hghest dervatve that t contans The goal s to fnd a functon y(t) whose dervatves fulfll the gven dfferental equatons, e.g. ( n) ( n ) y ( t) = f( t, y, y, y,..., y ) 3

Fnte Dfferences Approach for Solvng Dfferental Equatons Idea: replace the dervatves n the DE by an accordng approxmaton formula Typcally central dfferences y ( t) = [ y( t+ y( t ] h y ( t) = [ y( t+ y( t) + y( t ] h Example: Boundary value problem of an ordnary dfferental equaton d y dy = f( x, y, ) dx dx y(a) =α y(b) =β a x b Fnte Dfferences Approach (II) For smplcty, lets assume the ponts are equally spaced ( b a) x = a+ h n+ h= n + A two pont boundary value problem becomes then y =α ( y+ y + y ) = f( x, y, ( y+ y h h =β y n+ )) (x:) Equaton (x:) leads to a system of equatons Solvng the system of lnear equatons gves the soluton of the ODE at the dstnct ponts x x,...,, x, x n n+ 4

Example (I) Solve the followng two pont boundary value problem usng the fnte dfference method wth h=. d y dy + + x= x dx dx y( ) = y( ) = Snce h=., the mesh ponts are x =, x =., x =.4, x3 =.6, x4 =.8, x5 =. Thus, y = y( x ) y = y( x 5 ) = 5 = y - y 4 are unknown Example (II) Dscrete verson of the ODE usng central dfferences: ( y + ) + ( ) + = + y y y+ y x h h. ( y y y ) + ( y.4 ) + x + + + = 5( y+ y+ y ) + 5( y+ y ) + x = 5y+ 3y+ = y x y 5

=: Example (III) y x 5y+ 3y =. 5y+ 3y = =: 5y + 3y = =: y 5y+ 3y3 = 4 =3: y 5y3+ 3y4 = 6 =4: y3 5y4 = 68 or 5 3 5 3 5 y y = 3 y 3 5 y4 A y b 4 6 68 Solvng Ay=b usng B-CGSTAB Scalar product Gven A,b and an ntal guess y r = b Ay Gven rˆ such that rˆ T r ρ = α = ω = v =p = for =,, T ρ ˆ = r r ρ α β = ρ ω p = r + β p ω v v = Ap ρ α = T rˆ v s= r αv t= As T t s ω = T t t ( ) y = y +α p+ ωs r = s ωt Matrx-vector multplcaton 6

Scalar product: s= N = Parallel algorthm s= N/ = N/ Scalar product n parallel a[ ]* b[ ] ( a[ ]* b[ ]) + N = N/ ( a[ ]* b[ ]) N/ = ( alocal[ ]* blocal[ ]) + ( alocal[ ]* blocal[ ]) = = 444 4443 444 4443 rank= Process wth rank= a (... N ) b (... N ) a ( N... N) b( N... N) rank= requres communcaton between the processes Process wth rank= Matrx-vector product n parallel 5 3 5 3 5 x rhs x = rhs 3 x 3 rhs3 5 x4 rhs4 Process Process 5x + 3x x + x =rhs x x = rhs 5x 3 3 5x3+ 3 x 3 5x4 = rhs 4 =rhs 3 4 Process needs x 3 Process needs x 7

Matrx vector product n parallel (II) Introducton of ghost cells Process zero Process one x x x3 x x 3 Lookng at the source code, e.g p v = = r Ap + β( p ω v ) snce the vector used n the matrx vector multplcaton changes every teraton, you always have to update the ghost cells before dong the calculaton x 4 Matrx vector product n parallel (III) so the parallel algorthm for the same area s: p = r + β( p ω v ) Update the ghost-cells of p, e.g - Process sends p() to Process - Process sends p(3) to Process v = Ap 8

D Example - Laplace equaton (I) -D Laplace equaton u x, y) + x y ( u( x, y) = Central dscretzaton leads to u+, j u, j+ u, j u, j+ u, + h h,j+ j + u, j = -,j,j +,j,j- -D Example: Laplace equaton (II) Parallel doman decomposton Data exchange at process boundares requred Halo cells / Ghost cells Copy of the last row/column of data from the neghbor process 9

Example -D Laplace equaton (IV) Process mappng and determnng neghbor processes np : no of procs n x drecton x npy: no of procs n y drecton n = rank n n left rght n up down = rank+ = rank+ np = rank np x x 8 9,,, 3, 4 5 6 7,,, 3, 3,,, 3, At boundares: set the rank of the accordng neghbor to MPI_PROC_NULL a message sent to MPI_PROC_NULL wll be gnored by the MPI lbrary Easer: use cartesan topology functons y x Laplace equaton communcaton n y-drecton u(,j) s stored n a matrx n :!!assumng C!! Dmenson of u on an nner process (= not beng at a boundary): u( n xlocal, n + ) wth n xlocal ylocal : no of local ponts n x drecton no of local ponts n y drecton + ylocal u( : n xlocal,: nylocal) contanng the local data

Laplace equaton communcaton n y- drecton MPI_Request req[8]; MPI_Irecv(&u[][nylocal+], nxlocal, MPI_DOUBLE, nup, tag, comm, &req[]); MPI_Irecv(&u[][], nxlocal, MPI_DOUBLE, ndown, tag, comm, &req[]); MPI_Isend(&u[][nylocal], nxlocal, MPI_DOUBLE, nup, tag, comm, &req[]); MPI_Isend(&u[][], nxlocal, MPI_DOUBLE, nup, tag, comm, &req[3]); // Watall mght be postponed untl communcaton // n x-drecton has also been posted MPI_Watall (4, req, MPI_STATUSES_IGNORE); Laplace equaton communcaton n x- drecton Problem: the data whch we have to send s not contguous n the memory Logcal vew of the matrx Layout n memory of the same matrx (n C)

Laplace equaton communcaton n x-drecton How to mplement the halo-cell exchange n x-drecton? Send/Recv every element n a separate message + works - very slow - derved datatypes - copy the data nto a separate vector/array and send ths array + works a more general nterface s provded by MPI to pack data nto a contguous buffer before sendng Usng derved datatypes MPI_Datatype coldat; // Create a derved datatype descrbng a column // of your vector MPI_Type_vector (nylocal,, nxlocal, MPI_DOUBLE, &coldat ); MPI_Type_commt ( &coldat ); // use that datatype for the communcaton to your left // and rght neghbors MPI_Irecv ( &(u[][]),, coldat, nleft, tag, comm, &req[4] ); MPI_Irecv ( &(u[nylocal+][], coldat, nrght, tag, comm, &req[5] ); MPI_Isend ( &(u[][]),, coldat, nleft, tag, comm, &req[6] ); MPI_Isend ( &(u[nylocal][]),, coldat, nrght, tag, comm, &req[7] );

Packng a message MPI_Pack (vod* nbuf, nt ncount, MPI_Datatype dat, vod *outbuf, nt *pos, MPI_Comm comm); MPI_Pack copes ncount elements of type dat from nbuf nto the user provded buffer outbuf outbuf has to be large enough to hold the data pos contans the poston of the last packed data n outbuf. Has to be ntalzed to zero before frst usage can be called several tmes to pack ndependent peces of data Send and receve a message, whch has been packed usng the MPI datatype MPI_PACKED Packng a message (II) outbuf before pack, pos= pos MPI_Pack(nbuf,,MPI_INT,outbuf,&pos,comm); outbuf after st pack, pos=6 MPI nternal header pos MPI_Pack(nbuf,,MPI_FLOAT,outbuf,&pos,comm); outbuf after st pack, pos= pos 3

Unpackng a message MPI_Unpack(vod *nbuf, nt nsze, nt* pos, vod* outbuf, nt outcount, MPI_Datatype dat, MPI_Comm comm); MPI_Unpack copes outcount elements of type dat from nbuf nto the user provded buffer outbuf nbuf holds the whole message pos contans the poston of the last unpacked data n nbuf. Has to be ntalzed to zero before frst usage can be called several tmes to pack ndependent peces of data Determnng the sze of the pack-buffer MPI_Pack_sze(nt ncount, MPI_Datatype dat, MPI_Comm comm, nt *sze); MPI_Pack_sze returns the sze n bytes of the requred buffer to pack ncount elements of type dat usng MPI_Pack sze mght not be dentcal to ncount *szeof(orgnal datatype) several calls to MPI_Pack_sze requred, f you plan to pack more than one type of dat sum up the returned szes you can use sze e.g. to malloc a buffer 4

Laplace equaton communcaton n x- drecton (I) double *sbufleft, *sbufrgh, *rbufleft, *rbufrght; nt bufsze, posleft=, posrght=; /* determne the requred buffer szes and allocate the buffers */ MPI_Pack_sze (nylocal, MPI_DOUBLE, comm, &bufsze); sbufleft = malloc(bufsze); sbufrght = malloc(bufsze); rbufleft = malloc(bufsze); rbufrght = malloc(bufsze); /* Pack the data before sendng */ for (=; <nylocal+; ++) { MPI_Pack (u[nxlocal][],, MPI_DOUBLE, sbufrght, &posrght, comm); MPI_Pack (u[][],, MPI_DOUBLE, sbufleft &posleft, comm); } Laplace equaton communcaton n x- drecton (II) /* Execute now the real communcaton */ MPI_Irecv(rbufleft, bufsze, MPI_PACKED, nleft, tag, comm, &req[]); MPI_Irecv(rbufrght, bufsze, MPI_PACKED, nrght,tag, comm, &req[]); MPI_Isend(sbufleft, posleft, MPI_PACKED, nleft, tag, comm, &req[]); MPI_Isend(sbufrght, posrght, MPI_PACKED, nrght, tag, comm, &req[3]); MPI_Watall (4, req, MPI_STATUSES_IGNORE); /* Unpack the receved data */ posrght = posleft = ; for (=; <nylocal+; ++) { MPI_Unpack(rbufrght, bufsze, &posrght, u[nxlocal+][],, MPI_DOUBLE, comm); MPI_Unpack (rbufleft, bufsze, &posleft, u[][],, MPI_DOUBLE, comm); } 5

Derved data types vs. pack/unpack Advantages of derved datatypes: avods temporary buffers code potentally shorter gves the MPI lbrary the possblty to optmze the accordng operatons Advantages of pack/unpack mght lead to performance advantages f the same packed buffer has to be sent to multple targets many users fnd pack/unpack ntutve smlar to smply copyng the data tems nto a temporary buffer MPI Parallel Programmng Summary Executon of the same executable np-tmes processes-specfc codes sectons can rely on the rank of a process ndvdual communcaton (e.g. MPI_Send/Recv) vs. collectve communcaton (e.g. MPI_Bcast) blockng communcaton (e.g. MPI_Send/Recv) vs. non-blockng communcaton (e.g. MPI_Isend/Irecv) group and communcator management (e.g. MPI_Comm_splt/MPI_Comm_create) process topologes (e.g. MPI_Cart_create) 6

MPI Parallel Programmng Summary Data dstrbuton (e.g. -D block column/row wse, -D) Dscontguous data tems ( derved data types, pack/unpack) 7