COSC 4397 Parallel Computation

Similar documents
COSC 6374 Parallel Computation

NUMERICAL DIFFERENTIATION

Chapter 12. Ordinary Differential Equation Boundary Value (BV) Problems

Numerical Heat and Mass Transfer

Transfer Functions. Convenient representation of a linear, dynamic model. A transfer function (TF) relates one input and one output: ( ) system

Linear Approximation with Regularization and Moving Least Squares

Lecture 21: Numerical methods for pricing American type derivatives

EEE 241: Linear Systems

2 Finite difference basics

New Method for Solving Poisson Equation. on Irregular Domains

Inner Product. Euclidean Space. Orthonormal Basis. Orthogonal

Lecture 12: Discrete Laplacian

= z 20 z n. (k 20) + 4 z k = 4

Complex Numbers. x = B B 2 4AC 2A. or x = x = 2 ± 4 4 (1) (5) 2 (1)

Module 3 LOSSY IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

Additional Codes using Finite Difference Method. 1 HJB Equation for Consumption-Saving Problem Without Uncertainty

MMA and GCMMA two methods for nonlinear optimization

One-sided finite-difference approximations suitable for use with Richardson extrapolation

Report on Image warping

CHAPTER 5 NUMERICAL EVALUATION OF DYNAMIC RESPONSE

FTCS Solution to the Heat Equation

Lecture 10 Support Vector Machines II

Implicit Integration Henyey Method

ON A DETERMINATION OF THE INITIAL FUNCTIONS FROM THE OBSERVED VALUES OF THE BOUNDARY FUNCTIONS FOR THE SECOND-ORDER HYPERBOLIC EQUATION

NON-CENTRAL 7-POINT FORMULA IN THE METHOD OF LINES FOR PARABOLIC AND BURGERS' EQUATIONS

Singular Value Decomposition: Theory and Applications

n α j x j = 0 j=1 has a nontrivial solution. Here A is the n k matrix whose jth column is the vector for all t j=0

U.C. Berkeley CS294: Beyond Worst-Case Analysis Luca Trevisan September 5, 2017

Hidden Markov Models & The Multivariate Gaussian (10/26/04)

8.4 COMPLEX VECTOR SPACES AND INNER PRODUCTS

Introduction to Vapor/Liquid Equilibrium, part 2. Raoult s Law:

Numerical Solution of Ordinary Differential Equations

Differentiating Gaussian Processes

Relaxation Methods for Iterative Solution to Linear Systems of Equations

DUE: WEDS FEB 21ST 2018

The Multiple Classical Linear Regression Model (CLRM): Specification and Assumptions. 1. Introduction

1 Matrix representations of canonical matrices

Difference Equations

A Hybrid Variational Iteration Method for Blasius Equation

Chapter Newton s Method

APPENDIX A Some Linear Algebra

Lecture 2: Numerical Methods for Differentiations and Integrations

Consistency & Convergence

MAE140 - Linear Circuits - Winter 16 Final, March 16, 2016

Errors for Linear Systems

APPROXIMATE PRICES OF BASKET AND ASIAN OPTIONS DUPONT OLIVIER. Premia 14

Homework Notes Week 7

ECE559VV Project Report

Chapter 4 The Wave Equation

Bezier curves. Michael S. Floater. August 25, These notes provide an introduction to Bezier curves. i=0

ELASTIC WAVE PROPAGATION IN A CONTINUOUS MEDIUM

Lecture 5.8 Flux Vector Splitting

3.1 Expectation of Functions of Several Random Variables. )' be a k-dimensional discrete or continuous random vector, with joint PMF p (, E X E X1 E X

Appendix B. The Finite Difference Scheme

An Interactive Optimisation Tool for Allocation Problems

Some Comments on Accelerating Convergence of Iterative Sequences Using Direct Inversion of the Iterative Subspace (DIIS)

Advanced Quantum Mechanics

Overall Problem. Parallel Program Design Patterns and Strategies. Contents. 1. Patterns for Functional Decomposition

Bézier curves. Michael S. Floater. September 10, These notes provide an introduction to Bézier curves. i=0

PART 8. Partial Differential Equations PDEs

Dynamic Programming. Preview. Dynamic Programming. Dynamic Programming. Dynamic Programming (Example: Fibonacci Sequence)

Analysis of Discrete Time Queues (Section 4.6)

Lecture 2 Solution of Nonlinear Equations ( Root Finding Problems )

Linear Regression Analysis: Terminology and Notation

Grid Generation around a Cylinder by Complex Potential Functions

CHARACTERISTICS OF COMPLEX SEPARATION SCHEMES AND AN ERROR OF SEPARATION PRODUCTS OUTPUT DETERMINATION

Lecture 3. Ax x i a i. i i

Exercises. 18 Algorithms

Inductance Calculation for Conductors of Arbitrary Shape

2.3 Nilpotent endomorphisms

NP-Completeness : Proofs

The Exact Formulation of the Inverse of the Tridiagonal Matrix for Solving the 1D Poisson Equation with the Finite Difference Method

CIS526: Machine Learning Lecture 3 (Sept 16, 2003) Linear Regression. Preparation help: Xiaoying Huang. x 1 θ 1 output... θ M x M

Salmon: Lectures on partial differential equations. Consider the general linear, second-order PDE in the form. ,x 2

Hongyi Miao, College of Science, Nanjing Forestry University, Nanjing ,China. (Received 20 June 2013, accepted 11 March 2014) I)ϕ (k)

Section 8.3 Polar Form of Complex Numbers

A linear imaging system with white additive Gaussian noise on the observed data is modeled as follows:

Chapter 5. Solution of System of Linear Equations. Module No. 6. Solution of Inconsistent and Ill Conditioned Systems

ACTM State Calculus Competition Saturday April 30, 2011

Modelli Clamfim Equazione del Calore Lezione ottobre 2014

Topic 5: Non-Linear Regression

6.854J / J Advanced Algorithms Fall 2008

8.592J: Solutions for Assignment 7 Spring 2005

MATH 5630: Discrete Time-Space Model Hung Phan, UMass Lowell March 1, 2018

For now, let us focus on a specific model of neurons. These are simplified from reality but can achieve remarkable results.

Calculation of time complexity (3%)

Numerical Transient Heat Conduction Experiment

Structure and Drive Paul A. Jensen Copyright July 20, 2003

ISQS 6348 Final Open notes, no books. Points out of 100 in parentheses. Y 1 ε 2

6. Stochastic processes (2)

The optimal delay of the second test is therefore approximately 210 hours earlier than =2.

ME 501A Seminar in Engineering Analysis Page 1

MEM 255 Introduction to Control Systems Review: Basics of Linear Algebra

Nice plotting of proteins II

6. Stochastic processes (2)

PHYS 705: Classical Mechanics. Calculus of Variations II

U.C. Berkeley CS294: Spectral Methods and Expanders Handout 8 Luca Trevisan February 17, 2016

Note 10. Modeling and Simulation of Dynamic Systems

A new Approach for Solving Linear Ordinary Differential Equations

Week3, Chapter 4. Position and Displacement. Motion in Two Dimensions. Instantaneous Velocity. Average Velocity

Transcription:

COSC 4397 Solvng the Laplace Equaton wth MPI Sprng Numercal dfferentaton forward dfference formula y From the defnton of dervatves f( x+ f( f ( = lm h h one can derve an approxmaton for the st dervatve f( x+ f( f ( h xn h f( x n +h The same formula can be obtaned from the Taylor seres, e.g. h f ( x+ = f( + hf ( + f ( ξ) f( x+ hf( h f ( = f ( ξ) h f ( x n ) x

(5:) (5:) Central Dfference Formula A better formula s derved f lookng at the followng two terms 3 y h h ξ f ( x+ = f( + hf ( + f ( + f ( ) 3! 3 h h f ( x = f( hf ( + f ( f ( ξ) 3! Subtractng equaton (5:) from (5:) leads to hh f ( = [ f( x+ f( x ] h [...] x x n +h n h x h h s quadratc n the error term f( f ( x n ) Central Dfference Formula for nd Dervatves Extend (5:) and (5:) by an addtonal term 3 4 h h h f x+ = f( + hf ( + f ( + f ( + 3! 4! (4) ( f 3 4 h h h f x = f( hf ( + f ( f ( + 3! 4! Addng both equatons leads to 4 h f ( = [ f( x+ f( + f( x ] [...] h ( ξ) (4) ( f ( ξ )

Numercal dfferentaton - summary Forward dfference formula: f( x+ f( f ( = h Central dfference formula for the st dervatve: f ( = [ f( x+ f( x ] h Central dfference formula for the nd dervatve: f ( = [ f( x+ f( + f( x ] h Dfferental equatons - termnology Dfferental equatons: equatons contanng the dervatve of a functon as a varable An ordnary dfferental equaton (ODE) only contans functons of one ndependent varable A partal dfferental equaton (PDE) contans functons of multple ndependent varables and ther partal dervatves The order of a dfferental equaton s that of the hghest dervatve that t contans The goal s to fnd a functon y(t) whose dervatves fulfll the gven dfferental equatons, e.g. ( n) ( n ) y ( t) = f( t, y, y, y,..., y ) 3

Fnte Dfferences Approach for Solvng Dfferental Equatons Idea: replace the dervatves n the DE by an accordng approxmaton formula Typcally central dfferences y ( t) = [ y( t+ y( t ] h y ( t) = [ y( t+ y( t) + y( t ] h Example: Boundary value problem of an ordnary dfferental equaton d y dy = f( x, y, ) dx dx y(a) =α y(b) =β a x b Fnte Dfferences Approach (II) For smplcty, lets assume the ponts are equally spaced ( b a) x = a+ h n+ h= n + A two pont boundary value problem becomes then y =α ( y+ y + y ) = f( x, y, ( y+ y h h =β y n+ )) (x:) Equaton (x:) leads to a system of equatons Solvng the system of lnear equatons gves the soluton of the ODE at the dstnct ponts x x,...,, x, x n n+ 4

Example (I) Solve the followng two pont boundary value problem usng the fnte dfference method wth h=. d y dy + + x= x dx dx y( ) = y( ) = Snce h=., the mesh ponts are x =, x =., x =.4, x3 =.6, x4 =.8, x5 =. Thus, y = y( x ) y = y( x 5 ) = 5 = y - y 4 are unknown Example (II) Dscrete verson of the ODE usng central dfferences: ( y + ) + ( ) + = + y y y+ y x h h. ( y y y ) + ( y.4 ) + x + + + = 5( y+ y+ y ) + 5( y+ y ) + x = 5y+ 3y+ = y x y 5

=: Example (III) y x 5y+ 3y =. 5y+ 3y = =: 5y + 3y = =: y 5y+ 3y3 = 4 =3: y 5y3+ 3y4 = 6 =4: y3 5y4 = 68 or 5 3 5 3 5 y y = 3 y 3 5 y4 A y b 4 6 68 Solvng Ay=b usng B-CGSTAB Scalar product Gven A,b and an ntal guess y r = b Ay Gven rˆ such that rˆ T r ρ = α = ω = v =p = for =,, T ρ ˆ = r r ρ α β = ρ ω p = r + β p ω v v = Ap ρ α = T rˆ v s= r αv t= As T t s ω = T t t ( ) y = y +α p+ ωs r = s ωt Matrx-vector multplcaton 6

Scalar product: s= N = Parallel algorthm s= N/ = N/ Scalar product n parallel a[ ]* b[ ] ( a[ ]* b[ ]) + N = N/ ( a[ ]* b[ ]) N/ = ( alocal[ ]* blocal[ ]) + ( alocal[ ]* blocal[ ]) = = 444 4443 444 4443 rank= Process wth rank= a (... N ) b (... N ) a ( N... N) b( N... N) rank= requres communcaton between the processes Process wth rank= Matrx-vector product n parallel 5 3 5 3 5 x rhs x = rhs 3 x 3 rhs3 5 x4 rhs4 Process Process 5x + 3x x + x =rhs x x = rhs 5x 3 3 5x3+ 3 x 3 5x4 = rhs 4 =rhs 3 4 Process needs x 3 Process needs x 7

Matrx vector product n parallel (II) Introducton of ghost cells Process zero Process one x x x3 x x 3 Lookng at the source code, e.g p v = = r Ap + β( p ω v ) snce the vector used n the matrx vector multplcaton changes every teraton, you always have to update the ghost cells before dong the calculaton x 4 Matrx vector product n parallel (III) so the parallel algorthm for the same area s: p = r + β( p ω v ) Update the ghost-cells of p, e.g - Process sends p() to Process - Process sends p(3) to Process v = Ap 8

D Example - Laplace equaton (I) -D Laplace equaton u x, y) + x y ( u( x, y) = Central dscretzaton leads to u+, j u, j+ u, j u, j+ u, + h h,j+ j + u, j = -,j,j +,j,j- -D Example: Laplace equaton (II) Parallel doman decomposton Data exchange at process boundares requred Halo cells / Ghost cells Copy of the last row/column of data from the neghbor process 9

Example -D Laplace equaton (IV) Process mappng and determnng neghbor processes np : no of procs n x drecton x npy: no of procs n y drecton n = rank n n left rght n up down = rank+ = rank+ np = rank np x x 8 9,,, 3, 4 5 6 7,,, 3, 3,,, 3, At boundares: set the rank of the accordng neghbor to MPI_PROC_NULL a message sent to MPI_PROC_NULL wll be gnored by the MPI lbrary Easer: use cartesan topology functons y x Laplace equaton communcaton n y-drecton u(,j) s stored n a matrx n :!!assumng C!! Dmenson of u on an nner process (= not beng at a boundary): u( n xlocal, n + ) wth n xlocal ylocal : no of local ponts n x drecton no of local ponts n y drecton + ylocal u( : n xlocal,: nylocal) contanng the local data

Laplace equaton communcaton n y- drecton MPI_Request req[8]; MPI_Irecv(&u[][nylocal+], nxlocal, MPI_DOUBLE, nup, tag, comm, &req[]); MPI_Irecv(&u[][], nxlocal, MPI_DOUBLE, ndown, tag, comm, &req[]); MPI_Isend(&u[][nylocal], nxlocal, MPI_DOUBLE, nup, tag, comm, &req[]); MPI_Isend(&u[][], nxlocal, MPI_DOUBLE, nup, tag, comm, &req[3]); // Watall mght be postponed untl communcaton // n x-drecton has also been posted MPI_Watall (4, req, MPI_STATUSES_IGNORE); Laplace equaton communcaton n x- drecton Problem: the data whch we have to send s not contguous n the memory Logcal vew of the matrx Layout n memory of the same matrx (n C)

Laplace equaton communcaton n x-drecton How to mplement the halo-cell exchange n x-drecton? Send/Recv every element n a separate message + works - very slow - derved datatypes - copy the data nto a separate vector/array and send ths array + works a more general nterface s provded by MPI to pack data nto a contguous buffer before sendng Usng derved datatypes MPI_Datatype coldat; // Create a derved datatype descrbng a column // of your vector MPI_Type_vector (nylocal,, nxlocal, MPI_DOUBLE, &coldat ); MPI_Type_commt ( &coldat ); // use that datatype for the communcaton to your left // and rght neghbors MPI_Irecv ( &(u[][]),, coldat, nleft, tag, comm, &req[4] ); MPI_Irecv ( &(u[nylocal+][], coldat, nrght, tag, comm, &req[5] ); MPI_Isend ( &(u[][]),, coldat, nleft, tag, comm, &req[6] ); MPI_Isend ( &(u[nylocal][]),, coldat, nrght, tag, comm, &req[7] );

Packng a message MPI_Pack (vod* nbuf, nt ncount, MPI_Datatype dat, vod *outbuf, nt *pos, MPI_Comm comm); MPI_Pack copes ncount elements of type dat from nbuf nto the user provded buffer outbuf outbuf has to be large enough to hold the data pos contans the poston of the last packed data n outbuf. Has to be ntalzed to zero before frst usage can be called several tmes to pack ndependent peces of data Send and receve a message, whch has been packed usng the MPI datatype MPI_PACKED Packng a message (II) outbuf before pack, pos= pos MPI_Pack(nbuf,,MPI_INT,outbuf,&pos,comm); outbuf after st pack, pos=6 MPI nternal header pos MPI_Pack(nbuf,,MPI_FLOAT,outbuf,&pos,comm); outbuf after st pack, pos= pos 3

Unpackng a message MPI_Unpack(vod *nbuf, nt nsze, nt* pos, vod* outbuf, nt outcount, MPI_Datatype dat, MPI_Comm comm); MPI_Unpack copes outcount elements of type dat from nbuf nto the user provded buffer outbuf nbuf holds the whole message pos contans the poston of the last unpacked data n nbuf. Has to be ntalzed to zero before frst usage can be called several tmes to pack ndependent peces of data Determnng the sze of the pack-buffer MPI_Pack_sze(nt ncount, MPI_Datatype dat, MPI_Comm comm, nt *sze); MPI_Pack_sze returns the sze n bytes of the requred buffer to pack ncount elements of type dat usng MPI_Pack sze mght not be dentcal to ncount *szeof(orgnal datatype) several calls to MPI_Pack_sze requred, f you plan to pack more than one type of dat sum up the returned szes you can use sze e.g. to malloc a buffer 4

Laplace equaton communcaton n x- drecton (I) double *sbufleft, *sbufrgh, *rbufleft, *rbufrght; nt bufsze, posleft=, posrght=; /* determne the requred buffer szes and allocate the buffers */ MPI_Pack_sze (nylocal, MPI_DOUBLE, comm, &bufsze); sbufleft = malloc(bufsze); sbufrght = malloc(bufsze); rbufleft = malloc(bufsze); rbufrght = malloc(bufsze); /* Pack the data before sendng */ for (=; <nylocal+; ++) { MPI_Pack (u[nxlocal][],, MPI_DOUBLE, sbufrght, &posrght, comm); MPI_Pack (u[][],, MPI_DOUBLE, sbufleft &posleft, comm); } Laplace equaton communcaton n x- drecton (II) /* Execute now the real communcaton */ MPI_Irecv(rbufleft, bufsze, MPI_PACKED, nleft, tag, comm, &req[]); MPI_Irecv(rbufrght, bufsze, MPI_PACKED, nrght,tag, comm, &req[]); MPI_Isend(sbufleft, posleft, MPI_PACKED, nleft, tag, comm, &req[]); MPI_Isend(sbufrght, posrght, MPI_PACKED, nrght, tag, comm, &req[3]); MPI_Watall (4, req, MPI_STATUSES_IGNORE); /* Unpack the receved data */ posrght = posleft = ; for (=; <nylocal+; ++) { MPI_Unpack(rbufrght, bufsze, &posrght, u[nxlocal+][],, MPI_DOUBLE, comm); MPI_Unpack (rbufleft, bufsze, &posleft, u[][],, MPI_DOUBLE, comm); } 5

Derved data types vs. pack/unpack Advantages of derved datatypes: avods temporary buffers code potentally shorter gves the MPI lbrary the possblty to optmze the accordng operatons Advantages of pack/unpack mght lead to performance advantages f the same packed buffer has to be sent to multple targets many users fnd pack/unpack ntutve smlar to smply copyng the data tems nto a temporary buffer MPI Parallel Programmng Summary Executon of the same executable np-tmes processes-specfc codes sectons can rely on the rank of a process ndvdual communcaton (e.g. MPI_Send/Recv) vs. collectve communcaton (e.g. MPI_Bcast) blockng communcaton (e.g. MPI_Send/Recv) vs. non-blockng communcaton (e.g. MPI_Isend/Irecv) group and communcator management (e.g. MPI_Comm_splt/MPI_Comm_create) process topologes (e.g. MPI_Cart_create) 6

MPI Parallel Programmng Summary Data dstrbuton (e.g. -D block column/row wse, -D) Dscontguous data tems ( derved data types, pack/unpack) 7