Overview: Parallelisation via Pipelining

Size: px

Start display at page:

Download "Overview: Parallelisation via Pipelining"

Beverly McDaniel
5 years ago
Views:

1 Overview: Parallelisation via Pipelining three type of pipelines adding numbers (type ) performance analysis of pipelines insertion sort (type ) linear system back substitution (type ) Ref: chapter : Wilkinson and Allen COMP00/800 L: Parallelisation via Pipelining 07

2 Pipelining Already encountered instruction pipelining at the CPU level problems that can be divided into a series of sequential tasks that can be completed one after another e.g. a frequency filter in which each process filters one frequency three typical scenarios:. if more than one instance of the complete problem is to be executed. if a series of data items must be processed, each requiring multiple operations. if information to start the next process can be passed forward before the process has completed all its internal operations COMP00/800 L: Parallelisation via Pipelining 07

3 Type Pipelining p m P P P P P P Time COMP00/800 L: Parallelisation via Pipelining 07

4 Type Pipelining Input Sequence d 9 d d d d d P0 P P P P P P6 P7 P8 P9 p n P9 d d d d d P8 d d d d d P7 d d d d d P6 d d d d d d 9 P d d d d d d 9 P d d d d d d 9 P d d d d d d 9 P d d d d d d 9 P d d d d d d 9 P0 d d d d d d 9 Time COMP00/800 L: Parallelisation via Pipelining 07

5 Type Pipelining Information passed to next stage P P P P P P P Even Time P Uneven Time P P P0 P0 Time Time COMP00/800 L: Parallelisation via Pipelining 07

6 Example Type : Adding Numbers s = n i i= Σ ι Σ ι Σ ι Σ ι Σ ι P0 P P P P accumulation = number ; if ( p r o c e s s i d > 0) { recv (& accumulation, p r o c e s s i d ); accumulation = accumulation + number ; } if ( p r o c e s s i d < p ) send (& accumulation, p r o c e s s i d + ); COMP00/800 L: Parallelisation via Pipelining 07 6

7 General Pipeline Analysis assume each process performs a similar action in each pipeline cycle work out computation and communication for a cycle compute the total execution time as: t total = (time for one pipeline cycle)(number of cycles) = (t comp +t comm )(m + p ) where m is the number of instances and p the number of pipeline stages (processes) average time for a computation is then t av = t total m COMP00/800 L: Parallelisation via Pipelining 07 7

8 Summation Analysis single instance: t comp = t f t comm = (t s +t w ) t total = ((t s +t w ) +t f )p = a time complexity of O(p) multiple instances: t total = ((t s +t w ) +t f )(m + p ) t av = t total m (t s +t w ) for m p, t av is one pipeline cycle COMP00/800 L: Parallelisation via Pipelining 07 8

9 Example Type : Insertion Sort Algorithm is like moving a playing card over other cards until correct location found. P0 P P P P,,,,,,,,,, Code: Time (cycles) recv (& number, process id ); if ( number > x ) { send (& x, p r o c e s s i d +); x = number ; } else send (& number, p r o c e s s i d +); assuming n numbers, then process i will 0 receive n i numbers pass on n i numbers COMP00/800 L: Parallelisation via Pipelining 07 9

10 Sort Analysis sequential: t s = (n ) + (n ) = n(n ) i.e. O(n ) not a great sorting algorithm! parallel: each pipeline cycle t comp = t comm = (t s +t w ) total execution time (note: p = n here): t total = (t comp +t comm )(n ) = ( + (t s +t w )(n ) i.e. overall O(n) scaling COMP00/800 L: Parallelisation via Pipelining 07 0

11 Pipelined Insertion Sort Sorting Phase (n ) Returning Phase (n) P P P P P0 Discussion point: using the pipelining idea, we have developed a solution where the number of processing elements matches the number of data items. To what extent is this realistic? Are such algorithms still useful? Time COMP00/800 L: Parallelisation via Pipelining 07

12 Example Type : Linear Equations solve an upper triangular system of linear equations a n,0 x 0 + a n, x + a n, x + + a n,n x n = b n. a,0 x 0 + a, x + a, x = b a,0 x 0 + a, x = b a 0,0 x 0 = b 0 a and b are constants and x are the unknowns to be solved for COMP00/800 L: Parallelisation via Pipelining 07

13 Back Substitution solve for x 0 x 0 = b 0 a 0,0 solve for x using above value for x 0 x = b a,0 x 0 a, solve for x using above values for x and x 0 x = b a,0 x 0 a, x a, etc x i = (b i i a i, j x j )/a i,i j=0 COMP00/800 L: Parallelisation via Pipelining 07

14 Back Substitution: Pipeline Solution P0 P P P x x 0 x 0 x 0 x 0 x Compute x 0 Compute x x Compute x x Compute x x x COMP00/800 L: Parallelisation via Pipelining 07

15 Sequential code: x [0] = b [0]/ a [0][0]; for ( i = ; i < n ; i ++) { sum = 0; for ( j = 0; j < i ; j ++) sum = sum + a [ i ][ j ] x [ j ]; x [ i ] = ( b [ i ] sum )/ a [ i ][ i ]; } Parallel code: i = p r o c e s s i d ; for ( j = 0; j < i ; j ++) { recv (& x [ j ], process id ); send (& x [ j ], p r o c e s s i d +); } sum = 0; for ( j = 0; j < i ; j ++) sum = sum + a [ i ][ j ] x [ j ]; x [ i ] = ( b [ i ] sum )/ a [ i ][ i ]; send (& x [ i ], p r o c e s s i d +); Back Substitution Code COMP00/800 L: Parallelisation via Pipelining 07

16 Back Substitution Time Diagram Processes P P Final value computed P P P P0 First value passed Time COMP00/800 L: Parallelisation via Pipelining 07 6

17 Analysis no longer constant work per pipeline stage! process performs one divide and one send process i performs i sends and receives, i multiply/adds, one division/subtract, and one final send t comm = (i + )(t s +t w ) t comp = i + much harder to analyse! Remark: the systolic array is a pipelined-based architecture. Designs have been used to solve linear systems. COMP00/800 L: Parallelisation via Pipelining 07 7

Overview: Synchronous Computations

Overview: Synchronous Computations barriers: linear, tree-based and butterfly degrees of synchronization synchronous example 1: Jacobi Iterations serial and parallel code, performance analysis synchronous