Program Performance Metrics

Size: px

Start display at page:

Download "Program Performance Metrics"

Shanon Joseph
5 years ago
Views:

1 Program Performance Metrics he parallel run time (par) is the time from the moment when computation starts to the moment when the last processor finished his execution he speedup (S) is defined as the ratio of the time needed to solve the problem on a single processor (seq) to the time required to solve the same problem on parallel system with "p" processors (par) relative - seq is the execution time of parallel algorithm executing on one of the processors of parallel computer real - seq is the execution time for the best-know algorithm using one of the processors of parallel computer absolute - seq is the execution time for the best-know algorithm using the best-know computer 1 1

2 Program Performance Metrics he efficiency (E) of parallel program is defined as a ratio of speedup to the number of processors he cost is usually defined as a product of a parallel run time and the number of processors he scalability of parallel system is a measure of its capacity to increase speedup in proportion to the number of processors

3 Communication costs in static Interconnection networks Principal parameters - startup time (t s ) - per-hop time (t h ) - per-word transfer time (t w ) Routing techniques - store-and-forward routing - cut-through routing

4 Communication costs depends on routing strategy Store and forward routing - the message is sending between different processors and each intermediate processor store it in the local memory until received the whole message ( ) tcomm t s mtw th l Cut-through routing - the message is divided on parts which are sending between processors without waiting for the whole message tcomm ts lth mtw

5 Basic communication operations -Simple message transfer between two processors -One-to-all broadcast -All-to-all broadcast -One-to-all personalized communication -All-to-all personalized communication - Circular shift 5

6 One-to-all broadcast M M M M 0 1 p-1 Single-node accumulation 0 1 p-1 M p-1 M p-1 M p-1 All-to-all broadcast M 1 M 0 M 1 M p-1 M p p-1 Multinode accumulation M 1 M 0 M 1 M 0 M p-1 One-to-all personalized M 1 M 0 M 0 M 1 M p p p-1 Single-node gather M 0,p-1 M 0,1 M 0,0 6 M 1,p-1 M 1,1 M 1,0 M p-1,p-1 M p-1,1 M p-1,0 All-to-all personalized M p-1,0 M 1,0 M 0,0 M p-1,1 M 1,1 M 0,1 M p-1.p p p-1 Multinode gather M 1,p-1 M 0,p-1

7 One-to-all broadcast - SF a) in a ring with even number of procesors b) in a ring with odd number of procesors. one _ to_ all_ b t t m 7 7 s w p

8 8 8 One-to-all broadcast - SF in a mesh with wraparound p m t t w s b all to one _

9 One-to-all broadcast - SF (110) (111) 6 7 (010) (011) 1 5 (100) (101) 0 1 (000) (001) in a hypercube one_ to_ all_ b t t mlog p s w 9 9

10 One-to-all broadcast - SF { 1} { } { } { } { 5} { 6} { 7} { 8} { 9} {10} {11} {1} {1} {1} {15} {16} {17} {18} {19} procedure ONE_O_ALL_BC(d,my_id,X); begin mask:= d -1; for i:=d-1 downto 0 do begin mask:=mask XOR i ; if (my_id AND mask)=0 then if (my_id AND i )=0 then begin msg_destination:=my_id XOR i ; send X to msg_destination; endif else begin msg_source:=my_id XOR i ; receive X from msg_source; endelse; endfor; end ONE_O_ALL_BC A code of one to all broadcast operation in hypercube (processor with label 0 is broadcasting its message)

11 One-to-all broadcast - C in a ring onetoallbc t s log p t w mlog p t h p 1 11

12 One-to-all broadcast - C in a mesh with wraparound onetoallbc t t mlog p t p 1 s w h 1 1

13 One-to-all broadcast - C in a balanced binary tree onetoallbc t t mt log p 1 log p s w h 1 1

14 7. communication step.... communication step 1. communication step All-to-all broadcast - SF 1(7) 1(6) 1(5) () (7) (6) (5) () (0) (1) () () 1() 1(0) 1(1) 1() (6) (5) () () (6,7) (5,6) (,5) (,) (0,7) (0,1) (1,) (,) () (7) (0) (1) 7(1) 7(0) (7) 7(6) (1..7) (0..6) (0..5,7) (0..,6,7) (0,..7) (0,1,..7) (0..,..7) (0..,5..7) 7(5) 7() 7() 7() 1 1

15 All-to-all broadcast - SF { 1} { } { } { } { 5} { 6} { 7} { 8} { 9} {10} {11} {1} {1} procedure ALL_O_ALL_BC_RING(my_id,my_msg,p,result); begin left:=(my_id - 1) mod p; right:=(my_id + 1) mod p; result:=my_msg; msg:=result; for i:=1 to p-1 do begin send msg to right; receive msg from left; result:=result msg; endfor; end ALL_O_ALL_BC_RING; alltoallbc t t m p 1 s w 15 15

16 All-to-all broadcast - SF (1) (1) (1) (15) 11 (8) (9) (10) (11) () (5) (6) (7) (0) (1) () () 7 1 (1..15) 1 1 (1..15) (1..15) 15 (1..15) 8 (8..11) 9 10 (8..11) (8..11) 11 (8..11) (..7) 5 6 (..7) (..7) 7 (..7) 0 1 (0..) (0..) (0..) (0..) 16 16

17 All-to-all broadcast - SF procedure ALL_O_ALL_BC_MESH(my_id,my_msg,p,result); begin left:= {...}; right:=(...}; result:=my_msg; msg:=result; for i:=1 to p-1 do begin send msg to right; receive msg from left; result:=result msg; endfor; left:= {...}; right:=(...}; msg:=result; for i:=1 to p-1 do begin send msg to right; receive msg from left; result:=result msg; endfor; end ALL_O_ALL_BC_MESH; p 1 t mp 1 t alltoallbc s w 17 17

18 All-to-all broadcast - SF () (0) 0 () (6) (7) 6 1 () (1) 7 5 (,) (5) (0,1) 0 (6,7) (6,7) 6 (,5) 1 (,) (0,1) a) Initial distribution of messages b) Distribusion before the second step 7 5 (,5) (..7) (..7) 6 7 (0..7) (0..7) 6 7 (0..) (0..) (0..7) (0..7) (0..) 0 (..7) 1 (0..) 5 (..7) (0..7) 0 (0..7) 1 (0..7) 5 (0..7) c) Distribusion before the third step d) Final distribusion of messages 18 alltoallbc t s log p t m p 1 w

19 All-to-all broadcast with reduction - SF { 1} { } { } { } { 5} { 6} { 7} { 8} { 9} {10} {11} procedure ALL_O_ALL_BC_HCUBE(my_id,my_msg,d,result); begin result:=my_msg; for i:=1 to d-1 do begin partner:=my_id XOR i ; send result to partner; receive msg from partner; result:=result msg; endfor; end ALL_O_ALL_BC_HCUBE; alltoallbc t t mlog p s w 19 19

20 7. communication step.... communication step 1. communication step onetoall pers One-to-all personalized - SF t t m p 1 s w (7) (6) (7) 7(7) (6) 7(5) 0 1 7() 7(1) 7() 7() 0 0

21 One-to-all personalized - SF (1..15) (8..11) (..7) (0..) (1) (1) (1) (15) 11 (8) (9) (10) (11) () (5) (6) (7) (0) (1) () () p 1 t mp 1 t onetoall pers s w 1 1

22 All-to-all personalized - SF (5,0) (5,1) (5,) (5,) (5,) (,0) (,1) (,) (,) (,5) (0,1) (0,) (0,) (0,) (0,5) (,0) (,1) (,) (,) (,5) (1,0) (1,) (1,) (1,) (1,5) (,0) (,1) (,) (,) (,5) 1. communication step (,0) (,1) (,) (,) (,0) (,1) (,) 5 (,5) (,5) 0 1. communication step (5,1) (0,) (5,) (0,) (5,) (0,) (5,) (0,5) (,0) (,1) (,) (1,0) (1,) (1,) (1,5)

23 All-to-all personalized - SF (,0) (,1) (,) (,0) (,1) (1,0) (1,) 5 (,5) (1,5) 0 1 (0,) (0,) (0,5). communication step (,1) (,) (,) (5,) (5,) (5,) (,0) (,1) (1,0) (1,5) (0,) (0,5) (,1) (,) (,) (,) (5,) (5,). communication step (0,5) 5 (1,0) 5. (,) communication 0 1 step (,1) (,) (5,)

24 All-to-all personalized - SF 6 (6,0),(6,),(6,6) (6,1),(6,),(6,7) (6,),(6,5),(6,8) (,0),(,),(,6) (,1),(,),(,7) (,),(,5),(,8) 7 (7,0),(7,),(7,6) (7,1),(7,),(7,7) (7,),(7,5),(7,8) (,0),(,),(,6) (,1),(,),(,7) (,),(,5),(,8) 8 5 (8,0),(8,),(8,6) (8,1),(8,),(8,7) (8,),(8,5),(8,8) (5,0),(5,),(5,6) (5,1),(5,),(5,7) (5,),(5,5),(5,8) 0 (0,0),(0,),(0,6) (0,1),(0,),(0,7) (0,),(0,5),(0,8) 1 (1,0),(1,),(1,6) (1,1),(1,),(1,7) (1,),(1,5),(1,8) (,0),(,),(,6) (,1),(,),(,7) (,),(,5),(,8)

25 All-to-all personalized - SF 6 (6,0),(6,),(6,6) 7 (6,1),(6,),(6,7) 8 (6,),(6,5),(6,8) (7,0),(7,),(7,6) (7,1),(7,),(7,7) (7,),(7,5),(7,8) (8,0),(8,),(8,6) (8,1),(8,),(8,7) (8,),(8,5),(8,8) (,0),(,),(,6) (,1),(,),(,7) 5 (,),(,5),(,8) (,0),(,),(,6) (,1),(,),(,7) (,),(,5),(,8) (5,0),(5,),(5,6) (5,1),(5,),(5,7) (5,),(5,5),(5,8) 0 1 (0,0),(0,),(0,6) (0,1),(0,),(0,7) (0,),(0,5),(0,8) (1,0),(1,),(1,6) (1,1),(1,),(1,7) (1,),(1,5),(1,8) (,0),(,),(,6) (,1),(,),(,7) (,),(,5),(,8) t t mp p 1 alltoall pers s w 5 5

Algorithms for Collective Communication. Design and Analysis of Parallel Algorithms

Algorithms for Collective Communication Design and Analysis of Parallel Algorithms Source A. Grama, A. Gupta, G. Karypis, and V. Kumar. Introduction to Parallel Computing, Chapter 4, 2003. Outline One-to-all