Lctur
usbasd symmtric multiprocssors (SM s): combin both aspcts Compilr support? rchitctural support? Static and dynamic locality of rfrnc ar critical for high prformanc M I M ccss to local mmory is usually 0000 tims fastr than accss to nonlocal mmory Nonuniform mmory accss (NUM) machins: what about cachs? roblm: why go across ntwork for instructions? radonly data? Early paralll procssors lik NYU Ultracomputr ll mmory is qually far away from all procssors I M M Uniform mmory accss (UM) machins hysical Organization
y diffrnc: In SMM, can accss rmot mmory locations w/o prarrangd participation of application program on rmot procssor basic mssagpassing commands: snd rciv communication btwn procssors: mssags (lik mail) ach procssor has its own addrss spac I M (concptual pictur) M Distributd Mmory Modl (Mssag assing) som systms: distinguish btwn local and rmot rfrncs communication btwn procssors: rad/writ shard mmory locations: put gt to applications programmr hardwar/systms softwar provid singl addrss spac modl M singl addrss spac I (concptual pictur) x M Shard Mmory Modl Logical Organization
Mssag assing
Ovrlapping of computation and communication is critical for prformanc roblm: sndr cannot push data out and mov on rcivr cannot do othr work if data is not availabl yt on possibility: nw command TEST(Src,flag): is thr a mssag from Src? Motivation: Hardwar channls btwn procssors in arly multicomputrs Implmntation: Src snds tokn saying rady to snd Dst rturns tokn saying m too Data transfr taks plac dirctly btwn application programs w/o buffring in O/S Src fild in RECEIVE command prmits Dst to slct which procssor it wants to rciv data from History: Caltch Cosmic Cub SEND(x, Dst) x : RECEIVE(y,Src) M F M F Src Dst Sndr and rcivr rndzvous to xchang data locking SEND/RECEIVE : coupl data transfr and synchronization
Can w liminat buffring of data at Dst? Can w liminat waiting at Src? Data is buffrd in O/S buffrs at Dst till application program dos a RECEIVE What if Dst has not don a RECEIVE whn data arrivs from Src? pplications program can tst flag and tak th right action RECEIVE dos not block flag is st to tru by O/S if data was transfrd/fals othrwis Tag fild on mssags prmits rcivr to rciv mssags in an ordr diffrnt from ordr that thy wr snt by Src Many variation: rturn to application program whn data is out on ntwork? data has bn copid into an O/S buffr? Src can push data out and mov on Ntwork x : SEND(x, Dst,tag) RECEIVE(y,Src,tag,flag) M F M F Src Dst Nonblocking SEND/RECEIVE : dcoupl synchronization from data transfr
Eliminats buffring of data in Dst O/S ara if IRECEIVE is postd bfor mssag arrivs at Dst posting of information to O/S Flag is writtn by O/S and rad by application program on Dst tlls O/S to plac data in y and st flag aftr data is rcivd rturns bfor data arrivs pplication program continus, but must tst flag bfor ovrwriting x RECEIVE is nonblocking: SEND rturns as soon as O/S knows about what nds to b snt Flag st by O/S whn data in x has bn shippd out Ntwork x : ISEND(x, Dst,tag,flag) IRECEIVE(y,Src,tag,flag) M F M F Src Dst synchronous SEND/RECEIVE
ach procssor dos a ontoall communication alltoall prsonalizd communication on procssor snds a diffrnt pic of data to all othr procssors ontoall prsonalizd communication vry procssor snds a pic of data to vry othr procssor alltoall broadcast (g adding a st of numbrs distributd across all procssors) alltoon rduction (g x implmntd by rowwis distribution: all procssors nd x) ontoall broadcast important ons: than through long squncs of snd s and rciv s pattrns of group communication that can b implmntd mor fficintly Collctiv communication: So far, w hav lookd at pointtopoint communication
in broadcast many mssags by th tim procssor is rady to participat Rality chck: ctually, a kary tr maks sns bcaus procssor 0 can snd Ts log + Th() + Ts + Th/ + Total tim for broadcast Ts + Th/ ssuming mssag siz is small, tim to snd a mssag Ts + hth whr Ts ovrhad at sndr/rcivr Th tim pr hop 3 6 5 0 3 7 3 3 Mssags in ach phas do not compt for links 3 Exampl: Ontoall broadcast (intuition: think tr )
Tim Ts log + Th(sqrt() ) Stp : roadcast within ach column in paralll Stp : roadcast within row of originating procssor D Msh Othr topologis: us th sam ida
Exampl: lltoon rduction 0 7 6 5 3 Mssags in ach phas do not compt for links urpos: apply a commutativ and associativ oprator (rduction oprator) lik +,,ND,OR tc to valus containd in ach nod Can b viwd as invrs of ontoall broadcast Sam tim as ontoall broadcast Important us: dtrmin whn all procssors ar finishd working (implmntation of barrir )
scond phas, alltoall broadcast within ach column first phas, alltoall broadcast within ach row Sam ida can b applid to mshs as wll: Tim (Ts + Th) () assuming mssag siz is small Total of () phass to complt alltoall broadcast stors it away, and snds it to nxt nighbor in th nxt phas Each procssor rcivs a valu from on nighbor, Intuition: cyclic shift rgistr 3 3 6 5 6 5 3 6 5 0 7 5 3 0 0 7 7 0 6 7 Exampl: lltoall broadcast
Mssagpassing rogram
Mid 99: MI standard out and svral implmntations availabl (S) MI goal: standardiz mssag passing constructs syntax and smantics vn to go from on distributd mmory platform to anothr! porting programs rquird changing paralll programs Each vndor had its own communication constructs Lots of vndors of Distributd Mmory Computrs: IM,NCub, Intl, CM5, Distributd Mmory Computrs Goal: ortabl aralll rogramming for MI: Mssagassing Intrfac
Vry naiv algorithm, but it s a start slav prforms product, rturns rsult and asks for mor work mastr snds a row of matrix to slav ach slav coms to mastr for work Slavs ar slfschduld Mastr broadcasts vctor b to all slavs Mastr initially owns all rows of and vctor b mastr coordinats activitis of slavs on mastr, svral slavs Styl of programming: MastrSlav b Writ an MI program to prform matrixvctor multiply
MI_CST: roadcast MI_FINLIZE: Trminat MI MI_RECV: Rciv a mssag (blocking rciv) to b snt with on command prmits ntir data structurs idntifis procss group MI_SEND(addrss,count,datatyp,Dst,tag,comm) MI_SEND: Snd a mssag MI_COMM_RN: Who am I? MI_COMM_SIZE: Find out how many procsss thr ar MI_INIT : Initializ th MI Systm y MI Routins w will us:
&&&&&&& &&&&&&& : 5! )!, 5 ) 9 : ' ( J 5 /??H I C )) 5 ' %, ), 53 /;$! / ) $ 5$ % 5! 6 %! ) G% 53 /; 3 ' 0! & F &! ) 53 %?? :??! )? C 9 / )! )) E 9 0! )) )) 6 ) 0/ 0/ 5 0/ 6 ) D D 9 6 ) 0!! ) / )! )) $ C $ % / ) $ )'' $ % / ) /; )! ) 9?????? 6 ) 9 % $% 7 /! 3&' 7 / ) 6 )!!! $! &%! ) + (, ) $! % / % 0! / % 3%'( $ % 6 ),
' 0! & F &! ) 53! ) G% 53 /; 3 G% ) )) 0, $%?H %? D ) : D? 5?H 5 5!? )!,? 5 3 )3 / ) %! ) 9 )) 5 : 3 )3 / )?L 5 $ / )! (?M D % $ )'' D D?M 5 $ )'' % : 0/ 9 )) N I 9 & D 5! 5! O?L 5? ) : Q
5 0/ 9 $ )) 5 ) $ D 5 ) ' 5 $ 5! & &% : 53?R D % $ )'' D 5 $ 5! O D?R 5 $ )'' % 5 ) : 0/ 9 )) D 5 ) 5! O 5! 5! O % 5 ) 5 0/ 9 )) 5 '? 5 / ) $ 3 ):?S % / ) T T T T?S 5 U
!?? 5 $ %?V % W X 5 /??H I C )) 5 ' 5 ' 6?V 9 )) 5 ) 0/ 53 0/ 5 O $ )''?? %?&? % 6??H ' $ & F & 9 )) $ )''! ) 9 ))! ) %, ), 53 /;$! / ) $ 5$ % 5! 6
T T ' T T, : 5! )!, 5 ) 9 : ' ( W %?? :??! )? 5 0/ M 0/ H : 0/ / ) 0! T / )! T,% T 5 ' / 9 / ) $ 3 % H / ) Y T ' 5 $ / )! & &% H 53 C 9 / )! )) E 9 0! )) )) 6 ) 0/ 0/ 5 0/ 6 ) D D 9 6 ) 0!! ) / )! )) $ C $ % / ) $ )'' $ % / ) /; )! ) 9?????? 6 ) 9 % $% 7 /! 3&' 7 / ) 6 )!!! $! &%! ) + (, ) $! % / % 0! / % 3%'( $ % 6 ),
$?L : %?M D % 5??H? D W W 5 ) $ 0/ 9 )) 0/ 9 )) 5? ) : 5 5! 5! O D $ )'' 5 $ )'' D D?M %?L 5 $ / )! ( 5 : 3 )3 / ) 9 ))! ) 3 )3 / ) )!,? 5! 5 5 D? ) :?H % G%! ) G% 53 /; 3 ' 0! & F &! ) 53
5 $ %?V %?S T T? 5 : %?R 5! D W Z O $ )''?? %?&? % 6??H ' $ & F & 0/ 0/ 53 9 )) $ )''! ) 9 ))! ) %, ), 53 /;$! / ) $ 5$ % 5! 6 % 5 / ) T T?S % / ) $ 3 ): 5 5 ' 9 )) 5 ) 0/ 9 )) % 5! 5! O D 5 ) 5! O $ )'' 5 ) 5 $ )'' D D 5 $ O?R % ' 5 $ 5! & &% : 53 D 5 )
! [W 5 /??H I C )) 5 ' 5 ' 6?V 9 )) 5 ) 0/?? 5
]\ ^ _ `_ acb d f gih h bb db g k h l l m^ k ^ n_ h bb od p m^ k b dq k h l r sb ` g b^ dt h h` u pq tr v g k h l l m^ k bbw g n d _ dx nd `s d `] d h_ l nd o yd y`s h m n ] m `] d tz^ _ ` g d f l o ^ _ { m^ ` nh ` ^ ^ ` ^ d _} z m g m^ n g b^ d ~ nhd ] g n d n sb o sm o ^ d dm ` g k h l ~ y _ `s `] ^ _ ^ _ m ` d a n l l m u n }f pq ~ w r ~ } }} v } W ƒ