2. One-To-All Broadcast and All-To-One Reduction. 1. Chapter 4 : Efficient Collective Communication

Size: px
Start display at page:

Download "2. One-To-All Broadcast and All-To-One Reduction. 1. Chapter 4 : Efficient Collective Communication"

Transcription

1 1. Chater : Efficient Collective Communication Collective communication: comm amongst collection of nodes (not just sender & recver. One-to-all (bcast, all-to-one (reduc, all-to-all, scatter/gather, etc. Otimization can have different goals (default is last: 1. Minimize articular node s time in collective communication. Exloit known synchronization to reduce total time in alg 3. Min total time in alg, assuming roughly synced nodes As always, kee our eye on two (ossibly conflicting aims: 1. Min time by using as many links as ossible (eg., send more messages. Avoid contention (min # of mesgs, or use art comm attern Avoiding cont req knowledge of interconnects amongst coll We assume cut-through ( acket routing, so comm time: t c = t s +mt w, t s : mesg startu time t w : erword transfer time (inverse of bandwidth in words m: mesg size in words. One-To-All Broadcast and All-To-One Reduction bcast: Source node has m-length buffer needed by all other nodes reduction: All nodes contribute m-length buffers which are combined (eg., sum to a final m-length buffer left on the destination node These os are duals: can run bcast in reverse to create reduction Now discuss efficient imlement on various canonical interconnects: Note that row/col of -D mesh is ring or linear array Perfectly algorithm uses all links all time Not usually ossible (eg bcast: only 1 roc busy in 1st ste Will want to send as many as ossible, w/o contention When we get answer to all nodes (in collection in minimal ossible time, we call this the minimum sanning tree 3. Basics of Broadcast For bcast where hardware can handle 1 send at a time, the min sanning tree is given by recursive doubling: All nodes ossessing mesg send to a node that doesn t At each stage, number of senders doubles Takes log ( stes, wt i senders in ste i (0 i < log ( In final ste, / links active, even ring has this many links! Have ordering choices avoid cont, snd to furthest node 1 st For Reduction, just reverse the stes and add combine oeration! Cost: log ((t s +t w m [log ((t s +(t w +t o m]. Recursive Doubling for Mesh Interconnects On -D mesh, utilize ring-based rec doubling within row (col and then have all col (row do same in arallel. Row/col of square -node mesh linear array Cost: log ( (t s +t w m = (log ( + log ( (t s + t w m = log ((t s +t w m log b (x+log b (y = log b (xy Use -D alg to avoid contention, not to change # of stes! 3-ste alg for 3-D mesh Simly reverse direction for reduction Intro to Parallel Comuting, Fig.5, g 153 contention-free -D mesh bcast contention-free bcast contention-free reduction

2 5. Recursive Doubling on Hyercube Interconnects Hyercube wt d nodes is d-dimensional mesh with nodes in each dim: Aly tt comm along each d link, alg done Unlike mesh/ring, all orderings work, since hy has links dim; however, other than ordering & hos, no better than cheaer mesh [log ((t s +mt w ]! Hy bcast works for indirect connection using balanced binary tree Intro to Parallel Comuting, Fig.6, g 15 Intro to Parallel Comuting, Fig.6, g General One-to-All Broadcast Algorithm void oneall(int d, int Iam, int src, int n, void *x { { viam = Iam ^ src; mask = (1<<d - 1; for (i=d-1; d >= 0; d--{ abit = (1<<i; mask ^= abit; if ((viam & mask == 0{ if ((viam & abit == 0{ vdest = viam ^ abit; send(vdest^src, n, x; else { vsrc = viam ^ abit; recv(vsrc^src, n, x; mask indicates who is allowed in: Init to all 1s, so no one Each iter removes most sig bit restriction, adding new nodes (recursive doubling! Of active nodes, 1/ sending, others recving: active bit = 0, send active bit = 1, recv Can get send-to-nearest by reversing i loo (OK hy, no mesh (x ^ y ^ y = x contention-free Hyercube bcast contention-free binary tree bcast One alg (adated gives content-free 1-to-all bcast ring, mesh, hy Since viam allows easy conversion between node-0 src/dest, use node-0 in future for clarity! 7. Recursive Halving All-to-One Reduction (dest=0 Easy to build recursive halving all-to-one reduction by reversing bcast: void AllOneReduce(int d, int Iam, int m, TYPE *X { mask = 0; // all nodes start for (i=0; i < d; i++ { abit = (1<<i; if ((Iam & mask == 0 { if (Iam & abit!= 0 { dest = Iam^abit; send(dest, m, X; else { src = Iam^abit; recv(src, m, buff; for (k=0; k < m; k++ X[k] += buff[k]; mask = abit; Can use ^= or = or += for mask udate by abit Since bcast attern reversed: Odd nodes are senders Recvers stay in alg Need m-buff to recv other s X X garbage on non-dest nodes Can use another buff to avoid Can use other reduction oerators (eg. min, max Can use async recv & extra buff to overla comm & com Comlicates simle alg! 8. All-to-All Broadcast and All-to-All Reduction In all-to-all bcast, all nodes have unique mesg to share wt everyone In all-to-all reduction, each node has mesgs which should be reduced to a different rocessor (eg., mesg 0 for 0, mesg 1 for 1 Ex.: each nodes comutes a ortion of C in gemm, and then all-to-all reduce uts answer on roc owning art blk of C In all-to-all, everybody has something so can do comm er ste (max, assuming 1 send a time In one-to-all, not all nodes had data, so max comm / If we did one-to-all, would never use links in Better to have secial all-to-all than do one-to-all We assume each node can send & recv at same time Intro to Parallel Comuting, Fig.8, g 157

3 9. No Contention, Max Link All-To-All Ring Broadcast 10. No Contention, Max Link All-to-All Ring Reduction Ste 1: Send my mesg right, recv left Ste i: Send mesg recv last time right, recv left At ste 1, all nodes have mesgs 1 func allallbc ring ( buff { 3 l e f t = (+Iam 1% right = (Iam+1% ; 5 resbuff = buff ; 6 for ( i =1; i < ; i++{ 7 send buff to right 8 recv buff frm l e f t 9 resbuf = resbuf U msg; 0 1 return ( resbuf ; May hang if send out of buff Usually, send/recv from areas in resbuf, do async recv, sync recv resbuf m size Cost: ( 1(t s +mt w Contention for linear array! Intro to Parallel Comuting, Fig.8, g 157 All stes excet 1 st take data from receeding ste, add it to buff of the node that message is destined for, and forward it on: 1 void aared ring ( int m, TYPE buffs { rbuff = alloc (m; 3 l e f t = (+Iam 1% ; right = (Iam+1% ; 5 for ( i =1; i < ; i++ { 6 dest = (Iam+i % ; 7 i f ( i!= 1 { 8 for (k=0; k < m; k++ 9 buffs [ dest ] [ k ] += rbuff [ k ] ; arecv rbuff from right 1 send buffs [ dest ] to l e f t 13 1 for (k=0; k < m; k++ 15 buffs [ Iam ] [ k ] += rbuff [ k ] ; 16 (#, dest = # overwrites buffs wt artial results Cost: ( 1(t s +m(t w +t o May be oss to reduce to: ( 1(t s +mmax(t w,t o Art class, Fig 0.0, g Contention-filled All-to-All Mesh Broadcast Perform algorithm in two stes: 1. All-to-All ring bcast along rows [( 1(t s +mt w ]. AA ring bcast of combined result along cols [( 1(t s + mt w ] Cost: ( 1(t s +mt w +( 1(t s + mt w = ( 1t s +( 1(( +1mt w = ( 1t s +( 1mt w For non-square grids, do ste 1 on shortest dim Since subgrids of grids wt wraaround links do not have wraaround, this is likely to be a best alg for small messages, where t s dominates Contention unavoidable w/o wraaround, but if we treat mesh as 1-d ring, messages stay smaller, so contention hurts less, and we finish in normal time [( 1(t s + mt w ], which says if t s is small, contention savings may make this the better algorithm Might modify 1-d ring so that even rows go in increasing dir, odd rows go decreasing Note that sub-array of ring to is also not a ring, but rather a linear array, so actually none of these alg are contention free in ractice unless whole machine is used 1. Contention free All-to-All Bcast on Hyercube Done in log ( stes, wt mesg size doubling at each ste 1 func allallbc hy ( buff { 3 resbuff = buff ; for ( i =0; i < d ; i++ { 5 art = Iam ˆ (1<< i ; 6 send resbuff to art 7 recv mesg from art 8 resbuff = resbuf U mesg 9 10 Cost : log ( i=1 (t s + i 1 mt w log (t s +( 1mt w sub-hycube has all needed links Will cause contention on all other interconnects Does not red dom t w term Is gen cont-free, other tos not Intro to Parallel Comuting, Fig.11, g 16 All-to-all bcast on 8-node hycube

4 13. All-to-All Summary In general, issues such as blocking send/recv comlicate bcast beyond those described here Reduction os have the added comlexity of trying to overla comm & com, which comlicates them further If successful, cost same as bcast (t o done during t w! Unlike 1-to-all bcast, cannot use hycube alg tos w/o contention There are contention-free bcasts for ring, -D Mesh wt wraaround, and hyercube interconnects, wt dominant term cost ( 1mt w, and startu costs ( 1t s, ( 1t s, and log (t s, resectively Hyercube remains contention free when subdivided by ower of two, but mesh and ring do not Need rocess/rocessor maing to min non-hy contention Since all have same dominant term, ring is robably best large-case algorithm, since it will cause least contention due to fixed-size mesgs 1. Contention free All-Reduce on Hyercube An all-reduce (AKA leave-on-all reduction is symantically equivalent of erforming an all-to-one reduction followed by a one-to-all bcast. Done in log ( stes, using bidirectional exchange (sim AA: 1 allred BE ( int d, int m, TYPE buff { wrk = alloc (m; 3 for ( i =0; i < d ; i++ { art = Iam ˆ (1<< i ; 5 r i = Arecv ( art, m, wrk ; 6 send ( art, m, buff ; 7 wait ( r i ; 8 for (k=0; k < m; k++ 9 buff [ k ] += wrk [ k ] ; 10 Cost : log ((t s +m(t w +t o No overla of t o send most recent No cont on hyercube other tos rob use log ( a1,1a Redundant com heter danger Intro to Parallel Comuting, Fig.11, g 16 All-to-all bcast on 8-node hycube 15. Prefix Sum (Scan Oeration S = 0 S = 1 S = 0 1. Schematic for 8-node Bidirectional Exchange Always using links Any to but hyercube has contention Wt n 0,n 1,...,n 1 (1 er node Comute : s k = k i=0 n i nodes 1 func refix sum hcube (my num { res = my num; 3 msg = res ; d = log ( ; 5 for ( i =0; i < d ; i++ { 6 art = Iam ˆ (1<< i ; 7 send msg to art 8 recv hisnum from art 9 msg += hisnum ; 10 i f ( art < Iam 11 res += hisnum ; 1 13 return ( res ; 1 m = 1 in book For 1 num, forget contention Cost: < log ((t s +m(t w +t o For non-hy, can use any aa to build (eg., mesh-based aa [ < ( 1(t s +m(t w +t o ] Intro to Parallel Comuting, Fig.13, g 168 refix sum on 8-node hyercue

5 16. Scatter (One-to-All Personalized and Gather A node sending unique mesgs to all nodes is a scatter oeration, and collasing unique mesgs to one node is a gather. Programmed like one-to-all excet wt mesg size halving (no contention on any to: 1 roc scatter (d, m, TYPE buff { mask = (1<<d 1; 3 n = m ; for ( i=d 1; d >= 0; d { 5 abit = (1<< i ; 6 mask ˆ= abit ; 7 i f (( Iam & mask == 0 { 8 i f (Iam & abit!= 0 9 send (Iamˆabit, n, buff+n ; 0 else 1 recv (Iamˆabit, n, buff ; 3 n >>= 1; 5 log ( Cost: i=1 (t s + m it w = 1 i log (t s +mt w log ( i=1 note: n 1 i=1 i = n 1 n log (t s + mt w ( log ( 1 log ( = log (t s +mt w ( 1 = log (t s +m( 1t w Intro to Parallel Comuting, Fig.1, g 169 Intro to Parallel Comuting, Fig.15, g All-to-All Personalized Communication In all-to-all ersonalized communication (AKA: total exchange each node has buffers, all 1 destined for differing nodes. Like doing scatter (gather oerations at once Useful for FFTs, matrix transose, data base joins Comm attern of total exchange same as all-to-all intereconnects Contents & length of message different than all-to-all Will have contention on sub-grous excet for hyercube Label individual buffs by (src,dest air: Intro to Parallel Comuting, Fig.16, g Unidirectional Total Exchange on a Ring 19. Unidirectional All-to-All Personalized on a Ring Send all mesgs to right destined for other nodes Recv mesg left, take out my iece, forward on Sto when messages are emty (-1 stes Intro to Parallel Comuting, Fig.18, g 17 Send all messages to right destined for other nodes Recv mesg from left, take out my iece, forward rest on Sto when messages would be emty ( 1 stes Cost: 1 i=1 (t 1 s+m( it w = ( 1t s +mt w i=1 i = ( 1t s+ mt w ( 1+1 ( 1 = ( 1(t s +m t w Failure to exloit shortest ath doubles bandwidth needs!! i = 1 i = i = 3 1 3, 3,3 3 0, 0 0 1, 1 0,1 i = 1 0, 0,3 0, 0 0 1, 1,3 1, 1 0,1,3, 0 3,1 3, 3, 3 0,1,,3 1,,3 0,3 0, 0 0 1,3 1, 1 0,1, 0 3,1 3,

6 0. Bidirectional All-to-All Personalized on a Ring In ring, two links lead to the same node (left & right each node sends half his messages to the left, and the other half to right. 1. Send ( 1 messages bound to nearest nodes on right to right: i = 1 i = 1 0, 0 1,3 1 3, 3,0 3 0, (ts +mt w ( +1 Cost: i=1 (t s+m( 1 i+1tw = 1. Send ( 1 messages bound to nearest nodes on left to left: j = 1: 3 0, 0 0 1, 1 0,1 1 3, 3,3 j = : i+1tw = (ts +mt w ( Cost: i=1 (t s+m( 1 Tot cost: ( 1t s +mt w ( 1t s +mt w ( (( ( ( = ( 1 ( ( t s +mt +1 w ( ( Detailed Bidirectional All-to-All Personalized Cost Total cost = ( 1t s +mt w ( ( If odd, 1 = 1 = 1, examine only t w term: ( ( ( = 1 = ( ( ( ( 1 + = 1 +1 = ( 1 ( +1 Tot odd cost = ( 1 ( t s +m +1 t w If even, 1 = 1 +1, and ( 1 ( +1 + ( +1 = ( ( ( ( ( + = Tot even cost = ( 1t s +m ( Close enough: ( 1t s +m t w ( 1 t w = ( 1ts +m =, exam t w term: +1+1 = = ( ( ( ( ++ = ( = ( t w Uni or Bidir ring: t s = O(,t w = O( scatters (log (t s +m( 1t w : t s = O(log (,t w = O(. Proof of All-to-All Personalized Otimility on Ring Assume avg dist each m-length acket travels is: 1 i=1 i = ( 1 1 = Directly connected traffic = (# nodes(bufflen(dist traveled = ((m( 1( Since tot # links to share this load is, otimal bw cost is: (t w ((m( 1( m( 1 = t w BW cost of unid ring: ( 1(m t w otimal Problem is assumtion about avg dist: not true for best alg! Bidirec ring needs some buffer sorting, which can add to cost. True within factor, but scatters is true by factor again! scatters : (log (t s +m( 1t w = log (t s +( 1mt w AA ers has lower order t s term as well NOTE: All aa ring comm is all nearest neighbor store & forward works as well as cut-through routing! 3. All-to-All Personalized on Square -D Mesh Stes: 1. Assemble mesgs into c grous of r mesgs destined for each col (r = c =. Each row does aa ers on c mesgs of length mr 3. Each node assembles mesgs into r grous of c mesgs. Each col does aa ers on r mesgs of length cm Cost ste (use ring cost wt :m = m, = ( : t s +t w (( m ( 1 = ( t s +t w (m ( 1 Next hase same (r = c, tot cost: (t s +mt w ( 1 Ignores shuffle cost Ignores no-wra again! Intro to Parallel Comuting, Fig.19, g 17

7 . Small-message Otimal All-to-All Personalized on Hyercube Extend mesh alg to hy; do for each log ( d 1 sub-cubes: 1. Nodes exchange mesgs from/to other sub-cube Cost: (t s +m t wlog ( Rearr cost : mlog (t r traff = (# nodes(blen(dist avg dist trav = log ( Traffic = ((m( 1( log # hy links: log ( Best direct comm BW alg = (traff / (# links = ((m( 1( log log ( = m( 1 Best alg O(, this alg O(log ( for BW This alg O(log ( on t s, so otimal for small mesg Intro to Parallel Comuting, Fig.0, g BW-Otimal All-to-All Personalized on Hyercube Each air of nodes does bidirectional exchange: 1 roc aaers hy ({ for ( i =1; i < ; i++ { 3 art = Iam ˆ i ; send M art Iam to art 5 recv Mart Iam from art Overla on links are in different directions Known as e-cube routing Cost: (t s +t w m( 1 O( rather than O( No shuffling required! t s term O( rather than O(log ( This alg max BW by sending min mesg size (m Hurts t s since must erf stes Intro to Parallel Comuting, Fig.1, g Musings on Contention in All-to-All Only one-to-all alg is contention free on subartition of rocs Problem is one of assumed links (wraaround is missing, and all-to-all uses all links in (a1 uses / max This means that one message must san to activelinks Both bad, as alg work by having all nodes active at once Can assume will effectively half BW t w multilier My guess is book ignores contention in sub-art because given alg still otimal in face of contention: Make aa bcast [( 1(t s +mt w ] cont-free by doing 1-to-alls log ( ( [log ((t s +mt w ] : Double-m aa better or equal 1a ring bcasts would ieline for cost: (( 1 + (t s + mt w, but this requires wraaround too! If alg ot on links, differ only by for 1, so still O( otimal 7. Circular Shifts on Ring In a circular q-shift, all nodes send their buff to node Iam+q, and recv new data from (+Iam q mod, where 0 < q. Direct send best small mesgs, but full of contention (cut BW. Cost t s +min(q, qmt w To otimize BW, do min(q, q 1-ste shifts to left/right res. If q, shift to right times If q >, shift to left q times (eg. q = 7, shift left 1 Cost : min(q, q(t s +mt w Will have contention on linear array (effectively double m done -shift on 8-node ring (6 (7 (0 (1 ( (3 ( ( (7 (0 (1 ( (3 ( (5 (6 i = (0 (1 ( (3 ( (5 (6 (7 i =

8 8. Circular q-shift on a Square -D Mesh Assume row-major grid: 1. Do q mod shift along rows. q mod cols should have shifted their val down one row, so do this along columns (q mod cols articiate 3. Do q shifts along col In ractice, shift left- /right/u/down for min movement Max shift dim Maxcost:(t s +mt w ( +1 Intro to Parallel Comuting, Fig., g Imroving Bandwidth Performance by Slitting u Messages Discussed AA alg that are O( BW ot, but not 1a, a1 & all-reduce: All-reduce is BW ot for hyercube only, else do a1 red, 1a bcast Problem is that, unlike aa os, 1a, etc, do not use all links from start (recursive doubling, so link usage criled For large m-size, can slit mesg into chunks, send them out to bring links earlier in the rocess Additional t s cost to reduce BW cost In ractice imlement t s otimized, t w otimized, oss hybrid, and switch amongst them de on m We can use O( ot aa algs to build O( algorithms for: 1. One-to-all broadcast. All-to-one reduction 3. Non-hyercube all-reduce 30. Asymtotically Otimal One-to-All Broadcast Scatter & aa bcast O( ot, so build O( ot 1a bcast from them by slitting buff into buffs of length n = m (costs for ring, works to: 31. Asymtotically Otimal All-to-One Reduction AA reduction & gather O( ot, so build O( ot a1 reduction from them by slitting buff into buffs of length n = m (costs for ring, works to: Reg 1a: log ((t s +mt w O(BW = log ( 1. Do a scatter on n-len buffs: log (t s + ( m ( 1tw. Do a aa bcast on n-len buffs: ( 1t s + ( m ( 1tw Total cost = (log (+ 1t s + ( m ( 1tw 1 extra t s O(BW = 1 (mt w O( BW ot, since it costs O( same as send! Intro to Parallel Comuting, Fig.1, g 169 scatter [log (t s +m( 1t w ] Intro to Parallel Comuting, Fig.8, g 157 all-to-all bcast [( 1(t s +mt w ] Reg a1: log ((t s +mt w O(BW = log ( Intro to Parallel Comuting, Fig.8, g Do a aa reduc on n-len buffs: ( 1t s + ( m ( 1tw. Do a gather on n-len buffs: log (t s + ( all-to-all reduction [( 1(t s +mt w ] m ( 1tw Total cost = (log (+ 1t s + ( Intro to Parallel Comuting, Fig.1, g 169 m ( 1tw 1 extra t s O(BW = 1 (mt w O( BW ot, since it costs O( same as send! gather [log (t s +m( 1t w ]

9 3. Asymtotically Otimal All-reduce All-reduce can be build out of all-to-one reduction followed by a oneto-all broadcast: All-reduce : (m-len a1 reduction + (m-len 1a bcast All-reduce : ([ m -len aa reduce] + [m -len gather] + ([ m -len scatter] + [m -len aa bcast] [gather] followed by [scatter] = no All-reduce : ( m -len aa reduce + m -len aa bcast aa bcast Total cost: ( ( 1t s + ( m ( 1tw ( 1 t s rather than log ( O(BW = 1 (mt w Normal all-reduce: log ((t s +t w m O(BW = log ( (mt w Intro to Parallel Comuting, Fig.8, g All-ort Communication We ve been assuming 1 send/recv at a time, but some archs allow to send/recv on all network orts simultaneously: All-ort communication rovides seedu, but doesn t change O( much: Max seedu O(log ( for hyercube Only constant seedu for mesh, ring, so no asymtotic diff Very difficult to rog: Contention-free routes mesgs must be found Must make mesgs large enough to be slit, w/o comute time dominating Even harder to kee efficient as logical nodes hysical Mem must be able to kee u (best if MP has q loc mems all comm 3. Communication Cost Summary Oeration Ring time t s O( BW O( one-to-all/a1 log ((t s +mt w, bcast/reduc (log (+ 1t s + ( m( 1 tw log ( O(1 all-to-all ( 1(t s +mt w O( O( bcast/reduc all- log ((t s +mt w, reduce ( ( 1t s + ( m( 1 ts O(log ( O(1 scatter/gather log (t s +m( 1t ( w O(log ( O( all-to-all ers ( 1t s + mt w O( O( circ q-shift min(q, q(t s +mt w O( O( Could ask above for hyercube or mesh All-to-all ers O( for ring, O( for mesh, O( for hyercube Table in book (g 187 on hyercube costs claims ( 1 = O(1! 35. Oeration to MPI Name Maing Oeration MPI Name One-to-all broadcast MPI Bcast All-to-one reduction MPI Reduce All-to-all broadcast MPI Allgather All-to-all reduction MPI Reduce scatter All-reduce MPI Allreduce Gather MPI Gather Scatter MPI Scatter All-to-all ersonalized MPI Alltoall

Program Performance Metrics

Program Performance Metrics Program Performance Metrics he parallel run time (par) is the time from the moment when computation starts to the moment when the last processor finished his execution he speedup (S) is defined as the

More information

John Weatherwax. Analysis of Parallel Depth First Search Algorithms

John Weatherwax. Analysis of Parallel Depth First Search Algorithms Sulementary Discussions and Solutions to Selected Problems in: Introduction to Parallel Comuting by Viin Kumar, Ananth Grama, Anshul Guta, & George Karyis John Weatherwax Chater 8 Analysis of Parallel

More information

Theory of Parallel Hardware May 11, 2004 Massachusetts Institute of Technology Charles Leiserson, Michael Bender, Bradley Kuszmaul

Theory of Parallel Hardware May 11, 2004 Massachusetts Institute of Technology Charles Leiserson, Michael Bender, Bradley Kuszmaul Theory of Parallel Hardware May 11, 2004 Massachusetts Institute of Technology 6.896 Charles Leiserson, Michael Bender, Bradley Kuszmaul Final Examination Final Examination ffl Do not oen this exam booklet

More information

Algorithms for Collective Communication. Design and Analysis of Parallel Algorithms

Algorithms for Collective Communication. Design and Analysis of Parallel Algorithms Algorithms for Collective Communication Design and Analysis of Parallel Algorithms Source A. Grama, A. Gupta, G. Karypis, and V. Kumar. Introduction to Parallel Computing, Chapter 4, 2003. Outline One-to-all

More information

PARALLEL MATRIX MULTIPLICATION: A SYSTEMATIC JOURNEY

PARALLEL MATRIX MULTIPLICATION: A SYSTEMATIC JOURNEY PARALLEL MATRIX MULTIPLICATION: A SYSTEMATIC JOURNEY MARTIN D SCHATZ, ROBERT A VAN DE GEIJN, AND JACK POULSON Abstract We exose a systematic aroach for develoing distributed memory arallel matrix matrix

More information

Finite difference methods. Finite difference methods p. 1

Finite difference methods. Finite difference methods p. 1 Finite difference methods Finite difference methods p. 1 Overview 1D heat equation u t = κu xx +f(x,t) as a motivating example Quick intro of the finite difference method Recapitulation of parallelization

More information

PARALLEL MATRIX MULTIPLICATION: A SYSTEMATIC JOURNEY

PARALLEL MATRIX MULTIPLICATION: A SYSTEMATIC JOURNEY PARALLEL MATRIX MULTIPLICATION: A SYSTEMATIC JOURNEY MARTIN D SCHATZ, ROBERT A VAN DE GEIJN, AND JACK POULSON Abstract We exose a systematic aroach for develoing distributed memory arallel matrixmatrix

More information

A randomized sorting algorithm on the BSP model

A randomized sorting algorithm on the BSP model A randomized sorting algorithm on the BSP model Alexandros V. Gerbessiotis a, Constantinos J. Siniolakis b a CS Deartment, New Jersey Institute of Technology, Newark, NJ 07102, USA b The American College

More information

Overview: Synchronous Computations

Overview: Synchronous Computations Overview: Synchronous Computations barriers: linear, tree-based and butterfly degrees of synchronization synchronous example 1: Jacobi Iterations serial and parallel code, performance analysis synchronous

More information

ECE 669 Parallel Computer Architecture

ECE 669 Parallel Computer Architecture ECE 669 Parallel Computer Architecture Lecture Interconnection Network Performance Performance Analysis of Interconnection Networks Bandwidth Latency Proportional to diameter Latency with contention Processor

More information

Outline. EECS150 - Digital Design Lecture 26 Error Correction Codes, Linear Feedback Shift Registers (LFSRs) Simple Error Detection Coding

Outline. EECS150 - Digital Design Lecture 26 Error Correction Codes, Linear Feedback Shift Registers (LFSRs) Simple Error Detection Coding Outline EECS150 - Digital Design Lecture 26 Error Correction Codes, Linear Feedback Shift Registers (LFSRs) Error detection using arity Hamming code for error detection/correction Linear Feedback Shift

More information

Computer arithmetic. Intensive Computation. Annalisa Massini 2017/2018

Computer arithmetic. Intensive Computation. Annalisa Massini 2017/2018 Comuter arithmetic Intensive Comutation Annalisa Massini 7/8 Intensive Comutation - 7/8 References Comuter Architecture - A Quantitative Aroach Hennessy Patterson Aendix J Intensive Comutation - 7/8 3

More information

1 / 28. Parallel Programming.

1 / 28. Parallel Programming. 1 / 28 Parallel Programming pauldj@aices.rwth-aachen.de Collective Communication 2 / 28 Barrier Broadcast Reduce Scatter Gather Allgather Reduce-scatter Allreduce Alltoall. References Collective Communication:

More information

Analysis of execution time for parallel algorithm to dertmine if it is worth the effort to code and debug in parallel

Analysis of execution time for parallel algorithm to dertmine if it is worth the effort to code and debug in parallel Performance Analysis Introduction Analysis of execution time for arallel algorithm to dertmine if it is worth the effort to code and debug in arallel Understanding barriers to high erformance and redict

More information

Antonio Falabella. 3 rd nternational Summer School on INtelligent Signal Processing for FrontIEr Research and Industry, September 2015, Hamburg

Antonio Falabella. 3 rd nternational Summer School on INtelligent Signal Processing for FrontIEr Research and Industry, September 2015, Hamburg INFN - CNAF (Bologna) 3 rd nternational Summer School on INtelligent Signal Processing for FrontIEr Research and Industry, 14-25 September 2015, Hamburg 1 / 44 Overview 1 2 3 4 5 2 / 44 to Computing The

More information

Named Entity Recognition using Maximum Entropy Model SEEM5680

Named Entity Recognition using Maximum Entropy Model SEEM5680 Named Entity Recognition using Maximum Entroy Model SEEM5680 Named Entity Recognition System Named Entity Recognition (NER): Identifying certain hrases/word sequences in a free text. Generally it involves

More information

Model checking, verification of CTL. One must verify or expel... doubts, and convert them into the certainty of YES [Thomas Carlyle]

Model checking, verification of CTL. One must verify or expel... doubts, and convert them into the certainty of YES [Thomas Carlyle] Chater 5 Model checking, verification of CTL One must verify or exel... doubts, and convert them into the certainty of YES or NO. [Thomas Carlyle] 5. The verification setting Page 66 We introduce linear

More information

Feedback-error control

Feedback-error control Chater 4 Feedback-error control 4.1 Introduction This chater exlains the feedback-error (FBE) control scheme originally described by Kawato [, 87, 8]. FBE is a widely used neural network based controller

More information

Matching Partition a Linked List and Its Optimization

Matching Partition a Linked List and Its Optimization Matching Partition a Linked List and Its Otimization Yijie Han Deartment of Comuter Science University of Kentucky Lexington, KY 40506 ABSTRACT We show the curve O( n log i + log (i) n + log i) for the

More information

Elliptic Curves and Cryptography

Elliptic Curves and Cryptography Ellitic Curves and Crytograhy Background in Ellitic Curves We'll now turn to the fascinating theory of ellitic curves. For simlicity, we'll restrict our discussion to ellitic curves over Z, where is a

More information

Part III. for energy minimization

Part III. for energy minimization ICCV 2007 tutorial Part III Message-assing algorithms for energy minimization Vladimir Kolmogorov University College London Message assing ( E ( (,, Iteratively ass messages between nodes... Message udate

More information

The Graph Accessibility Problem and the Universality of the Collision CRCW Conflict Resolution Rule

The Graph Accessibility Problem and the Universality of the Collision CRCW Conflict Resolution Rule The Grah Accessibility Problem and the Universality of the Collision CRCW Conflict Resolution Rule STEFAN D. BRUDA Deartment of Comuter Science Bisho s University Lennoxville, Quebec J1M 1Z7 CANADA bruda@cs.ubishos.ca

More information

Barrier. Overview: Synchronous Computations. Barriers. Counter-based or Linear Barriers

Barrier. Overview: Synchronous Computations. Barriers. Counter-based or Linear Barriers Overview: Synchronous Computations Barrier barriers: linear, tree-based and butterfly degrees of synchronization synchronous example : Jacobi Iterations serial and parallel code, performance analysis synchronous

More information

A communication-avoiding parallel algorithm for the symmetric eigenvalue problem

A communication-avoiding parallel algorithm for the symmetric eigenvalue problem A communication-avoiding arallel algorithm for the symmetric eigenvalue roblem Edgar Solomonik ETH Zurich Email: solomonik@inf.ethz.ch Grey Ballard Sandia National Laboratory Email: gmballa@sandia.gov

More information

6.852: Distributed Algorithms Fall, Class 10

6.852: Distributed Algorithms Fall, Class 10 6.852: Distributed Algorithms Fall, 2009 Class 10 Today s plan Simulating synchronous algorithms in asynchronous networks Synchronizers Lower bound for global synchronization Reading: Chapter 16 Next:

More information

An Introduction To Range Searching

An Introduction To Range Searching An Introduction To Range Searching Jan Vahrenhold eartment of Comuter Science Westfälische Wilhelms-Universität Münster, Germany. Overview 1. Introduction: Problem Statement, Lower Bounds 2. Range Searching

More information

ECE 534 Information Theory - Midterm 2

ECE 534 Information Theory - Midterm 2 ECE 534 Information Theory - Midterm Nov.4, 009. 3:30-4:45 in LH03. You will be given the full class time: 75 minutes. Use it wisely! Many of the roblems have short answers; try to find shortcuts. You

More information

Topic: Lower Bounds on Randomized Algorithms Date: September 22, 2004 Scribe: Srinath Sridhar

Topic: Lower Bounds on Randomized Algorithms Date: September 22, 2004 Scribe: Srinath Sridhar 15-859(M): Randomized Algorithms Lecturer: Anuam Guta Toic: Lower Bounds on Randomized Algorithms Date: Setember 22, 2004 Scribe: Srinath Sridhar 4.1 Introduction In this lecture, we will first consider

More information

16. Binary Search Trees

16. Binary Search Trees Dictionary imlementation 16. Binary Search Trees [Ottman/Widmayer, Ka..1, Cormen et al, Ka. 12.1-12.] Hashing: imlementation of dictionaries with exected very fast access times. Disadvantages of hashing:

More information

Reconstructing Householder Vectors from Tall-Skinny QR

Reconstructing Householder Vectors from Tall-Skinny QR Reconstructing Householder Vectors from Tall-Skinny QR Grey Ballard James Demmel Laura Grigori Mathias Jacquelin Hong Die Nguyen Edgar Solomonik Electrical Engineering and Comuter Sciences University of

More information

4. Score normalization technical details We now discuss the technical details of the score normalization method.

4. Score normalization technical details We now discuss the technical details of the score normalization method. SMT SCORING SYSTEM This document describes the scoring system for the Stanford Math Tournament We begin by giving an overview of the changes to scoring and a non-technical descrition of the scoring rules

More information

Unit 1 - Computer Arithmetic

Unit 1 - Computer Arithmetic FIXD-POINT (FX) ARITHMTIC Unit 1 - Comuter Arithmetic INTGR NUMBRS n bit number: b n 1 b n 2 b 0 Decimal Value Range of values UNSIGND n 1 SIGND D = b i 2 i D = 2 n 1 b n 1 + b i 2 i n 2 i=0 i=0 [0, 2

More information

Advanced Cryptography Midterm Exam

Advanced Cryptography Midterm Exam Advanced Crytograhy Midterm Exam Solution Serge Vaudenay 17.4.2012 duration: 3h00 any document is allowed a ocket calculator is allowed communication devices are not allowed the exam invigilators will

More information

16. Binary Search Trees

16. Binary Search Trees Dictionary imlementation 16. Binary Search Trees [Ottman/Widmayer, Ka..1, Cormen et al, Ka. 1.1-1.] Hashing: imlementation of dictionaries with exected very fast access times. Disadvantages of hashing:

More information

MATH 2710: NOTES FOR ANALYSIS

MATH 2710: NOTES FOR ANALYSIS MATH 270: NOTES FOR ANALYSIS The main ideas we will learn from analysis center around the idea of a limit. Limits occurs in several settings. We will start with finite limits of sequences, then cover infinite

More information

19th Bay Area Mathematical Olympiad. Problems and Solutions. February 28, 2017

19th Bay Area Mathematical Olympiad. Problems and Solutions. February 28, 2017 th Bay Area Mathematical Olymiad February, 07 Problems and Solutions BAMO- and BAMO- are each 5-question essay-roof exams, for middle- and high-school students, resectively. The roblems in each exam are

More information

Machine Learning: Homework 4

Machine Learning: Homework 4 10-601 Machine Learning: Homework 4 Due 5.m. Monday, February 16, 2015 Instructions Late homework olicy: Homework is worth full credit if submitted before the due date, half credit during the next 48 hours,

More information

GIVEN an input sequence x 0,..., x n 1 and the

GIVEN an input sequence x 0,..., x n 1 and the 1 Running Max/Min Filters using 1 + o(1) Comarisons er Samle Hao Yuan, Member, IEEE, and Mikhail J. Atallah, Fellow, IEEE Abstract A running max (or min) filter asks for the maximum or (minimum) elements

More information

Improved Capacity Bounds for the Binary Energy Harvesting Channel

Improved Capacity Bounds for the Binary Energy Harvesting Channel Imroved Caacity Bounds for the Binary Energy Harvesting Channel Kaya Tutuncuoglu 1, Omur Ozel 2, Aylin Yener 1, and Sennur Ulukus 2 1 Deartment of Electrical Engineering, The Pennsylvania State University,

More information

2. Sample representativeness. That means some type of probability/random sampling.

2. Sample representativeness. That means some type of probability/random sampling. 1 Neuendorf Cluster Analysis Assumes: 1. Actually, any level of measurement (nominal, ordinal, interval/ratio) is accetable for certain tyes of clustering. The tyical methods, though, require metric (I/R)

More information

HYPERCUBE ALGORITHMS FOR IMAGE PROCESSING AND PATTERN RECOGNITION SANJAY RANKA SARTAJ SAHNI Sanjay Ranka and Sartaj Sahni

HYPERCUBE ALGORITHMS FOR IMAGE PROCESSING AND PATTERN RECOGNITION SANJAY RANKA SARTAJ SAHNI Sanjay Ranka and Sartaj Sahni HYPERCUBE ALGORITHMS FOR IMAGE PROCESSING AND PATTERN RECOGNITION SANJAY RANKA SARTAJ SAHNI 1989 Sanjay Ranka and Sartaj Sahni 1 2 Chapter 1 Introduction 1.1 Parallel Architectures Parallel computers may

More information

A Parallel Algorithm for Minimization of Finite Automata

A Parallel Algorithm for Minimization of Finite Automata A Parallel Algorithm for Minimization of Finite Automata B. Ravikumar X. Xiong Deartment of Comuter Science University of Rhode Island Kingston, RI 02881 E-mail: fravi,xiongg@cs.uri.edu Abstract In this

More information

UPPAAL tutorial What s inside UPPAAL The UPPAAL input languages

UPPAAL tutorial What s inside UPPAAL The UPPAAL input languages UPPAAL tutorial What s inside UPPAAL The UPPAAL inut languages 1 UPPAAL tool Develoed jointly by Usala & Aalborg University >>8,000 downloads since 1999 1 UPPAAL Tool Simulation Modeling Verification 3

More information

Fig. 21: Architecture of PeerSim [44]

Fig. 21: Architecture of PeerSim [44] Sulementary Aendix A: Modeling HPP with PeerSim Fig. : Architecture of PeerSim [] In PeerSim, every comonent can be relaced by another comonent imlementing the same interface, and the general simulation

More information

Modelling and implementation of algorithms in applied mathematics using MPI

Modelling and implementation of algorithms in applied mathematics using MPI Modelling and implementation of algorithms in applied mathematics using MPI Lecture 3: Linear Systems: Simple Iterative Methods and their parallelization, Programming MPI G. Rapin Brazil March 2011 Outline

More information

Understanding DPMFoam/MPPICFoam

Understanding DPMFoam/MPPICFoam Understanding DPMFoam/MPPICFoam Jeroen Hofman March 18, 2015 In this document I intend to clarify the flow solver and at a later stage, the article-fluid and article-article interaction forces as imlemented

More information

Eigenanalysis of Finite Element 3D Flow Models by Parallel Jacobi Davidson

Eigenanalysis of Finite Element 3D Flow Models by Parallel Jacobi Davidson Eigenanalysis of Finite Element 3D Flow Models by Parallel Jacobi Davidson Luca Bergamaschi 1, Angeles Martinez 1, Giorgio Pini 1, and Flavio Sartoretto 2 1 Diartimento di Metodi e Modelli Matematici er

More information

Algorithms for Air Traffic Flow Management under Stochastic Environments

Algorithms for Air Traffic Flow Management under Stochastic Environments Algorithms for Air Traffic Flow Management under Stochastic Environments Arnab Nilim and Laurent El Ghaoui Abstract A major ortion of the delay in the Air Traffic Management Systems (ATMS) in US arises

More information

Clojure Concurrency Constructs, Part Two. CSCI 5828: Foundations of Software Engineering Lecture 13 10/07/2014

Clojure Concurrency Constructs, Part Two. CSCI 5828: Foundations of Software Engineering Lecture 13 10/07/2014 Clojure Concurrency Constructs, Part Two CSCI 5828: Foundations of Software Engineering Lecture 13 10/07/2014 1 Goals Cover the material presented in Chapter 4, of our concurrency textbook In particular,

More information

MODELING THE RELIABILITY OF C4ISR SYSTEMS HARDWARE/SOFTWARE COMPONENTS USING AN IMPROVED MARKOV MODEL

MODELING THE RELIABILITY OF C4ISR SYSTEMS HARDWARE/SOFTWARE COMPONENTS USING AN IMPROVED MARKOV MODEL Technical Sciences and Alied Mathematics MODELING THE RELIABILITY OF CISR SYSTEMS HARDWARE/SOFTWARE COMPONENTS USING AN IMPROVED MARKOV MODEL Cezar VASILESCU Regional Deartment of Defense Resources Management

More information

CSE Introduction to Parallel Processing. Chapter 2. A Taste of Parallel Algorithms

CSE Introduction to Parallel Processing. Chapter 2. A Taste of Parallel Algorithms Dr.. Izadi CSE-0 Introduction to Parallel Processing Chapter 2 A Taste of Parallel Algorithms Consider five basic building-block parallel operations Implement them on four simple parallel architectures

More information

On Line Parameter Estimation of Electric Systems using the Bacterial Foraging Algorithm

On Line Parameter Estimation of Electric Systems using the Bacterial Foraging Algorithm On Line Parameter Estimation of Electric Systems using the Bacterial Foraging Algorithm Gabriel Noriega, José Restreo, Víctor Guzmán, Maribel Giménez and José Aller Universidad Simón Bolívar Valle de Sartenejas,

More information

Trivially parallel computing

Trivially parallel computing Parallel Computing After briefly discussing the often neglected, but in praxis frequently encountered, issue of trivially parallel computing, we turn to parallel computing with information exchange. Our

More information

A SIMPLE AD EFFICIET PARALLEL FFT ALGORITHM USIG THE BSP MODEL MARCIA A. IDA AD ROB H. BISSELIG Abstract. In this aer, we resent a new arallel radix-4

A SIMPLE AD EFFICIET PARALLEL FFT ALGORITHM USIG THE BSP MODEL MARCIA A. IDA AD ROB H. BISSELIG Abstract. In this aer, we resent a new arallel radix-4 Universiteit-Utrecht * Deartment of Mathematics A simle and ecient arallel FFT algorithm using the BSP model by Marcia A. Inda and Rob H. Bisseling Prerint nr. 3 March 2000 A SIMPLE AD EFFICIET PARALLEL

More information

p,egp AFp EFp ... p,agp

p,egp AFp EFp ... p,agp TUESDAY, Session 2 Temoral logic and model checking, cont 1 Branching time and CTL model checking In a branching time temoral logics, we consider not just a single ath through the Krike model, but all

More information

Agreement. Today. l Coordination and agreement in group communication. l Consensus

Agreement. Today. l Coordination and agreement in group communication. l Consensus Agreement Today l Coordination and agreement in group communication l Consensus Events and process states " A distributed system a collection P of N singlethreaded processes w/o shared memory Each process

More information

q-ary Symmetric Channel for Large q

q-ary Symmetric Channel for Large q List-Message Passing Achieves Caacity on the q-ary Symmetric Channel for Large q Fan Zhang and Henry D Pfister Deartment of Electrical and Comuter Engineering, Texas A&M University {fanzhang,hfister}@tamuedu

More information

Topic 7: Using identity types

Topic 7: Using identity types Toic 7: Using identity tyes June 10, 2014 Now we would like to learn how to use identity tyes and how to do some actual mathematics with them. By now we have essentially introduced all inference rules

More information

Lecture 4. Writing parallel programs with MPI Measuring performance

Lecture 4. Writing parallel programs with MPI Measuring performance Lecture 4 Writing parallel programs with MPI Measuring performance Announcements Wednesday s office hour moved to 1.30 A new version of Ring (Ring_new) that handles linear sequences of message lengths

More information

Parallelism and Locality in Priority Queues. A. Ranade S. Cheng E. Deprit J. Jones S. Shih. University of California. Berkeley, CA 94720

Parallelism and Locality in Priority Queues. A. Ranade S. Cheng E. Deprit J. Jones S. Shih. University of California. Berkeley, CA 94720 Parallelism and Locality in Priority Queues A. Ranade S. Cheng E. Derit J. Jones S. Shih Comuter Science Division University of California Berkeley, CA 94720 Abstract We exlore two ways of incororating

More information

CSE613: Parallel Programming, Spring 2012 Date: May 11. Final Exam. ( 11:15 AM 1:45 PM : 150 Minutes )

CSE613: Parallel Programming, Spring 2012 Date: May 11. Final Exam. ( 11:15 AM 1:45 PM : 150 Minutes ) CSE613: Parallel Programming, Spring 2012 Date: May 11 Final Exam ( 11:15 AM 1:45 PM : 150 Minutes ) This exam will account for either 10% or 20% of your overall grade depending on your relative performance

More information

Shadow Computing: An Energy-Aware Fault Tolerant Computing Model

Shadow Computing: An Energy-Aware Fault Tolerant Computing Model Shadow Comuting: An Energy-Aware Fault Tolerant Comuting Model Bryan Mills, Taieb Znati, Rami Melhem Deartment of Comuter Science University of Pittsburgh (bmills, znati, melhem)@cs.itt.edu Index Terms

More information

2 Asymptotic density and Dirichlet density

2 Asymptotic density and Dirichlet density 8.785: Analytic Number Theory, MIT, sring 2007 (K.S. Kedlaya) Primes in arithmetic rogressions In this unit, we first rove Dirichlet s theorem on rimes in arithmetic rogressions. We then rove the rime

More information

Lecture 1.2 Units, Dimensions, Estimations 1. Units To measure a quantity in physics means to compare it with a standard. Since there are many

Lecture 1.2 Units, Dimensions, Estimations 1. Units To measure a quantity in physics means to compare it with a standard. Since there are many Lecture. Units, Dimensions, Estimations. Units To measure a quantity in hysics means to comare it with a standard. Since there are many different quantities in nature, it should be many standards for those

More information

Distributed Systems Byzantine Agreement

Distributed Systems Byzantine Agreement Distributed Systems Byzantine Agreement He Sun School of Informatics University of Edinburgh Outline Finish EIG algorithm for Byzantine agreement. Number-of-processors lower bound for Byzantine agreement.

More information

2 Asymptotic density and Dirichlet density

2 Asymptotic density and Dirichlet density 8.785: Analytic Number Theory, MIT, sring 2007 (K.S. Kedlaya) Primes in arithmetic rogressions In this unit, we first rove Dirichlet s theorem on rimes in arithmetic rogressions. We then rove the rime

More information

Outline. CS21 Decidability and Tractability. Regular expressions and FA. Regular expressions and FA. Regular expressions and FA

Outline. CS21 Decidability and Tractability. Regular expressions and FA. Regular expressions and FA. Regular expressions and FA Outline CS21 Decidability and Tractability Lecture 4 January 14, 2019 FA and Regular Exressions Non-regular languages: Puming Lemma Pushdown Automata Context-Free Grammars and Languages January 14, 2019

More information

Online Appendix to Accompany AComparisonof Traditional and Open-Access Appointment Scheduling Policies

Online Appendix to Accompany AComparisonof Traditional and Open-Access Appointment Scheduling Policies Online Aendix to Accomany AComarisonof Traditional and Oen-Access Aointment Scheduling Policies Lawrence W. Robinson Johnson Graduate School of Management Cornell University Ithaca, NY 14853-6201 lwr2@cornell.edu

More information

Time. To do. q Physical clocks q Logical clocks

Time. To do. q Physical clocks q Logical clocks Time To do q Physical clocks q Logical clocks Events, process states and clocks A distributed system A collection P of N single-threaded processes (p i, i = 1,, N) without shared memory The processes in

More information

Radial Basis Function Networks: Algorithms

Radial Basis Function Networks: Algorithms Radial Basis Function Networks: Algorithms Introduction to Neural Networks : Lecture 13 John A. Bullinaria, 2004 1. The RBF Maing 2. The RBF Network Architecture 3. Comutational Power of RBF Networks 4.

More information

An Introduction to Information Theory: Notes

An Introduction to Information Theory: Notes An Introduction to Information Theory: Notes Jon Shlens jonshlens@ucsd.edu 03 February 003 Preliminaries. Goals. Define basic set-u of information theory. Derive why entroy is the measure of information

More information

Image Reconstruction And Poisson s equation

Image Reconstruction And Poisson s equation Chapter 1, p. 1/58 Image Reconstruction And Poisson s equation School of Engineering Sciences Parallel s for Large-Scale Problems I Chapter 1, p. 2/58 Outline 1 2 3 4 Chapter 1, p. 3/58 Question What have

More information

Numerical Linear Algebra

Numerical Linear Algebra Numerical Linear Algebra Numerous alications in statistics, articularly in the fitting of linear models. Notation and conventions: Elements of a matrix A are denoted by a ij, where i indexes the rows and

More information

Multi-Operation Multi-Machine Scheduling

Multi-Operation Multi-Machine Scheduling Multi-Oeration Multi-Machine Scheduling Weizhen Mao he College of William and Mary, Williamsburg VA 3185, USA Abstract. In the multi-oeration scheduling that arises in industrial engineering, each job

More information

LIMITATIONS OF RECEPTRON. XOR Problem The failure of the perceptron to successfully simple problem such as XOR (Minsky and Papert).

LIMITATIONS OF RECEPTRON. XOR Problem The failure of the perceptron to successfully simple problem such as XOR (Minsky and Papert). LIMITATIONS OF RECEPTRON XOR Problem The failure of the ercetron to successfully simle roblem such as XOR (Minsky and Paert). x y z x y z 0 0 0 0 0 0 Fig. 4. The exclusive-or logic symbol and function

More information

STA 250: Statistics. Notes 7. Bayesian Approach to Statistics. Book chapters: 7.2

STA 250: Statistics. Notes 7. Bayesian Approach to Statistics. Book chapters: 7.2 STA 25: Statistics Notes 7. Bayesian Aroach to Statistics Book chaters: 7.2 1 From calibrating a rocedure to quantifying uncertainty We saw that the central idea of classical testing is to rovide a rigorous

More information

DRAFT - do not circulate

DRAFT - do not circulate An Introduction to Proofs about Concurrent Programs K. V. S. Prasad (for the course TDA383/DIT390) Deartment of Comuter Science Chalmers University Setember 26, 2016 Rough sketch of notes released since

More information

Parallel programming using MPI. Analysis and optimization. Bhupender Thakur, Jim Lupo, Le Yan, Alex Pacheco

Parallel programming using MPI. Analysis and optimization. Bhupender Thakur, Jim Lupo, Le Yan, Alex Pacheco Parallel programming using MPI Analysis and optimization Bhupender Thakur, Jim Lupo, Le Yan, Alex Pacheco Outline l Parallel programming: Basic definitions l Choosing right algorithms: Optimal serial and

More information

A generalization of Amdahl's law and relative conditions of parallelism

A generalization of Amdahl's law and relative conditions of parallelism A generalization of Amdahl's law and relative conditions of arallelism Author: Gianluca Argentini, New Technologies and Models, Riello Grou, Legnago (VR), Italy. E-mail: gianluca.argentini@riellogrou.com

More information

Evaluating Circuit Reliability Under Probabilistic Gate-Level Fault Models

Evaluating Circuit Reliability Under Probabilistic Gate-Level Fault Models Evaluating Circuit Reliability Under Probabilistic Gate-Level Fault Models Ketan N. Patel, Igor L. Markov and John P. Hayes University of Michigan, Ann Arbor 48109-2122 {knatel,imarkov,jhayes}@eecs.umich.edu

More information

HENSEL S LEMMA KEITH CONRAD

HENSEL S LEMMA KEITH CONRAD HENSEL S LEMMA KEITH CONRAD 1. Introduction In the -adic integers, congruences are aroximations: for a and b in Z, a b mod n is the same as a b 1/ n. Turning information modulo one ower of into similar

More information

Parallel Programming

Parallel Programming Parallel Programming Prof. Paolo Bientinesi pauldj@aices.rwth-aachen.de WS 16/17 Communicators MPI_Comm_split( MPI_Comm comm, int color, int key, MPI_Comm* newcomm)

More information

Distributed Systems Principles and Paradigms

Distributed Systems Principles and Paradigms Distributed Systems Principles and Paradigms Chapter 6 (version April 7, 28) Maarten van Steen Vrije Universiteit Amsterdam, Faculty of Science Dept. Mathematics and Computer Science Room R4.2. Tel: (2)

More information

FAST AND EFFICIENT SIDE INFORMATION GENERATION IN DISTRIBUTED VIDEO CODING BY USING DENSE MOTION REPRESENTATIONS

FAST AND EFFICIENT SIDE INFORMATION GENERATION IN DISTRIBUTED VIDEO CODING BY USING DENSE MOTION REPRESENTATIONS 18th Euroean Signal Processing Conference (EUSIPCO-2010) Aalborg, Denmark, August 23-27, 2010 FAST AND EFFICIENT SIDE INFORMATION GENERATION IN DISTRIBUTED VIDEO CODING BY USING DENSE MOTION REPRESENTATIONS

More information

Theoretically Optimal and Empirically Efficient R-trees with Strong Parallelizability

Theoretically Optimal and Empirically Efficient R-trees with Strong Parallelizability Theoretically Otimal and Emirically Efficient R-trees with Strong Parallelizability Jianzhong Qi, Yufei Tao, Yanchuan Chang, Rui Zhang School of Comuting and Information Systems, The University of Melbourne

More information

Participation Factors. However, it does not give the influence of each state on the mode.

Participation Factors. However, it does not give the influence of each state on the mode. Particiation Factors he mode shae, as indicated by the right eigenvector, gives the relative hase of each state in a articular mode. However, it does not give the influence of each state on the mode. We

More information

Time Synchronization

Time Synchronization Massachusetts Institute of Technology Lecture 7 6.895: Advanced Distributed Algorithms March 6, 2006 Professor Nancy Lynch Time Synchronization Readings: Fan, Lynch. Gradient clock synchronization Attiya,

More information

Information collection on a graph

Information collection on a graph Information collection on a grah Ilya O. Ryzhov Warren Powell February 10, 2010 Abstract We derive a knowledge gradient olicy for an otimal learning roblem on a grah, in which we use sequential measurements

More information

On the Toppling of a Sand Pile

On the Toppling of a Sand Pile Discrete Mathematics and Theoretical Comuter Science Proceedings AA (DM-CCG), 2001, 275 286 On the Toling of a Sand Pile Jean-Christohe Novelli 1 and Dominique Rossin 2 1 CNRS, LIFL, Bâtiment M3, Université

More information

0.6 Factoring 73. As always, the reader is encouraged to multiply out (3

0.6 Factoring 73. As always, the reader is encouraged to multiply out (3 0.6 Factoring 7 5. The G.C.F. of the terms in 81 16t is just 1 so there is nothing of substance to factor out from both terms. With just a difference of two terms, we are limited to fitting this olynomial

More information

compare to comparison and pointer based sorting, binary trees

compare to comparison and pointer based sorting, binary trees Admin Hashing Dictionaries Model Operations. makeset, insert, delete, find keys are integers in M = {1,..., m} (so assume machine word size, or unit time, is log m) can store in array of size M using power:

More information

Decoding Linear Block Codes Using a Priority-First Search: Performance Analysis and Suboptimal Version

Decoding Linear Block Codes Using a Priority-First Search: Performance Analysis and Suboptimal Version IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 44, NO. 3, MAY 1998 133 Decoding Linear Block Codes Using a Priority-First Search Performance Analysis Subotimal Version Yunghsiang S. Han, Member, IEEE, Carlos

More information

Information collection on a graph

Information collection on a graph Information collection on a grah Ilya O. Ryzhov Warren Powell October 25, 2009 Abstract We derive a knowledge gradient olicy for an otimal learning roblem on a grah, in which we use sequential measurements

More information

Homework Solution 4 for APPM4/5560 Markov Processes

Homework Solution 4 for APPM4/5560 Markov Processes Homework Solution 4 for APPM4/556 Markov Processes 9.Reflecting random walk on the line. Consider the oints,,, 4 to be marked on a straight line. Let X n be a Markov chain that moves to the right with

More information

15-451/651: Design & Analysis of Algorithms October 23, 2018 Lecture #17: Prediction from Expert Advice last changed: October 25, 2018

15-451/651: Design & Analysis of Algorithms October 23, 2018 Lecture #17: Prediction from Expert Advice last changed: October 25, 2018 5-45/65: Design & Analysis of Algorithms October 23, 208 Lecture #7: Prediction from Exert Advice last changed: October 25, 208 Prediction with Exert Advice Today we ll study the roblem of making redictions

More information

Optimal Recognition Algorithm for Cameras of Lasers Evanescent

Optimal Recognition Algorithm for Cameras of Lasers Evanescent Otimal Recognition Algorithm for Cameras of Lasers Evanescent T. Gaudo * Abstract An algorithm based on the Bayesian aroach to detect and recognise off-axis ulse laser beams roagating in the atmoshere

More information

Efficient Hardware Architecture of SEED S-box for Smart Cards

Efficient Hardware Architecture of SEED S-box for Smart Cards JOURNL OF SEMICONDUCTOR TECHNOLOY ND SCIENCE VOL.4 NO.4 DECEMBER 4 37 Efficient Hardware rchitecture of SEED S-bo for Smart Cards Joon-Ho Hwang bstract This aer resents an efficient architecture that otimizes

More information

Periodic scheduling 05/06/

Periodic scheduling 05/06/ Periodic scheduling T T or eriodic scheduling, the best that we can do is to design an algorithm which will always find a schedule if one exists. A scheduler is defined to be otimal iff it will find a

More information

Cryptography Assignment 3

Cryptography Assignment 3 Crytograhy Assignment Michael Orlov orlovm@cs.bgu.ac.il) Yanik Gleyzer yanik@cs.bgu.ac.il) Aril 9, 00 Abstract Solution for Assignment. The terms in this assignment are used as defined in [1]. In some

More information

MA3H1 TOPICS IN NUMBER THEORY PART III

MA3H1 TOPICS IN NUMBER THEORY PART III MA3H1 TOPICS IN NUMBER THEORY PART III SAMIR SIKSEK 1. Congruences Modulo m In quadratic recirocity we studied congruences of the form x 2 a (mod ). We now turn our attention to situations where is relaced

More information