Physical Organization

Similar documents
Computing and Communications -- Network Coding

Random Access Techniques: ALOHA (cont.)

From Elimination to Belief Propagation

UNTYPED LAMBDA CALCULUS (II)

COHORT MBA. Exponential function. MATH review (part2) by Lucian Mitroiu. The LOG and EXP functions. Properties: e e. lim.

Chapter 6 Folding. Folding

Quasi-Classical States of the Simple Harmonic Oscillator

Propositional Logic. Combinatorial Problem Solving (CPS) Albert Oliveras Enric Rodríguez-Carbonell. May 17, 2018

Learning Spherical Convolution for Fast Features from 360 Imagery

Searching Linked Lists. Perfect Skip List. Building a Skip List. Skip List Analysis (1) Assume the list is sorted, but is stored in a linked list.

1 Minimum Cut Problem

Alpha and beta decay equation practice

Brief on APSCO Data Sharing Service Platform Project

Higher order derivatives

Strongly Connected Components

EEO 401 Digital Signal Processing Prof. Mark Fowler

Machine Detector Interface Workshop: ILC-SLAC, January 6-8, 2005.

ECE602 Exam 1 April 5, You must show ALL of your work for full credit.

ME 321 Kinematics and Dynamics of Machines S. Lambert Winter 2002

CS 6353 Compiler Construction, Homework #1. 1. Write regular expressions for the following informally described languages:

Addition of angular momentum

Announce. ECE 2026 Summer LECTURE OBJECTIVES READING. LECTURE #3 Complex View of Sinusoids May 21, Complex Number Review

The van der Waals interaction 1 D. E. Soper 2 University of Oregon 20 April 2012

Classical Magnetic Dipole

CE 530 Molecular Simulation

PHA 5127 Answers Homework 2 Fall 2001

Molecular Orbitals in Inorganic Chemistry

Coupled Pendulums. Two normal modes.

Y 0. Standing Wave Interference between the incident & reflected waves Standing wave. A string with one end fixed on a wall

Lecture 2: Discrete-Time Signals & Systems. Reza Mohammadkhani, Digital Signal Processing, 2015 University of Kurdistan eng.uok.ac.

Derangements and Applications

Addition of angular momentum

Basic Polyhedral theory

v 5 v 7 v v 2 v 6 v 8 v 4

EECE 301 Signals & Systems Prof. Mark Fowler

(1) Then we could wave our hands over this and it would become:

First derivative analysis

Association (Part II)

GEOMETRICAL PHENOMENA IN THE PHYSICS OF SUBATOMIC PARTICLES. Eduard N. Klenov* Rostov-on-Don, Russia

11: Echo formation and spatial encoding

ECE 407 Computer Aided Design for Electronic Systems. Instructor: Maria K. Michael. Overview. CAD tools for multi-level logic synthesis:

Final Exam Solutions

Statistical Thermodynamics: Sublimation of Solid Iodine

Aim To manage files and directories using Linux commands. 1. file Examines the type of the given file or directory

Slide 1. Slide 2. Slide 3 DIGITAL SIGNAL PROCESSING CLASSIFICATION OF SIGNALS

Fourier Transforms and the Wave Equation. Key Mathematics: More Fourier transform theory, especially as applied to solving the wave equation.

ECE 2210 / 00 Phasor Examples

orbiting electron turns out to be wrong even though it Unfortunately, the classical visualization of the

Middle East Technical University Department of Mechanical Engineering ME 413 Introduction to Finite Element Analysis

COMPUTER GENERATED HOLOGRAMS Optical Sciences 627 W.J. Dallas (Monday, April 04, 2005, 8:35 AM) PART I: CHAPTER TWO COMB MATH.

A s device signals an interrupt. time-> time T. A s device. starts device. starts device. A s ISR. WAIT/block. Process A. interrupt.

2 AN OVERVIEW OF THE TENSOR PRODUCT

Higher-Order Discrete Calculus Methods

perm4 A cnt 0 for for if A i 1 A i cnt cnt 1 cnt i j. j k. k l. i k. j l. i l

Lecture 37 (Schrödinger Equation) Physics Spring 2018 Douglas Fields

Chapter 13 GMM for Linear Factor Models in Discount Factor form. GMM on the pricing errors gives a crosssectional

Problem Set 6 Solutions

Introduction to Arithmetic Geometry Fall 2013 Lecture #20 11/14/2013

Ph.D. students Department of Electronics and Telecommunications, Politecnico di Torino

A Sub-Optimal Log-Domain Decoding Algorithm for Non-Binary LDPC Codes

10. The Discrete-Time Fourier Transform (DTFT)

Chapter Finding Small Vertex Covers. Extending the Limits of Tractability. Coping With NP-Completeness. Vertex Cover

Roadmap. XML Indexing. DataGuide example. DataGuides. Strong DataGuides. Multiple DataGuides for same data. CPS Topics in Database Systems

MA 262, Spring 2018, Final exam Version 01 (Green)

On the Hamiltonian of a Multi-Electron Atom

Differentiation of Exponential Functions

Einstein Equations for Tetrad Fields

Search sequence databases 3 10/25/2016


u 3 = u 3 (x 1, x 2, x 3 )

Elements of Statistical Thermodynamics

Division of Mechanics Lund University MULTIBODY DYNAMICS. Examination Name (write in block letters):.

Outline. Thanks to Ian Blockland and Randy Sobie for these slides Lifetimes of Decaying Particles Scattering Cross Sections Fermi s Golden Rule

u x v x dx u x v x v x u x dx d u x v x u x v x dx u x v x dx Integration by Parts Formula

Indexed Search Tree (Trie)

Davisson Germer experiment

That is, we start with a general matrix: And end with a simpler matrix:

Middle East Technical University Department of Mechanical Engineering ME 413 Introduction to Finite Element Analysis

Brief Introduction to Statistical Mechanics

General Notes About 2007 AP Physics Scoring Guidelines

EEO 401 Digital Signal Processing Prof. Mark Fowler

Probability Translation Guide

Davisson Germer experiment Announcements:

Homework #3. 1 x. dx. It therefore follows that a sum of the

A False History of True Concurrency

The pn junction: 2 Current vs Voltage (IV) characteristics

Electromagnetics Research Group A THEORETICAL MODEL OF A LOSSY DIELECTRIC SLAB FOR THE CHARACTERIZATION OF RADAR SYSTEM PERFORMANCE SPECIFICATIONS

Week 3: Connected Subgraphs

EXST Regression Techniques Page 1

ECE 344 Microwave Fundamentals

INTRODUCTION TO AUTOMATIC CONTROLS INDEX LAPLACE TRANSFORMS

Integration by Parts

Collisions between electrons and ions

Inheritance Gains in Notional Defined Contributions Accounts (NDCs)

Atomic energy levels. Announcements:

Data Assimilation 1. Alan O Neill National Centre for Earth Observation UK

Exam 1. It is important that you clearly show your work and mark the final answer clearly, closed book, closed notes, no calculator.

Chapter 8: Electron Configurations and Periodicity

10. EXTENDING TRACTABILITY

5. B To determine all the holes and asymptotes of the equation: y = bdc dced f gbd

Transcription:

Lctur

usbasd symmtric multiprocssors (SM s): combin both aspcts Compilr support? rchitctural support? Static and dynamic locality of rfrnc ar critical for high prformanc M I M ccss to local mmory is usually 0000 tims fastr than accss to nonlocal mmory Nonuniform mmory accss (NUM) machins: what about cachs? roblm: why go across ntwork for instructions? radonly data? Early paralll procssors lik NYU Ultracomputr ll mmory is qually far away from all procssors I M M Uniform mmory accss (UM) machins hysical Organization

y diffrnc: In SMM, can accss rmot mmory locations w/o prarrangd participation of application program on rmot procssor basic mssagpassing commands: snd rciv communication btwn procssors: mssags (lik mail) ach procssor has its own addrss spac I M (concptual pictur) M Distributd Mmory Modl (Mssag assing) som systms: distinguish btwn local and rmot rfrncs communication btwn procssors: rad/writ shard mmory locations: put gt to applications programmr hardwar/systms softwar provid singl addrss spac modl M singl addrss spac I (concptual pictur) x M Shard Mmory Modl Logical Organization

Mssag assing

Ovrlapping of computation and communication is critical for prformanc roblm: sndr cannot push data out and mov on rcivr cannot do othr work if data is not availabl yt on possibility: nw command TEST(Src,flag): is thr a mssag from Src? Motivation: Hardwar channls btwn procssors in arly multicomputrs Implmntation: Src snds tokn saying rady to snd Dst rturns tokn saying m too Data transfr taks plac dirctly btwn application programs w/o buffring in O/S Src fild in RECEIVE command prmits Dst to slct which procssor it wants to rciv data from History: Caltch Cosmic Cub SEND(x, Dst) x : RECEIVE(y,Src) M F M F Src Dst Sndr and rcivr rndzvous to xchang data locking SEND/RECEIVE : coupl data transfr and synchronization

Can w liminat buffring of data at Dst? Can w liminat waiting at Src? Data is buffrd in O/S buffrs at Dst till application program dos a RECEIVE What if Dst has not don a RECEIVE whn data arrivs from Src? pplications program can tst flag and tak th right action RECEIVE dos not block flag is st to tru by O/S if data was transfrd/fals othrwis Tag fild on mssags prmits rcivr to rciv mssags in an ordr diffrnt from ordr that thy wr snt by Src Many variation: rturn to application program whn data is out on ntwork? data has bn copid into an O/S buffr? Src can push data out and mov on Ntwork x : SEND(x, Dst,tag) RECEIVE(y,Src,tag,flag) M F M F Src Dst Nonblocking SEND/RECEIVE : dcoupl synchronization from data transfr

Eliminats buffring of data in Dst O/S ara if IRECEIVE is postd bfor mssag arrivs at Dst posting of information to O/S Flag is writtn by O/S and rad by application program on Dst tlls O/S to plac data in y and st flag aftr data is rcivd rturns bfor data arrivs pplication program continus, but must tst flag bfor ovrwriting x RECEIVE is nonblocking: SEND rturns as soon as O/S knows about what nds to b snt Flag st by O/S whn data in x has bn shippd out Ntwork x : ISEND(x, Dst,tag,flag) IRECEIVE(y,Src,tag,flag) M F M F Src Dst synchronous SEND/RECEIVE

ach procssor dos a ontoall communication alltoall prsonalizd communication on procssor snds a diffrnt pic of data to all othr procssors ontoall prsonalizd communication vry procssor snds a pic of data to vry othr procssor alltoall broadcast (g adding a st of numbrs distributd across all procssors) alltoon rduction (g x implmntd by rowwis distribution: all procssors nd x) ontoall broadcast important ons: than through long squncs of snd s and rciv s pattrns of group communication that can b implmntd mor fficintly Collctiv communication: So far, w hav lookd at pointtopoint communication

in broadcast many mssags by th tim procssor is rady to participat Rality chck: ctually, a kary tr maks sns bcaus procssor 0 can snd Ts log + Th() + Ts + Th/ + Total tim for broadcast Ts + Th/ ssuming mssag siz is small, tim to snd a mssag Ts + hth whr Ts ovrhad at sndr/rcivr Th tim pr hop 3 6 5 0 3 7 3 3 Mssags in ach phas do not compt for links 3 Exampl: Ontoall broadcast (intuition: think tr )

Tim Ts log + Th(sqrt() ) Stp : roadcast within ach column in paralll Stp : roadcast within row of originating procssor D Msh Othr topologis: us th sam ida

Exampl: lltoon rduction 0 7 6 5 3 Mssags in ach phas do not compt for links urpos: apply a commutativ and associativ oprator (rduction oprator) lik +,,ND,OR tc to valus containd in ach nod Can b viwd as invrs of ontoall broadcast Sam tim as ontoall broadcast Important us: dtrmin whn all procssors ar finishd working (implmntation of barrir )

scond phas, alltoall broadcast within ach column first phas, alltoall broadcast within ach row Sam ida can b applid to mshs as wll: Tim (Ts + Th) () assuming mssag siz is small Total of () phass to complt alltoall broadcast stors it away, and snds it to nxt nighbor in th nxt phas Each procssor rcivs a valu from on nighbor, Intuition: cyclic shift rgistr 3 3 6 5 6 5 3 6 5 0 7 5 3 0 0 7 7 0 6 7 Exampl: lltoall broadcast

Mssagpassing rogram

Mid 99: MI standard out and svral implmntations availabl (S) MI goal: standardiz mssag passing constructs syntax and smantics vn to go from on distributd mmory platform to anothr! porting programs rquird changing paralll programs Each vndor had its own communication constructs Lots of vndors of Distributd Mmory Computrs: IM,NCub, Intl, CM5, Distributd Mmory Computrs Goal: ortabl aralll rogramming for MI: Mssagassing Intrfac

Vry naiv algorithm, but it s a start slav prforms product, rturns rsult and asks for mor work mastr snds a row of matrix to slav ach slav coms to mastr for work Slavs ar slfschduld Mastr broadcasts vctor b to all slavs Mastr initially owns all rows of and vctor b mastr coordinats activitis of slavs on mastr, svral slavs Styl of programming: MastrSlav b Writ an MI program to prform matrixvctor multiply

MI_CST: roadcast MI_FINLIZE: Trminat MI MI_RECV: Rciv a mssag (blocking rciv) to b snt with on command prmits ntir data structurs idntifis procss group MI_SEND(addrss,count,datatyp,Dst,tag,comm) MI_SEND: Snd a mssag MI_COMM_RN: Who am I? MI_COMM_SIZE: Find out how many procsss thr ar MI_INIT : Initializ th MI Systm y MI Routins w will us:

&&&&&&& &&&&&&& : 5! )!, 5 ) 9 : ' ( J 5 /??H I C )) 5 ' %, ), 53 /;$! / ) $ 5$ % 5! 6 %! ) G% 53 /; 3 ' 0! & F &! ) 53 %?? :??! )? C 9 / )! )) E 9 0! )) )) 6 ) 0/ 0/ 5 0/ 6 ) D D 9 6 ) 0!! ) / )! )) $ C $ % / ) $ )'' $ % / ) /; )! ) 9?????? 6 ) 9 % $% 7 /! 3&' 7 / ) 6 )!!! $! &%! ) + (, ) $! % / % 0! / % 3%'( $ % 6 ),

' 0! & F &! ) 53! ) G% 53 /; 3 G% ) )) 0, $%?H %? D ) : D? 5?H 5 5!? )!,? 5 3 )3 / ) %! ) 9 )) 5 : 3 )3 / )?L 5 $ / )! (?M D % $ )'' D D?M 5 $ )'' % : 0/ 9 )) N I 9 & D 5! 5! O?L 5? ) : Q

5 0/ 9 $ )) 5 ) $ D 5 ) ' 5 $ 5! & &% : 53?R D % $ )'' D 5 $ 5! O D?R 5 $ )'' % 5 ) : 0/ 9 )) D 5 ) 5! O 5! 5! O % 5 ) 5 0/ 9 )) 5 '? 5 / ) $ 3 ):?S % / ) T T T T?S 5 U

!?? 5 $ %?V % W X 5 /??H I C )) 5 ' 5 ' 6?V 9 )) 5 ) 0/ 53 0/ 5 O $ )''?? %?&? % 6??H ' $ & F & 9 )) $ )''! ) 9 ))! ) %, ), 53 /;$! / ) $ 5$ % 5! 6

T T ' T T, : 5! )!, 5 ) 9 : ' ( W %?? :??! )? 5 0/ M 0/ H : 0/ / ) 0! T / )! T,% T 5 ' / 9 / ) $ 3 % H / ) Y T ' 5 $ / )! & &% H 53 C 9 / )! )) E 9 0! )) )) 6 ) 0/ 0/ 5 0/ 6 ) D D 9 6 ) 0!! ) / )! )) $ C $ % / ) $ )'' $ % / ) /; )! ) 9?????? 6 ) 9 % $% 7 /! 3&' 7 / ) 6 )!!! $! &%! ) + (, ) $! % / % 0! / % 3%'( $ % 6 ),

$?L : %?M D % 5??H? D W W 5 ) $ 0/ 9 )) 0/ 9 )) 5? ) : 5 5! 5! O D $ )'' 5 $ )'' D D?M %?L 5 $ / )! ( 5 : 3 )3 / ) 9 ))! ) 3 )3 / ) )!,? 5! 5 5 D? ) :?H % G%! ) G% 53 /; 3 ' 0! & F &! ) 53

5 $ %?V %?S T T? 5 : %?R 5! D W Z O $ )''?? %?&? % 6??H ' $ & F & 0/ 0/ 53 9 )) $ )''! ) 9 ))! ) %, ), 53 /;$! / ) $ 5$ % 5! 6 % 5 / ) T T?S % / ) $ 3 ): 5 5 ' 9 )) 5 ) 0/ 9 )) % 5! 5! O D 5 ) 5! O $ )'' 5 ) 5 $ )'' D D 5 $ O?R % ' 5 $ 5! & &% : 53 D 5 )

! [W 5 /??H I C )) 5 ' 5 ' 6?V 9 )) 5 ) 0/?? 5

]\ ^ _ `_ acb d f gih h bb db g k h l l m^ k ^ n_ h bb od p m^ k b dq k h l r sb ` g b^ dt h h` u pq tr v g k h l l m^ k bbw g n d _ dx nd `s d `] d h_ l nd o yd y`s h m n ] m `] d tz^ _ ` g d f l o ^ _ { m^ ` nh ` ^ ^ ` ^ d _} z m g m^ n g b^ d ~ nhd ] g n d n sb o sm o ^ d dm ` g k h l ~ y _ `s `] ^ _ ^ _ m ` d a n l l m u n }f pq ~ w r ~ } }} v } W ƒ