The DOACROSS statement

Similar documents
Counting Paths Between Vertices. Isomorphism of Graphs. Isomorphism of Graphs. Isomorphism of Graphs. Isomorphism of Graphs. Isomorphism of Graphs

Lecture 6: Coding theory

CS 491G Combinatorial Optimization Lecture Notes

Solutions to Problem Set #1

Solutions for HW9. Bipartite: put the red vertices in V 1 and the black in V 2. Not bipartite!

Numbers and indices. 1.1 Fractions. GCSE C Example 1. Handy hint. Key point

1 PYTHAGORAS THEOREM 1. Given a right angled triangle, the square of the hypotenuse is equal to the sum of the squares of the other two sides.

CIT 596 Theory of Computation 1. Graphs and Digraphs

CARLETON UNIVERSITY. 1.0 Problems and Most Solutions, Sect B, 2005

18.06 Problem Set 4 Due Wednesday, Oct. 11, 2006 at 4:00 p.m. in 2-106

Algebra 2 Semester 1 Practice Final

The University of Nottingham SCHOOL OF COMPUTER SCIENCE A LEVEL 2 MODULE, SPRING SEMESTER MACHINES AND THEIR LANGUAGES ANSWERS

STRAND J: TRANSFORMATIONS, VECTORS and MATRICES

Graph Theory. Simple Graph G = (V, E). V={a,b,c,d,e,f,g,h,k} E={(a,b),(a,g),( a,h),(a,k),(b,c),(b,k),...,(h,k)}

Now we must transform the original model so we can use the new parameters. = S max. Recruits

Nondeterminism and Nodeterministic Automata

CS311 Computational Structures Regular Languages and Regular Grammars. Lecture 6

Factorising FACTORISING.

CS 2204 DIGITAL LOGIC & STATE MACHINE DESIGN SPRING 2014

CSE 332. Sorting. Data Abstractions. CSE 332: Data Abstractions. QuickSort Cutoff 1. Where We Are 2. Bounding The MAXIMUM Problem 4

Regular expressions, Finite Automata, transition graphs are all the same!!

Outline Data Structures and Algorithms. Data compression. Data compression. Lossy vs. Lossless. Data Compression

The Double Integral. The Riemann sum of a function f (x; y) over this partition of [a; b] [c; d] is. f (r j ; t k ) x j y k

378 Relations Solutions for Chapter 16. Section 16.1 Exercises. 3. Let A = {0,1,2,3,4,5}. Write out the relation R that expresses on A.

First Midterm Examination

Surds and Indices. Surds and Indices. Curriculum Ready ACMNA: 233,

Mid-Term Examination - Spring 2014 Mathematical Programming with Applications to Economics Total Score: 45; Time: 3 hours

Lesson 2.1 Inductive Reasoning

I 3 2 = I I 4 = 2A

Instructions. An 8.5 x 11 Cheat Sheet may also be used as an aid for this test. MUST be original handwriting.

Nondeterministic Automata vs Deterministic Automata

for all x in [a,b], then the area of the region bounded by the graphs of f and g and the vertical lines x = a and x = b is b [ ( ) ( )] A= f x g x dx

Lesson 2.1 Inductive Reasoning

Linear Inequalities. Work Sheet 1

MCH T 111 Handout Triangle Review Page 1 of 3

Introduction to Olympiad Inequalities

Common intervals of genomes. Mathieu Raffinot CNRS LIAFA

NON-DETERMINISTIC FSA

Intermediate Math Circles Wednesday 17 October 2012 Geometry II: Side Lengths

2.4 Theoretical Foundations

Computing data with spreadsheets. Enter the following into the corresponding cells: A1: n B1: triangle C1: sqrt

CS 360 Exam 2 Fall 2014 Name

Metaheuristics for the Asymmetric Hamiltonian Path Problem

First Midterm Examination

Lesson 55 - Inverse of Matrices & Determinants

Algorithms & Data Structures Homework 8 HS 18 Exercise Class (Room & TA): Submitted by: Peer Feedback by: Points:

I1 = I2 I1 = I2 + I3 I1 + I2 = I3 + I4 I 3

Section 2.1 Special Right Triangles

Resources. Introduction: Binding. Resource Types. Resource Sharing. The type of a resource denotes its ability to perform different operations

Momentum and Energy Review

In this skill we review equations that involve percents. review the meaning of proportion.

Technische Universität München Winter term 2009/10 I7 Prof. J. Esparza / J. Křetínský / M. Luttenberger 11. Februar Solution

Lecture 4: Graph Theory and the Four-Color Theorem

Automata and Regular Languages

CS415 Compilers. Lexical Analysis and. These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University

Designing finite automata II

SOLUTIONS TO ASSIGNMENT NO The given nonrecursive signal processing structure is shown as

Proportions: A ratio is the quotient of two numbers. For example, 2 3

U Q W The First Law of Thermodynamics. Efficiency. Closed cycle steam power plant. First page of S. Carnot s paper. Sadi Carnot ( )

Solids of Revolution

Vectors , (0,0). 5. A vector is commonly denoted by putting an arrow above its symbol, as in the picture above. Here are some 3-dimensional vectors:

Discrete Structures Lecture 11

6.5 Improper integrals

Review of Gaussian Quadrature method

Technology Mapping Method for Low Power Consumption and High Performance in General-Synchronous Framework

Chapter 4 State-Space Planning

Chapter 19: The Second Law of Thermodynamics

1 ELEMENTARY ALGEBRA and GEOMETRY READINESS DIAGNOSTIC TEST PRACTICE

8 THREE PHASE A.C. CIRCUITS

Comparing the Pre-image and Image of a Dilation

A Study on the Properties of Rational Triangles

We will see what is meant by standard form very shortly

NFA DFA Example 3 CMSC 330: Organization of Programming Languages. Equivalence of DFAs and NFAs. Equivalence of DFAs and NFAs (cont.

AP Calculus BC Chapter 8: Integration Techniques, L Hopital s Rule and Improper Integrals

22: Union Find. CS 473u - Algorithms - Spring April 14, We want to maintain a collection of sets, under the operations of:

On a Class of Planar Graphs with Straight-Line Grid Drawings on Linear Area

April 8, 2017 Math 9. Geometry. Solving vector problems. Problem. Prove that if vectors and satisfy, then.

Mathematics Number: Logarithms

Project 6: Minigoals Towards Simplifying and Rewriting Expressions

Let s divide up the interval [ ab, ] into n subintervals with the same length, so we have

Exercise sheet 6: Solutions

Chapters Five Notes SN AA U1C5

Definite Integrals. The area under a curve can be approximated by adding up the areas of rectangles = 1 1 +

Laboratory for Foundations of Computer Science. An Unfolding Approach. University of Edinburgh. Model Checking. Javier Esparza

APPENDIX. Precalculus Review D.1. Real Numbers and the Real Number Line

Torsion in Groups of Integral Triangles

SOLUTIONS FOR ADMISSIONS TEST IN MATHEMATICS, COMPUTER SCIENCE AND JOINT SCHOOLS WEDNESDAY 5 NOVEMBER 2014

Section 6.1 Definite Integral

THE PYTHAGOREAN THEOREM

Suppose we want to find the area under the parabola and above the x axis, between the lines x = 2 and x = -2.

6. Suppose lim = constant> 0. Which of the following does not hold?

Lecture 11 Binary Decision Diagrams (BDDs)

TIME AND STATE IN DISTRIBUTED SYSTEMS

Lecture 3 ( ) (translated and slightly adapted from lecture notes by Martin Klazar)

T b a(f) [f ] +. P b a(f) = Conclude that if f is in AC then it is the difference of two monotone absolutely continuous functions.

Particle Physics. Michaelmas Term 2011 Prof Mark Thomson. Handout 3 : Interaction by Particle Exchange and QED. Recap

CSC2542 State-Space Planning

Alpha Algorithm: Limitations

CMSC 330: Organization of Programming Languages

= state, a = reading and q j

Transcription:

The DOACROSS sttement Is prllel loop similr to DOALL, ut it llows prouer-onsumer type of synhroniztion. Synhroniztion is llowe from lower to higher itertions sine it is ssume tht lower itertions re selete first y the impliit tsks. If synhroniztion were not from lower to higher itertions, elok oul our. Assume for exmple tht the first itertion wits t point w for n event from the seon itertion. If there were only one impliit tsk it woul wit forever t w sine there is no ontext swithing. 1

Exmples of DOACROSS * exmple 1. no ely * post (ev(0)) oross i=1,n (i) = (i) + (i) post(ev(i)) wit(ev(i-1)) x(i) = (i-1) + 2 en oross P1 (1) x(1) (4) x(4) P2 (2) x(2) (5) x(5) P3 (3) x(3) (6) x(6) 2

* exmple 2. ely etween onseutive itertions * post(ev(0)) oross i = 1, n wit(ev(i-1)) (i) = (i) + (i-1) post(ev(i)) x(i) = (i) + 2 en oross P1 (1) x(1) (4) x(4) P2 (2) x(2) (5) x(5) P3 (3) x(3) (6) 3

* exmple 3. ely etween non-onseutive itertions. * post (ev(0)) post(ev(1)) oross i = 2, n wit(ev(i-2)) (i) = (i) + (i-2) post(ev(i)) x(i) = (i) + 2 en oross P1 (1) x(1) (7) x(7) P2 (2) x(2) (8) x(8) P3 (3) x(3) (9) x(9) P4 (4) x(4) (10) x(10) P5 (5) x(5) (11) P6 (6) x(6) (12) 4

* exmple 4. ouly neste loop * oross i = 1, n integer j o j = 1, n wit (ev(i-1,j)) (i,j) = (i-1,j) + (i,j- 1) post (ev(i,j)) en o en oross P1 (1,1) (1,2) (1,3) (1,4) P2 (2,1) (2,2) (2,3) P3 (3,1) (3,2) P4 (4,1) 5

Exeution time of DOACROSS when orere ritil setions hve onstnt exeution time. Consier the loop oross i=1,n $orer $enorer $orer... $orer... $orer... $orer e... en oross Assume its exeution time lines hve the following form: 6

e e e e whih in terms of performne is equivlent to the following time lines: e II e II II e II II II e where onstnt ely II etween the strt of onseutive itertions is evient. This ely is equl to the time of the longest orere ritil setion (i.e II=T() in this se). 7

The exeution time of the previous loop using n proessors is: s n e seen next: T()+T()+nT()+T()+T(e) T()+T() nt()=nii T()+T(e) e e e In generl the exeution time when there re s mny proessors s itertions is nii+(b-ii)=(n-1)ii+b where B is the exeution time of the whole loop oy. S p = nb/[(n-1)ii+b] B/II 8

When there re p < n proessors the exeution time of the loop epens on whether B >= pii or not. Cse 1: B >= pii If p = 3, for the previous loop we hve: T(loop) = n/3 B + T()((n-1) mo 3) n/3 B II II e e e e e e e e In generl the formul is: n/p B+II((n-1) mo p) 9

Cse 2: B < pii For the previous loop, n in generl we hve T(loop) = nii + B - II B-II T()+T() nt() = nii T()+T(e) e e e e e e e e e e e e 10

From the previous two sttements we hve tht T(loop)= if B pii then ( n/p -1)B + II ((n-1) mo p) + B else (n-1)ii + B ut n-1 = p( n/p - 1) + (n-1) mo p therefore T(loop)= if B pii then ( n/p -1)B + II ((n-1) mo p) + B else (p( n/p - 1) + (n-1) mo p)ii + B n T(loop)= ( n/p -1) mx(b,pii) + II ((n-1) mo p) + B 11

Cyli Depenenes -- DOPIPE Assume loop with two or more epenene yles (strongly onnete omponents or π-loks) The first pproh evelope for onurrentiztion of o loops is illustrte elow: o i=1,n (i) = (i) + (i-1) (i) = (i) + (i-1) en o oegin o i=1,n (i) = (i) + (i-1) V(σ) en o // o i=1,n P(σ) (i) = (i) + (i-1) en o oen 12

i.e. to tke loop with two or more π-loks suh s: n exeute olletions of π-loks on seprte proessors in pipeline fshion: 13

4.18.1 Exeution time of DOPIPE Assume the epenene grph shown to the right. Assume lso tht T()=mx(T(),T(),T(),T(),T(e)) Then the exeution time of the DOPIPE on 4 proessors is T()+T()+nT()+T()+T(e) e T()+T() nt() T()+T(e) e e e e e 14

DOPIPE n Loop Distriution Assume loop with the epenene grph shown on the right The loop oul e istriute to proue: o i=1,n en o o i=1,n en o The first loop oul e trnsforme into DOALL, n the seon into DOPIPE. The resulting time lines woul e: 15

However, exeuting the originl loop s DOPIPE proues the sme exeution time with fewer proessor (if numer of itertions >4): 16

Prolems with DOPIPE 1. Proessor llotion is fixe t ompile-time, i.e. loops re ompile for fixe numer of proessors. Exmple 1: A loop with the epenene grph shown to the right, oul e ompile for three proessors s: oegin o i=1,n en o // o i=1,n en o // o i=1,n en o oen 17

ut for two proessors it shoul e ompile s oegin o i=1,n en o // o i=1,n en o oen 18

Exmple 2: The loop n e trnslte into oegin o i=1,n en o // o i=1,n en o // o i=1,n en o oen 19

or into oegin o i=1,n oen // o i=1,n,2 oegin // oen en o // o i=1,n en o oen If the exeution time of is unknown, (e.g. it inlues while loop), it is not possile to eie t ompile-time how mny opies of to o in prllel. 20

2. There is the nee to o pking whih is NP-hr Prtition: Given set A Z +, is there suset A A suh tht Σ ( A ) = Σ ( A-A )? DOPIPE trnsltion: Given loop with the following epenene grph 1 2 n with T() = (T( 1 )+T( 2 )+...+ T( n ))/2. Compute is n optiml sheule of the loop on 3 proessors. Clerly, solving the DOPIPE trnsltion prolem lso solves Prtition. 21

3. Cyles fore sequentil exeution Exmple 3 o i=3,n S: (i)=(i-2)-1 T: (i)=(i-3)*k en o S T Exmple 4 o i=1,n o j=1,n S: (i,j)=(i-1,j)+(i,j-1) en o en o S 22

Cyli epenenes -- DOACROSS A loop with yli epenenes n e trnsforme into DOACROSS s shown next: o i=1,n (i) = (i) + (i-1) (i) = (i) + (i-1) en o $oross orer(,),shre(,,) o i=1,n $orer (i) = (i) + (i-1) $enorer $orer (i) = (i) + (i-1) $enorer en o DOACROSS hs the vntge tht ll impliit tsks eseute the sme oe. This filittes oe ssignment. Other vntge of the DOACROSS onstrut over the DOPIPE onstrut re illustrte in the following exmples. 23

Exmple 1: The sme trnsltion works for two or three proessors: Two proessors Three proessors 24

Exmple 2: Inresing the numer of proessors improve performne 25

Exmple 3 When the following loop is exeute s oross on two proessors o i=1,n S: (i) = (i-2) -1 T: (i) = (i-3) * k en o we get the following time lines ( S i stns for sttement S in itertion i) Pro. 1 2 S 1 S 2 T 2 T 1 S 3 S 4 T 3 T 4 Cyle shrinking tkes ple utomtilly. This is lso true in the se of multiply-neste loops where ll wht is neee is to use tuple s the loop inex s in oross (i,j,k)=[1..n 1 ]..[1..n 2 ]..[1..n 3 ] 26

Exmple 4: The following loop o i=1,n o j=1,n S: (i,j) = (i-1,j) + (i.j-1) en o en o n e trnslte into the following oross loop: oross (i,j) = [1..n]..[1..n] wit (ev(i-1,j)); wit (ev(i,j-1)) S: (i,j) = (i-1,j) + (i.j-1) post (ev(i,j)) en oross 27

The itertion spe of the previous loop is: S 1,1 S 1,2 S 1,3 S 1,4 S 2,1 S 2,2 S 2,3 S 2,4 S 3,1 S 3,2 S 3,3 S 3,4 S 4,1 S 4,2 S 4,3 S 4,4 n its time lines when exeute on n proessors re: S 1,1 S 1,2 S 1,3 S 1,1 S 1,2 S 1,3 S 1,1 S 1,2 S 1,3 28

Sttement Sheuling n DOACROSS Exeution Time. Consier the following epenene grph for the oy of singly-neste o loop. S 1 S 2 S 3 S 4 S 5 When the DOACROSS oy hs the originl sttement orer, there is no speeup (S 1 of itertion i+1 nnot strt exeuting until S 5 of itertion i ompletes exeution). When the oy is permute into the orer S 1 S 4 S 5 S 2 S 3, then there will e speeup s shown in the following time lines S 1 S 4 S 5 S 2 S 3 S 1 S 4 S 5 S 2 S 3 S 1 S 4 S 5 S 2 S 4 S 5 S 3 S 1 29

Seleting n optimum sttement orering to minimize the ely is NP-Hr(Cytron s PhD Thesis). When the oross is ouly-neste o loop, the orer of the inex (if the loops re interhngele) lso influenes the exeution time.(tng et l 1988) 30

Cyli epenenes -- Loop Pipelining This metho ssumes the presene of no if sttements n tht ll epenene istnes re 0 or 1. (Aiken n Niolu 1988) It proees y (greey) sheuling the oy of the loop in prllel for the first itertion, n then for the seon, n so on until pttern is etete. One the pttern is etete, prllel oe n e esily generte s illustrte next.(the numers next to the rs represent A 0 0 F I 0 1 0 0 J K L 1 M 0 0 N 0 1 E 0 P Q R 0 B 1 H 0 1 D 0 0 C 1 G epenene istnes.) The resulting progrm n e exeute in VLIW mhine or in n synhronous multiproessor.

itertion time 1 2 3 4 5 6 7 1 ABC A A - - - - 2 DEFI I I - - - - 3 GHJKL CK KL A - - - 4 M BDM M I - - - 5 N EFGN FN KL - - - 6 PQR PQR CPQR M A - - 7 HJ DJ FN I - - 8 BG PQR KL - - 9 E J M A - 10 H C FN I - 11 BD PQR KL - 12 EG J M A 13 H C FN I 14 BD PQR KL 15 EG J M 16 H C FN 17 BD PQR 18 EG J 19 H C 20 BD 21 EG 22 H Finl Progrm Grph: H 1 C 2 F 3 N 3 I 4 B 2 D 2 P 3 Q 3 R 3 K 4 L 4 E 2 G 2 J 3 M 4 A 5