Advanced Restructuring Compilers. Advanced Topics Spring 2009 Prof. Robert van Engelen

Size: px
Start display at page:

Download "Advanced Restructuring Compilers. Advanced Topics Spring 2009 Prof. Robert van Engelen"

Transcription

1 Advanced Restructuring Compilers Advanced Topics Spring 2009 Prof. Robert van Engelen

2 Overview Data and control dependences The theory and practice of data dependence analysis K-level loop-carried dependences Loop-independent dependences Testing dependence Dependence equations and the polyhedral model Dependence solvers Loop transformations Loop parallelization Loop reordering Unimodular loop transformations 2/20/09 HPC II Spring

3 Data and Control Dependences Fundamental property of all computations S 1 PI = 3.14 S 2 R = 5.0 S 3 AREA = PI * R ** 2 Data dependence: data is produced and consumed in the correct order S 1 IF (T.EQ.0.0) GOTO S 3 S 2 A = A / T S 3 CONTINUE Control dependence: a dependence that arises as a result of control flow 3

4 Data Dependence Definition There is a data dependence from statement S 1 to statement S 2 iff 1) both statements S 1 and S 2 access the same memory location and at least one of them stores into it, and 2) there is a feasible run-time execution path from statement S 1 to S 2 2/20/09 HPC II Spring

5 Data Dependence Classification S 1 X = S 2 = X Definition A dependence relation δ is S 1 δ S 2 S 1 = X S 2 X = S 1 δ -1 S 2 A true dependence (or flow dependence), denoted S 1 δ S 2 An anti dependence, denoted S 1 δ -1 S 2 S 1 X = S 2 X = S 1 δ o S 2 An output dependence, denoted S 1 δ o S 2 2/20/09 HPC II Spring

6 Theorem Compiling for Parallelism Any instruction statement reordering transformation that preserves every dependence in a program preserves the meaning of that program Bernstein s conditions for loop parallelism: Iteration I 1 does not write into a location that is read by iteration I 2 Iteration I 2 does not write into a location that is read by iteration I 1 Iteration I 1 does not write into a location that is written into by iteration I 2 2/20/09 HPC II Spring

7 Fortran s PARALLEL DO The PARALLEL DO requires that there are no scheduling constraints among its iterations When is it safe to convert a DO to PARALLEL DO? PARALLEL DO I = 1, N A(I+1) = A(I) + B(I) PARALLEL DO I = 1, N A(I) = A(I) + B(I) OK PARALLEL DO I = 1, N A(I-1) = A(I) + B(I) PARALLEL DO I = 1, N S = S + B(I) Not OK 7

8 Dependences Whereas the PARALLEL DO requires the programmer to check if loop iterations can be executed in parallel, an optimizing compiler must analyze loop dependences for automatic loop parallelization and vectorization There are multiple levels of granularity of parallelism a programmer or compiler can exploit, for example Instruction level parallelism (ILP) Synchronous (e.g. SIMD, also vector and SSE) loop parallelism (fine grain) aim is to parallelize inner loops as vectors ops Asynchronous (e.g. MIMD, work sharing) loop parallelism (coarse grain) aim is to schedule outer loops as tasks Task parallelism aim is to run different tasks (threads) in parallel (coarse grain) 8

9 Advanced Compiler Technology DO I = 1, N DO J = 1, N C(J,I) = 0.0 DO K = 1, N C(J,I) = C(J,I)+A(J,K)*B(K,I) Powerful restructuring compilers transform loops into parallel or vector code Must analyze data dependences in loop nests besides straight-line code parallelize vectorize PARALLEL DO I = 1, N DO J = 1, N C(J,I) = 0.0 DO K = 1, N C(J,I) = C(J,I)+A(J,K)*B(K,I) DO I = 1, N DO J = 1, N, 64 C(J:J+63,I) = 0.0 DO K = 1, N C(J:J+63,I) = C(J:J+63,I) +A(J:J+63,K)*B(K,I) 9

10 Synchronous Parallel Machines p 1,1 p 1,2 p 1,3 p 1,4 p 2,1 p 2,2 p 2,3 p 2,4 p 3,1 p 3,2 p 3,3 p 3,4 p 4,1 p 4,2 p 4,3 p 4,4 Example 4x4 processor mesh SIMD architecture: processors execute in lock-step and execute the same instruction per clock SIMD architecture: simpler (and cheaper) than MIMD Programming model: PRAM (parallel random access machine) Processors operate in lockstep executing the same instruction on different portions of the data space Application should exhibit data parallelism Control flow requires masking operations (similar to predicated instructions), which can be inefficient 10

11 Asynchronous Parallel Computers p 1 p 2 p 3 p 4 bus mem m 1 m 2 m 3 m 4 p 1 p 2 p 3 p 4 bus, x-bar, or network Multiprocessors Shared memory is directly accessible by each PE (processing element) Multicomputers Distributed memory requires message passing between PEs SMPs Shared memory in small clusters of processors, need message passing across clusters 11

12 Valid Transformations A transformation is said to be valid for the program to which it applies if it preserves all dependences in the program valid DO I = 1, 3 S 1 A(I+1) = B(I) S 2 B(I+1) = A(I) invalid DO I = 1, 3 B(I+1) = A(I) A(I+1) = B(I) DO I = 3, 1, -1 A(I+1) = B(I) B(I+1) = A(I) 12

13 Fundamental Theorem of Dependence Any reordering transformation that preserves every dependence in a program preserves the meaning of that program 13

14 Dependence in Loops S 1 DO I = 1, N A(I+1) = A(I) + B(I) The statement instances S 1 [i] for iterations i = 1,,N represent the loop execution We have the following flow dependences: S 1 [1] A(2) = A(1) + B(1) S 1 [2] A(3) = A(2) + B(2) S 1 [3] A(4) = A(3) + B(3) S 1 [4] A(5) = A(4) + B(4) Statement instances with flow dependences S 1 [1] δ S 1 [2] S 1 [2] δ S 1 [3] S 1 [N-1] δ S 1 [N] 2/20/09 HPC II Spring

15 Iteration Vector Definition Given a nest of n loops, the iteration vector i of a particular iteration of the innermost loop is a vector of integers i = (i 1, i 2,, i n ) where i k represents the iteration number for the loop at nesting level k The set of all possible iteration vectors spans an iteration space over loop statement instances S j [i] 2/20/09 HPC II Spring

16 Iteration Vector Example S 1 DO I = 1, 3 DO J = 1, I A(I,J) = A(J,I) The iteration space of the statement at S 1 is the set of iteration vectors {(1,1), (2,1), (2,2), (3,1), (3,2), (3,3)} I (1,1) (2,1) (2,2) Example: at iteration i = (2,1) statement instance S 1 [i] assigns the value of A(1,2) to A(2,1) (3,1) J (3,2) (3,3) 2/20/09 HPC II Spring

17 Iteration Vector Ordering The iteration vectors are naturally ordered according to a lexicographical order For example, iteration (1,2) precedes (2,1) and (2,2) Definition Iteration i precedes iteration j, denoted i < j, iff 1) i[1:n-1] < j[1:n-1], or 2) i[1:n-1] = j[1:n-1] and i n < j n 2/20/09 HPC II Spring

18 Cross-Iteration Dependence Definition There exist a dependence from S 1 to S 2 in a loop nest iff there exist two iteration vectors i and j such that 1) i < j and there is a path from S 1 to S 2 2) S 1 accesses memory location M on iteration i and S 2 accesses memory location M on iteration j 3) one of these accesses is a write 2/20/09 HPC II Spring

19 Dependence Example S 1 DO I = 1, 3 DO J = 1, I A(I,J) = A(J,I) I (1,1) (2,1) (3,1) J (2,2) (3,2) (3,3) Show that the loop has no cross-iteration dependence Answer: there are no iteration vectors i and j in {(1,1), (2,1), (2,2), (3,1), (3,2), (3,3)} such that i < j and S 1 in i writes to the same element of A that is read at S 1 in iteration j or S 1 in iteration i reads an element A that is written at S 1 in iteration j 2/20/09 HPC II Spring

20 Dependence Distance and Definition Direction Vectors Given a dependence from S 1 on iteration i to S 2 on iteration j, the dependence distance vector d(i, j) is defined as d(i, j) = j - i Given a dependence from S 1 on iteration i and S 2 on iteration j, the dependence direction vector D(i, j) is defined for the k th component as D(i, j) k = < if d(i, j) k > 0 = if d(i, j) k = 0 > if d(i, j) k < 0 2/20/09 HPC II Spring

21 Direction of Dependence There is a flow dependence if the iteration vector of the write is lexicographically less than the iteration vector of the read In other words, if the direction vector is lexicographically positive D(i, j) k = (=, =,, =, <, ) All = Any <, =, or > 21 HC TD5102

22 Representing Dependences with Data Dependence Graphs DO I = 1, S 1 A(I) = B(I) * 5 S 2 C(I+1) = C(I) + A(I) Static data dependences for accesses to A and C: S 1 δ (=) S 2 and S 2 δ (<) S 2 It is generally infeasible to represent all data dependences that arise in a program Usually only static data dependences are recorded S 1 δ (=) S 2 means S 1 [i] δ S 2 [i] for all i = 1,, S 2 δ (<) S 2 means S 1 [i] δ S 2 [j] for all i,j = 1,, with i < j (<) S 1 (=) S 2 Data dependence graph A data dependence graph compactly represent data dependences in a loop nest 2/20/09 HPC II Spring

23 Example 1 J S 1 DO I = 1, 3 DO J = 1, I A(I+1,J) = A(I,J) Flow dependence between S 1 and itself on: i = (1,1) and j = (2,1): d(i, j) = (1,0), D(i, j) = (<, =) i = (2,1) and j = (3,1): d(i, j) = (1,0), D(i, j) = (<, =) i = (2,2) and j = (3,2): d(i, j) = (1,0), D(i, j) = (<, =) (<,=) I S 1 2/20/09 HPC II Spring

24 Example 2 4 J S 1 DO I = 1, 4 DO J = 1, 4 A(I,J+1) = A(I-1,J) Distance vector is (1,1) S 1 δ (<,<) S I (<,<) S1 2/20/09 HPC II Spring

25 Example 3 4 J S 1 DO I = 1, 4 DO J = 1, 5-I A(I+1,J+I-1) = A(I,J+I-1) Distance vector is (1,-1) S 1 δ (<,>) S I (<,>) S 1 2/20/09 HPC II Spring

26 Example 4 4 J S 1 S 2 DO I = 1, 4 DO J = 1, 4 B(I) = B(I) + A(I,J) A(I+1,J) = A(I,J) Distance vectors are (0,1) and (1,0) S 1 δ (=,<) S 1 S 2 δ (<,=) S 1 S 2 δ (<,=) S (=,<) (<,=) I S 1 S 2 (<,=) 2/20/09 HPC II Spring

27 Loop-Independent Dependences Definition Statement S 1 has a loop-independent dependence on S 2 iff there exist two iteration vectors i and j such that 1) S 1 refers to memory location M on iteration i, S 2 refers to M on iteration j, and i = j 2) there is a control flow path from S 1 to S 2 within the iteration That is, the direction vector contains only = 2/20/09 HPC II Spring

28 K-Level Loop-Carried Dependences Definition Statement S 1 has a loop-carried dependence on S 2 iff 1) there exist two iteration vectors i and j such that S 1 refers to memory location M on iteration i and S 2 refers to M on iteration j 2) The direction vector is lexicographically positive that is, D(i, j) contains a < as its leftmost non- = component A loop-carried dependence from S 1 to S 2 is 1) lexically forward if S 2 appears after S 1 in the loop body 2) lexically backward if S 2 appears before S 1 (or if S 1 =S 2 ) The level of a loop-carried dependence is the index of the leftmost non- = of D(i, j) for the dependence 2/20/09 HPC II Spring

29 Example 1 S 1 S 2 DO I = 1, 3 A(I+1) = F(I) F(I+1) = A(I) S 1 δ (<) S 2 and S 2 δ (<) S 1 The loop-carried dependence S 1 δ (<) S 2 is forward The loop-carried dependence S 2 δ (<) S 1 is backward All loop-carried dependences are of level 1, because D(i, j) = (<) for every dependence 29

30 Example 2 S 1 DO I = 1, 10 DO J = 1, 10 DO K = 1, 10 A(I,J,K+1) = A(I,J,K) All loop-carried dependences are of the 3 rd level, D(i, j) = (=, =, <) Level-k dependences are sometimes denoted by S x δ k S y S 1 δ (=,=,<) S 1 S 1 δ 3 S 1 Alternative notation for a level-3 dependence 2/20/09 HPC II Spring

31 Combining Direction Vectors DO I = 1, 9 S 1 A(I) = S 2 = A(10-I) DO I = 1, 10 S 1 S = S + A(I) S 1 DO I = 1, 10 DO J = 1, 10 A(J) = A(J)+1 S 1 δ (*) S 2 S 1 δ (<) S 2 S 1 δ (=) S 2 S 1 δ (>) S 2 S 1 δ (*) S 1 S 1 δ (*,=) S 1 A loop nest can have multiple different directions at the same loop level k We abbreviate a level-k direction vector component with * to denote any direction 2/20/09 HPC II Spring

32 How to Determine Dependences? A system of Diophantine dependence equations is set up and solved to test direction of dependence (flow/anti) Proving there is (no) solution to the equations means there is (no) dependence Most dependence solvers solve Diophantine equations over affine array index expressions of the form a 1 i 1 + a 2 i a n i n + a 0 where i j are loop index variables and a k are integer Non-linear subscripts are problematic Symbolic terms Function values Indirect indexing etc 2/20/09 HPC II Spring

33 Dependence Equation: Testing Flow Dependence DO I = 1, N S 1 A(f(I)) = A(g(I)) write read To determine flow dependence: prove the dependence equation f(α) = g(β) has a solution for iteration vectors α and β, such that α < β DO I = 1, N S 1 A(I+1) = A(I) α+1 = β has solution α=β-1 DO I = 1, N S 1 A(2*I+1) = A(2*I) 2/20/09 2α+1 = 2β has no solution HPC II Spring

34 Dependence Equation: Testing Anti Dependence DO I = 1, N S 1 A(f(I)) = A(g(I)) write read To determine anti dependence: prove the dependence equation f(α) = g(β) has a solution for iteration vectors α and β, such that β < α DO I = 1, N S 1 A(I+1) = A(I) α+1 = β has no solution DO I = 1, N S 1 A(2*I) = A(2*I+2) 2/20/09 2α = 2β+2 has solution α=β+1 HPC II Spring

35 Dependence Equation Examples To determine flow dependence: prove there are iteration vectors α < β such that f(α) = g(β) S 1 DO I = 1, N DO J = 1, N A(I-1) = A(J) α = (α I, α J ) and β = (β I, β J ) α I -1 = β J has solution DO I = 1, N S 1 A(5) = A(I) 5 = β is solution if 5 < N DO I = 1, N S 1 A(5) = A(6) 2/20/09 5 = 6 has no solution HPC II Spring

36 ZIV, SIV, and MIV S 1 DO I = DO J = DO K = A(5,I+1,J) = A(N,I,K) ZIV pair SIV pair MIV pair ZIV (zero index variable) subscript pairs SIV (single index variable) subscript pairs MIV (multiple index variable) subscript pairs 36

37 SIV pair S 1 Distance vector is (1,-1) S 1 δ (<,>) S 1 Example MIV pair DO I = 1, 4 DO J = 1, 5-I A(I+1,J+I-1) = A(I,J+I-1) First distance vector component is easy (SIV), but second is not so easy (MIV) J I 37

38 S 1 Separability and Coupled DO I = DO J = DO K = A(I,J,J) = A(I,J,K) SIV pair is separable Subscripts MIV pair gives coupled indices When testing multidimensional arrays, subscript are separable if indices do not occur in other subscripts ZIV and SIV subscripts pairs are separable MIV pairs may give rise to coupled subscript groups 38

39 Dependence Testing Overview 1. Partition subscripts into separable and coupled groups 2. Classify each subscript as ZIV, SIV, or MIV 3. For each separable subscript, apply the applicable single subscript test (ZIV, SIV, or MIV) to prove independence or produce direction vectors for dependence 4. For each coupled group, apply a multiple subscript test 5. If any test yields independence, no further testing needed 6. Otherwise, merge all dependence vectors into one set 39

40 Loop Normalization S 1 S 1 DO I = 2, 100 DO J = 1, I-1 A(I,J) = A(J,I) DO i = 0, 98 DO j = 0, i+2-2 A(i+2,j+1) = A(j+1,i+2) Some tests use normalized loop iteration spaces Set lower bound to zero (or one) by adjusting limits and occurrence of loop variable For an arbitrary loop with loop index I and loop bounds L and U and stride S, the normalized iteration number is i = (I-L)/S A normalized dependence system consists of a dependence equation along with a set of normalized constraints 2/20/09 HPC II Spring

41 S 1 Dependence System: Formulating Flow Dependence α < β such that f(α) = g(β) where α = (α I, α J ) and β = (β I, β J ) DO i = 0, 98 DO j = 0, i+2-2 A(i+2,j+1) = A(j+1,i+2) α I +2 = β J +1 α J +1 = β I +2 0 < α I, β I < 98 0 < α J < α I 0 < β J < β I α I < β I Dependence equations Loop constraints Constraint for (<,*) dep. direction Assuming normalized loop iteration spaces A dependence system consists of a dependence equation along with a set of constraints: Solution must lie within loop bounds Solution must be integer Need dependence distance or direction vector (flow/anti) 2/20/09 HPC II Spring

42 2/20/ Dependence System: Matrix Notation α I β I α J β J α I β I α J β J = < The polyhedral model: solving a linear (affine) dependence system in matrix notation Polyhedral { x : Ax < c } for some matrix A and bounds vector c A contains the loop constraints (inequalities) Rewrite dependence equation as set of inequalities: Ax = b Ax < b and -Ax < -b If no point lies in the polyhedral (a polytope) then there is no solution and thus there is no dependence HPC II Spring

43 Fourier-Motzkin Projection x 2 x 2 < x 1 x 2 < x 1 System of linear inequalities Ax < b Projections on x 1 and x 2 2/20/09 HPC II Spring

44 Fourier-Motzkin Variable x 1 x 2 Elimination (FMVE) Select x 2 : L = {3,4}, U = {1,2} new system: x 1 x 2 < max(1/2,-2) < x 1 < min(11,7) < 2 FMVE procedure: 1. Select an unknown x j 2. L = {i a ij < 0} 3. U = {i a ij > 0} 4. if L= or U= then x j is unconstrained (delete it) 5. for i L U A [i] := A [i] / a ij b i := b i / a ij 6. for i L for k U add new inequality A [i] + A [k] < b i + b k 7. Delete old rows L and U 2/20/09 HPC II Spring

45 ZIV Test DO I = 1, N S 1 A(5) = A(6) K = 10 DO I = 1, N S 1 A(5) = A(K) S 1 K = 10 DO I = 1, N DO J = 1, N A(I,5) = A(I,K) Simple and quick test The ZIV test compares two subscripts If the expressions are proved unequal, no dependence can exist 45

46 Strong SIV Test DO I = 1, N S 1 A(I+1) = A(I) DO I = 1, N S 1 A(2*I+2) = A(2*I) DO I = 1, N S 1 A(I+N) = A(I) Requires subscript pairs of the form ai+c 1 and ai +c 2 for the def I and use I, respectively Dependence equation ai+c 1 = ai +c 2 has a solution if the dependence distance d = (c 1 - c 2 )/a is integer and d < U - L with L and U loop bounds 46

47 Weak-Zero SIV Test DO I = 1, N S 1 A(I) = A(1) DO I = 1, N S 1 A(2*I+2) = A(2) DO I = 1, N S 1 A(1) = A(I) + A(1) Requires subscript pairs of the form a 1 I+c 1 and a 2 I +c 2 with either a 1 = 0 or a 2 = 0 If a 2 = 0, the dependence equation I = (c 2 - c 1 )/a 1 has a solution if (c 2 - c 1 )/a 1 is integer and L < (c 2 - c 1 )/a 1 < U If a 1 = 0, similar case 47

48 GCD Test DO I = 1, N S 1 A(2*I+1) = A(2*I) DO I = 1, N S 1 A(4*I+1) = A(2*I+3) S 1 2/20/09 DO I = 1, 10 DO J = 1, 10 A(1+2*I+20*J) = A(2+20*I+2*J) Requires that f(α) and g(β) are affine: f(α)= a 0 +a 1 α 1 + +a n α n g(β)= b 0 +b 1 β 1 + +b n β n Reordering gives linear Diophantine equation: a 1 α 1 -b 1 β 1 + +a n α n -b n β n = b 0 -a 0 which has a solution if gcd(a 1,,a n,b 1,,b n ) divides b 0 -a 0 Which of these loops have dependences? HPC II Spring

49 Banerjee Test (1) S 1 IF M > 0 THEN DO I = 1, 10 A(I) = A(I+M) + B(I) Dependence equation α = β+m is rewritten as - M + α - β = 0 Constraints: 1 < M < 0 < α < 9 0 < β < 9 f(α) and g(β) are affine: f(x 1,x 2,,x n )= a 0 +a 1 x 1 + +a n x n g(y 1,y 2,,y n )= b 0 +b 1 y 1 + +b n y n Dependence equation a 0 -b 0 +a 1 x 1 -b 1 y 1 + +a n x n -b n y n =0 has no solution if 0 is not within the lower bound L and upper bound U of the equation s LHS 49

50 Banerjee Test (2) S 1 IF M > 0 THEN DO I = 1, 10 A(I) = A(I+M) + B(I) - M + α - β = 0 Test for (*) dependence: possible dependence because 0 lies within [-,8] L U step - M + α - β - M + α - β original eq. - M + α M + α - 0 eliminate β - M M + 9 eliminate α - 8 eliminate M Constraints: 1 < M < 0 < α < 9 0 < β < 9 50

51 Banerjee Test (3) S 1 IF M > 0 THEN DO I = 1, 10 A(I) = A(I+M) + B(I) - M + α - β = 0 L U step - M + α - β - M + α - β original eq. - M + α M + α - (α+1) eliminate β - M M - 1 eliminate α eliminate M Test for (<) dependence: assume that α < β-1 No dependence because 0 does no lie within [-,-2] Constraints: 1 < M < 0 < α < β-1 < 8 1 < α+1 < β < 9 51

52 Banerjee Test (4) S 1 IF M > 0 THEN DO I = 1, 10 A(I) = A(I+M) + B(I) - M + α - β = 0 L U step - M + α - β - M + α - β original eq. - M + α - (α+1) - M + α eliminate β - M M + 9 eliminate α - 8 eliminate M Test for (>) dependence: assume that α+1 > β Possible dependence because 0 lies within [-,8] Constraints: 1 < M < 1 < β-1 < α < 9 0 < β < α+1 < 8 52

53 Banerjee Test (5) S 1 IF M > 0 THEN DO I = 1, 10 A(I) = A(I+M) + B(I) - M + α - β = 0 L U step - M + α - β - M + α - β original eq. - M + α - α - M + α - α eliminate β - M - M eliminate α eliminate M Test for (=) dependence: assume that α = β No dependence because 0 does not lie within [-,-1] Constraints: 1 < M < 0 < α = β < 9 53

54 Testing for All Direction Vectors (*,*,*) yes (may depend) (<,*,*) (=,*,*) (>,*,*) yes (<,<,*) (<,=,*) (<,>,*) no (indep) yes (<,=,<) (<,=,=) (<,=,>) Refine dependence between a pair of statements by testing for the most general set of direction vectors and upon failure refine each component recursively 54

55 Direction Vectors and Reordering Transformations (1) Definitions A reordering transformation is any program transformation that changes the execution order of the code, without adding or deleting any statement executions A reordering transformation preserves a dependence if it preserves the relative execution order of the source and sink of that dependence Theorem A reordering transformation is valid if, after it is applied, the new dependence vectors are lexicographically positive None of the new direction vectors has a leftmost non- = component that is > 55

56 Direction Vectors and Reordering Transformations (2) A reordering transformation preserves all level-k dependences if 1) it preserves the iteration order of the level-k loop, 2) if it does not interchange any loop at level < k to a position inside the level-k loop, and 3) if it does not interchange any loop at level > k to a position outside the level-k loop 56

57 Examples DO I = 1, 10 S 1 F(I+1) = A(I) S 2 A(I+1) = F(I) Both S 1 δ (<) S 2 and S 2 δ (<) S 1 are level-1 dependences DO I = 1, 10 S 2 A(I+1) = F(I) S 1 F(I+1) = A(I) Find deps. Choose transform S 1 S 1 DO I = 1, 10 DO J = 1, 10 DO K = 1, 10 A(I+1,J+2,K+3) = A(I,J,K) + B Find deps. S 1 δ (<,<,<) S 1 is a level-1 dependence Choose transform DO I = 1, 10 DO K = 10, 1, -1 DO J = 1, 10 A(I+1,J+2,K+3) = A(I,J,K) + B 57

58 Direction Vectors and Reordering Transformations (3) A reordering transformation preserves a loopindependent dependence between statements S 1 and S 2 if it does not move statement instances between iterations and preserves the relative order of S 1 and S 2 58

59 Examples DO I = 1, 10 S 1 A(I) = S 2 = A(I) S 1 δ (=) S 2 is a loop-independent dependence DO I = 10, 1, -1 S 2 A(I) = S 1 = A(I) Find deps. Choose transform S 1 S 2 DO I = 1, 10 DO J = 1, 10 A(I,J) = = A(I,J) S 1 δ (=,=) S 1 is a loop-independent dependence S 1 S 2 DO J = 10, 1, -1 DO I = 10, 1, -1 A(I,J) = = A(I,J) Find deps. Choose transform 59

60 Loop Fission S 1 DO I = 1, 10 S 2 DO J = 1, 10 S 3 A(I,J) = B(I,J) + C(I,J) S 4 D(I,J) = A(I,J-1) * 2.0 S 5 S 6 S 3 δ (=,<) S 4 S 1 DO I = 1, 10 S 2 DO J = 1, 10 S 3 A(I,J) = B(I,J) + C(I,J) S x S y DO J = 1, 10 S 4 D(I,J) = A(I,J-1) * 2.0 S 5 S 6 S 3 δ (=,<) S 4 Loop fission (or loop distribution) splits a single loop into multiple loops Enables vectorization Enables parallelization of separate loops if original loop is sequential Loop fission must preserve all dependence relations of the original loop S 1 PARALLEL DO I = 1, 10 S 3 A(I,1:10)=B(I,1:10)+C(I,1:10) S 4 D(I,1:10)=A(I,0:9) * 2.0 S 6 S 3 δ (=,<) S 4 60

61 Loop Fission: Algorithm S 1 DO I = 1, 10 S 2 A(I) = A(I) + B(I-1) S 3 B(I) = C(I-1)*X + Z S 4 C(I) = 1/B(I) S 5 D(I) = sqrt(c(i)) S 6 S 3 δ (<) S 2 S 4 δ (<) S 3 S 3 δ (=) S 4 S 4 δ (=) S 5 Compute the acyclic condensation of the dependence graph to find a legal order of the loops 1 1 S 2 S 3 0 S 4 0 S 3 S 4 S 2 S 5 Acyclic condensation S 1 DO I = 1, 10 S 3 B(I) = C(I-1)*X + Z S 4 C(I) = 1/B(I) S x S y DO I = 1, 10 S 2 A(I) = A(I) + B(I-1) S z S u DO I = 1, 10 S 5 D(I) = sqrt(c(i)) S v S 5 Dependence graph 2/20/09 HPC II Spring

62 Loop Fusion S 1 B(1) = T(1)*X(1) S 2 DO I = 2, N S 3 B(I) = T(I)*X(I) S 4 S 5 DO I = 2, N S 6 A(I) = B(I) - B(I-1) S 7 S 1 B(1) = T(1)*X(1) S x DO I = 2, N S 3 B(I) = T(I)*X(I) S 6 A(I) = B(I) - B(I-1) S y S 1 δ S 6 S 3 δ S 6 S 1 δ S 6 S 3 δ (=) S 6 S 3 δ (<) S 6 Original code has dependences S 1 δ S 6 and S 3 δ S 6 Fused loop has dependences S 1 δ S 6 and S 3 δ (=) S 6 and S 3 δ (<) S 6 Combine two consecutive loops with same IV and loop bounds into one Fused loop must preserve all dependence relations of the original loop Enables more effective scalar optimizations in fused loop But: may reduce temporal locality 62

63 Example S 1 DO I = 1, N S 2 A(I) = B(I) + 1 S 3 S 4 DO I = 1, N S 5 C(I) = A(I)/2 S 6 S 7 DO I = 1, N S 8 D(I) = 1/C(I+1) S 9 Which of the three fused loops is legal? a) b) c) S 1 DO I = 1, N S 2 A(I) = B(I) + 1 S 3 S x DO I = 1, N S 5 C(I) = A(I)/2 S 8 D(I) = 1/C(I+1) S y S x DO I = 1, N S 2 A(I) = B(I) + 1 S 5 C(I) = A(I)/2 S y S 7 DO I = 1, N S 8 D(I) = 1/C(I+1) S 9 S x DO I = 1, N S 2 A(I) = B(I) + 1 S 5 C(I) = A(I)/2 S 8 D(I) = 1/C(I+1) S y 63

64 Alignment for Fusion S 1 DO I = 1, N S 2 B(I) = T(I)/C S 3 S 4 DO I = 1, N S 5 A(I) = B(I+1) - B(I-1) S 6 S 1 DO I = 0, N-1 S 2 B(I+1) = T(I+1)/C S 3 S 4 DO I = 1, N S 5 A(I) = B(I+1) - B(I-1) S 6 S 2 δ S 5 S 2 δ S 5 Alignment for fusion changes iteration bounds of one loop to enable fusion when dependences would otherwise prevent fusion S x B(1) = T(1)/C S 1 DO I = 1, N-1 S 2 B(I+1) = T(I+1)/C S 5 A(I) = B(I+1) - B(I-1) S 6 S y A(N) = B(N+1) - B(N-1) Loop deps: S 2 δ (=) S 5 S 2 δ (<) S 5 64

65 Reversal for Fusion S 1 DO I = 1, N S 2 B(I) = T(I)*X(I) S 3 S 4 DO I = 1, N S 5 A(I) = B(I+1) S 6 S 1 DO I = N, 1, -1 S 2 B(I) = T(I)*X(I) S 3 S 4 DO I = N, 1, -1 S 5 A(I) = B(I+1) S 6 S 1 DO I = N, 1, -1 S 2 B(I) = T(I)*X(I) S 5 A(I) = B(I+1) S 6 S 2 δ S 5 S 2 δ S 5 S 2 δ (<) S 5 Reverse the direction of the iteration Only legal for loops that have no carried dependences Enables loop fusion by ensuring dependences are preserved between loop statements 65

66 Loop Interchange S 1 DO I = 1, N S 2 DO J = 1, M S 3 DO K = 1, L S 4 A(I+1,J+1,K) = A(I,J,K) + A(I,J+1,K+1) S 5 S 6 S 7 S 4 δ (<,<,=) S 4 S 4 δ (<,=,>) S 4 Compute the direction matrix and find which columns can be permuted without violating dependence relations in original loop nest < < = < = > < < = < = > < = < = > < Invalid Direction matrix < < = < = > < < = = < > Valid 2/20/09 HPC II Spring

67 Loop Parallelization S 1 DO I = 1, 4 DO J = 1, 4 A(I,J+1) = A(I,J) It is valid to convert a sequential loop to a parallel loop if the loop carries no dependence 4 J S 1 δ (=,<) S 1 parallelize I S 1 PARALLEL DO I = 1, 4 DO J = 1, 4 A(I,J+1) = A(I,J) 67

68 Vectorization (1) DO I = 1, 4 S 1 X(I) = X(I) + C vectorize A single-statement loop that carries no dependence can be vectorized S 1 X(1:4) = X(1:4) + C Fortran 90 array statement X(1) X(2) X(3) X(4) + C C C C X(1) X(2) X(3) X(4) Vector operation 68

69 Vectorization (2) S 1 δ (<) S 1 DO I = 1, N S 1 X(I+1) = X(I) + C This example has a loop-carried dependence, and is therefore invalid S 1 X(2:N+1) = X(1:N) + C Invalid Fortran 90 statement 69

70 Vectorization (3) DO I = 1, N S 1 A(I+1) = B(I) + C S 2 D(I) = A(I) + E DO I = 1, N S 1 A(I+1) = B(I) + C DO I = 1, N S 2 D(I) = A(I) + E loop distribution vectorize S 1 A(2:N+1) = B(1:N) + C S 2 D(1:N) = A(1:N) + E Because only single loop statements can be vectorized, loops with multiple statements must be transformed using the loop distribution transformation Loop has no loopcarried dependence or has forward flow dependences 70

71 Vectorization (4) DO I = 1, N S 1 D(I) = A(I) + E S 2 A(I+1) = B(I) + C DO I = 1, N S 2 A(I+1) = B(I) + C DO I = 1, N S 1 D(I) = A(I) + E loop distribution vectorize When a loop has backward flow dependences and no loop-independent dependences, interchange the statements to enable loop distribution S 2 A(2:N+1) = B(1:N) + C S 1 D(1:N) = A(1:N) + E 71

72 Preliminary Transformations to Support Dependence Testing To determine distance and direction vectors with dependence tests require array subscripts to be in a standard form Preliminary transformations put more subscripts in affine form, where subscripts are linear integer functions of loop induction variables Loop normalization Forward substitution Constant propagation Induction-variable substitution Scalar expansion 72

73 Forward Substitution and Constant Propagation N = 2 DO I = 1, 100 K = I + N S 1 A(K) = A(K) + 5 DO I = 1, 100 S 1 A(I+2) = A(I+2) + 5 Forward substitution and constant propagation help modify array subscripts for dependence analysis 73

74 Induction Variable Substitution Example loop with IVs I = 0 J = 1 while (I<N) I = I+1 = A[J] J = J+2 K = 2*I A[K] = endwhile IVS After IV substitution (IVS) (note the affine indexes) for i=0 to N-1 S 1 : = A[2*i+1] S 2 : A[2*i+2] = endfor Dep test After parallelization forall (i=0,n-1) = A[2*i+1] A[2*i+2] = endforall A[] W R W R W R A[2*i+1] A[2*i+2] GCD test to solve dependence equation 2i d - 2i u = -1 Since 2 does not divide 1 there is no data dependence. 74

75 Example (1) INC = 2 KI = 0 DO I = 1, 100 DO J = 1, 100 KI = KI + INC U(KI) = U(KI) + W(J) S(I) = U(KI) Before and after induction-variable substitution of KI in the inner loop INC = 2 KI = 0 DO I = 1, 100 DO J = 1, 100 U(KI+J*INC) = U(KI+J*INC) + W(J) KI = KI * INC S(I) = U(KI) 75

76 Example (2) INC = 2 KI = 0 DO I = 1, 100 DO J = 1, 100 U(KI+J*INC) = U(KI+J*INC) + W(J) KI = KI + 100*INC S(I) = U(KI) Before and after induction-variable substitution of KI in the outer loop INC = 2 KI = 0 DO I = 1, 100 DO J = 1, 100 U(KI+(I-1)*100*INC+J*INC) = U(KI+(I-1)*100*INC+J*INC) + W(J) S(I) = U(KI+I*100*INC) KI = KI + 100*100*INC 76

77 Example (3) INC = 2 KI = 0 DO I = 1, 100 DO J = 1, 100 U(KI+(I-1)*100*INC+J*INC) = U(KI+(I-1)*100*INC+J*INC) + W(J) S(I) = U(KI+I*100*INC) KI = KI + 100*100*INC Affine subscripts Before and after constant propagation of KI and INC DO I = 1, 100 DO J = 1, 100 U(I*200+J*2-200) = U(I*200+J*2-200) + W(J) S(I) = U(I*200) KI =

78 Scalar Expansion S 1 DO I = 1, N S 2 T = A(I) + B(I) S 3 C(I) = T + 1/T S 4 S x IF N > 0 THEN S y ALLOC Tx(1:N) S 1 DO I = 1, N S 2 Tx(I) = A(I) + B(I) S x C(I) = Tx(I) + 1/Tx(I) S 4 S z T = Tx(N) S u ENDIF S 2 δ (=) S 3 S 2 δ -1 (<) S 3 S 2 δ (=) S 3 Breaks antidependence relations by expanding or promoting a scalar into an array Scalar anti-dependence relations prevent certain loop transformations such as loop fission and loop interchange 78

79 Example S 1 DO I = 1, 10 S 2 T = A(I,1) S 3 DO J = 2, 10 S 4 T = T + A(I,J) S 5 S 6 B(I) = T S 7 S 2 δ (=) S 4 S 4 δ (=,<) S 4 S 4 δ (=) S 6 S 2 δ -1 (<) S 6 S 1 DO I = 1, 10 S 2 Tx(I) = A(I,1) S 3 DO J = 2, 10 S 4 Tx(I) = Tx(I)+A(I,J) S 5 S 6 B(I) = Tx(I) S 7 S 2 δ (=) S 4 S 4 δ (=,<) S 4 S 4 δ (=) S 6 S 1 DO I = 1, 10 S 2 Tx(I) = A(I,1) S x S 1 DO I = 1, 10 S 3 DO J = 2, 10 S 4 Tx(I) = Tx(I) + A(I,J) S 5 S y S z DO I = 1, 10 S 6 B(I) = Tx(I) S 7 S 2 δ S 4 S 4 δ (=,<) S 4 S 4 δ S 6 S 2 Tx(1:10) = A(1:10,1) S 3 DO J = 2, 10 S 4 Tx(1:10) = Tx(1:10)+A(1:10,J) S 5 S 6 B(1:10) = Tx(1:10) S 2 δ S 4 S 4 δ (<,=) S 4 S 4 δ S 6 79

80 Unimodular Loop Transformations A unimodular matrix U is a matrix with integer entries and determinant ±1 Such a matrix maps an object onto another object with exactly the same number of integer points in it The set of all unimodular transformations forms a group, called the modular group The identity matrix is the identity element The inverse U - ¹ of U exist and is also unimodular Unimodular transformations are closed under multiplication: a composition of unimodular transformations is unimodular Intel 9/12/05 Intel 9/12/05 80

81 Unimodular Loop Transformations (cont d) Many loop transformations can be represented as matrix operations using unimodular matrices, including loop interchange, loop reversal, loop interchange, DO K = 1, 100 DO L = 1, 50 A(K,L) = U interchange = 2/20/ DO L = 1, 50 DO K = 1, 100 A(L,K) = 1 0 U reversal = U skewing = 0 1 DO K = -100, -1 DO L = 1, 50 A(-K,L) = 1 0 p 1 DO K = 1, 100 DO L = 1+p*K, 50+p*K A(K,L-p*K) = HPC II Spring

82 Unimodular Loop Transformations (cont d) The unimodular transformation is also applied to the dependence vectors: if the result is lexicographically positive, the transformation is legal DO K = 2, N DO L = 1, N-1 A(L,K) = A(L+1,K-1) Loop has distance vector d(i,j) = (1,-1) for the writes at iterations i=(i 1,i 2 ) and reads at iterations j=(j 1,j 2 ) U interchange d = d new = 1 1 2/20/09 HPC II Spring

83 2/20/09 Unimodular Loop Transformations (cont d) Transformed loop nest is given by A U - ¹ I b The new loop body has indices I = U -1 I Iteration space IS = {I A I b } after transformation IS = {I A U -1 I b } Transformed loop nest needs to be normalized by means of Fourier- Motzkin elimination to ensure that loop bounds are affine expressions in more outer loop indices M N DO M = 1, 3 DO N = M+1, 4 A(M,N) = M = 0 1 N 1 0 K L L K DO K = 2, 4 DO L = 1, MIN(3,K-1) A(L,K) = FMVE L 0 1 K L 3 and L K-1 2 K 4 HPC II Spring

84 [Wolfe] Chapters 5, 7-9 Further Reading 2/20/09 HPC II Spring

Saturday, April 23, Dependence Analysis

Saturday, April 23, Dependence Analysis Dependence Analysis Motivating question Can the loops on the right be run in parallel? i.e., can different processors run different iterations in parallel? What needs to be true for a loop to be parallelizable?

More information

Loop Parallelization Techniques and dependence analysis

Loop Parallelization Techniques and dependence analysis Loop Parallelization Techniques and dependence analysis Data-Dependence Analysis Dependence-Removing Techniques Parallelizing Transformations Performance-enchancing Techniques 1 When can we run code in

More information

COMP 515: Advanced Compilation for Vector and Parallel Processors. Vivek Sarkar Department of Computer Science Rice University

COMP 515: Advanced Compilation for Vector and Parallel Processors. Vivek Sarkar Department of Computer Science Rice University COMP 515: Advanced Compilation for Vector and Parallel Processors Vivek Sarkar Department of Computer Science Rice University vsarkar@rice.edu COMP 515 Lecture 3 13 January 2009 Acknowledgments Slides

More information

Compiler Optimisation

Compiler Optimisation Compiler Optimisation 8 Dependence Analysis Hugh Leather IF 1.18a hleather@inf.ed.ac.uk Institute for Computing Systems Architecture School of Informatics University of Edinburgh 2018 Introduction This

More information

' $ Dependence Analysis & % 1

' $ Dependence Analysis & % 1 Dependence Analysis 1 Goals - determine what operations can be done in parallel - determine whether the order of execution of operations can be altered Basic idea - determine a partial order on operations

More information

Organization of a Modern Compiler. Middle1

Organization of a Modern Compiler. Middle1 Organization of a Modern Compiler Source Program Front-end syntax analysis + type-checking + symbol table High-level Intermediate Representation (loops,array references are preserved) Middle1 loop-level

More information

Advanced Compiler Construction

Advanced Compiler Construction CS 526 Advanced Compiler Construction http://misailo.cs.illinois.edu/courses/cs526 DEPENDENCE ANALYSIS The slides adapted from Vikram Adve and David Padua Kinds of Data Dependence Direct Dependence X =

More information

Homework #2: assignments/ps2/ps2.htm Due Thursday, March 7th.

Homework #2:   assignments/ps2/ps2.htm Due Thursday, March 7th. Homework #2: http://www.cs.cornell.edu/courses/cs612/2002sp/ assignments/ps2/ps2.htm Due Thursday, March 7th. 1 Transformations and Dependences 2 Recall: Polyhedral algebra tools for determining emptiness

More information

DO i 1 = L 1, U 1 DO i 2 = L 2, U 2... DO i n = L n, U n. S 1 A(f 1 (i 1,...,i, n ),...,f, m (i 1,...,i, n )) =...

DO i 1 = L 1, U 1 DO i 2 = L 2, U 2... DO i n = L n, U n. S 1 A(f 1 (i 1,...,i, n ),...,f, m (i 1,...,i, n )) =... 15-745 Lecture 7 Data Dependence in Loops 2 Complex spaces Delta Test Merging vectors Copyright Seth Goldstein, 2008-9 Based on slides from Allen&Kennedy 1 The General Problem S 1 A(f 1 (i 1,,i n ),,f

More information

Data Dependences and Parallelization. Stanford University CS243 Winter 2006 Wei Li 1

Data Dependences and Parallelization. Stanford University CS243 Winter 2006 Wei Li 1 Data Dependences and Parallelization Wei Li 1 Agenda Introduction Single Loop Nested Loops Data Dependence Analysis 2 Motivation DOALL loops: loops whose iterations can execute in parallel for i = 11,

More information

Loop Interchange. Loop Transformations. Taxonomy. do I = 1, N do J = 1, N S 1 A(I,J) = A(I-1,J) + 1 enddo enddo. Loop unrolling.

Loop Interchange. Loop Transformations. Taxonomy. do I = 1, N do J = 1, N S 1 A(I,J) = A(I-1,J) + 1 enddo enddo. Loop unrolling. Advanced Topics Which Loops are Parallel? review Optimization for parallel machines and memory hierarchies Last Time Dependence analysis Today Loop transformations An example - McKinley, Carr, Tseng loop

More information

COMP 515: Advanced Compilation for Vector and Parallel Processors

COMP 515: Advanced Compilation for Vector and Parallel Processors COMP 515: Advanced Compilation for Vector and Parallel Processors Vivek Sarkar Department of Computer Science Rice University vsarkar@rice.edu COMP 515 Lecture 4 1 September, 2011 1 Acknowledgments Slides

More information

Dependence Analysis. Dependence Examples. Last Time: Brief introduction to interprocedural analysis. do I = 2, 100 A(I) = A(I-1) + 1 enddo

Dependence Analysis. Dependence Examples. Last Time: Brief introduction to interprocedural analysis. do I = 2, 100 A(I) = A(I-1) + 1 enddo Dependence Analysis Dependence Examples Last Time: Brief introduction to interprocedural analysis Today: Optimization for parallel machines and memory hierarchies Dependence analysis Loop transformations

More information

Announcements PA2 due Friday Midterm is Wednesday next week, in class, one week from today

Announcements PA2 due Friday Midterm is Wednesday next week, in class, one week from today Loop Transformations Announcements PA2 due Friday Midterm is Wednesday next week, in class, one week from today Today Recall stencil computations Intro to loop transformations Data dependencies between

More information

DO i 1 = L 1, U 1 DO i 2 = L 2, U 2... DO i n = L n, U n. S 1 A(f 1 (i 1,...,i, n ),...,f, m (i 1,...,i, n )) =...

DO i 1 = L 1, U 1 DO i 2 = L 2, U 2... DO i n = L n, U n. S 1 A(f 1 (i 1,...,i, n ),...,f, m (i 1,...,i, n )) =... 15-745 Lecture 7 Data Dependence in Loops 2 Delta Test Merging vectors Copyright Seth Goldstein, 2008 The General Problem S 1 A(f 1 (i 1,,i n ),,f m (i 1,,i n )) = S 2 = A(g 1 (i 1,,i n ),,g m (i 1,,i

More information

Fortran program + Partial data layout specifications Data Layout Assistant.. regular problems. dynamic remapping allowed Invoked only a few times Not part of the compiler Can use expensive techniques HPF

More information

Classes of data dependence. Dependence analysis. Flow dependence (True dependence)

Classes of data dependence. Dependence analysis. Flow dependence (True dependence) Dependence analysis Classes of data dependence Pattern matching and replacement is all that is needed to apply many source-to-source transformations. For example, pattern matching can be used to determine

More information

Dependence analysis. However, it is often necessary to gather additional information to determine the correctness of a particular transformation.

Dependence analysis. However, it is often necessary to gather additional information to determine the correctness of a particular transformation. Dependence analysis Pattern matching and replacement is all that is needed to apply many source-to-source transformations. For example, pattern matching can be used to determine that the recursion removal

More information

Front-end. Organization of a Modern Compiler. Middle1. Middle2. Back-end. converted to control flow) Representation

Front-end. Organization of a Modern Compiler. Middle1. Middle2. Back-end. converted to control flow) Representation register allocation instruction selection Code Low-level intermediate Representation Back-end Assembly array references converted into low level operations, loops converted to control flow Middle2 Low-level

More information

Parallel Scientific Computing

Parallel Scientific Computing IV-1 Parallel Scientific Computing Matrix-vector multiplication. Matrix-matrix multiplication. Direct method for solving a linear equation. Gaussian Elimination. Iterative method for solving a linear equation.

More information

Loop Scheduling and Software Pipelining \course\cpeg421-08s\topic-7.ppt 1

Loop Scheduling and Software Pipelining \course\cpeg421-08s\topic-7.ppt 1 Loop Scheduling and Software Pipelining 2008-04-24 \course\cpeg421-08s\topic-7.ppt 1 Reading List Slides: Topic 7 and 7a Other papers as assigned in class or homework: 2008-04-24 \course\cpeg421-08s\topic-7.ppt

More information

Solution of Linear Systems

Solution of Linear Systems Solution of Linear Systems Parallel and Distributed Computing Department of Computer Science and Engineering (DEI) Instituto Superior Técnico May 12, 2016 CPD (DEI / IST) Parallel and Distributed Computing

More information

Lecture 6 September 21, 2016

Lecture 6 September 21, 2016 ICS 643: Advanced Parallel Algorithms Fall 2016 Lecture 6 September 21, 2016 Prof. Nodari Sitchinava Scribe: Tiffany Eulalio 1 Overview In the last lecture, we wrote a non-recursive summation program and

More information

Topics on Compilers

Topics on Compilers Assignment 2 4541.775 Topics on Compilers Sample Solution 1. Test for dependences on S. Write down the subscripts. Which positions are separable, which are coupled? Which dependence test would you apply

More information

Worst-Case Execution Time Analysis. LS 12, TU Dortmund

Worst-Case Execution Time Analysis. LS 12, TU Dortmund Worst-Case Execution Time Analysis Prof. Dr. Jian-Jia Chen LS 12, TU Dortmund 02, 03 May 2016 Prof. Dr. Jian-Jia Chen (LS 12, TU Dortmund) 1 / 53 Most Essential Assumptions for Real-Time Systems Upper

More information

Lecture 23: I. Data Dependence II. Dependence Testing: Formulation III. Dependence Testers IV. Loop Parallelization V.

Lecture 23: I. Data Dependence II. Dependence Testing: Formulation III. Dependence Testers IV. Loop Parallelization V. Lecure 23: Array Dependence Analysis & Parallelizaion I. Daa Dependence II. Dependence Tesing: Formulaion III. Dependence Tesers IV. Loop Parallelizaion V. Loop Inerchange [ALSU 11.6, 11.7.8] Phillip B.

More information

Organization of a Modern Compiler. Front-end. Middle1. Back-end DO I = 1, N DO J = 1,M S 1 1 N. Source Program

Organization of a Modern Compiler. Front-end. Middle1. Back-end DO I = 1, N DO J = 1,M S 1 1 N. Source Program Organization of a Modern Compiler ource Program Front-end snta analsis + tpe-checking + smbol table High-level ntermediate Representation (loops,arra references are preserved) Middle loop-level transformations

More information

A Polynomial-Time Algorithm for Memory Space Reduction

A Polynomial-Time Algorithm for Memory Space Reduction A Polynomial-Time Algorithm for Memory Space Reduction Yonghong Song Cheng Wang Zhiyuan Li Sun Microsystems, Inc. Department of Computer Sciences 4150 Network Circle Purdue University Santa Clara, CA 95054

More information

COMP 515: Advanced Compilation for Vector and Parallel Processors. Vivek Sarkar Department of Computer Science Rice University

COMP 515: Advanced Compilation for Vector and Parallel Processors. Vivek Sarkar Department of Computer Science Rice University COMP 515: Advanced Compilation for Vector and Parallel Processors Vivek Sarkar Department of Computer Science Rice University vsarkar@rice.edu COMP 515 Lecture 10 10 February 2009 Announcement Feb 17 th

More information

The purpose of computing is insight, not numbers. Richard Wesley Hamming

The purpose of computing is insight, not numbers. Richard Wesley Hamming Systems of Linear Equations The purpose of computing is insight, not numbers. Richard Wesley Hamming Fall 2010 1 Topics to Be Discussed This is a long unit and will include the following important topics:

More information

NCU EE -- DSP VLSI Design. Tsung-Han Tsai 1

NCU EE -- DSP VLSI Design. Tsung-Han Tsai 1 NCU EE -- DSP VLSI Design. Tsung-Han Tsai 1 Multi-processor vs. Multi-computer architecture µp vs. DSP RISC vs. DSP RISC Reduced-instruction-set Register-to-register operation Higher throughput by using

More information

A Review of Matrix Analysis

A Review of Matrix Analysis Matrix Notation Part Matrix Operations Matrices are simply rectangular arrays of quantities Each quantity in the array is called an element of the matrix and an element can be either a numerical value

More information

Module 1: Analyzing the Efficiency of Algorithms

Module 1: Analyzing the Efficiency of Algorithms Module 1: Analyzing the Efficiency of Algorithms Dr. Natarajan Meghanathan Professor of Computer Science Jackson State University Jackson, MS 39217 E-mail: natarajan.meghanathan@jsums.edu What is an Algorithm?

More information

The Data-Dependence graph of Adjoint Codes

The Data-Dependence graph of Adjoint Codes The Data-Dependence graph of Adjoint Codes Laurent Hascoët INRIA Sophia-Antipolis November 19th, 2012 (INRIA Sophia-Antipolis) Data-Deps of Adjoints 19/11/2012 1 / 14 Data-Dependence graph of this talk

More information

Loop parallelization using compiler analysis

Loop parallelization using compiler analysis Loop parallelization using compiler analysis Which of these loops is parallel? How can we determine this automatically using compiler analysis? Organization of a Modern Compiler Source Program Front-end

More information

MATH2210 Notebook 2 Spring 2018

MATH2210 Notebook 2 Spring 2018 MATH2210 Notebook 2 Spring 2018 prepared by Professor Jenny Baglivo c Copyright 2009 2018 by Jenny A. Baglivo. All Rights Reserved. 2 MATH2210 Notebook 2 3 2.1 Matrices and Their Operations................................

More information

On the optimality of Allen and Kennedy's algorithm for parallelism extraction in nested loops Alain Darte and Frederic Vivien Laboratoire LIP, URA CNR

On the optimality of Allen and Kennedy's algorithm for parallelism extraction in nested loops Alain Darte and Frederic Vivien Laboratoire LIP, URA CNR On the optimality of Allen and Kennedy's algorithm for parallelism extraction in nested loops Alain Darte and Frederic Vivien Laboratoire LIP, URA CNRS 1398 Ecole Normale Superieure de Lyon, F - 69364

More information

Marwan Burelle. Parallel and Concurrent Programming. Introduction and Foundation

Marwan Burelle.  Parallel and Concurrent Programming. Introduction and Foundation and and marwan.burelle@lse.epita.fr http://wiki-prog.kh405.net Outline 1 2 and 3 and Evolutions and Next evolutions in processor tends more on more on growing of cores number GPU and similar extensions

More information

J.I. Aliaga 1 M. Bollhöfer 2 A.F. Martín 1 E.S. Quintana-Ortí 1. March, 2009

J.I. Aliaga 1 M. Bollhöfer 2 A.F. Martín 1 E.S. Quintana-Ortí 1. March, 2009 Parallel Preconditioning of Linear Systems based on ILUPACK for Multithreaded Architectures J.I. Aliaga M. Bollhöfer 2 A.F. Martín E.S. Quintana-Ortí Deparment of Computer Science and Engineering, Univ.

More information

MATH 2030: MATRICES ,, a m1 a m2 a mn If the columns of A are the vectors a 1, a 2,...,a n ; A is represented as A 1. .

MATH 2030: MATRICES ,, a m1 a m2 a mn If the columns of A are the vectors a 1, a 2,...,a n ; A is represented as A 1. . MATH 030: MATRICES Matrix Operations We have seen how matrices and the operations on them originated from our study of linear equations In this chapter we study matrices explicitely Definition 01 A matrix

More information

Special Nodes for Interface

Special Nodes for Interface fi fi Special Nodes for Interface SW on processors Chip-level HW Board-level HW fi fi C code VHDL VHDL code retargetable compilation high-level synthesis SW costs HW costs partitioning (solve ILP) cluster

More information

MATH 3511 Lecture 1. Solving Linear Systems 1

MATH 3511 Lecture 1. Solving Linear Systems 1 MATH 3511 Lecture 1 Solving Linear Systems 1 Dmitriy Leykekhman Spring 2012 Goals Review of basic linear algebra Solution of simple linear systems Gaussian elimination D Leykekhman - MATH 3511 Introduction

More information

Prepared by: M. S. KumarSwamy, TGT(Maths) Page

Prepared by: M. S. KumarSwamy, TGT(Maths) Page Prepared by: M. S. KumarSwamy, TGT(Maths) Page - 50 - CHAPTER 3: MATRICES QUICK REVISION (Important Concepts & Formulae) MARKS WEIGHTAGE 03 marks Matrix A matrix is an ordered rectangular array of numbers

More information

HYPERCUBE ALGORITHMS FOR IMAGE PROCESSING AND PATTERN RECOGNITION SANJAY RANKA SARTAJ SAHNI Sanjay Ranka and Sartaj Sahni

HYPERCUBE ALGORITHMS FOR IMAGE PROCESSING AND PATTERN RECOGNITION SANJAY RANKA SARTAJ SAHNI Sanjay Ranka and Sartaj Sahni HYPERCUBE ALGORITHMS FOR IMAGE PROCESSING AND PATTERN RECOGNITION SANJAY RANKA SARTAJ SAHNI 1989 Sanjay Ranka and Sartaj Sahni 1 2 Chapter 1 Introduction 1.1 Parallel Architectures Parallel computers may

More information

1 Multiply Eq. E i by λ 0: (λe i ) (E i ) 2 Multiply Eq. E j by λ and add to Eq. E i : (E i + λe j ) (E i )

1 Multiply Eq. E i by λ 0: (λe i ) (E i ) 2 Multiply Eq. E j by λ and add to Eq. E i : (E i + λe j ) (E i ) Direct Methods for Linear Systems Chapter Direct Methods for Solving Linear Systems Per-Olof Persson persson@berkeleyedu Department of Mathematics University of California, Berkeley Math 18A Numerical

More information

5.1 Banded Storage. u = temperature. The five-point difference operator. uh (x, y + h) 2u h (x, y)+u h (x, y h) uh (x + h, y) 2u h (x, y)+u h (x h, y)

5.1 Banded Storage. u = temperature. The five-point difference operator. uh (x, y + h) 2u h (x, y)+u h (x, y h) uh (x + h, y) 2u h (x, y)+u h (x h, y) 5.1 Banded Storage u = temperature u= u h temperature at gridpoints u h = 1 u= Laplace s equation u= h u = u h = grid size u=1 The five-point difference operator 1 u h =1 uh (x + h, y) 2u h (x, y)+u h

More information

Linear Equation: a 1 x 1 + a 2 x a n x n = b. x 1, x 2,..., x n : variables or unknowns

Linear Equation: a 1 x 1 + a 2 x a n x n = b. x 1, x 2,..., x n : variables or unknowns Linear Equation: a x + a 2 x 2 +... + a n x n = b. x, x 2,..., x n : variables or unknowns a, a 2,..., a n : coefficients b: constant term Examples: x + 4 2 y + (2 5)z = is linear. x 2 + y + yz = 2 is

More information

Worst-Case Execution Time Analysis. LS 12, TU Dortmund

Worst-Case Execution Time Analysis. LS 12, TU Dortmund Worst-Case Execution Time Analysis Prof. Dr. Jian-Jia Chen LS 12, TU Dortmund 09/10, Jan., 2018 Prof. Dr. Jian-Jia Chen (LS 12, TU Dortmund) 1 / 43 Most Essential Assumptions for Real-Time Systems Upper

More information

Dense Arithmetic over Finite Fields with CUMODP

Dense Arithmetic over Finite Fields with CUMODP Dense Arithmetic over Finite Fields with CUMODP Sardar Anisul Haque 1 Xin Li 2 Farnam Mansouri 1 Marc Moreno Maza 1 Wei Pan 3 Ning Xie 1 1 University of Western Ontario, Canada 2 Universidad Carlos III,

More information

EE/CSCI 451: Parallel and Distributed Computation

EE/CSCI 451: Parallel and Distributed Computation EE/CSCI 451: Parallel and Distributed Computation Lecture #19 3/28/2017 Xuehai Qian Xuehai.qian@usc.edu http://alchem.usc.edu/portal/xuehaiq.html University of Southern California 1 From last class PRAM

More information

Kasetsart University Workshop. Multigrid methods: An introduction

Kasetsart University Workshop. Multigrid methods: An introduction Kasetsart University Workshop Multigrid methods: An introduction Dr. Anand Pardhanani Mathematics Department Earlham College Richmond, Indiana USA pardhan@earlham.edu A copy of these slides is available

More information

Computation of the mtx-vec product based on storage scheme on vector CPUs

Computation of the mtx-vec product based on storage scheme on vector CPUs BLAS: Basic Linear Algebra Subroutines BLAS: Basic Linear Algebra Subroutines BLAS: Basic Linear Algebra Subroutines Analysis of the Matrix Computation of the mtx-vec product based on storage scheme on

More information

LU Factorization. Marco Chiarandini. DM559 Linear and Integer Programming. Department of Mathematics & Computer Science University of Southern Denmark

LU Factorization. Marco Chiarandini. DM559 Linear and Integer Programming. Department of Mathematics & Computer Science University of Southern Denmark DM559 Linear and Integer Programming LU Factorization Marco Chiarandini Department of Mathematics & Computer Science University of Southern Denmark [Based on slides by Lieven Vandenberghe, UCLA] Outline

More information

LU Factorization a 11 a 1 a 1n A = a 1 a a n (b) a n1 a n a nn L = l l 1 l ln1 ln 1 75 U = u 11 u 1 u 1n 0 u u n 0 u n...

LU Factorization a 11 a 1 a 1n A = a 1 a a n (b) a n1 a n a nn L = l l 1 l ln1 ln 1 75 U = u 11 u 1 u 1n 0 u u n 0 u n... .. Factorizations Reading: Trefethen and Bau (1997), Lecture 0 Solve the n n linear system by Gaussian elimination Ax = b (1) { Gaussian elimination is a direct method The solution is found after a nite

More information

AMS 209, Fall 2015 Final Project Type A Numerical Linear Algebra: Gaussian Elimination with Pivoting for Solving Linear Systems

AMS 209, Fall 2015 Final Project Type A Numerical Linear Algebra: Gaussian Elimination with Pivoting for Solving Linear Systems AMS 209, Fall 205 Final Project Type A Numerical Linear Algebra: Gaussian Elimination with Pivoting for Solving Linear Systems. Overview We are interested in solving a well-defined linear system given

More information

0-1 Knapsack Problem in parallel Progetto del corso di Calcolo Parallelo AA

0-1 Knapsack Problem in parallel Progetto del corso di Calcolo Parallelo AA 0-1 Knapsack Problem in parallel Progetto del corso di Calcolo Parallelo AA 2008-09 Salvatore Orlando 1 0-1 Knapsack problem N objects, j=1,..,n Each kind of item j has a value p j and a weight w j (single

More information

BLAS: Basic Linear Algebra Subroutines Analysis of the Matrix-Vector-Product Analysis of Matrix-Matrix Product

BLAS: Basic Linear Algebra Subroutines Analysis of the Matrix-Vector-Product Analysis of Matrix-Matrix Product Level-1 BLAS: SAXPY BLAS-Notation: S single precision (D for double, C for complex) A α scalar X vector P plus operation Y vector SAXPY: y = αx + y Vectorization of SAXPY (αx + y) by pipelining: page 8

More information

Matrices and Matrix Algebra.

Matrices and Matrix Algebra. Matrices and Matrix Algebra 3.1. Operations on Matrices Matrix Notation and Terminology Matrix: a rectangular array of numbers, called entries. A matrix with m rows and n columns m n A n n matrix : a square

More information

Improving Memory Hierarchy Performance Through Combined Loop. Interchange and Multi-Level Fusion

Improving Memory Hierarchy Performance Through Combined Loop. Interchange and Multi-Level Fusion Improving Memory Hierarchy Performance Through Combined Loop Interchange and Multi-Level Fusion Qing Yi Ken Kennedy Computer Science Department, Rice University MS-132 Houston, TX 77005 Abstract Because

More information

Integers and Division

Integers and Division Integers and Division Notations Z: set of integers N : set of natural numbers R: set of real numbers Z + : set of positive integers Some elements of number theory are needed in: Data structures, Random

More information

CSE507. Satisfiability Modulo Theories. Computer-Aided Reasoning for Software. Emina Torlak

CSE507. Satisfiability Modulo Theories. Computer-Aided Reasoning for Software. Emina Torlak Computer-Aided Reasoning for Software CSE507 Satisfiability Modulo Theories courses.cs.washington.edu/courses/cse507/18sp/ Emina Torlak emina@cs.washington.edu Today Last lecture Practical applications

More information

VLSI Signal Processing

VLSI Signal Processing VLSI Signal Processing Lecture 1 Pipelining & Retiming ADSP Lecture1 - Pipelining & Retiming (cwliu@twins.ee.nctu.edu.tw) 1-1 Introduction DSP System Real time requirement Data driven synchronized by data

More information

a 11 x 1 + a 12 x a 1n x n = b 1 a 21 x 1 + a 22 x a 2n x n = b 2.

a 11 x 1 + a 12 x a 1n x n = b 1 a 21 x 1 + a 22 x a 2n x n = b 2. Chapter 1 LINEAR EQUATIONS 11 Introduction to linear equations A linear equation in n unknowns x 1, x,, x n is an equation of the form a 1 x 1 + a x + + a n x n = b, where a 1, a,, a n, b are given real

More information

Tight Timing Estimation With the Newton-Gregory Formulae

Tight Timing Estimation With the Newton-Gregory Formulae CPC 2003 p.1/30 Tight Timing Estimation With the Newton-Gregory Formulae Robert van Engelen Kyle Gallivan Burt Walsh Florida State University CPC 2003 p.2/30 Introduction Worst-case execution time (WCET)

More information

Linear Algebra Review

Linear Algebra Review Chapter 1 Linear Algebra Review It is assumed that you have had a beginning course in linear algebra, and are familiar with matrix multiplication, eigenvectors, etc I will review some of these terms here,

More information

MTH Linear Algebra. Study Guide. Dr. Tony Yee Department of Mathematics and Information Technology The Hong Kong Institute of Education

MTH Linear Algebra. Study Guide. Dr. Tony Yee Department of Mathematics and Information Technology The Hong Kong Institute of Education MTH 3 Linear Algebra Study Guide Dr. Tony Yee Department of Mathematics and Information Technology The Hong Kong Institute of Education June 3, ii Contents Table of Contents iii Matrix Algebra. Real Life

More information

APPENDIX: MATHEMATICAL INDUCTION AND OTHER FORMS OF PROOF

APPENDIX: MATHEMATICAL INDUCTION AND OTHER FORMS OF PROOF ELEMENTARY LINEAR ALGEBRA WORKBOOK/FOR USE WITH RON LARSON S TEXTBOOK ELEMENTARY LINEAR ALGEBRA CREATED BY SHANNON MARTIN MYERS APPENDIX: MATHEMATICAL INDUCTION AND OTHER FORMS OF PROOF When you are done

More information

CS475: Linear Equations Gaussian Elimination LU Decomposition Wim Bohm Colorado State University

CS475: Linear Equations Gaussian Elimination LU Decomposition Wim Bohm Colorado State University CS475: Linear Equations Gaussian Elimination LU Decomposition Wim Bohm Colorado State University Except as otherwise noted, the content of this presentation is licensed under the Creative Commons Attribution

More information

Mathematical Optimisation, Chpt 2: Linear Equations and inequalities

Mathematical Optimisation, Chpt 2: Linear Equations and inequalities Mathematical Optimisation, Chpt 2: Linear Equations and inequalities Peter J.C. Dickinson p.j.c.dickinson@utwente.nl http://dickinson.website version: 12/02/18 Monday 5th February 2018 Peter J.C. Dickinson

More information

Linear Algebra Review

Linear Algebra Review Chapter 1 Linear Algebra Review It is assumed that you have had a course in linear algebra, and are familiar with matrix multiplication, eigenvectors, etc. I will review some of these terms here, but quite

More information

Solving linear systems (6 lectures)

Solving linear systems (6 lectures) Chapter 2 Solving linear systems (6 lectures) 2.1 Solving linear systems: LU factorization (1 lectures) Reference: [Trefethen, Bau III] Lecture 20, 21 How do you solve Ax = b? (2.1.1) In numerical linear

More information

Definition of Equality of Matrices. Example 1: Equality of Matrices. Consider the four matrices

Definition of Equality of Matrices. Example 1: Equality of Matrices. Consider the four matrices IT 131: Mathematics for Science Lecture Notes 3 Source: Larson, Edwards, Falvo (2009): Elementary Linear Algebra, Sixth Edition. Matrices 2.1 Operations with Matrices This section and the next introduce

More information

MATH 2331 Linear Algebra. Section 1.1 Systems of Linear Equations. Finding the solution to a set of two equations in two variables: Example 1: Solve:

MATH 2331 Linear Algebra. Section 1.1 Systems of Linear Equations. Finding the solution to a set of two equations in two variables: Example 1: Solve: MATH 2331 Linear Algebra Section 1.1 Systems of Linear Equations Finding the solution to a set of two equations in two variables: Example 1: Solve: x x = 3 1 2 2x + 4x = 12 1 2 Geometric meaning: Do these

More information

Parallel Numerical Algorithms

Parallel Numerical Algorithms Parallel Numerical Algorithms Chapter 13 Prof. Michael T. Heath Department of Computer Science University of Illinois at Urbana-Champaign CS 554 / CSE 512 Michael T. Heath Parallel Numerical Algorithms

More information

INF2270 Spring Philipp Häfliger. Lecture 8: Superscalar CPUs, Course Summary/Repetition (1/2)

INF2270 Spring Philipp Häfliger. Lecture 8: Superscalar CPUs, Course Summary/Repetition (1/2) INF2270 Spring 2010 Philipp Häfliger Summary/Repetition (1/2) content From Scalar to Superscalar Lecture Summary and Brief Repetition Binary numbers Boolean Algebra Combinational Logic Circuits Encoder/Decoder

More information

MATRICES. a m,1 a m,n A =

MATRICES. a m,1 a m,n A = MATRICES Matrices are rectangular arrays of real or complex numbers With them, we define arithmetic operations that are generalizations of those for real and complex numbers The general form a matrix of

More information

Matrices. Chapter Definitions and Notations

Matrices. Chapter Definitions and Notations Chapter 3 Matrices 3. Definitions and Notations Matrices are yet another mathematical object. Learning about matrices means learning what they are, how they are represented, the types of operations which

More information

Design of Distributed Systems Melinda Tóth, Zoltán Horváth

Design of Distributed Systems Melinda Tóth, Zoltán Horváth Design of Distributed Systems Melinda Tóth, Zoltán Horváth Design of Distributed Systems Melinda Tóth, Zoltán Horváth Publication date 2014 Copyright 2014 Melinda Tóth, Zoltán Horváth Supported by TÁMOP-412A/1-11/1-2011-0052

More information

Section Summary. Sequences. Recurrence Relations. Summations Special Integer Sequences (optional)

Section Summary. Sequences. Recurrence Relations. Summations Special Integer Sequences (optional) Section 2.4 Section Summary Sequences. o Examples: Geometric Progression, Arithmetic Progression Recurrence Relations o Example: Fibonacci Sequence Summations Special Integer Sequences (optional) Sequences

More information

Hermite normal form: Computation and applications

Hermite normal form: Computation and applications Integer Points in Polyhedra Gennady Shmonin Hermite normal form: Computation and applications February 24, 2009 1 Uniqueness of Hermite normal form In the last lecture, we showed that if B is a rational

More information

DEFINITION OF COLLAPSIBLE GRAPHS

DEFINITION OF COLLAPSIBLE GRAPHS Appendix A DEFINITION OF COLLAPSIBLE GRAPHS In the following discussion, i, j, and k are nodes. The indegree of a node i, which will be called indeg (i), is the number of incoming arcs. The outdegree of

More information

Vector Lane Threading

Vector Lane Threading Vector Lane Threading S. Rivoire, R. Schultz, T. Okuda, C. Kozyrakis Computer Systems Laboratory Stanford University Motivation Vector processors excel at data-level parallelism (DLP) What happens to program

More information

Pipelined Computations

Pipelined Computations Chapter 5 Slide 155 Pipelined Computations Pipelined Computations Slide 156 Problem divided into a series of tasks that have to be completed one after the other (the basis of sequential programming). Each

More information

ELEMENTARY LINEAR ALGEBRA

ELEMENTARY LINEAR ALGEBRA ELEMENTARY LINEAR ALGEBRA K R MATTHEWS DEPARTMENT OF MATHEMATICS UNIVERSITY OF QUEENSLAND First Printing, 99 Chapter LINEAR EQUATIONS Introduction to linear equations A linear equation in n unknowns x,

More information

Computational Linear Algebra

Computational Linear Algebra Computational Linear Algebra PD Dr. rer. nat. habil. Ralf Peter Mundani Computation in Engineering / BGU Scientific Computing in Computer Science / INF Winter Term 2017/18 Part 2: Direct Methods PD Dr.

More information

Lemma 8: Suppose the N by N matrix A has the following block upper triangular form:

Lemma 8: Suppose the N by N matrix A has the following block upper triangular form: 17 4 Determinants and the Inverse of a Square Matrix In this section, we are going to use our knowledge of determinants and their properties to derive an explicit formula for the inverse of a square matrix

More information

Lecture 9: Numerical Linear Algebra Primer (February 11st)

Lecture 9: Numerical Linear Algebra Primer (February 11st) 10-725/36-725: Convex Optimization Spring 2015 Lecture 9: Numerical Linear Algebra Primer (February 11st) Lecturer: Ryan Tibshirani Scribes: Avinash Siravuru, Guofan Wu, Maosheng Liu Note: LaTeX template

More information

Model Checking: An Introduction

Model Checking: An Introduction Model Checking: An Introduction Meeting 3, CSCI 5535, Spring 2013 Announcements Homework 0 ( Preliminaries ) out, due Friday Saturday This Week Dive into research motivating CSCI 5535 Next Week Begin foundations

More information

5.7 Cramer's Rule 1. Using Determinants to Solve Systems Assumes the system of two equations in two unknowns

5.7 Cramer's Rule 1. Using Determinants to Solve Systems Assumes the system of two equations in two unknowns 5.7 Cramer's Rule 1. Using Determinants to Solve Systems Assumes the system of two equations in two unknowns (1) possesses the solution and provided that.. The numerators and denominators are recognized

More information

The simplex algorithm

The simplex algorithm The simplex algorithm The simplex algorithm is the classical method for solving linear programs. Its running time is not polynomial in the worst case. It does yield insight into linear programs, however,

More information

Fundamental Algorithms

Fundamental Algorithms Chapter 2: Sorting, Winter 2018/19 1 Fundamental Algorithms Chapter 2: Sorting Jan Křetínský Winter 2018/19 Chapter 2: Sorting, Winter 2018/19 2 Part I Simple Sorts Chapter 2: Sorting, Winter 2018/19 3

More information

CS-206 Concurrency. Lecture 13. Wrap Up. Spring 2015 Prof. Babak Falsafi parsa.epfl.ch/courses/cs206/

CS-206 Concurrency. Lecture 13. Wrap Up. Spring 2015 Prof. Babak Falsafi parsa.epfl.ch/courses/cs206/ CS-206 Concurrency Lecture 13 Wrap Up Spring 2015 Prof. Babak Falsafi parsa.epfl.ch/courses/cs206/ Created by Nooshin Mirzadeh, Georgios Psaropoulos and Babak Falsafi EPFL Copyright 2015 EPFL CS-206 Spring

More information

Fundamental Algorithms

Fundamental Algorithms Fundamental Algorithms Chapter 2: Sorting Harald Räcke Winter 2015/16 Chapter 2: Sorting, Winter 2015/16 1 Part I Simple Sorts Chapter 2: Sorting, Winter 2015/16 2 The Sorting Problem Definition Sorting

More information

COMP 633: Parallel Computing Fall 2018 Written Assignment 1: Sample Solutions

COMP 633: Parallel Computing Fall 2018 Written Assignment 1: Sample Solutions COMP 633: Parallel Computing Fall 2018 Written Assignment 1: Sample Solutions September 12, 2018 I. The Work-Time W-T presentation of EREW sequence reduction Algorithm 2 in the PRAM handout has work complexity

More information

SOLVING LINEAR SYSTEMS

SOLVING LINEAR SYSTEMS SOLVING LINEAR SYSTEMS We want to solve the linear system a, x + + a,n x n = b a n, x + + a n,n x n = b n This will be done by the method used in beginning algebra, by successively eliminating unknowns

More information

Algorithms PART II: Partitioning and Divide & Conquer. HPC Fall 2007 Prof. Robert van Engelen

Algorithms PART II: Partitioning and Divide & Conquer. HPC Fall 2007 Prof. Robert van Engelen Algorithms PART II: Partitioning and Divide & Conquer HPC Fall 2007 Prof. Robert van Engelen Overview Partitioning strategies Divide and conquer strategies Further reading HPC Fall 2007 2 Partitioning

More information

Projection in Logic, CP, and Optimization

Projection in Logic, CP, and Optimization Projection in Logic, CP, and Optimization John Hooker Carnegie Mellon University Workshop on Logic and Search Melbourne, 2017 Projection as a Unifying Concept Projection is a fundamental concept in logic,

More information

Conventional Matrix Operations

Conventional Matrix Operations Overview: Graphs & Linear Algebra Peter M. Kogge Material based heavily on the Class Book Graph Theory with Applications by Deo and Graphs in the Language of Linear Algebra: Applications, Software, and

More information

Model Order Reduction via Matlab Parallel Computing Toolbox. Istanbul Technical University

Model Order Reduction via Matlab Parallel Computing Toolbox. Istanbul Technical University Model Order Reduction via Matlab Parallel Computing Toolbox E. Fatih Yetkin & Hasan Dağ Istanbul Technical University Computational Science & Engineering Department September 21, 2009 E. Fatih Yetkin (Istanbul

More information

Abstractions and Decision Procedures for Effective Software Model Checking

Abstractions and Decision Procedures for Effective Software Model Checking Abstractions and Decision Procedures for Effective Software Model Checking Prof. Natasha Sharygina The University of Lugano, Carnegie Mellon University Microsoft Summer School, Moscow, July 2011 Lecture

More information