Loop Scheduling and Software Pipelining \course\cpeg421-08s\topic-7.ppt 1
|
|
- Gillian Watts
- 5 years ago
- Views:
Transcription
1 Loop Scheduling and Software Pipelining \course\cpeg421-08s\topic-7.ppt 1
2 Reading List Slides: Topic 7 and 7a Other papers as assigned in class or homework: \course\cpeg421-08s\topic-7.ppt 2
3 ABET Outcome Ability to apply knowledge of basic code generation techniques, e.g. Loop scheduling e.g. software pipelining techniques to solve code generation problems. An ability to identify, formulate and solve loops scheduling problems using software pipelining techniques Ability to analyze the basic algorithms on the above techniques and conduct experiments to show their effectiveness. Ability to use a modern compiler development platform and tools for the practice of above. A Knowledge on contemporary issues on this topic \course\cpeg421-08s\topic-7.ppt 3
4 Outline Brief overview Problem formulation of the modulo scheduling problem Solution methods Summary \course\cpeg421-08s\topic-7.ppt 4
5 General Compiler Framework Source ME Inter-Procedural Optimization (IPA) Loop Nest Optimization (LNO) Global Optimization (OPT) Innermost Loop scheduling BE/CG Global inst scheduling Reg alloc Local inst scheduling Arch Models Good IPO Good LNO Good global optimization Good integration of IPO/LNO/OPT Smooth information passing between FE and CG Complete and flexible support of inner-loop scheduling (SWP), instruction scheduling and register allocation Executable \course\cpeg421-08s\topic-7.ppt 5
6 Questions? How to formulate the loop scheduling problem? How to model it? How to solve it? \course\cpeg421-08s\topic-7.ppt 6
7 Questions (cont d) Instruction scheduling for code without loops a review Dependence graphs may become cyclic! So, critical path length is less obvious! Is it becoming harder!(?) What new insights are required to formulate and solve it? \course\cpeg421-08s\topic-7.ppt 7
8 Challenges of Loop Scheduling A DDG With Cycles Right strategy? \course\cpeg421-08s\topic-7.ppt 8
9 Observations Execution of good loops tend to be regular and repetitive - a pattern may appear This gives cyclic scheduling problem a new twist! How to efficiently derive a pattern? \course\cpeg421-08s\topic-7.ppt 9
10 Problem Formulation (I) Given a weighted dependence graph, derive a schedule which is time-optimal under a machine model M. Def: A schedule S of a loop L is time-optimal if among all legal schedules of L, no other schedule that is faster than S. Note: There may be more than one time-optimal schedule \course\cpeg421-08s\topic-7.ppt 10
11 A Short Tour on Data Dependence Graphs for Loops \course\cpeg421-08s\topic-7.ppt 11
12 Basic Concept and Motivation Data dependence between 2 accesses The same memory location Exist an execution path between them One of them is a write Three types of data dependence Dependence graphs Things are not simple when dealing with loops \course\cpeg421-08s\topic-7.ppt 12
13 Types of Data Dependence Flow dependence X = = X... δ Anti-dependence X = = X δ -1 Output dependence X = X =... 0 δ \course\cpeg421-08s\topic-7.ppt 13
14 Data Dependence Example 1: S1: A = 0 S2: B = A S3: C = A + D S4: D = 2 S1 S2 S3 S4 S x δ S y S y depends on S x \course\cpeg421-08s\topic-7.ppt 14
15 Data Dependence Con d Example 2: S1: A = 0 S2: B = A S3: A = B + 1 S4: C = A S1 S2 S3 S4 S 1 S 2 δ δ 0 S 3 : Output-dep -1 S 3 : anti-dep \course\cpeg421-08s\topic-7.ppt 15
16 Should we consider input dependence? = X = X Is the reading of the same X important? Well, it may be! (if we intend to group the 2 reads together for cache optimization!) \course\cpeg421-08s\topic-7.ppt 16
17 Subscript Variables Extension of def-use chains to employ a more precise treatment of arrays, especially in iterative loops. DO I = 1, N A(I + 1) = X(I + 1) + B(I) X(I) = A(I) * 5 ENDDO \course\cpeg421-08s\topic-7.ppt 17
18 Dependence Graph Con d Applications - register allocation - instruction scheduling - loop scheduling - vectorization - parallelization - memory hierarchy optimization \course\cpeg421-08s\topic-7.ppt 18
19 Data Dependence in Loops An Example Find the dependence relations due to the array X in the program below: (1) for I = 2 to 9 do (2) X[I] = Y[I] + Z[I] (3) A[I] = X[I-1] + 1 (4) end for Solution To find the data dependence relations in a simple loop, we can unroll the loop and see which statement instances depend on which others: I = 2 I = 3 I = 4 (2) X[2]=Y[2]+Z[2] (3) A[2]=X[1]+1 X[3] =Y[3]+Z[3] A[3] =X[2]+1 X[4]=Y[4]+Z[4] A[4]=X[3] \course\cpeg421-08s\topic-7.ppt 19
20 Data Dependence in Loops Con d In our example, there is a loop-carried, lexically forward flow dependence relation. S 2 S 3 (1) Dependence distance = 1 Data dependence graph for statements in a loop. - Loop-carried vs loop-independent - Lexical-forward vs lexical backward \course\cpeg421-08s\topic-7.ppt 20
21 An Example for i = 0 to N - 1 do a: a[i] = a[i - 1] + R[i]; b: b[i] = a[i] + c[i - 1]; c: c[i] = b[i] + 1; a b time i = 0 i = 1 i = 2 0 a end; 1 b Note: We use a token here to represent a flow dependence of distance = 1 II time 2 c a 3 b 4 c a So, iteration interval II = 2 iterations i = 3... c Assume each operation takes 1 cycle and there is only one addition unit! \course\cpeg421-08s\topic-7.ppt 21
22 Software Pipeline: Concept Software Pipeline is a technique that reduces the execution time of important loops by interweaving operations from many iterations to optimize the use of resources. stf fadds ldf cmp bg sub time \course\cpeg421-08s\topic-7.ppt 22
23 The Structrure of the SWP code % prologue a[0] = a[-1] + R[0]; % pattern for i = 0 to N-2 do b[i] = a[i] + c[i -1]; a[i + 1] = a[i] + R[i + 1]; c[i] = b[i] + 1; end; % epilogue b[n - 1] = a[n - 1] + c[n - 2]; c[n - 1] = b[n - 1] + 1 prologue b i, c i, a i+1 epilogue \course\cpeg421-08s\topic-7.ppt 23
24 Software Pipeline (Cont d) What limits the speed of a loop? Data dependencies: recurrence initiation interval (rec_mii) Processor resources: resource initiation interval (res_mii) Initiation interval stf fadds ldf cmp bg sub time \course\cpeg421-08s\topic-7.ppt 24
25 Previous Approaches Approach I (Operational): Emulate the loop execution under the machine model and a pattern will eventually occur [AikenNic88, EbciogluNic89, GaoEtAl91] Approach II (Periodic scheduling): Specify the scheduling problem into a periodical scheduling problem and find optimal solution [Lam88, RauEtAl81,GovindAltmanGao94] \course\cpeg421-08s\topic-7.ppt 25
26 Periodic Schedule (Modulo Scheduling) The time (cycle) when the I-th instance of the operation v is scheduled : t(i, v) = T * i+ A[v] where T = II so t(i + 1, v) -t(i, v) = T*(i + 1) T*(i) = T For our example: t(i, v) = 2i + A[v] where A(a) = 1 A(b) = 0 A(c) = 1 Question: Is this an optimal schedule? \course\cpeg421-08s\topic-7.ppt 26
27 Periodic Schedule (cont d) Yes, the schedule t(i, v) = 2i + A[v] is time-optimal! With II = \course\cpeg421-08s\topic-7.ppt 27
28 Restate the problem Given a DDG of a loop L, how to determine the fastest computation rate of L -- also called minimum initiation interval (MII)? Hint: Consider the Critical Cycles as well as critical resource usage in L \course\cpeg421-08s\topic-7.ppt 28
29 Recurrence MII -- RecMII An Example \course\cpeg421-08s\topic-7.ppt 29
30 An Example (Revisit) : RecMII for i = 0 to N - 1 do a: a[i] = a[i - 1] + R[i]; b: b[i] = a[i] + c[i - 1]; c: c[i] = b[i] + 1; end; a b II time time i = 0 i = 1 i = 2 0 a 1 b 2 c a 3 b 4 c a So, RecMII = 2 iterations i = 3... c Assume each operation takes 1 cycle and there is only one addition unit! \course\cpeg421-08s\topic-7.ppt 30
31 Hint: now one must think about deadlines. Maximum Computation Rate Theorem: The maximum computation rate of a loop is bounded by the following ratio r opt (C) = min{ Dc/Wc} where C is a dependence cycle, Dc is the total dependence distance along C and Wc is total execution time of C. i.e. Dc Wc = c = c di wi (d i is the dependence distance along the edge i in C) (w i is the edge weight along the edge i in C) \course\cpeg421-08s\topic-7.ppt 31
32 RecMII (Contn d) And the optimal period MII = T opt = 1/r opt Def: A cycle is critical if the period of the cycle equal to MII (We should write it as RecMII) Note: a loop may have multiple critical cycles! \course\cpeg421-08s\topic-7.ppt 32
33 An Example (Revisit) : RecMII for i = 0 to N - 1 do a: a[i] = a[i - 1] + R[i]; b: b[i] = a[i] + c[i - 1]; c: c[i] = b[i] + 1; end; a b II time time i = 0 i = 1 i = 2 0 a 1 b 2 c a 3 b 4 c a So, RecMII = 2 iterations i = 3... c Assume each operation takes 1 cycle and there is only one addition unit! \course\cpeg421-08s\topic-7.ppt 33
34 How About Machine Resource and Register Constraints? Numbers and types of FUs, etc, (ResMII) Number of registers \course\cpeg421-08s\topic-7.ppt 34
35 Software Pipelining Review of previous work [RauFisher93] [Rau94] Minimum initiation interval [MII] MII = max {RecMII, ResMII} where RecMII: determined by critical recurrence cycles in DDG ResMII: determined by resource constraints (#s and types of FUs) \course\cpeg421-08s\topic-7.ppt 35
36 Consider a simple example of n nodes and 1 FU -- what is ResMII? The Resource Constraint MII (ResMII) Can be calculated by totaling, for each resource, the usage requirement imposed by one iteration of the loop can be done by bin-packing a resource reservation table (expensive) usually derive a lower-bound is enough as a starting point to begin the search process More price with complex hardware pipelines? \course\cpeg421-08s\topic-7.ppt 36
37 Example 1: Reservation Table - An Example 1 Stage X X X X X 2 3 X (a) a Pipeline M (b) The Reservation Table of M \course\cpeg421-08s\topic-7.ppt 37
38 A Quiz? What is the RT of a fully pipelined adder with 3 stages? How about an unpipelined adder? How to computer ResMII for each case? \course\cpeg421-08s\topic-7.ppt 38
39 Modulo Reservation Table The modulo reservation table only has length = II Each entry in the table records a sequence of reservations corresponds a sequence of slots every II cycles Hints: think about your weekly calendar \course\cpeg421-08s\topic-7.ppt 39
40 Modulo Scheduling MII II = II + 1 Choose Next Operation X failed try to place x in a time slot i in II A legal schedule is found succeed Remove from L is L =<>? No \course\cpeg421-08s\topic-7.ppt 40
41 Heuristic Method for Modulo Scheduling Why a simple variant list scheduling may not work? Hint: consider the deadline constraints of operations in a cycle \course\cpeg421-08s\topic-7.ppt 41
42 Counter Example I: List Scheduling May Fail! (a) [1] A 4 C 2 B 2 D 4 (b) MII = RecMII = 4 Note: if simple list scheduling is used B cannot be scheduled due to the deadline set by scheduling C: deadlock! A B C D (c) MII = RecMII = 4 Note: we cannot fire C as early as possible! \course\cpeg421-08s\topic-7.ppt A B C D
43 Example I (Cont d) In previous figure, We show an example demonstrating the problem with greedy scheduling in the presence of recurrences. (a) The data dependence graph with a cycle. (b) The resulting partial schedule when C is scheduled greedily. B cannot be scheduled. (c) The resulting valid schedule when C is not scheduled greedyly delayed to two cycles later \course\cpeg421-08s\topic-7.ppt 43
44 Example 2: Problems with Greedy schedule due to ResMII A1 C2 A3 A4 M M Adder Mult Bus A1 M6 0 1 A4 C2 2 A Adder Mult Bus A1 M6 C2 A4 A3 M5 ResMII = 6 ResMII = 6 (a) DDG A: non-pipelined adders M: non-pipelined multipliers (b) A greedy schedule which (c) a non-greedy schedule cannot schedule A4 with which achieves ResMII ResMII \course\cpeg421-08s\topic-7.ppt 44
45 Example 2 (Cont d) In previous figure, We show an example demonstrating the problem with greedy scheduling in the presence of complex reservation tables. (a) The data dependence graph without cycle. (b) The resulting partial schedule when A1, M6, C2, and A3 are scheduled greedily. A4 cannot be scheduled. (c) The resulting valid schedule when A3 is scheduled one cycle later \course\cpeg421-08s\topic-7.ppt 45
46 Example 3: infeasibility of MII Con d A1 2 M2 3 MA3 5 Adder Mult Bus Adder Mult Bus Adder Mult Bus Note: ResMII = 2. But is there a legal schedule under II = 2? Note: The presence of complex reservation tables \course\cpeg421-08s\topic-7.ppt 46 (a)
47 Example 3 (cont d) Adder Mult Bus A1 M2 M2 Adder Mult Bus A1 M2 A1 MA3 A1 MA3 M2 MII = 2 II = 2 (b) cannot fit MA3 under II=2 MII = 2 II = 3 (c) A feasible schedule for II=3 Note: It is possible that there is no valid schedule at MII! \course\cpeg421-08s\topic-7.ppt 47
48 Infeasibility of MII Con d The previous slide shows an example demonstrating the infeasibility of the MII in the presence of complex reservation tables. (a) The three operations and their reservation tables. (b) The MRT corresponding to the dead-end partial schedule for an II of 2 after A1 and M2 have been scheduled. (c) The MRT corresponding to a valid schedule for an II of \course\cpeg421-08s\topic-7.ppt 48
49 Example 4: Infeasibility of MII Assume: fully pipelined adders A1 A2 2 2 A3 2 Adder Mult Bus 2 A3 2 [2] A4 2 2 A4 2 ResMII = 4 RecMII = 4 (a) Note: It is possible that there is no valid schedule at MII! And, the reservation table here is simple! \course\cpeg421-08s\topic-7.ppt 49 (b)
50 Example 4 (cont d) The previous slides shows an example demonstrating the infeasibility of the MII due to the interaction between the recurrence constraints and the resource usage constraints. (a) The data dependence graph. (b) The MRT corresponding to the dead-end partial schedule for an II of 4 after A1 and A2 have been scheduled \course\cpeg421-08s\topic-7.ppt 50
51 How to derive a best feasible schedule? It is possible do so via exhaustive search. But, it is expensive! \course\cpeg421-08s\topic-7.ppt 51
52 Software Pipelining A Taxonomy of Software Pipelining Operational Approach Periodic Scheduling (Modulo Scheduling) FSA Based Method (Model hardware pipeline with sharing and hazards) Heuristic (Aiken88, AikenNic88, Ebcioglu89, etc.) Formal Model (GaoWonNin91) PLDI'91 In-Exact Method (Heuristic) (RauGla81, Lam88, RauEtAI92, Huff93, DehnertTow93, Rau94, WanEis93, StouchininEtAI00) Basic Formulation (DongenGao92) CONPAR'92 Register Optimal (Ninggao91, NingGao93, Ning93) POPL'93 Exact Method ILP based Resource Constrained (GovindAITGao94) Micro27 "Showdown" (RuttenbergGao StouchininWoody96) PLDI'96 Exhaustive Search EuroPar'96 FSA Co-Scheduling formulation (GovindAltmanGao'96) FSA Construction Method (GovindAltmanGao98) FSA Heuristic/optimization (ZhangGovindRyanGao99) Theory of Co-Scheduling (GovindAltmanGao00) Resource & Register (GovindAITGao95, Altman95, PLDI'95 EichenbergerDav95) (Altman95) \course\cpeg421-08s\topic-7.ppt 52
53 Advanced Topics Consider register constraints Realistic pipeline architecture constraints (with structural hazards) Loop body with conditionals Multi-Dimensional Loops \course\cpeg421-08s\topic-7.ppt 53
Announcements PA2 due Friday Midterm is Wednesday next week, in class, one week from today
Loop Transformations Announcements PA2 due Friday Midterm is Wednesday next week, in class, one week from today Today Recall stencil computations Intro to loop transformations Data dependencies between
More informationLoop Interchange. Loop Transformations. Taxonomy. do I = 1, N do J = 1, N S 1 A(I,J) = A(I-1,J) + 1 enddo enddo. Loop unrolling.
Advanced Topics Which Loops are Parallel? review Optimization for parallel machines and memory hierarchies Last Time Dependence analysis Today Loop transformations An example - McKinley, Carr, Tseng loop
More informationFortran program + Partial data layout specifications Data Layout Assistant.. regular problems. dynamic remapping allowed Invoked only a few times Not part of the compiler Can use expensive techniques HPF
More informationAdvanced Restructuring Compilers. Advanced Topics Spring 2009 Prof. Robert van Engelen
Advanced Restructuring Compilers Advanced Topics Spring 2009 Prof. Robert van Engelen Overview Data and control dependences The theory and practice of data dependence analysis K-level loop-carried dependences
More informationSaturday, April 23, Dependence Analysis
Dependence Analysis Motivating question Can the loops on the right be run in parallel? i.e., can different processors run different iterations in parallel? What needs to be true for a loop to be parallelizable?
More informationNCU EE -- DSP VLSI Design. Tsung-Han Tsai 1
NCU EE -- DSP VLSI Design. Tsung-Han Tsai 1 Multi-processor vs. Multi-computer architecture µp vs. DSP RISC vs. DSP RISC Reduced-instruction-set Register-to-register operation Higher throughput by using
More informationDSP Design Lecture 7. Unfolding cont. & Folding. Dr. Fredrik Edman.
SP esign Lecture 7 Unfolding cont. & Folding r. Fredrik Edman fredrik.edman@eit.lth.se Unfolding Unfolding creates a program with more than one iteration, J=unfolding factor Unfolding is a structured way
More informationTopic-I-C Dataflow Analysis
Topic-I-C Dataflow Analysis 2012/3/2 \course\cpeg421-08s\topic4-a.ppt 1 Global Dataflow Analysis Motivation We need to know variable def and use information between basic blocks for: constant folding dead-code
More informationDependence Analysis. Dependence Examples. Last Time: Brief introduction to interprocedural analysis. do I = 2, 100 A(I) = A(I-1) + 1 enddo
Dependence Analysis Dependence Examples Last Time: Brief introduction to interprocedural analysis Today: Optimization for parallel machines and memory hierarchies Dependence analysis Loop transformations
More informationINF2270 Spring Philipp Häfliger. Lecture 8: Superscalar CPUs, Course Summary/Repetition (1/2)
INF2270 Spring 2010 Philipp Häfliger Summary/Repetition (1/2) content From Scalar to Superscalar Lecture Summary and Brief Repetition Binary numbers Boolean Algebra Combinational Logic Circuits Encoder/Decoder
More informationWorst-Case Execution Time Analysis. LS 12, TU Dortmund
Worst-Case Execution Time Analysis Prof. Dr. Jian-Jia Chen LS 12, TU Dortmund 02, 03 May 2016 Prof. Dr. Jian-Jia Chen (LS 12, TU Dortmund) 1 / 53 Most Essential Assumptions for Real-Time Systems Upper
More informationEmbedded Systems Design: Optimization Challenges. Paul Pop Embedded Systems Lab (ESLAB) Linköping University, Sweden
of /4 4 Embedded Systems Design: Optimization Challenges Paul Pop Embedded Systems Lab (ESLAB) Linköping University, Sweden Outline! Embedded systems " Example area: automotive electronics " Embedded systems
More informationPipelining. Traditional Execution. CS 365 Lecture 12 Prof. Yih Huang. add ld beq CS CS 365 2
Pipelining CS 365 Lecture 12 Prof. Yih Huang CS 365 1 Traditional Execution 1 2 3 4 1 2 3 4 5 1 2 3 add ld beq CS 365 2 1 Pipelined Execution 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5
More informationLoop Parallelization Techniques and dependence analysis
Loop Parallelization Techniques and dependence analysis Data-Dependence Analysis Dependence-Removing Techniques Parallelizing Transformations Performance-enchancing Techniques 1 When can we run code in
More informationExploring the Potential of Instruction-Level Parallelism of Exposed Datapath Architectures with Buffered Processing Units
Exploring the Potential of Instruction-Level Parallelism of Exposed Datapath Architectures with Buffered Processing Units Anoop Bhagyanath and Klaus Schneider Embedded Systems Chair University of Kaiserslautern
More informationEnergy-efficient scheduling
Energy-efficient scheduling Guillaume Aupy 1, Anne Benoit 1,2, Paul Renaud-Goud 1 and Yves Robert 1,2,3 1. Ecole Normale Supérieure de Lyon, France 2. Institut Universitaire de France 3. University of
More informationCSCI-564 Advanced Computer Architecture
CSCI-564 Advanced Computer Architecture Lecture 8: Handling Exceptions and Interrupts / Superscalar Bo Wu Colorado School of Mines Branch Delay Slots (expose control hazard to software) Change the ISA
More informationCompiler Optimisation
Compiler Optimisation 8 Dependence Analysis Hugh Leather IF 1.18a hleather@inf.ed.ac.uk Institute for Computing Systems Architecture School of Informatics University of Edinburgh 2018 Introduction This
More informationThe Data-Dependence graph of Adjoint Codes
The Data-Dependence graph of Adjoint Codes Laurent Hascoët INRIA Sophia-Antipolis November 19th, 2012 (INRIA Sophia-Antipolis) Data-Deps of Adjoints 19/11/2012 1 / 14 Data-Dependence graph of this talk
More informationMultiprocessor Scheduling I: Partitioned Scheduling. LS 12, TU Dortmund
Multiprocessor Scheduling I: Partitioned Scheduling Prof. Dr. Jian-Jia Chen LS 12, TU Dortmund 22/23, June, 2015 Prof. Dr. Jian-Jia Chen (LS 12, TU Dortmund) 1 / 47 Outline Introduction to Multiprocessor
More informationENEE350 Lecture Notes-Weeks 14 and 15
Pipelining & Amdahl s Law ENEE350 Lecture Notes-Weeks 14 and 15 Pipelining is a method of processing in which a problem is divided into a number of sub problems and solved and the solu8ons of the sub problems
More informationChapter 5. Digital Design and Computer Architecture, 2 nd Edition. David Money Harris and Sarah L. Harris. Chapter 5 <1>
Chapter 5 Digital Design and Computer Architecture, 2 nd Edition David Money Harris and Sarah L. Harris Chapter 5 Chapter 5 :: Topics Introduction Arithmetic Circuits umber Systems Sequential Building
More informationSpecial Nodes for Interface
fi fi Special Nodes for Interface SW on processors Chip-level HW Board-level HW fi fi C code VHDL VHDL code retargetable compilation high-level synthesis SW costs HW costs partitioning (solve ILP) cluster
More informationPerformance, Power & Energy. ELEC8106/ELEC6102 Spring 2010 Hayden Kwok-Hay So
Performance, Power & Energy ELEC8106/ELEC6102 Spring 2010 Hayden Kwok-Hay So Recall: Goal of this class Performance Reconfiguration Power/ Energy H. So, Sp10 Lecture 3 - ELEC8106/6102 2 PERFORMANCE EVALUATION
More informationData Structures in Java
Data Structures in Java Lecture 21: Introduction to NP-Completeness 12/9/2015 Daniel Bauer Algorithms and Problem Solving Purpose of algorithms: find solutions to problems. Data Structures provide ways
More informationCS 700: Quantitative Methods & Experimental Design in Computer Science
CS 700: Quantitative Methods & Experimental Design in Computer Science Sanjeev Setia Dept of Computer Science George Mason University Logistics Grade: 35% project, 25% Homework assignments 20% midterm,
More informationEvaluation and Validation
Evaluation and Validation Jian-Jia Chen (Slides are based on Peter Marwedel) TU Dortmund, Informatik 12 Germany Springer, 2010 2016 年 01 月 05 日 These slides use Microsoft clip arts. Microsoft copyright
More informationIntroduction to Computer Science and Programming for Astronomers
Introduction to Computer Science and Programming for Astronomers Lecture 8. István Szapudi Institute for Astronomy University of Hawaii March 7, 2018 Outline Reminder 1 Reminder 2 3 4 Reminder We have
More informationOutline. policies for the first part. with some potential answers... MCS 260 Lecture 10.0 Introduction to Computer Science Jan Verschelde, 9 July 2014
Outline 1 midterm exam on Friday 11 July 2014 policies for the first part 2 questions with some potential answers... MCS 260 Lecture 10.0 Introduction to Computer Science Jan Verschelde, 9 July 2014 Intro
More informationCOVER SHEET: Problem#: Points
EEL 4712 Midterm 3 Spring 2017 VERSION 1 Name: UFID: Sign here to give permission for your test to be returned in class, where others might see your score: IMPORTANT: Please be neat and write (or draw)
More informationLecture 10: Big-Oh. Doina Precup With many thanks to Prakash Panagaden and Mathieu Blanchette. January 27, 2014
Lecture 10: Big-Oh Doina Precup With many thanks to Prakash Panagaden and Mathieu Blanchette January 27, 2014 So far we have talked about O() informally, as a way of capturing the worst-case computation
More informationRegister Allocation. Maryam Siahbani CMPT 379 4/5/2016 1
Register Allocation Maryam Siahbani CMPT 379 4/5/2016 1 Register Allocation Intermediate code uses unlimited temporaries Simplifying code generation and optimization Complicates final translation to assembly
More informationVLSI Signal Processing
VLSI Signal Processing Lecture 1 Pipelining & Retiming ADSP Lecture1 - Pipelining & Retiming (cwliu@twins.ee.nctu.edu.tw) 1-1 Introduction DSP System Real time requirement Data driven synchronized by data
More informationClasses of data dependence. Dependence analysis. Flow dependence (True dependence)
Dependence analysis Classes of data dependence Pattern matching and replacement is all that is needed to apply many source-to-source transformations. For example, pattern matching can be used to determine
More informationDependence analysis. However, it is often necessary to gather additional information to determine the correctness of a particular transformation.
Dependence analysis Pattern matching and replacement is all that is needed to apply many source-to-source transformations. For example, pattern matching can be used to determine that the recursion removal
More informationAdvanced Compiler Construction
CS 526 Advanced Compiler Construction http://misailo.cs.illinois.edu/courses/cs526 DEPENDENCE ANALYSIS The slides adapted from Vikram Adve and David Padua Kinds of Data Dependence Direct Dependence X =
More informationTopic 17. Analysis of Algorithms
Topic 17 Analysis of Algorithms Analysis of Algorithms- Review Efficiency of an algorithm can be measured in terms of : Time complexity: a measure of the amount of time required to execute an algorithm
More informationEE382V: System-on-a-Chip (SoC) Design
EE82V: SystemonChip (SoC) Design Lecture EE82V: SystemonaChip (SoC) Design Lecture Operation Scheduling Source: G. De Micheli, Integrated Systems Center, EPFL Synthesis and Optimization of Digital Circuits,
More informationDivisible Load Scheduling
Divisible Load Scheduling Henri Casanova 1,2 1 Associate Professor Department of Information and Computer Science University of Hawai i at Manoa, U.S.A. 2 Visiting Associate Professor National Institute
More informationReal-Time Workload Models with Efficient Analysis
Real-Time Workload Models with Efficient Analysis Advanced Course, 3 Lectures, September 2014 Martin Stigge Uppsala University, Sweden Fahrplan 1 DRT Tasks in the Model Hierarchy Liu and Layland and Sporadic
More informationEE382V-ICS: System-on-a-Chip (SoC) Design
EE8VICS: SystemonChip (SoC) EE8VICS: SystemonaChip (SoC) Scheduling Source: G. De Micheli, Integrated Systems Center, EPFL Synthesis and Optimization of Digital Circuits, McGraw Hill, 00. Additional sources:
More informationAlgorithms. NP -Complete Problems. Dong Kyue Kim Hanyang University
Algorithms NP -Complete Problems Dong Kyue Kim Hanyang University dqkim@hanyang.ac.kr The Class P Definition 13.2 Polynomially bounded An algorithm is said to be polynomially bounded if its worst-case
More informationICS 233 Computer Architecture & Assembly Language
ICS 233 Computer Architecture & Assembly Language Assignment 6 Solution 1. Identify all of the RAW data dependencies in the following code. Which dependencies are data hazards that will be resolved by
More informationAnalysis of Algorithms [Reading: CLRS 2.2, 3] Laura Toma, csci2200, Bowdoin College
Analysis of Algorithms [Reading: CLRS 2.2, 3] Laura Toma, csci2200, Bowdoin College Why analysis? We want to predict how the algorithm will behave (e.g. running time) on arbitrary inputs, and how it will
More information' $ Dependence Analysis & % 1
Dependence Analysis 1 Goals - determine what operations can be done in parallel - determine whether the order of execution of operations can be altered Basic idea - determine a partial order on operations
More information8-1: Backpropagation Prof. J.C. Kao, UCLA. Backpropagation. Chain rule for the derivatives Backpropagation graphs Examples
8-1: Backpropagation Prof. J.C. Kao, UCLA Backpropagation Chain rule for the derivatives Backpropagation graphs Examples 8-2: Backpropagation Prof. J.C. Kao, UCLA Motivation for backpropagation To do gradient
More informationWorst-Case Execution Time Analysis. LS 12, TU Dortmund
Worst-Case Execution Time Analysis Prof. Dr. Jian-Jia Chen LS 12, TU Dortmund 09/10, Jan., 2018 Prof. Dr. Jian-Jia Chen (LS 12, TU Dortmund) 1 / 43 Most Essential Assumptions for Real-Time Systems Upper
More informationEnumerate all possible assignments and take the An algorithm is a well-defined computational
EMIS 8374 [Algorithm Design and Analysis] 1 EMIS 8374 [Algorithm Design and Analysis] 2 Designing and Evaluating Algorithms A (Bad) Algorithm for the Assignment Problem Enumerate all possible assignments
More informationA Simple Architectural Enhancement for Fast and Flexible Elliptic Curve Cryptography over Binary Finite Fields GF(2 m )
A Simple Architectural Enhancement for Fast and Flexible Elliptic Curve Cryptography over Binary Finite Fields GF(2 m ) Stefan Tillich, Johann Großschädl Institute for Applied Information Processing and
More informationCS 4407 Algorithms Lecture 3: Iterative and Divide and Conquer Algorithms
CS 4407 Algorithms Lecture 3: Iterative and Divide and Conquer Algorithms Prof. Gregory Provan Department of Computer Science University College Cork 1 Lecture Outline CS 4407, Algorithms Growth Functions
More informationData Dependences and Parallelization. Stanford University CS243 Winter 2006 Wei Li 1
Data Dependences and Parallelization Wei Li 1 Agenda Introduction Single Loop Nested Loops Data Dependence Analysis 2 Motivation DOALL loops: loops whose iterations can execute in parallel for i = 11,
More informationAutomatic Loop Interchange
RETROSPECTIVE: Automatic Loop Interchange Randy Allen Catalytic Compilers 1900 Embarcadero Rd, #206 Palo Alto, CA. 94043 jra@catacomp.com Ken Kennedy Department of Computer Science Rice University Houston,
More informationDigital Integrated Circuits A Design Perspective. Arithmetic Circuits. Jan M. Rabaey Anantha Chandrakasan Borivoje Nikolic.
Digital Integrated Circuits A Design Perspective Jan M. Rabaey Anantha Chandrakasan Borivoje Nikolic Arithmetic Circuits January, 2003 1 A Generic Digital Processor MEMORY INPUT-OUTPUT CONTROL DATAPATH
More informationClock-driven scheduling
Clock-driven scheduling Also known as static or off-line scheduling Michal Sojka Czech Technical University in Prague, Faculty of Electrical Engineering, Department of Control Engineering November 8, 2017
More informationAlgorithms. Outline! Approximation Algorithms. The class APX. The intelligence behind the hardware. ! Based on
6117CIT - Adv Topics in Computing Sci at Nathan 1 Algorithms The intelligence behind the hardware Outline! Approximation Algorithms The class APX! Some complexity classes, like PTAS and FPTAS! Illustration
More informationFall 2008 CSE Qualifying Exam. September 13, 2008
Fall 2008 CSE Qualifying Exam September 13, 2008 1 Architecture 1. (Quan, Fall 2008) Your company has just bought a new dual Pentium processor, and you have been tasked with optimizing your software for
More informationDigital Integrated Circuits A Design Perspective. Arithmetic Circuits. Jan M. Rabaey Anantha Chandrakasan Borivoje Nikolic.
Digital Integrated Circuits A Design Perspective Jan M. Rabaey Anantha Chandrakasan Borivoje Nikolic Arithmetic Circuits January, 2003 1 A Generic Digital Processor MEM ORY INPUT-OUTPUT CONTROL DATAPATH
More informationDigital Design. Register Transfer Specification And Design
Principles Of Digital Design Chapter 8 Register Transfer Specification And Design Chapter preview Boolean algebra 3 Logic gates and flip-flops 3 Finite-state machine 6 Logic design techniques 4 Sequential
More informationAlgorithm Design CS 515 Fall 2015 Sample Final Exam Solutions
Algorithm Design CS 515 Fall 2015 Sample Final Exam Solutions Copyright c 2015 Andrew Klapper. All rights reserved. 1. For the functions satisfying the following three recurrences, determine which is the
More informationAndrew Morton University of Waterloo Canada
EDF Feasibility and Hardware Accelerators Andrew Morton University of Waterloo Canada Outline 1) Introduction and motivation 2) Review of EDF and feasibility analysis 3) Hardware accelerators and scheduling
More informationLecture: Pipelining Basics
Lecture: Pipelining Basics Topics: Performance equations wrap-up, Basic pipelining implementation Video 1: What is pipelining? Video 2: Clocks and latches Video 3: An example 5-stage pipeline Video 4:
More informationProfessor Fearing EECS150/Problem Set Solution Fall 2013 Due at 10 am, Thu. Oct. 3 (homework box under stairs)
Professor Fearing EECS50/Problem Set Solution Fall 203 Due at 0 am, Thu. Oct. 3 (homework box under stairs). (25 pts) List Processor Timing. The list processor as discussed in lecture is described in RT
More informationA point p is said to be dominated by point q if p.x=q.x&p.y=q.y 2 true. In RAM computation model, RAM stands for Random Access Model.
In analysis the upper bound means the function grows asymptotically no faster than its largest term. 1 true A point p is said to be dominated by point q if p.x=q.x&p.y=q.y 2 true In RAM computation model,
More informationDatapath Component Tradeoffs
Datapath Component Tradeoffs Faster Adders Previously we studied the ripple-carry adder. This design isn t feasible for larger adders due to the ripple delay. ʽ There are several methods that we could
More informationA Novel Congruent Organizational Design Methodology Using Group Technology and a Nested Genetic Algorithm
A Novel Congruent Organizational Design Methodology Using Group Technology and a Nested Genetic Algorithm Feili Yu Georgiy M. Levchuk Candra Meirina Sui Ruan Krishna Pattipati 9-th International CCRTS,
More informationCompiling Techniques
Lecture 11: Introduction to 13 November 2015 Table of contents 1 Introduction Overview The Backend The Big Picture 2 Code Shape Overview Introduction Overview The Backend The Big Picture Source code FrontEnd
More informationOptimization Techniques for Parallel Code 1. Parallel programming models
Optimization Techniques for Parallel Code 1. Parallel programming models Sylvain Collange Inria Rennes Bretagne Atlantique http://www.irisa.fr/alf/collange/ sylvain.collange@inria.fr OPT - 2017 Goals of
More informationCS 4407 Algorithms Lecture 2: Iterative and Divide and Conquer Algorithms
CS 4407 Algorithms Lecture 2: Iterative and Divide and Conquer Algorithms Prof. Gregory Provan Department of Computer Science University College Cork 1 Lecture Outline CS 4407, Algorithms Growth Functions
More informationCS 161 Summer 2009 Homework #2 Sample Solutions
CS 161 Summer 2009 Homework #2 Sample Solutions Regrade Policy: If you believe an error has been made in the grading of your homework, you may resubmit it for a regrade. If the error consists of more than
More informationCSE. 1. In following code. addi. r1, skip1 xor in r2. r3, skip2. counter r4, top. taken): PC1: PC2: PC3: TTTTTT TTTTTT
CSE 560 Practice Problem Set 4 Solution 1. In this question, you will examine several different schemes for branch prediction, using the following code sequence for a simple load store ISA with no branch
More informationAsymptotic Running Time of Algorithms
Asymptotic Complexity: leading term analysis Asymptotic Running Time of Algorithms Comparing searching and sorting algorithms so far: Count worst-case of comparisons as function of array size. Drop lower-order
More informationCSE 431/531: Analysis of Algorithms. Dynamic Programming. Lecturer: Shi Li. Department of Computer Science and Engineering University at Buffalo
CSE 431/531: Analysis of Algorithms Dynamic Programming Lecturer: Shi Li Department of Computer Science and Engineering University at Buffalo Paradigms for Designing Algorithms Greedy algorithm Make a
More informationHomework #2: assignments/ps2/ps2.htm Due Thursday, March 7th.
Homework #2: http://www.cs.cornell.edu/courses/cs612/2002sp/ assignments/ps2/ps2.htm Due Thursday, March 7th. 1 Transformations and Dependences 2 Recall: Polyhedral algebra tools for determining emptiness
More informationCIS 4930/6930: Principles of Cyber-Physical Systems
CIS 4930/6930: Principles of Cyber-Physical Systems Chapter 11 Scheduling Hao Zheng Department of Computer Science and Engineering University of South Florida H. Zheng (CSE USF) CIS 4930/6930: Principles
More information/ : Computer Architecture and Design
16.482 / 16.561: Computer Architecture and Design Summer 2015 Homework #5 Solution 1. Dynamic scheduling (30 points) Given the loop below: DADDI R3, R0, #4 outer: DADDI R2, R1, #32 inner: L.D F0, 0(R1)
More informationLESSON 7.2 FACTORING POLYNOMIALS II
LESSON 7.2 FACTORING POLYNOMIALS II LESSON 7.2 FACTORING POLYNOMIALS II 305 OVERVIEW Here s what you ll learn in this lesson: Trinomials I a. Factoring trinomials of the form x 2 + bx + c; x 2 + bxy +
More informationFall 2011 Prof. Hyesoon Kim
Fall 2011 Prof. Hyesoon Kim Add: 2 cycles FE_stage add r1, r2, r3 FE L ID L EX L MEM L WB L add add sub r4, r1, r3 sub sub add add mul r5, r2, r3 mul sub sub add add mul sub sub add add mul sub sub add
More informationDigital Logic Design: a rigorous approach c
Digital Logic Design: a rigorous approach c Chapter 13: Decoders and Encoders Guy Even Moti Medina School of Electrical Engineering Tel-Aviv Univ. May 17, 2018 Book Homepage: http://www.eng.tau.ac.il/~guy/even-medina
More informationReal-time Scheduling of Periodic Tasks (2) Advanced Operating Systems Lecture 3
Real-time Scheduling of Periodic Tasks (2) Advanced Operating Systems Lecture 3 Lecture Outline The rate monotonic algorithm (cont d) Maximum utilisation test The deadline monotonic algorithm The earliest
More informationBin packing and scheduling
Sanders/van Stee: Approximations- und Online-Algorithmen 1 Bin packing and scheduling Overview Bin packing: problem definition Simple 2-approximation (Next Fit) Better than 3/2 is not possible Asymptotic
More informationStatic Program Analysis
Static Program Analysis Xiangyu Zhang The slides are compiled from Alex Aiken s Michael D. Ernst s Sorin Lerner s A Scary Outline Type-based analysis Data-flow analysis Abstract interpretation Theorem
More informationTiming Driven Power Gating in High-Level Synthesis
Timing Driven Power Gating in High-Level Synthesis Shih-Hsu Huang and Chun-Hua Cheng Department of Electronic Engineering Chung Yuan Christian University, Taiwan Outline Introduction Motivation Our Approach
More informationPerformance, Power & Energy
Recall: Goal of this class Performance, Power & Energy ELE8106/ELE6102 Performance Reconfiguration Power/ Energy Spring 2010 Hayden Kwok-Hay So H. So, Sp10 Lecture 3 - ELE8106/6102 2 What is good performance?
More informationECE 5775 (Fall 17) High-Level Digital Design Automation. Scheduling: Exact Methods
ECE 5775 (Fall 17) High-Level Digital Design Automation Scheduling: Exact Methods Announcements Sign up for the first student-led discussions today One slot remaining Presenters for the 1st session will
More informationModule 1: Analyzing the Efficiency of Algorithms
Module 1: Analyzing the Efficiency of Algorithms Dr. Natarajan Meghanathan Associate Professor of Computer Science Jackson State University Jackson, MS 39217 E-mail: natarajan.meghanathan@jsums.edu Based
More informationAnalysis of clocked sequential networks
Analysis of clocked sequential networks keywords: Mealy, Moore Consider : a sequential parity checker an 8th bit is added to each group of 7 bits such that the total # of 1 bits is odd for odd parity if
More informationAnalysis of Algorithm Efficiency. Dr. Yingwu Zhu
Analysis of Algorithm Efficiency Dr. Yingwu Zhu Measure Algorithm Efficiency Time efficiency How fast the algorithm runs; amount of time required to accomplish the task Our focus! Space efficiency Amount
More informationCOMP 515: Advanced Compilation for Vector and Parallel Processors. Vivek Sarkar Department of Computer Science Rice University
COMP 515: Advanced Compilation for Vector and Parallel Processors Vivek Sarkar Department of Computer Science Rice University vsarkar@rice.edu COMP 515 Lecture 10 10 February 2009 Announcement Feb 17 th
More informationAlgorithms and Programming I. Lecture#1 Spring 2015
Algorithms and Programming I Lecture#1 Spring 2015 CS 61002 Algorithms and Programming I Instructor : Maha Ali Allouzi Office: 272 MSB Office Hours: T TH 2:30:3:30 PM Email: mallouzi@kent.edu The Course
More informationExam Spring Embedded Systems. Prof. L. Thiele
Exam Spring 20 Embedded Systems Prof. L. Thiele NOTE: The given solution is only a proposal. For correctness, completeness, or understandability no responsibility is taken. Sommer 20 Eingebettete Systeme
More informationCMP N 301 Computer Architecture. Appendix C
CMP N 301 Computer Architecture Appendix C Outline Introduction Pipelining Hazards Pipelining Implementation Exception Handling Advanced Issues (Dynamic Scheduling, Out of order Issue, Superscalar, etc)
More informationData-Flow Analysis. Compiler Design CSE 504. Preliminaries
Data-Flow Analysis Compiler Design CSE 504 1 Preliminaries 2 Live Variables 3 Data Flow Equations 4 Other Analyses Last modifled: Thu May 02 2013 at 09:01:11 EDT Version: 1.3 15:28:44 2015/01/25 Compiled
More informationQuantum Computing Approach to V&V of Complex Systems Overview
Quantum Computing Approach to V&V of Complex Systems Overview Summary of Quantum Enabled V&V Technology June, 04 Todd Belote Chris Elliott Flight Controls / VMS Integration Discussion Layout I. Quantum
More informationReal-time Scheduling of Periodic Tasks (1) Advanced Operating Systems Lecture 2
Real-time Scheduling of Periodic Tasks (1) Advanced Operating Systems Lecture 2 Lecture Outline Scheduling periodic tasks The rate monotonic algorithm Definition Non-optimality Time-demand analysis...!2
More informationMath Models of OR: Branch-and-Bound
Math Models of OR: Branch-and-Bound John E. Mitchell Department of Mathematical Sciences RPI, Troy, NY 12180 USA November 2018 Mitchell Branch-and-Bound 1 / 15 Branch-and-Bound Outline 1 Branch-and-Bound
More informationHardware Acceleration of the Tate Pairing in Characteristic Three
Hardware Acceleration of the Tate Pairing in Characteristic Three CHES 2005 Hardware Acceleration of the Tate Pairing in Characteristic Three Slide 1 Introduction Pairing based cryptography is a (fairly)
More informationSection Summary. Sequences. Recurrence Relations. Summations Special Integer Sequences (optional)
Section 2.4 Section Summary Sequences. o Examples: Geometric Progression, Arithmetic Progression Recurrence Relations o Example: Fibonacci Sequence Summations Special Integer Sequences (optional) Sequences
More informationHardware Design I Chap. 4 Representative combinational logic
Hardware Design I Chap. 4 Representative combinational logic E-mail: shimada@is.naist.jp Already optimized circuits There are many optimized circuits which are well used You can reduce your design workload
More informationReal-time Systems: Scheduling Periodic Tasks
Real-time Systems: Scheduling Periodic Tasks Advanced Operating Systems Lecture 15 This work is licensed under the Creative Commons Attribution-NoDerivatives 4.0 International License. To view a copy of
More informationDiscrete-Time Systems
FIR Filters With this chapter we turn to systems as opposed to signals. The systems discussed in this chapter are finite impulse response (FIR) digital filters. The term digital filter arises because these
More information