Exploring the Potential of Instruction-Level Parallelism of Exposed Datapath Architectures with Buffered Processing Units

Size: px
Start display at page:

Download "Exploring the Potential of Instruction-Level Parallelism of Exposed Datapath Architectures with Buffered Processing Units"

Transcription

1 Exploring the Potential of Instruction-Level Parallelism of Exposed Datapath Architectures with Buffered Processing Units Anoop Bhagyanath and Klaus Schneider Embedded Systems Chair University of Kaiserslautern ACSD / 26

2 Outline 1 Motivation 2 Queue-based Code Generation 3 Mapping to SMT 4 Preliminary Results 5 Future Work 2 / 26

3 Motivation Instruction Level Parallelism (ILP) expression tree dataflow graph (2 Regs) VLIW (4R 2W ports) ld st x3 x4 / x3 x4 / x3 x4 / 3 steps 5 steps 4 steps 3 / 26

4 Conventional Architectures ILP restricted due to limited number of registers and ports in register file compiler spills variables to main memory number of instructions packed into a VLIW word increasing number of registers is difficult instruction format encoding register file wiring 4 / 26

5 Exposed Datapath Architectures Sync Control Async Dataflow (SCAD) grid of processing units FIFO buffers (queues) at inputs and outputs of PUs compiler also moves values from one PU to another: bypass registers although bypassing is used, the code generators still use register mappings examples: TTA, MOVE-PRO, TRIPS, Wavescalar, STA, Flexcore etc 5 / 26

6 SCAD Architecture move instruction O I move instructions O I move instruction bus (MIB) fills address slots PU fires if enough data available at input buffer heads data transport network (DTN) fills data slots application-specific any arbitrary functionality in PUs interconnect choice 6 / 26

7 Queue-based Code Generation Code Generation for Queue Machine z1 y1 y2 x3 x4 executed,,x3,x4 7 / 26

8 Code Generation for Queue Machine z1 y1 y2 x3 x4 executed y1 8 / 26

9 Code Generation for Queue Machine z1 y1 y2 x3 x4 executed y2 9 / 26

10 Code Generation for Queue Machine z1 y1 y2 x3 x4 executed z1 10 / 26

11 Computation Overhead for Basic Blocks given DAG levelized DAG planar DAG level-planar DAG queue program y1 y2 + y1 y2 + D y1 y2 S + D y1 D S y2 + D load,1 load,2 add 2 dup 2 dup 1 swap dup 1 mul 1 add 1 + D store y1 store y2 11 / 26

12 Computation Overhead for Basic Blocks given DAG levelized DAG planar DAG level-planar DAG queue program y1 y2 + y1 y2 + D y1 y2 S + D y1 D S y2 + D load,1 load,2 add 2 dup 2 dup 1 swap dup 1 mul 1 add 1 + D store y1 store y2 11 / 26

13 Computation Overhead for Basic Blocks given DAG levelized DAG planar DAG level-planar DAG queue program y1 y2 + y1 y2 + D y1 y2 S + D y1 D S y2 + D load,1 load,2 add 2 dup 2 dup 1 swap dup 1 mul 1 add 1 + D store y1 store y2 11 / 26

14 Computation Overhead for Basic Blocks given DAG levelized DAG planar DAG level-planar DAG queue program y1 y2 + y1 y2 + D y1 y2 S + D y1 D S y2 + D load,1 load,2 add 2 dup 2 dup 1 swap dup 1 mul 1 add 1 + D store y1 store y2 11 / 26

15 Computation Overhead for Basic Blocks given DAG levelized DAG planar DAG level-planar DAG queue program y1 y2 + y1 y2 + D y1 y2 S + D y1 D S y2 + D load,1 load,2 add 2 dup 2 dup 1 swap dup 1 mul 1 add 1 + D store y1 store y2 11 / 26

16 Depth-First Traversal Current Compilers order nodes by depth-first traversal minimize register usage optimal code for expression trees Sethi-Ullmann algorithm polynomial time optimal code for directed-acyclic graphs (DAG) proved to be NP-Complete 12 / 26

17 DFT in Queue Machine z1 y1 y2 x3 x4 executed, 13 / 26

18 DFT in Queue Machine z1 y1 y2 x3 x4 executed y1 14 / 26

19 DFT in Queue Machine z1 y1 y2 x3 x4 executed x3,x4 wrong order to execute y2! 15 / 26

20 Queue code to SCAD code (MBMV) Queue Instruction Corresponding SCAD Move Instructions.. (load,1) [->inp1; load->opc; 1->cps].. (add 1) [out->inp1; out->inp2; add->opc; 1->cps].. (dup 2) [out->inp1; dup->opc; 2->cps].. (swap) [out->inp1; out->inp2; swap->opc].. (store y1) [y1->inp1; out->inp2; store->opc].. is the SCAD code optimal? 16 / 26

21 Reduced Computation Overhead for SCAD given DAG DAG with PU assigned y1 y2 y1 y2 + + Queue machine: One queue single total order of all nodes x 2 x 2 x 2 x 1 x 2 x 2 x 2 x 1 17 / 26

22 Reduced Computation Overhead for SCAD given DAG DAG with PU assigned y1 + y2 x 1 + x 2 x 2 x 2 x 2 x 2 x 1 + x 2 y1 + y2 SCAD machine: Multiple queues multiple partial orders of nodes less computation overhead SAT based SCAD code generation (MEMOCODE) resource-optimal at most 4 PUs for up to 15 instruction basic blocks 18 / 26

23 Mapping to SMT Problem statement Given a basic block (in three-address SSA code), a SCAD machine with p universal PUs and 1 load-store unit, a desired execution time t: determine if the basic block can be executed on the SCAD machine in time t without any computation overhead. Relations α ij θ i,j variable x i is assigned to PU j variable x i is scheduled in timeslot j 19 / 26

24 Mapping to SMT Problem statement Given a basic block (in three-address SSA code), a SCAD machine with p universal PUs and 1 load-store unit, a desired execution time t: determine if the basic block can be executed on the SCAD machine in time t without any computation overhead. Relations α ij θ i,j variable x i is assigned to PU j variable x i is scheduled in timeslot j 19 / 26

25 Constraints Binary values n 1 i=0 j=0 p 0 α i,j 1 and n 1 t 1 i=0 j=0 0 θ i,j 1 (1) Schedule exactly once Unique PU assignment n 1 t 1 θ i,j = 1 (2) i=0 j=0 n 1 i=0 j=0 p α i,j = 1 (3) 20 / 26

26 Constraints Binary values n 1 i=0 j=0 p 0 α i,j 1 and n 1 t 1 i=0 j=0 0 θ i,j 1 (1) Schedule exactly once Unique PU assignment n 1 t 1 θ i,j = 1 (2) i=0 j=0 n 1 i=0 j=0 p α i,j = 1 (3) 20 / 26

27 Constraints Binary values n 1 i=0 j=0 p 0 α i,j 1 and n 1 t 1 i=0 j=0 0 θ i,j 1 (1) Schedule exactly once Unique PU assignment n 1 t 1 θ i,j = 1 (2) i=0 j=0 n 1 i=0 j=0 p α i,j = 1 (3) 20 / 26

28 Constraints... Data dependency For each node x i, τ i is the time slot in which the node is scheduled t 1 τ i = j θ i,j j=0 For every instruction x t = x l x r, τ t τ l δ l τ t τ r δ r (4) 21 / 26

29 Constraints... Ordering variables in buffers for every pair of instructions: x t = x l x r x t = x l x r x t x t Buffer constraint β t,t = ( ) τt < τ t ( ) β l,l = τ l τ l ( ) β r,r = τ r τ r ( ) τt < τ t ) (β l,l = τ l τ l ) (β r,r = τ r τ r x l x l x r x r (5) 22 / 26

30 Preliminary Results Performance unit latency for all nodes in the basic block 23 / 26

31 Preliminary Results... Performance 90% cache hit probability 1 cycle: hit 10 cycles: miss 24 / 26

32 Future Work analyze buffer-sizes hardness of optimal code generation problem efficient heuristics 25 / 26

33 Thank You! Questions? 26 / 26

CSCI-564 Advanced Computer Architecture

CSCI-564 Advanced Computer Architecture CSCI-564 Advanced Computer Architecture Lecture 8: Handling Exceptions and Interrupts / Superscalar Bo Wu Colorado School of Mines Branch Delay Slots (expose control hazard to software) Change the ISA

More information

Special Nodes for Interface

Special Nodes for Interface fi fi Special Nodes for Interface SW on processors Chip-level HW Board-level HW fi fi C code VHDL VHDL code retargetable compilation high-level synthesis SW costs HW costs partitioning (solve ILP) cluster

More information

Issue = Select + Wakeup. Out-of-order Pipeline. Issue. Issue = Select + Wakeup. OOO execution (2-wide) OOO execution (2-wide)

Issue = Select + Wakeup. Out-of-order Pipeline. Issue. Issue = Select + Wakeup. OOO execution (2-wide) OOO execution (2-wide) Out-of-order Pipeline Buffer of instructions Issue = Select + Wakeup Select N oldest, read instructions N=, xor N=, xor and sub Note: ma have execution resource constraints: i.e., load/store/fp Fetch Decode

More information

Micro-architecture Pipelining Optimization with Throughput- Aware Floorplanning

Micro-architecture Pipelining Optimization with Throughput- Aware Floorplanning Micro-architecture Pipelining Optimization with Throughput- Aware Floorplanning Yuchun Ma* Zhuoyuan Li* Jason Cong Xianlong Hong Glenn Reinman Sheqin Dong* Qiang Zhou *Department of Computer Science &

More information

Performance, Power & Energy. ELEC8106/ELEC6102 Spring 2010 Hayden Kwok-Hay So

Performance, Power & Energy. ELEC8106/ELEC6102 Spring 2010 Hayden Kwok-Hay So Performance, Power & Energy ELEC8106/ELEC6102 Spring 2010 Hayden Kwok-Hay So Recall: Goal of this class Performance Reconfiguration Power/ Energy H. So, Sp10 Lecture 3 - ELEC8106/6102 2 PERFORMANCE EVALUATION

More information

P C max. NP-complete from partition. Example j p j What is the makespan on 2 machines? 3 machines? 4 machines?

P C max. NP-complete from partition. Example j p j What is the makespan on 2 machines? 3 machines? 4 machines? Multiple Machines Model Multiple Available resources people time slots queues networks of computers Now concerned with both allocation to a machine and ordering on that machine. P C max NP-complete from

More information

Compiling Techniques

Compiling Techniques Lecture 11: Introduction to 13 November 2015 Table of contents 1 Introduction Overview The Backend The Big Picture 2 Code Shape Overview Introduction Overview The Backend The Big Picture Source code FrontEnd

More information

Digital Integrated Circuits A Design Perspective. Arithmetic Circuits. Jan M. Rabaey Anantha Chandrakasan Borivoje Nikolic.

Digital Integrated Circuits A Design Perspective. Arithmetic Circuits. Jan M. Rabaey Anantha Chandrakasan Borivoje Nikolic. Digital Integrated Circuits A Design Perspective Jan M. Rabaey Anantha Chandrakasan Borivoje Nikolic Arithmetic Circuits January, 2003 1 A Generic Digital Processor MEMORY INPUT-OUTPUT CONTROL DATAPATH

More information

This Unit: Scheduling (Static + Dynamic) CIS 501 Computer Architecture. Readings. Review Example

This Unit: Scheduling (Static + Dynamic) CIS 501 Computer Architecture. Readings. Review Example This Unit: Scheduling (Static + Dnamic) CIS 50 Computer Architecture Unit 8: Static and Dnamic Scheduling Application OS Compiler Firmware CPU I/O Memor Digital Circuits Gates & Transistors! Previousl:!

More information

Exploiting In-Memory Processing Capabilities for Density Functional Theory Applications

Exploiting In-Memory Processing Capabilities for Density Functional Theory Applications Exploiting In-Memory Processing Capabilities for Density Functional Theory Applications 2016 Aug 23 P. F. Baumeister, T. Hater, D. Pleiter H. Boettiger, T. Maurer, J. R. Brunheroto Contributors IBM R&D

More information

EE 660: Computer Architecture Out-of-Order Processors

EE 660: Computer Architecture Out-of-Order Processors EE 660: Computer Architecture Out-of-Order Processors Yao Zheng Department of Electrical Engineering University of Hawaiʻi at Mānoa Based on the slides of Prof. David entzlaff Agenda I4 Processors I2O2

More information

Computational Complexity

Computational Complexity Computational Complexity Algorithm performance and difficulty of problems So far we have seen problems admitting fast algorithms flow problems, shortest path, spanning tree... and other problems for which

More information

Performance, Power & Energy

Performance, Power & Energy Recall: Goal of this class Performance, Power & Energy ELE8106/ELE6102 Performance Reconfiguration Power/ Energy Spring 2010 Hayden Kwok-Hay So H. So, Sp10 Lecture 3 - ELE8106/6102 2 What is good performance?

More information

Automated design of floating-point logarithm functions on integer processors

Automated design of floating-point logarithm functions on integer processors 23rd IEEE Symposium on Computer Arithmetic Santa Clara, CA, USA, 10-13 July 2016 Automated design of floating-point logarithm functions on integer processors Guillaume Revy (presented by Florent de Dinechin)

More information

CS 301: Complexity of Algorithms (Term I 2008) Alex Tiskin Harald Räcke. Hamiltonian Cycle. 8.5 Sequencing Problems. Directed Hamiltonian Cycle

CS 301: Complexity of Algorithms (Term I 2008) Alex Tiskin Harald Räcke. Hamiltonian Cycle. 8.5 Sequencing Problems. Directed Hamiltonian Cycle 8.5 Sequencing Problems Basic genres. Packing problems: SET-PACKING, INDEPENDENT SET. Covering problems: SET-COVER, VERTEX-COVER. Constraint satisfaction problems: SAT, 3-SAT. Sequencing problems: HAMILTONIAN-CYCLE,

More information

Digital Integrated Circuits A Design Perspective. Arithmetic Circuits. Jan M. Rabaey Anantha Chandrakasan Borivoje Nikolic.

Digital Integrated Circuits A Design Perspective. Arithmetic Circuits. Jan M. Rabaey Anantha Chandrakasan Borivoje Nikolic. Digital Integrated Circuits A Design Perspective Jan M. Rabaey Anantha Chandrakasan Borivoje Nikolic Arithmetic Circuits January, 2003 1 A Generic Digital Processor MEM ORY INPUT-OUTPUT CONTROL DATAPATH

More information

Hybrid static/dynamic scheduling for already optimized dense matrix factorization. Joint Laboratory for Petascale Computing, INRIA-UIUC

Hybrid static/dynamic scheduling for already optimized dense matrix factorization. Joint Laboratory for Petascale Computing, INRIA-UIUC Hybrid static/dynamic scheduling for already optimized dense matrix factorization Simplice Donfack, Laura Grigori, INRIA, France Bill Gropp, Vivek Kale UIUC, USA Joint Laboratory for Petascale Computing,

More information

ICS 233 Computer Architecture & Assembly Language

ICS 233 Computer Architecture & Assembly Language ICS 233 Computer Architecture & Assembly Language Assignment 6 Solution 1. Identify all of the RAW data dependencies in the following code. Which dependencies are data hazards that will be resolved by

More information

Clock-driven scheduling

Clock-driven scheduling Clock-driven scheduling Also known as static or off-line scheduling Michal Sojka Czech Technical University in Prague, Faculty of Electrical Engineering, Department of Control Engineering November 8, 2017

More information

ECE 3401 Lecture 23. Pipeline Design. State Table for 2-Cycle Instructions. Control Unit. ISA: Instruction Specifications (for reference)

ECE 3401 Lecture 23. Pipeline Design. State Table for 2-Cycle Instructions. Control Unit. ISA: Instruction Specifications (for reference) ECE 3401 Lecture 23 Pipeline Design Control State Register Combinational Control Logic New/ Modified Control Word ISA: Instruction Specifications (for reference) P C P C + 1 I N F I R M [ P C ] E X 0 PC

More information

INF2270 Spring Philipp Häfliger. Lecture 8: Superscalar CPUs, Course Summary/Repetition (1/2)

INF2270 Spring Philipp Häfliger. Lecture 8: Superscalar CPUs, Course Summary/Repetition (1/2) INF2270 Spring 2010 Philipp Häfliger Summary/Repetition (1/2) content From Scalar to Superscalar Lecture Summary and Brief Repetition Binary numbers Boolean Algebra Combinational Logic Circuits Encoder/Decoder

More information

Loop Scheduling and Software Pipelining \course\cpeg421-08s\topic-7.ppt 1

Loop Scheduling and Software Pipelining \course\cpeg421-08s\topic-7.ppt 1 Loop Scheduling and Software Pipelining 2008-04-24 \course\cpeg421-08s\topic-7.ppt 1 Reading List Slides: Topic 7 and 7a Other papers as assigned in class or homework: 2008-04-24 \course\cpeg421-08s\topic-7.ppt

More information

CMP N 301 Computer Architecture. Appendix C

CMP N 301 Computer Architecture. Appendix C CMP N 301 Computer Architecture Appendix C Outline Introduction Pipelining Hazards Pipelining Implementation Exception Handling Advanced Issues (Dynamic Scheduling, Out of order Issue, Superscalar, etc)

More information

Worst-Case Execution Time Analysis. LS 12, TU Dortmund

Worst-Case Execution Time Analysis. LS 12, TU Dortmund Worst-Case Execution Time Analysis Prof. Dr. Jian-Jia Chen LS 12, TU Dortmund 02, 03 May 2016 Prof. Dr. Jian-Jia Chen (LS 12, TU Dortmund) 1 / 53 Most Essential Assumptions for Real-Time Systems Upper

More information

Simple Instruction-Pipelining. Pipelined Harvard Datapath

Simple Instruction-Pipelining. Pipelined Harvard Datapath 6.823, L8--1 Simple ruction-pipelining Laboratory for Computer Science M.I.T. http://www.csg.lcs.mit.edu/6.823 Pipelined Harvard path 6.823, L8--2. I fetch decode & eg-fetch execute memory Clock period

More information

ECE 5775 (Fall 17) High-Level Digital Design Automation. Scheduling: Exact Methods

ECE 5775 (Fall 17) High-Level Digital Design Automation. Scheduling: Exact Methods ECE 5775 (Fall 17) High-Level Digital Design Automation Scheduling: Exact Methods Announcements Sign up for the first student-led discussions today One slot remaining Presenters for the 1st session will

More information

Worst-Case Execution Time Analysis. LS 12, TU Dortmund

Worst-Case Execution Time Analysis. LS 12, TU Dortmund Worst-Case Execution Time Analysis Prof. Dr. Jian-Jia Chen LS 12, TU Dortmund 09/10, Jan., 2018 Prof. Dr. Jian-Jia Chen (LS 12, TU Dortmund) 1 / 43 Most Essential Assumptions for Real-Time Systems Upper

More information

Fall 2008 CSE Qualifying Exam. September 13, 2008

Fall 2008 CSE Qualifying Exam. September 13, 2008 Fall 2008 CSE Qualifying Exam September 13, 2008 1 Architecture 1. (Quan, Fall 2008) Your company has just bought a new dual Pentium processor, and you have been tasked with optimizing your software for

More information

Logic BIST. Sungho Kang Yonsei University

Logic BIST. Sungho Kang Yonsei University Logic BIST Sungho Kang Yonsei University Outline Introduction Basics Issues Weighted Random Pattern Generation BIST Architectures Deterministic BIST Conclusion 2 Built In Self Test Test/ Normal Input Pattern

More information

CPSC 3300 Spring 2017 Exam 2

CPSC 3300 Spring 2017 Exam 2 CPSC 3300 Spring 2017 Exam 2 Name: 1. Matching. Write the correct term from the list into each blank. (2 pts. each) structural hazard EPIC forwarding precise exception hardwired load-use data hazard VLIW

More information

Runtime Model Predictive Verification on Embedded Platforms 1

Runtime Model Predictive Verification on Embedded Platforms 1 Runtime Model Predictive Verification on Embedded Platforms 1 Pei Zhang, Jianwen Li, Joseph Zambreno, Phillip H. Jones, Kristin Yvonne Rozier Presenter: Pei Zhang Iowa State University peizhang@iastate.edu

More information

Che-Wei Chang Department of Computer Science and Information Engineering, Chang Gung University

Che-Wei Chang Department of Computer Science and Information Engineering, Chang Gung University Che-Wei Chang chewei@mail.cgu.edu.tw Department of Computer Science and Information Engineering, Chang Gung University } 2017/11/15 Midterm } 2017/11/22 Final Project Announcement 2 1. Introduction 2.

More information

APTAS for Bin Packing

APTAS for Bin Packing APTAS for Bin Packing Bin Packing has an asymptotic PTAS (APTAS) [de la Vega and Leuker, 1980] For every fixed ε > 0 algorithm outputs a solution of size (1+ε)OPT + 1 in time polynomial in n APTAS for

More information

CSE 200 Lecture Notes Turing machine vs. RAM machine vs. circuits

CSE 200 Lecture Notes Turing machine vs. RAM machine vs. circuits CSE 200 Lecture Notes Turing machine vs. RAM machine vs. circuits Chris Calabro January 13, 2016 1 RAM model There are many possible, roughly equivalent RAM models. Below we will define one in the fashion

More information

Unit 1A: Computational Complexity

Unit 1A: Computational Complexity Unit 1A: Computational Complexity Course contents: Computational complexity NP-completeness Algorithmic Paradigms Readings Chapters 3, 4, and 5 Unit 1A 1 O: Upper Bounding Function Def: f(n)= O(g(n)) if

More information

NCU EE -- DSP VLSI Design. Tsung-Han Tsai 1

NCU EE -- DSP VLSI Design. Tsung-Han Tsai 1 NCU EE -- DSP VLSI Design. Tsung-Han Tsai 1 Multi-processor vs. Multi-computer architecture µp vs. DSP RISC vs. DSP RISC Reduced-instruction-set Register-to-register operation Higher throughput by using

More information

CSCI 1590 Intro to Computational Complexity

CSCI 1590 Intro to Computational Complexity CSCI 59 Intro to Computational Complexity Overview of the Course John E. Savage Brown University January 2, 29 John E. Savage (Brown University) CSCI 59 Intro to Computational Complexity January 2, 29

More information

CSE. 1. In following code. addi. r1, skip1 xor in r2. r3, skip2. counter r4, top. taken): PC1: PC2: PC3: TTTTTT TTTTTT

CSE. 1. In following code. addi. r1, skip1 xor in r2. r3, skip2. counter r4, top. taken): PC1: PC2: PC3: TTTTTT TTTTTT CSE 560 Practice Problem Set 4 Solution 1. In this question, you will examine several different schemes for branch prediction, using the following code sequence for a simple load store ISA with no branch

More information

Lecture 8: Complete Problems for Other Complexity Classes

Lecture 8: Complete Problems for Other Complexity Classes IAS/PCMI Summer Session 2000 Clay Mathematics Undergraduate Program Basic Course on Computational Complexity Lecture 8: Complete Problems for Other Complexity Classes David Mix Barrington and Alexis Maciel

More information

Simple Instruction-Pipelining. Pipelined Harvard Datapath

Simple Instruction-Pipelining. Pipelined Harvard Datapath 6.823, L8--1 Simple ruction-pipelining Updated March 6, 2000 Laboratory for Computer Science M.I.T. http://www.csg.lcs.mit.edu/6.823 Pipelined Harvard path 6.823, L8--2. fetch decode & eg-fetch execute

More information

5 Integer Linear Programming (ILP) E. Amaldi Foundations of Operations Research Politecnico di Milano 1

5 Integer Linear Programming (ILP) E. Amaldi Foundations of Operations Research Politecnico di Milano 1 5 Integer Linear Programming (ILP) E. Amaldi Foundations of Operations Research Politecnico di Milano 1 Definition: An Integer Linear Programming problem is an optimization problem of the form (ILP) min

More information

Dense Arithmetic over Finite Fields with CUMODP

Dense Arithmetic over Finite Fields with CUMODP Dense Arithmetic over Finite Fields with CUMODP Sardar Anisul Haque 1 Xin Li 2 Farnam Mansouri 1 Marc Moreno Maza 1 Wei Pan 3 Ning Xie 1 1 University of Western Ontario, Canada 2 Universidad Carlos III,

More information

Divisible Load Scheduling

Divisible Load Scheduling Divisible Load Scheduling Henri Casanova 1,2 1 Associate Professor Department of Information and Computer Science University of Hawai i at Manoa, U.S.A. 2 Visiting Associate Professor National Institute

More information

Design for Testability

Design for Testability Design for Testability Outline Ad Hoc Design for Testability Techniques Method of test points Multiplexing and demultiplexing of test points Time sharing of I/O for normal working and testing modes Partitioning

More information

8.5 Sequencing Problems

8.5 Sequencing Problems 8.5 Sequencing Problems Basic genres. Packing problems: SET-PACKING, INDEPENDENT SET. Covering problems: SET-COVER, VERTEX-COVER. Constraint satisfaction problems: SAT, 3-SAT. Sequencing problems: HAMILTONIAN-CYCLE,

More information

Determine the size of an instance of the minimum spanning tree problem.

Determine the size of an instance of the minimum spanning tree problem. 3.1 Algorithm complexity Consider two alternative algorithms A and B for solving a given problem. Suppose A is O(n 2 ) and B is O(2 n ), where n is the size of the instance. Let n A 0 be the size of the

More information

Metode şi Algoritmi de Planificare (MAP) Curs 2 Introducere în problematica planificării

Metode şi Algoritmi de Planificare (MAP) Curs 2 Introducere în problematica planificării Metode şi Algoritmi de Planificare (MAP) 2009-2010 Curs 2 Introducere în problematica planificării 20.10.2009 Metode si Algoritmi de Planificare Curs 2 1 Introduction to scheduling Scheduling problem definition

More information

Cyclic Task Scheduling with Storage Requirement Minimisation under Specific Architectural Constraints: Case of Buffers and Rotating Storage Facilities

Cyclic Task Scheduling with Storage Requirement Minimisation under Specific Architectural Constraints: Case of Buffers and Rotating Storage Facilities UNIVERSITE DE VERSAILLES SAINT-QUENTIN EN YVELINES Cyclic Task Scheduling with Storage Requirement Minimisation under Specific Architectural Constraints: Case of Buffers and Rotating Storage Facilities

More information

A Combined Analytical and Simulation-Based Model for Performance Evaluation of a Reconfigurable Instruction Set Processor

A Combined Analytical and Simulation-Based Model for Performance Evaluation of a Reconfigurable Instruction Set Processor A Combined Analytical and Simulation-Based Model for Performance Evaluation of a Reconfigurable Instruction Set Processor Farhad Mehdipour, H. Noori, B. Javadi, H. Honda, K. Inoue, K. Murakami Faculty

More information

Models: Amdahl s Law, PRAM, α-β Tal Ben-Nun

Models: Amdahl s Law, PRAM, α-β Tal Ben-Nun spcl.inf.ethz.ch @spcl_eth Models: Amdahl s Law, PRAM, α-β Tal Ben-Nun Design of Parallel and High-Performance Computing Fall 2017 DPHPC Overview cache coherency memory models 2 Speedup An application

More information

CprE 281: Digital Logic

CprE 281: Digital Logic CprE 28: Digital Logic Instructor: Alexander Stoytchev http://www.ece.iastate.edu/~alexs/classes/ Simple Processor CprE 28: Digital Logic Iowa State University, Ames, IA Copyright Alexander Stoytchev Digital

More information

Lecture: Pipelining Basics

Lecture: Pipelining Basics Lecture: Pipelining Basics Topics: Performance equations wrap-up, Basic pipelining implementation Video 1: What is pipelining? Video 2: Clocks and latches Video 3: An example 5-stage pipeline Video 4:

More information

GATE 2014 A Brief Analysis (Based on student test experiences in the stream of CS on 1 st March, Second Session)

GATE 2014 A Brief Analysis (Based on student test experiences in the stream of CS on 1 st March, Second Session) GATE 4 A Brief Analysis (Based on student test experiences in the stream of CS on st March, 4 - Second Session) Section wise analysis of the paper Mark Marks Total No of Questions Engineering Mathematics

More information

Announcements. Project #1 grades were returned on Monday. Midterm #1. Project #2. Requests for re-grades due by Tuesday

Announcements. Project #1 grades were returned on Monday. Midterm #1. Project #2. Requests for re-grades due by Tuesday Announcements Project #1 grades were returned on Monday Requests for re-grades due by Tuesday Midterm #1 Re-grade requests due by Monday Project #2 Due 10 AM Monday 1 Page State (hardware view) Page frame

More information

Register Allocation. Maryam Siahbani CMPT 379 4/5/2016 1

Register Allocation. Maryam Siahbani CMPT 379 4/5/2016 1 Register Allocation Maryam Siahbani CMPT 379 4/5/2016 1 Register Allocation Intermediate code uses unlimited temporaries Simplifying code generation and optimization Complicates final translation to assembly

More information

A Second Datapath Example YH16

A Second Datapath Example YH16 A Second Datapath Example YH16 Lecture 09 Prof. Yih Huang S365 1 A 16-Bit Architecture: YH16 A word is 16 bit wide 32 general purpose registers, 16 bits each Like MIPS, 0 is hardwired zero. 16 bit P 16

More information

Building a Multi-FPGA Virtualized Restricted Boltzmann Machine Architecture Using Embedded MPI

Building a Multi-FPGA Virtualized Restricted Boltzmann Machine Architecture Using Embedded MPI Building a Multi-FPGA Virtualized Restricted Boltzmann Machine Architecture Using Embedded MPI Charles Lo and Paul Chow {locharl1, pc}@eecg.toronto.edu Department of Electrical and Computer Engineering

More information

Vector Lane Threading

Vector Lane Threading Vector Lane Threading S. Rivoire, R. Schultz, T. Okuda, C. Kozyrakis Computer Systems Laboratory Stanford University Motivation Vector processors excel at data-level parallelism (DLP) What happens to program

More information

How to deal with uncertainties and dynamicity?

How to deal with uncertainties and dynamicity? How to deal with uncertainties and dynamicity? http://graal.ens-lyon.fr/ lmarchal/scheduling/ 19 novembre 2012 1/ 37 Outline 1 Sensitivity and Robustness 2 Analyzing the sensitivity : the case of Backfilling

More information

Limits of Feasibility. Example. Complexity Relationships among Models. 1. Complexity Relationships among Models

Limits of Feasibility. Example. Complexity Relationships among Models. 1. Complexity Relationships among Models Limits of Feasibility Wolfgang Schreiner Wolfgang.Schreiner@risc.jku.at Research Institute for Symbolic Computation (RISC) Johannes Kepler University, Linz, Austria http://www.risc.jku.at 1. Complexity

More information

CMPEN 411 VLSI Digital Circuits Spring Lecture 19: Adder Design

CMPEN 411 VLSI Digital Circuits Spring Lecture 19: Adder Design CMPEN 411 VLSI Digital Circuits Spring 2011 Lecture 19: Adder Design [Adapted from Rabaey s Digital Integrated Circuits, Second Edition, 2003 J. Rabaey, A. Chandrakasan, B. Nikolic] Sp11 CMPEN 411 L19

More information

CS20a: NP completeness. NP-complete definition. Related properties. Cook's Theorem

CS20a: NP completeness. NP-complete definition. Related properties. Cook's Theorem CS20a: NP completeness Cook s theorem SAT is an NP-complete problem http://www.cs.caltech.edu/courses/cs20/a/ December 2, 2002 1 NP-complete definition A problem is in NP if it can be solved by a nondeterministic

More information

Embedded Systems Design: Optimization Challenges. Paul Pop Embedded Systems Lab (ESLAB) Linköping University, Sweden

Embedded Systems Design: Optimization Challenges. Paul Pop Embedded Systems Lab (ESLAB) Linköping University, Sweden of /4 4 Embedded Systems Design: Optimization Challenges Paul Pop Embedded Systems Lab (ESLAB) Linköping University, Sweden Outline! Embedded systems " Example area: automotive electronics " Embedded systems

More information

Aperiodic Task Scheduling

Aperiodic Task Scheduling Aperiodic Task Scheduling Jian-Jia Chen (slides are based on Peter Marwedel) TU Dortmund, Informatik 12 Germany Springer, 2010 2017 年 11 月 29 日 These slides use Microsoft clip arts. Microsoft copyright

More information

Multicore Semantics and Programming

Multicore Semantics and Programming Multicore Semantics and Programming Peter Sewell Tim Harris University of Cambridge Oracle October November, 2015 p. 1 These Lectures Part 1: Multicore Semantics: the concurrency of multiprocessors and

More information

Decision Diagram Relaxations for Integer Programming

Decision Diagram Relaxations for Integer Programming Decision Diagram Relaxations for Integer Programming Christian Tjandraatmadja April, 2018 Tepper School of Business Carnegie Mellon University Submitted to the Tepper School of Business in Partial Fulfillment

More information

Weighted Acyclic Di-Graph Partitioning by Balanced Disjoint Paths

Weighted Acyclic Di-Graph Partitioning by Balanced Disjoint Paths Weighted Acyclic Di-Graph Partitioning by Balanced Disjoint Paths H. Murat AFSAR Olivier BRIANT Murat.Afsar@g-scop.inpg.fr Olivier.Briant@g-scop.inpg.fr G-SCOP Laboratory Grenoble Institute of Technology

More information

Accelerating Decoupled Look-ahead to Exploit Implicit Parallelism

Accelerating Decoupled Look-ahead to Exploit Implicit Parallelism Accelerating Decoupled Look-ahead to Exploit Implicit Parallelism Raj Parihar Advisor: Prof. Michael C. Huang March 22, 2013 Raj Parihar Accelerating Decoupled Look-ahead to Exploit Implicit Parallelism

More information

Hoare Logic for Realistically Modelled Machine Code

Hoare Logic for Realistically Modelled Machine Code Hoare Logic for Realistically Modelled Machine Code Magnus O. Myreen, Michael J. C. Gordon TACAS, March 2007 This talk Contribution: A mechanised Hoare logic for machine code with emphasis on resource

More information

CHAPTER log 2 64 = 6 lines/mux or decoder 9-2.* C = C 8 V = C 8 C * 9-4.* (Errata: Delete 1 after problem number) 9-5.

CHAPTER log 2 64 = 6 lines/mux or decoder 9-2.* C = C 8 V = C 8 C * 9-4.* (Errata: Delete 1 after problem number) 9-5. CHPTER 9 2008 Pearson Education, Inc. 9-. log 2 64 = 6 lines/mux or decoder 9-2.* C = C 8 V = C 8 C 7 Z = F 7 + F 6 + F 5 + F 4 + F 3 + F 2 + F + F 0 N = F 7 9-3.* = S + S = S + S S S S0 C in C 0 dder

More information

Informatique Fondamentale IMA S8

Informatique Fondamentale IMA S8 Informatique Fondamentale IMA S8 Cours 4 : graphs, problems and algorithms on graphs, (notions of) NP completeness Laure Gonnord http://laure.gonnord.org/pro/teaching/ Laure.Gonnord@polytech-lille.fr Université

More information

Schedule Table Generation for Time-Triggered Mixed Criticality Systems

Schedule Table Generation for Time-Triggered Mixed Criticality Systems Schedule Table Generation for Time-Triggered Mixed Criticality Systems Jens Theis and Gerhard Fohler Technische Universität Kaiserslautern, Germany Sanjoy Baruah The University of North Carolina, Chapel

More information

An Integrative Model for Parallelism

An Integrative Model for Parallelism An Integrative Model for Parallelism Victor Eijkhout ICERM workshop 2012/01/09 Introduction Formal part Examples Extension to other memory models Conclusion tw-12-exascale 2012/01/09 2 Introduction tw-12-exascale

More information

LRADNN: High-Throughput and Energy- Efficient Deep Neural Network Accelerator using Low Rank Approximation

LRADNN: High-Throughput and Energy- Efficient Deep Neural Network Accelerator using Low Rank Approximation LRADNN: High-Throughput and Energy- Efficient Deep Neural Network Accelerator using Low Rank Approximation Jingyang Zhu 1, Zhiliang Qian 2, and Chi-Ying Tsui 1 1 The Hong Kong University of Science and

More information

Hardware Acceleration of DNNs

Hardware Acceleration of DNNs Lecture 12: Hardware Acceleration of DNNs Visual omputing Systems Stanford S348V, Winter 2018 Hardware acceleration for DNNs Huawei Kirin NPU Google TPU: Apple Neural Engine Intel Lake rest Deep Learning

More information

A Formal Model of Clock Domain Crossing and Automated Verification of Time-Triggered Hardware

A Formal Model of Clock Domain Crossing and Automated Verification of Time-Triggered Hardware A Formal Model of Clock Domain Crossing and Automated Verification of Time-Triggered Hardware Julien Schmaltz Institute for Computing and Information Sciences Radboud University Nijmegen The Netherlands

More information

Bayesian Networks. Motivation

Bayesian Networks. Motivation Bayesian Networks Computer Sciences 760 Spring 2014 http://pages.cs.wisc.edu/~dpage/cs760/ Motivation Assume we have five Boolean variables,,,, The joint probability is,,,, How many state configurations

More information

4th year Project demo presentation

4th year Project demo presentation 4th year Project demo presentation Colm Ó héigeartaigh CASE4-99387212 coheig-case4@computing.dcu.ie 4th year Project demo presentation p. 1/23 Table of Contents An Introduction to Quantum Computing The

More information

High Performance Computing

High Performance Computing Master Degree Program in Computer Science and Networking, 2014-15 High Performance Computing 2 nd appello February 11, 2015 Write your name, surname, student identification number (numero di matricola),

More information

Computer Science. Questions for discussion Part II. Computer Science COMPUTER SCIENCE. Section 4.2.

Computer Science. Questions for discussion Part II. Computer Science COMPUTER SCIENCE. Section 4.2. COMPUTER SCIENCE S E D G E W I C K / W A Y N E PA R T I I : A L G O R I T H M S, T H E O R Y, A N D M A C H I N E S Computer Science Computer Science An Interdisciplinary Approach Section 4.2 ROBERT SEDGEWICK

More information

Computational Boolean Algebra. Pingqiang Zhou ShanghaiTech University

Computational Boolean Algebra. Pingqiang Zhou ShanghaiTech University Computational Boolean Algebra Pingqiang Zhou ShanghaiTech University Announcements Written assignment #1 is out. Due: March 24 th, in class. Programming assignment #1 is out. Due: March 24 th, 11:59PM.

More information

CHAPTER 3 FUNDAMENTALS OF COMPUTATIONAL COMPLEXITY. E. Amaldi Foundations of Operations Research Politecnico di Milano 1

CHAPTER 3 FUNDAMENTALS OF COMPUTATIONAL COMPLEXITY. E. Amaldi Foundations of Operations Research Politecnico di Milano 1 CHAPTER 3 FUNDAMENTALS OF COMPUTATIONAL COMPLEXITY E. Amaldi Foundations of Operations Research Politecnico di Milano 1 Goal: Evaluate the computational requirements (this course s focus: time) to solve

More information

QuIDD-Optimised Quantum Algorithms

QuIDD-Optimised Quantum Algorithms QuIDD-Optimised Quantum Algorithms by S K University of York Computer science 3 rd year project Supervisor: Prof Susan Stepney 03/05/2004 1 Project Objectives Investigate the QuIDD optimisation techniques

More information

Data Structures in Java

Data Structures in Java Data Structures in Java Lecture 21: Introduction to NP-Completeness 12/9/2015 Daniel Bauer Algorithms and Problem Solving Purpose of algorithms: find solutions to problems. Data Structures provide ways

More information

Energy-efficient Mapping of Big Data Workflows under Deadline Constraints

Energy-efficient Mapping of Big Data Workflows under Deadline Constraints Energy-efficient Mapping of Big Data Workflows under Deadline Constraints Presenter: Tong Shu Authors: Tong Shu and Prof. Chase Q. Wu Big Data Center Department of Computer Science New Jersey Institute

More information

Design for Testability

Design for Testability Design for Testability Outline Ad Hoc Design for Testability Techniques Method of test points Multiplexing and demultiplexing of test points Time sharing of I/O for normal working and testing modes Partitioning

More information

CODE GENERATION REGISTER ALLOCATION. Goal. Interplay between. Translate intermediate code into target code

CODE GENERATION REGISTER ALLOCATION. Goal. Interplay between. Translate intermediate code into target code CODE GENERATION Goal Translate intermediate code into target code Interplay between Register Allocation Instruction Selection Instruction Scheduling 1 REGISTER ALLOCATION 1 REGISTER ALLOCATION Motivation

More information

Problem-Solving via Search Lecture 3

Problem-Solving via Search Lecture 3 Lecture 3 What is a search problem? How do search algorithms work and how do we evaluate their performance? 1 Agenda An example task Problem formulation Infrastructure for search algorithms Complexity

More information

Priority queues implemented via heaps

Priority queues implemented via heaps Priority queues implemented via heaps Comp Sci 1575 Data s Outline 1 2 3 Outline 1 2 3 Priority queue: most important first Recall: queue is FIFO A normal queue data structure will not implement a priority

More information

VLSI Design. [Adapted from Rabaey s Digital Integrated Circuits, 2002, J. Rabaey et al.] ECE 4121 VLSI DEsign.1

VLSI Design. [Adapted from Rabaey s Digital Integrated Circuits, 2002, J. Rabaey et al.] ECE 4121 VLSI DEsign.1 VLSI Design Adder Design [Adapted from Rabaey s Digital Integrated Circuits, 2002, J. Rabaey et al.] ECE 4121 VLSI DEsign.1 Major Components of a Computer Processor Devices Control Memory Input Datapath

More information

Unit 6: Branch Prediction

Unit 6: Branch Prediction CIS 501: Computer Architecture Unit 6: Branch Prediction Slides developed by Joe Devie/, Milo Mar4n & Amir Roth at Upenn with sources that included University of Wisconsin slides by Mark Hill, Guri Sohi,

More information

Mm7 Intro to distributed computing (jmp) Mm8 Backtracking, 2-player games, genetic algorithms (hps) Mm9 Complex Problems in Network Planning (JMP)

Mm7 Intro to distributed computing (jmp) Mm8 Backtracking, 2-player games, genetic algorithms (hps) Mm9 Complex Problems in Network Planning (JMP) Algorithms and Architectures II H-P Schwefel, Jens M. Pedersen Mm6 Advanced Graph Algorithms (hps) Mm7 Intro to distributed computing (jmp) Mm8 Backtracking, 2-player games, genetic algorithms (hps) Mm9

More information

Complexity: Some examples

Complexity: Some examples Algorithms and Architectures III: Distributed Systems H-P Schwefel, Jens M. Pedersen Mm6 Distributed storage and access (jmp) Mm7 Introduction to security aspects (hps) Mm8 Parallel complexity (hps) Mm9

More information

8.5 Sequencing Problems. Chapter 8. NP and Computational Intractability. Hamiltonian Cycle. Hamiltonian Cycle

8.5 Sequencing Problems. Chapter 8. NP and Computational Intractability. Hamiltonian Cycle. Hamiltonian Cycle Chapter 8 NP and Computational Intractability 8.5 Sequencing Problems Basic genres. Packing problems: SET-PACKING, INDEPENDENT SET. Covering problems: SET-COVER, VERTEX-COVER. Constraint satisfaction problems:

More information

Environment (E) IBP IBP IBP 2 N 2 N. server. System (S) Adapter (A) ACV

Environment (E) IBP IBP IBP 2 N 2 N. server. System (S) Adapter (A) ACV The Adaptive Cross Validation Method - applied to polling schemes Anders Svensson and Johan M Karlsson Department of Communication Systems Lund Institute of Technology P. O. Box 118, 22100 Lund, Sweden

More information

6.5.3 An NP-complete domino game

6.5.3 An NP-complete domino game 26 Chapter 6. Complexity Theory 3SAT NP. We know from Theorem 6.5.7 that this is true. A P 3SAT, for every language A NP. Hence, we have to show this for languages A such as kcolor, HC, SOS, NPrim, KS,

More information

Chapter 8. NP and Computational Intractability

Chapter 8. NP and Computational Intractability Chapter 8 NP and Computational Intractability Slides by Kevin Wayne. Copyright 2005 Pearson-Addison Wesley. All rights reserved. Acknowledgement: This lecture slide is revised and authorized from Prof.

More information

Communication-avoiding LU and QR factorizations for multicore architectures

Communication-avoiding LU and QR factorizations for multicore architectures Communication-avoiding LU and QR factorizations for multicore architectures DONFACK Simplice INRIA Saclay Joint work with Laura Grigori INRIA Saclay Alok Kumar Gupta BCCS,Norway-5075 16th April 2010 Communication-avoiding

More information

Automatic Verification of Parameterized Data Structures

Automatic Verification of Parameterized Data Structures Automatic Verification of Parameterized Data Structures Jyotirmoy V. Deshmukh, E. Allen Emerson and Prateek Gupta The University of Texas at Austin The University of Texas at Austin 1 Outline Motivation

More information

Lattice Boltzmann simulations on heterogeneous CPU-GPU clusters

Lattice Boltzmann simulations on heterogeneous CPU-GPU clusters Lattice Boltzmann simulations on heterogeneous CPU-GPU clusters H. Köstler 2nd International Symposium Computer Simulations on GPU Freudenstadt, 29.05.2013 1 Contents Motivation walberla software concepts

More information