Tecniche di ottimizzazione per lo sviluppo di applicazioni embedded su piattatforme multiprocessore su singolo chip
|
|
- Lucas Anthony
- 5 years ago
- Views:
Transcription
1 Tecniche di ottimizzazione per lo sviluppo di applicazioni embedded su piattatforme multiprocessore su singolo chip Luca Benini DEIS Università di Bologna Embedded Systems General purpose systems Embedded systems Microprocessor market shares 99% 1% 1
2 Example Area: Automotive Electronics What is automotive electronics? Vehicle functions implemented with electronics Body electronics System electronics: chassis, engine Information/entertainment Automotive Electronics Market Size Cost of Electronics / Car ($) 2006: 25% of the total cost of a car will be electronics Market ($billions) % of future innovations in vehicles: based on electronic embedded systems 2
3 Automotive Electronics Platform Example Source: Expanding automotive electronic systems, IEEE Computer, Jan Digital Convergence Mobile Example Communication Entertainment Computing Broadcasting Imaging Telematics One device, multiple functions Center of ubiquitous media network Smart mobile device: next drive for semicon. Industry 3
4 4 th Gen and Next-Gen Networks Includes: , WiMAX (802.16), HSDPA, TDD UMTS, UMTS and future versions of UMTS SoC: Enabler for Digital Convergence 4G/5G, DMB, WiBro, etc. Today > 100X Performance Low Power Complexity Storage Future SoC 4
5 Systems on chip Moore s law provides exponential growth of resources But design does not become easier Deep submicron problems (DSM) Wire vs. transistor speed, power, signal integrity Design productivity gap IP re-use, platforms, NoCs Verification technologies General-purpose Scalable RISC Processor 50 to 300 MHz 32-bit or 64-bit The evolution of SoC platforms Library of Device IP Blocks Image coprocessors DSPs UART 1394 USB MIPS MIPS CPU D$ PRxxxx I$ DEVICE IP BLOCK DEVICE IP BLOCK... DEVICE IP BLOCK DVP SYSTEM SILICON PI BUS SDRAM MMI DVP MEMORY BUS PI BUS TriMedia TriMedia CPU D$ TM-xxxx I$ DEVICE IP BLOCK DEVICE IP BLOCK... DEVICE IP BLOCK 2 Cores: Philips Nexperia PNX8850 SoC platform for High-end digital video (2001) Scalable VLIW Media Processor: 100 to 300 MHz 32-bit or 64-bit Nexperia System Buses bit 5
6 Running forward Four 350/400 MHz StarCore SC140 DSP extended cores 16 ALUs: 5600/6400 MMACS 1436 KB of internal SRAM & multi-level memory hierarchy Internal DMA controller supports 16 TDM unidirectional channels, Two internal coprocesssors (TCOP and VCOP) to provide special-purpose processing capability in parallel with the core processors 6 Cores: Motorola s MSC8126 SoC platform for 3G base stations (late 2003) Application pull 1TOPS/W 100GOPS/W H264 encoding dictation Expression recognition Gbit radio Adaptive route Gesture recognition 3D gaming 3D TV 3D ambient Structured interaction decoding Ubiquitous 3D projectednavigation Autonomous display driving HMI by motion Structured Gesture detection encoding Collision avoidance Language Emotion recognition 5 GOPS/W [IMEC] Mobile Base-band Image recognition UWB A/V Sign streaming recognition n Si Xray H264 decoding Fully recognition (security) Auto personalization Year of Introduction 6
7 MPSoC Platform Evolution Applications Software opt. Middleware, RTOS, API, Run-Time Controller Mapping V,Vt,Fclk,IL router Bus based Multi Proc Local Memory hierarchy Net Int Power Test Mgmt 45 nm 2 <4mm 30Mtr <1GHz Today s SoCs could fit in 1 tile!! Tile-based design I/O P E R I P H E R A L S 3D stacked main memory SoC Solution-on-a-Chip SOC Target System Application Application S/W System / Service Middleware Mobile Terminal RTOS Module HAL Chip S/W IP Process System Chip e-sw Requires design of Hardware AND software 7
8 Design as optimization Design space The set of all possible design choices Constraints Solutions that we are not willing to accept Cost function A property we are interested in (execution time, power, reliability ) Hardware synthesis Ampl (db) APPLICATION Freq ALGORITHM IN k D D c 1 c 2 c 4 c 5 D D c 3 c 6 d IN k OUT c 1 D c 2 c 3 D c 4 c 5 D c 6 c 7 c D 8 ARCHITECTURE interconnect GP signal ASIC processor MCM memory LOGIC AND PHYSICAL SYNTHESIS HIGH-LEVEL SYNTHESIS S1 S2 S3 S4 8
9 Behavioral synthesis Control/Data Flow Graph (C DFG ) Implementation Reg Reg Multiplier Reg Reg Adder Allocation, Assignment, and Scheduling >> >> D D >> - - >> >> Allocation: How Much? 2 adders 1 shifter 24 registers Assignment: Where? Shifter 1 Schedule: When? Time Slot 4 Techniques Well Understood and Mature 9
10 Resource constraints 4 control steps *2 4 * *2 2 *3 3 *1 Control Step 3 *1 4 Schedule 1 Schedule 2 Control Step * * * *1 4 *2 * *2 3 2 *3 4 *1 Scheduling under resource constraints Intractable problem Algorithms: Exact: Integer linear program Hu (restrictive assumptions) Approximate : List scheduling Force-directed scheduling 10
11 ILP formulation Binary decision variables: X = { x il, i = 1,2,. n; l = 1,2,, λ 1} x il, is TRUE only when operation v i starts in step l of the schedule (i.e. l = t i ) λ is an upper bound on latency Start time of operation v i : Σl. x il l ILP formulation constraints Operations start only once Σ x il = 1 l i = 1, 2,, n Sequencing relations must be satisfied t i t j d j (v j, v i ) є E Σ l x il Σ l x il d j 0 (v j, v i ) є E l l Resource bounds must be satisfied Simple case (unit delay) Σ x il a k k = 1,2, n res ; l i:t(v i )=k A A A 11
12 ILP Formulation min (Σ l x nl ) such that l Σ x il = 1 i = 1, 2,, n l Σ l x ij - Σ l x jl -d j 0 i, j = 1, 2,, n, (v j, v i ) є E l l l Σ Σ x im a k k = 1, 2,, n res ; l = 0, 1,, λ i:t(v i )=k m=l-d i 1 Example NOP * * * * * * < 4 - Resource constraints: 2 ALUs; 2 Multipliers a 1 = 2; a 2 = 2 Single-cycle operation d i = 1 i A - 5 NOP n 12
13 Example Operations start only once x 11 = 1 x 61 x 62 =1 Sequencing relations must be satisfied x 61 2x 62 2x 72 3x x 92 3x 93 4x 94 5x N5 1 0 Resource bounds must be satisfied x 11 x 21 x 61 x 81 2 x 32 x 62 x 72 x 81 2 Example NOP 0 TIME * * 10 TIME 2 * 3 * 6 < 11 TIME 3-4 * 7 * 8 TIME NOP n 13
14 Resource-Efficient Application mapping for MPSoCs MULTIMEDIA APPLICATIONS Given a platform 1. Achieve a specified throughput 2. Minimize usage of shared resources Resource assignment and scheduling THE SYSTEM Limited Size Mem Node 1 Processor Tightly-Coupled Memory Bus Interface Max time wheel period T. Task. A (WCET Ta) Task. B (WCET Tb) Task. N (WCET Tn) Node N On-chip Memory SHARED SYSTEM BUS Assumed To be infinite Max bus bandwidth 14
15 The application T0 T1 T2 T3.. Each task is characterized by: WCET Memory requirements Signal Processing Pipeline T7 Throughput Constraint Queues for inter-processor communication in TCM for efficiency reasons Program data in TCM (if space) or on-chip memory Internal state in TCM (if space) or on-chip memory Communication-aware Allocation and Scheduling for Stream-Oriented MPSoCs T0 T1 T2.. T7 Signal Processing Pipeline? Simplifying assumptions vs predictability Efficient solutions in reasonable time Pure ILP formulations suitable for small task sets Widespread use of heuristics ARM7 Local Scratchpad Memory.. ARM7 Local Scratchpad Memory B U S Private Memory. Private Memory Messageoriented MPSoC architecture 15
16 Master Problem model Assignment of tasks and memory slots (master problem) T ij = 1 if task i executes on processor j, 0 otherwise, Y ij =1 if task i allocates program data on processor j memory, 0 otherwise, Z ij =1 if task i allocates the internal state on processor j memory, 0 otherwise X ij =1 if task i executes on processor j and task i1 does not, 0 otherwise Each process should be allocated to one processor T ij = 1 for all j Link between variables X and T: X ij = T ij T i1 j for all i and j (can be linearized) If a task is NOT allocated to a processor nor its required memories are: T ij = 0 Y ij =0 and Z ij =0 Objective function mem i (T ij -Y ij ) state i (T ij -Y ij ) data i X ij /2 i j i Improvement of the model With the proposed model, the allocation problem solver tends to pack all tasks on a single processor and all memory required on the local l memory so as to have a ZERO communication cost: TRIVIAL SOLUTION To improve the model we should add a relaxation of the subproblem to the master problem model: For each set S of consecutive tasks whose sum of durations exceeds the Real time requirement, we impose that their processors should not t be the same WCET i > RT T ij S -1 i S i S 16
17 Sub-Problem model Task scheduling with static resource assignment (subproblem) i Sub-Problem model Task scheduling with static resource assignment (subproblem) We have to schedule tasks so we have to decide when they start Activity Starting Time: Start i ::[0..Deadline i ] Precedence constraints: Start i Dur i Start j Real time constraints: for all activities running on the same processor (Start i Dur i ) RT i Cumulative constraints on resources processors are unary resources: cumulative([start], [Dur], [1],1) memories are additive resources: cumulative([start],[dur],[mr],c) What about the bus?? 17
18 BANDWIDTH BIT/SEC Bus model Unary resource: granularity clock cycle Execution time taski and task j Max bus bandwidth Taski state read Taskj state read Taski state write Taskj State write TIME Arbitration mechanism that decides the bus allocation Bus model BANDWIDTH BIT/SEC Additive bus model Task0 accesses input data: BW=MaxBW/NoProc Size of program data TaskExecTime Max bus bandwidth Taski state read taski taskj Taskj state read Taski state write Taski state write The model does not hold under heavy bus congestion (more than 65% of total bandwidth) Bus traffic has to be minimized TIME 18
19 No good generation Assignment of tasks and memory slots (master problem) Task scheduling with static resource assignment (subproblem) If no feasible schedule exist for the allocation provided by the master a no-good is generated. We use the simple BUT EFFECTIVE one: identify CONFLICTING RESOURCES CR. For each R CR, ST R set of tasks allocated on R Σ T ir ST R - 1 i ST R Other cuts are also possible, [Hooker, Constraints 2005], but these are enough for our case and easy to extract Master Problem IP solver solution no good Sub- Problem CP solver solution Computational efficiency CP and IP formulations simplified Hybrid approach clearly outperforms pure CP and IP techniques Search time bounded to 15 minutes CP and IP can found a solution only in 50%- of the instances Hybrid approach always found a solution 19
20 Validation of bus model Requesting more than 65% of the theoretical maximum bandwidth causes the additive model to fail. Lower threshold in presence of communication hotspots (50%) Benefits of the additive model task execution time almost indep. of bus utilization Performance predictability greatly enhanced Validation of optimizer solutions MAX error lower than 10% AVG error equal to 4.7%, with standard deviation of 0.08 Optimizer turn out to be conservative in predicting infeasibility The flow was successfully applied to GSM benchmark 20
21 Energy-Efficient Application mapping for MPSoCs MULTIMEDIA APPLICATIONS Given a platform 1. Achieve a specified throughput 2. Minimize power consumption Exploiting Voltage Supply Supply voltage impacts power and performance Circuit slowdown T=1/f=K/(V dd -V t ) a Cubic power savings P=C eff *V dd2 *f Just-in-time computation Stretch execution time up to the max tolerable Power Fixed voltage Shutdown Variable voltage Available time 21
22 Platform with frequency scaling Core A Core B f 1 f 2 FIFO Slave A FIFO Slave B AHB Bridge FIFO Progr Device Clock tree generator f ref Core C Core D f 3 Multiple clocks across the platform Memory-mapped device allows dynamic scaling Scheduling & Voltage Scaling Different voltages: different frequencies CPU V bs f1 f2 f3 V dd Energy/speed trade-offs: varying the voltages Power τ 1 τ 2 τ 3 Slack t deadline P τ 1 τ 2 τ 3 t deadline Mapping and scheduling: given (fastest freq.) 22
23 Continuous Voltage Scaling Given Application: set of interacting processes Platform: set of nodes, each having supply voltage (V dd ) and body bias voltage (V bs ) inputs Mapping and schedule table (including timing constraints) Architecture and mapping Schedule table / processor V dd V bs CPU2 τ 1 CPU1 τ 0 τ 2 τ 3 Bus τ 5 τ 4 CPU3 V dd V bs V dd V bs P τ 1 τ 2 τ 3 Slack t deadline Solving CVS Determine Voltage levels V dd and V bs for each process Such that system energy is minimized and Deadlines are satisfied Assessment P Height: Slack voltage level Input Polynomial time τ 1 solvable τ 2 τ 3 with an arbitrary t good precision deadline P Area: A. Andrei, Overhead-Conscious Voltage Selection for energy Dynamic and Leakage Energy Reduction of Output Time- Constrained Systems, τ 1 technical τ 2 report, τ 3 Linköping t deadline University, 2004 Convex nonlinear problem 23
24 Formulation Minimize energy E[τ 0 ] E[τ 1 ] E[τ 2 ] E_OH[τ 1 -τ 2 ] Energy due to processes Overhead due to voltage changes CPU11 τ0 BUS CPU 2 τ 1 τ 2 Such that T start [τ 0 ] T exe [τ 0 ] T start [τ 1 ] T start [τ 1 ] T exe [τ 1 ] T oh [τ 1 -τ 2 ] T start [τ 2 ] Precedence relationships T start [τ 2 ] T exe [τ 2 ] DL[τ 2 ] Deadlines More realistic setting Discrete f and V With variable transition delays Multiple processors MPSoC platforms Computation, communication, storage Optimize allocation and scheduling Allocate task to processors Set frequency-voltage Schedule 24
25 Problem Decomposition Memory constraints Obj. Function: Power ALLOCATION & F assign: INTEGER PROGRAMMING Avoid trivially infeasible solutions Valid allocation No good Real Time constraint Obj. Function: Makespan SCHEDULING: CONSTRAINT PROGRAMMING Threshold for bus utilization Much tougher: many more decision variables, cost function depends on master and sub-problem Solver Performance Hundreds of of decision variables Much beyond ILP solver or CP solver capability 25
26 Accuracy of results Against cycle-accurate functional simulation Including Power estimation (STM 0.13µm) Pittfalls Beware of abstraction! Abstraction noise even of exact results The cost of modeling How to determine the coefficients Validation Do not forget to validate via functional simulation 26
Embedded Systems Design: Optimization Challenges. Paul Pop Embedded Systems Lab (ESLAB) Linköping University, Sweden
of /4 4 Embedded Systems Design: Optimization Challenges Paul Pop Embedded Systems Lab (ESLAB) Linköping University, Sweden Outline! Embedded systems " Example area: automotive electronics " Embedded systems
More informationNCU EE -- DSP VLSI Design. Tsung-Han Tsai 1
NCU EE -- DSP VLSI Design. Tsung-Han Tsai 1 Multi-processor vs. Multi-computer architecture µp vs. DSP RISC vs. DSP RISC Reduced-instruction-set Register-to-register operation Higher throughput by using
More informationA compositional approach to embedded system performance analysis
A compositional approach to embedded system performance analysis R. Ernst TU Braunschweig R. Ernst, TU Braunschweig 1 Overview heterogeneous embedded architectures formal performance analysis holistic
More informationVLSI Signal Processing
VLSI Signal Processing Lecture 1 Pipelining & Retiming ADSP Lecture1 - Pipelining & Retiming (cwliu@twins.ee.nctu.edu.tw) 1-1 Introduction DSP System Real time requirement Data driven synchronized by data
More informationVectorized 128-bit Input FP16/FP32/ FP64 Floating-Point Multiplier
Vectorized 128-bit Input FP16/FP32/ FP64 Floating-Point Multiplier Espen Stenersen Master of Science in Electronics Submission date: June 2008 Supervisor: Per Gunnar Kjeldsberg, IET Co-supervisor: Torstein
More informationL16: Power Dissipation in Digital Systems. L16: Spring 2007 Introductory Digital Systems Laboratory
L16: Power Dissipation in Digital Systems 1 Problem #1: Power Dissipation/Heat Power (Watts) 100000 10000 1000 100 10 1 0.1 4004 80088080 8085 808686 386 486 Pentium proc 18KW 5KW 1.5KW 500W 1971 1974
More informationCMPEN 411 VLSI Digital Circuits Spring Lecture 19: Adder Design
CMPEN 411 VLSI Digital Circuits Spring 2011 Lecture 19: Adder Design [Adapted from Rabaey s Digital Integrated Circuits, Second Edition, 2003 J. Rabaey, A. Chandrakasan, B. Nikolic] Sp11 CMPEN 411 L19
More informationContinuous heat flow analysis. Time-variant heat sources. Embedded Systems Laboratory (ESL) Institute of EE, Faculty of Engineering
Thermal Modeling, Analysis and Management of 2D Multi-Processor System-on-Chip Prof David Atienza Alonso Embedded Systems Laboratory (ESL) Institute of EE, Falty of Engineering Outline MPSoC thermal modeling
More informationEEC 216 Lecture #3: Power Estimation, Interconnect, & Architecture. Rajeevan Amirtharajah University of California, Davis
EEC 216 Lecture #3: Power Estimation, Interconnect, & Architecture Rajeevan Amirtharajah University of California, Davis Outline Announcements Review: PDP, EDP, Intersignal Correlations, Glitching, Top
More informationRobust On-Chip Bus Architecture Synthesis for MPSoCs Under Random Tasks Arrival
Robust On-Chip Bus Architecture Synthesis for MPSoCs Under Random Tasks Arrival Sujan Pandey NP Semiconductors Research Eindhoven, The Netherlands pandey@informatik.uni-bremen.de Rolf Drechsler Dept. of
More informationEmbedded Systems 23 BF - ES
Embedded Systems 23-1 - Measurement vs. Analysis REVIEW Probability Best Case Execution Time Unsafe: Execution Time Measurement Worst Case Execution Time Upper bound Execution Time typically huge variations
More informationEE382V: System-on-a-Chip (SoC) Design
EE82V: SystemonChip (SoC) Design Lecture EE82V: SystemonaChip (SoC) Design Lecture Operation Scheduling Source: G. De Micheli, Integrated Systems Center, EPFL Synthesis and Optimization of Digital Circuits,
More informationChapter 1. Binary Systems 1-1. Outline. ! Introductions. ! Number Base Conversions. ! Binary Arithmetic. ! Binary Codes. ! Binary Elements 1-2
Chapter 1 Binary Systems 1-1 Outline! Introductions! Number Base Conversions! Binary Arithmetic! Binary Codes! Binary Elements 1-2 3C Integration 傳輸與介面 IA Connecting 聲音與影像 Consumer Screen Phone Set Top
More informationEE382V-ICS: System-on-a-Chip (SoC) Design
EE8VICS: SystemonChip (SoC) EE8VICS: SystemonaChip (SoC) Scheduling Source: G. De Micheli, Integrated Systems Center, EPFL Synthesis and Optimization of Digital Circuits, McGraw Hill, 00. Additional sources:
More informationDesign for Manufacturability and Power Estimation. Physical issues verification (DSM)
Design for Manufacturability and Power Estimation Lecture 25 Alessandra Nardi Thanks to Prof. Jan Rabaey and Prof. K. Keutzer Physical issues verification (DSM) Interconnects Signal Integrity P/G integrity
More informationECE 5775 (Fall 17) High-Level Digital Design Automation. Scheduling: Exact Methods
ECE 5775 (Fall 17) High-Level Digital Design Automation Scheduling: Exact Methods Announcements Sign up for the first student-led discussions today One slot remaining Presenters for the 1st session will
More informationPerformance, Power & Energy. ELEC8106/ELEC6102 Spring 2010 Hayden Kwok-Hay So
Performance, Power & Energy ELEC8106/ELEC6102 Spring 2010 Hayden Kwok-Hay So Recall: Goal of this class Performance Reconfiguration Power/ Energy H. So, Sp10 Lecture 3 - ELEC8106/6102 2 PERFORMANCE EVALUATION
More informationVLSI System Design Part V : High-Level Synthesis(2) Oct Feb.2007
VLSI System Design Part V : High-Level Synthesis(2) Oct.2006 - Feb.2007 Lecturer : Tsuyoshi Isshiki Dept. Communications and Integrated Systems, Tokyo Institute of Technology isshiki@vlsi.ss.titech.ac.jp
More informationSchool of EECS Seoul National University
4!4 07$ 8902808 3 School of EECS Seoul National University Introduction Low power design 3974/:.9 43 Increasing demand on performance and integrity of VLSI circuits Popularity of portable devices Low power
More informationCMPEN 411 VLSI Digital Circuits Spring 2012 Lecture 17: Dynamic Sequential Circuits And Timing Issues
CMPEN 411 VLSI Digital Circuits Spring 2012 Lecture 17: Dynamic Sequential Circuits And Timing Issues [Adapted from Rabaey s Digital Integrated Circuits, Second Edition, 2003 J. Rabaey, A. Chandrakasan,
More informationTU Wien. Energy Efficiency. H. Kopetz 26/11/2009 Model of a Real-Time System
TU Wien 1 Energy Efficiency Outline 2 Basic Concepts Energy Estimation Hardware Scaling Power Gating Software Techniques Energy Sources 3 Why are Energy and Power Awareness Important? The widespread use
More informationExploring the Potential of Instruction-Level Parallelism of Exposed Datapath Architectures with Buffered Processing Units
Exploring the Potential of Instruction-Level Parallelism of Exposed Datapath Architectures with Buffered Processing Units Anoop Bhagyanath and Klaus Schneider Embedded Systems Chair University of Kaiserslautern
More informationLecture 2: Metrics to Evaluate Systems
Lecture 2: Metrics to Evaluate Systems Topics: Metrics: power, reliability, cost, benchmark suites, performance equation, summarizing performance with AM, GM, HM Sign up for the class mailing list! Video
More informationA comparison of sequencing formulations in a constraint generation procedure for avionics scheduling
A comparison of sequencing formulations in a constraint generation procedure for avionics scheduling Department of Mathematics, Linköping University Jessika Boberg LiTH-MAT-EX 2017/18 SE Credits: Level:
More informationFPGA Implementation of a Predictive Controller
FPGA Implementation of a Predictive Controller SIAM Conference on Optimization 2011, Darmstadt, Germany Minisymposium on embedded optimization Juan L. Jerez, George A. Constantinides and Eric C. Kerrigan
More informationThe Linear-Feedback Shift Register
EECS 141 S02 Timing Project 2: A Random Number Generator R R R S 0 S 1 S 2 1 0 0 0 1 0 1 0 1 1 1 0 1 1 1 0 1 1 0 0 1 1 0 0 The Linear-Feedback Shift Register 1 Project Goal Design a 4-bit LFSR SPEED, SPEED,
More informationLecture 25. Dealing with Interconnect and Timing. Digital Integrated Circuits Interconnect
Lecture 25 Dealing with Interconnect and Timing Administrivia Projects will be graded by next week Project phase 3 will be announced next Tu.» Will be homework-like» Report will be combined poster Today
More informationDigital Integrated Circuits A Design Perspective
Semiconductor Memories Adapted from Chapter 12 of Digital Integrated Circuits A Design Perspective Jan M. Rabaey et al. Copyright 2003 Prentice Hall/Pearson Outline Memory Classification Memory Architectures
More informationNew Exploration Frameworks for Temperature-Aware Design of MPSoCs. Prof. David Atienza
New Exploration Frameworks for Temperature-Aware Degn of MPSoCs Prof. David Atienza Dept Computer Architecture and Systems Engineering (DACYA) Complutense Univerty of Madrid, Spain Integrated Systems Lab
More informationLecture 9: Clocking, Clock Skew, Clock Jitter, Clock Distribution and some FM
Lecture 9: Clocking, Clock Skew, Clock Jitter, Clock Distribution and some FM Mark McDermott Electrical and Computer Engineering The University of Texas at Austin 9/27/18 VLSI-1 Class Notes Why Clocking?
More informationAdministrivia. Course Objectives. Overview. Lecture Notes Week markem/cs333/ 2. Staff. 3. Prerequisites. 4. Grading. 1. Theory and application
Administrivia 1. markem/cs333/ 2. Staff 3. Prerequisites 4. Grading Course Objectives 1. Theory and application 2. Benefits 3. Labs TAs Overview 1. What is a computer system? CPU PC ALU System bus Memory
More informationSystem-Level Mitigation of WID Leakage Power Variability Using Body-Bias Islands
System-Level Mitigation of WID Leakage Power Variability Using Body-Bias Islands Siddharth Garg Diana Marculescu Dept. of Electrical and Computer Engineering Carnegie Mellon University {sgarg1,dianam}@ece.cmu.edu
More informationVLSI Design. [Adapted from Rabaey s Digital Integrated Circuits, 2002, J. Rabaey et al.] ECE 4121 VLSI DEsign.1
VLSI Design Adder Design [Adapted from Rabaey s Digital Integrated Circuits, 2002, J. Rabaey et al.] ECE 4121 VLSI DEsign.1 Major Components of a Computer Processor Devices Control Memory Input Datapath
More informationEnergy-Efficient Real-Time Task Scheduling in Multiprocessor DVS Systems
Energy-Efficient Real-Time Task Scheduling in Multiprocessor DVS Systems Jian-Jia Chen *, Chuan Yue Yang, Tei-Wei Kuo, and Chi-Sheng Shih Embedded Systems and Wireless Networking Lab. Department of Computer
More informationELEC516 Digital VLSI System Design and Design Automation (spring, 2010) Assignment 4 Reference solution
ELEC516 Digital VLSI System Design and Design Automation (spring, 010) Assignment 4 Reference solution 1) Pulse-plate 1T DRAM cell a) Timing diagrams for nodes and Y when writing 0 and 1 Timing diagram
More informationCycleTandem: Energy-Saving Scheduling for Real-Time Systems with Hardware Accelerators
CycleTandem: Energy-Saving Scheduling for Real-Time Systems with Hardware Accelerators Sandeep D souza and Ragunathan (Raj) Rajkumar Carnegie Mellon University High (Energy) Cost of Accelerators Modern-day
More informationCMP 338: Third Class
CMP 338: Third Class HW 2 solution Conversion between bases The TINY processor Abstraction and separation of concerns Circuit design big picture Moore s law and chip fabrication cost Performance What does
More informationChapter 8. Low-Power VLSI Design Methodology
VLSI Design hapter 8 Low-Power VLSI Design Methodology Jin-Fu Li hapter 8 Low-Power VLSI Design Methodology Introduction Low-Power Gate-Level Design Low-Power Architecture-Level Design Algorithmic-Level
More informationPerformance, Power & Energy
Recall: Goal of this class Performance, Power & Energy ELE8106/ELE6102 Performance Reconfiguration Power/ Energy Spring 2010 Hayden Kwok-Hay So H. So, Sp10 Lecture 3 - ELE8106/6102 2 What is good performance?
More informationAdministrative Stuff
EE141- Spring 2004 Digital Integrated Circuits Lecture 30 PERSPECTIVES 1 Administrative Stuff Homework 10 posted just for practice. No need to turn in (hw 9 due today). Normal office hours next week. HKN
More informationPower-Aware Scheduling of Conditional Task Graphs in Real-Time Multiprocessor Systems
Power-Aware Scheduling of Conditional Task Graphs in Real-Time Multiprocessor Systems Dongkun Shin School of Computer Science and Engineering Seoul National University sdk@davinci.snu.ac.kr Jihong Kim
More informationHow to deal with uncertainties and dynamicity?
How to deal with uncertainties and dynamicity? http://graal.ens-lyon.fr/ lmarchal/scheduling/ 19 novembre 2012 1/ 37 Outline 1 Sensitivity and Robustness 2 Analyzing the sensitivity : the case of Backfilling
More informationThe Digital Logic Level
The Digital Logic Level Wolfgang Schreiner Research Institute for Symbolic Computation (RISC-Linz) Johannes Kepler University Wolfgang.Schreiner@risc.uni-linz.ac.at http://www.risc.uni-linz.ac.at/people/schreine
More informationChe-Wei Chang Department of Computer Science and Information Engineering, Chang Gung University
Che-Wei Chang chewei@mail.cgu.edu.tw Department of Computer Science and Information Engineering, Chang Gung University } 2017/11/15 Midterm } 2017/11/22 Final Project Announcement 2 1. Introduction 2.
More informationPERFORMANCE METRICS. Mahdi Nazm Bojnordi. CS/ECE 6810: Computer Architecture. Assistant Professor School of Computing University of Utah
PERFORMANCE METRICS Mahdi Nazm Bojnordi Assistant Professor School of Computing University of Utah CS/ECE 6810: Computer Architecture Overview Announcement Jan. 17 th : Homework 1 release (due on Jan.
More informationCSE370: Introduction to Digital Design
CSE370: Introduction to Digital Design Course staff Gaetano Borriello, Brian DeRenzi, Firat Kiyak Course web www.cs.washington.edu/370/ Make sure to subscribe to class mailing list (cse370@cs) Course text
More informationww.padasalai.net
t w w ADHITHYA TRB- TET COACHING CENTRE KANCHIPURAM SUNDER MATRIC SCHOOL - 9786851468 TEST - 2 COMPUTER SCIENC PG - TRB DATE : 17. 03. 2019 t et t et t t t t UNIT 1 COMPUTER SYSTEM ARCHITECTURE t t t t
More informationAn Approximate Parallel Multiplier with Deterministic Errors for Ultra-High Speed Integrated Optical Circuits
An Approximate Parallel Multiplier with Deterministic Errors for Ultra-High Speed Integrated Optical Circuits Jun Shiomi 1, Tohru Ishihara 1, Hidetoshi Onodera 1, Akihiko Shinya 2, Masaya Notomi 2 1 Graduate
More informationEE241 - Spring 2006 Advanced Digital Integrated Circuits
EE241 - Spring 2006 Advanced Digital Integrated Circuits Lecture 20: Asynchronous & Synchronization Self-timed and Asynchronous Design Functions of clock in synchronous design 1) Acts as completion signal
More informationExam Spring Embedded Systems. Prof. L. Thiele
Exam Spring 20 Embedded Systems Prof. L. Thiele NOTE: The given solution is only a proposal. For correctness, completeness, or understandability no responsibility is taken. Sommer 20 Eingebettete Systeme
More informationTiming Issues. Digital Integrated Circuits A Design Perspective. Jan M. Rabaey Anantha Chandrakasan Borivoje Nikolić. January 2003
Digital Integrated Circuits A Design Perspective Jan M. Rabaey Anantha Chandrakasan Borivoje Nikolić Timing Issues January 2003 1 Synchronous Timing CLK In R Combinational 1 R Logic 2 C in C out Out 2
More informationCMPEN 411 VLSI Digital Circuits Spring Lecture 14: Designing for Low Power
CMPEN 411 VLSI Digital Circuits Spring 2012 Lecture 14: Designing for Low Power [Adapted from Rabaey s Digital Integrated Circuits, Second Edition, 2003 J. Rabaey, A. Chandrakasan, B. Nikolic] Sp12 CMPEN
More informationDigital Integrated Circuits A Design Perspective. Arithmetic Circuits. Jan M. Rabaey Anantha Chandrakasan Borivoje Nikolic.
Digital Integrated Circuits A Design Perspective Jan M. Rabaey Anantha Chandrakasan Borivoje Nikolic Arithmetic Circuits January, 2003 1 A Generic Digital Processor MEM ORY INPUT-OUTPUT CONTROL DATAPATH
More informationCSE493/593. Designing for Low Power
CSE493/593 Designing for Low Power Mary Jane Irwin [Adapted from Rabaey s Digital Integrated Circuits, 2002, J. Rabaey et al.].1 Why Power Matters Packaging costs Power supply rail design Chip and system
More informationPerformance Metrics for Computer Systems. CASS 2018 Lavanya Ramapantulu
Performance Metrics for Computer Systems CASS 2018 Lavanya Ramapantulu Eight Great Ideas in Computer Architecture Design for Moore s Law Use abstraction to simplify design Make the common case fast Performance
More informationLecture 13. Real-Time Scheduling. Daniel Kästner AbsInt GmbH 2013
Lecture 3 Real-Time Scheduling Daniel Kästner AbsInt GmbH 203 Model-based Software Development 2 SCADE Suite Application Model in SCADE (data flow + SSM) System Model (tasks, interrupts, buses, ) SymTA/S
More informationLecture 23. Dealing with Interconnect. Impact of Interconnect Parasitics
Lecture 23 Dealing with Interconnect Impact of Interconnect Parasitics Reduce Reliability Affect Performance Classes of Parasitics Capacitive Resistive Inductive 1 INTERCONNECT Dealing with Capacitance
More informationLeveraging Transactional Memory for a Predictable Execution of Applications Composed of Hard Real-Time and Best-Effort Tasks
Leveraging Transactional Memory for a Predictable Execution of Applications Composed of Hard Real-Time and Best-Effort Tasks Stefan Metzlaff, Sebastian Weis, and Theo Ungerer Department of Computer Science,
More informationCSE477 VLSI Digital Circuits Fall Lecture 20: Adder Design
CSE477 VLSI Digital Circuits Fall 22 Lecture 2: Adder Design Mary Jane Irwin ( www.cse.psu.edu/~mji ) www.cse.psu.edu/~cg477 [Adapted from Rabaey s Digital Integrated Circuits, 22, J. Rabaey et al.] CSE477
More informationLogic, Optimization and Data Analytics
Logic, Optimization and Data Analytics John Hooker Carnegie Mellon University United Technologies Research Center, Cork, Ireland August 2015 Thesis Logic and optimization have an underlying unity. Ideas
More informationDigital Integrated Circuits A Design Perspective. Arithmetic Circuits. Jan M. Rabaey Anantha Chandrakasan Borivoje Nikolic.
Digital Integrated Circuits A Design Perspective Jan M. Rabaey Anantha Chandrakasan Borivoje Nikolic Arithmetic Circuits January, 2003 1 A Generic Digital Processor MEMORY INPUT-OUTPUT CONTROL DATAPATH
More informationECE 407 Computer Aided Design for Electronic Systems. Simulation. Instructor: Maria K. Michael. Overview
407 Computer Aided Design for Electronic Systems Simulation Instructor: Maria K. Michael Overview What is simulation? Design verification Modeling Levels Modeling circuits for simulation True-value simulation
More informationCS 700: Quantitative Methods & Experimental Design in Computer Science
CS 700: Quantitative Methods & Experimental Design in Computer Science Sanjeev Setia Dept of Computer Science George Mason University Logistics Grade: 35% project, 25% Homework assignments 20% midterm,
More information80386DX. 32-Bit Microprocessor FEATURES: DESCRIPTION: Logic Diagram
32-Bit Microprocessor 21 1 22 1 2 10 3 103 FEATURES: 32-Bit microprocessor RAD-PAK radiation-hardened agait natural space radiation Total dose hardness: - >100 Krad (Si), dependent upon space mission Single
More informationA Simple Architectural Enhancement for Fast and Flexible Elliptic Curve Cryptography over Binary Finite Fields GF(2 m )
A Simple Architectural Enhancement for Fast and Flexible Elliptic Curve Cryptography over Binary Finite Fields GF(2 m ) Stefan Tillich, Johann Großschädl Institute for Applied Information Processing and
More informationTemperature-Aware Analysis and Scheduling
Temperature-Aware Analysis and Scheduling Lothar Thiele, Pratyush Kumar Overview! Introduction! Power and Temperature Models! Analysis Real-time Analysis Worst-case Temperature Analysis! Scheduling Stop-and-go
More informationLogic-based Benders Decomposition
Logic-based Benders Decomposition A short Introduction Martin Riedler AC Retreat Contents 1 Introduction 2 Motivation 3 Further Notes MR Logic-based Benders Decomposition June 29 July 1 2 / 15 Basic idea
More informationTradeoff between Reliability and Power Management
Tradeoff between Reliability and Power Management 9/1/2005 FORGE Lee, Kyoungwoo Contents 1. Overview of relationship between reliability and power management 2. Dakai Zhu, Rami Melhem and Daniel Moss e,
More informationGMU, ECE 680 Physical VLSI Design 1
ECE680: Physical VLSI Design Chapter VII Timing Issues in Digital Circuits (chapter 10 in textbook) GMU, ECE 680 Physical VLSI Design 1 Synchronous Timing (Fig. 10 1) CLK In R Combinational 1 R Logic 2
More informationLecture 13: Sequential Circuits, FSM
Lecture 13: Sequential Circuits, FSM Today s topics: Sequential circuits Finite state machines 1 Clocks A microprocessor is composed of many different circuits that are operating simultaneously if each
More informationDistributed Joint Offloading Decision and Resource Allocation for Multi-User Mobile Edge Computing: A Game Theory Approach
Distributed Joint Offloading Decision and Resource Allocation for Multi-User Mobile Edge Computing: A Game Theory Approach Ning Li, Student Member, IEEE, Jose-Fernan Martinez-Ortega, Gregorio Rubio Abstract-
More informationDigital Integrated Circuits A Design Perspective
Digital Integrated Circuits A Design Perspective Jan M. Rabaey Anantha Chandrakasan Borivoje Nikolic Designing Sequential Logic Circuits November 2002 Sequential Logic Inputs Current State COMBINATIONAL
More informationDigital Integrated Circuits A Design Perspective. Semiconductor. Memories. Memories
Digital Integrated Circuits A Design Perspective Semiconductor Chapter Overview Memory Classification Memory Architectures The Memory Core Periphery Reliability Case Studies Semiconductor Memory Classification
More informationEDF Feasibility and Hardware Accelerators
EDF Feasibility and Hardware Accelerators Andrew Morton University of Waterloo, Waterloo, Canada, arrmorton@uwaterloo.ca Wayne M. Loucks University of Waterloo, Waterloo, Canada, wmloucks@pads.uwaterloo.ca
More informationSemiconductor memories
Semiconductor memories Semiconductor Memories Data in Write Memory cell Read Data out Some design issues : How many cells? Function? Power consuption? Access type? How fast are read/write operations? Semiconductor
More informationCIS 371 Computer Organization and Design
CIS 371 Computer Organization and Design Unit 13: Power & Energy Slides developed by Milo Mar0n & Amir Roth at the University of Pennsylvania with sources that included University of Wisconsin slides by
More informationLecture 8-1. Low Power Design
Lecture 8 Konstantinos Masselos Department of Electrical & Electronic Engineering Imperial College London URL: http://cas.ee.ic.ac.uk/~kostas E-mail: k.masselos@ic.ac.uk Lecture 8-1 Based on slides/material
More informationVirtual Circuits in Network-on-Chip
Virtual Circuits in Network-on-Chip Master of Science thesis nr.: 87 Technical University of Denmark Informatics and Mathematical Modelling Computer Science and Engineering Lyngby, 18th of September 2006
More informationSpecial Nodes for Interface
fi fi Special Nodes for Interface SW on processors Chip-level HW Board-level HW fi fi C code VHDL VHDL code retargetable compilation high-level synthesis SW costs HW costs partitioning (solve ILP) cluster
More informationPipelining. Traditional Execution. CS 365 Lecture 12 Prof. Yih Huang. add ld beq CS CS 365 2
Pipelining CS 365 Lecture 12 Prof. Yih Huang CS 365 1 Traditional Execution 1 2 3 4 1 2 3 4 5 1 2 3 add ld beq CS 365 2 1 Pipelined Execution 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5
More informationLecture 6 Power Zhuo Feng. Z. Feng MTU EE4800 CMOS Digital IC Design & Analysis 2010
EE4800 CMOS Digital IC Design & Analysis Lecture 6 Power Zhuo Feng 6.1 Outline Power and Energy Dynamic Power Static Power 6.2 Power and Energy Power is drawn from a voltage source attached to the V DD
More informationReducing power in using different technologies using FSM architecture
Reducing power in using different technologies using FSM architecture Himani Mitta l, Dinesh Chandra 2, Sampath Kumar 3,2,3 J.S.S.Academy of Technical Education,NOIDA,U.P,INDIA himanimit@yahoo.co.in, dinesshc@gmail.com,
More informationAnalog Computation in Flash Memory for Datacenter-scale AI Inference in a Small Chip
1 Analog Computation in Flash Memory for Datacenter-scale AI Inference in a Small Chip Dave Fick, CTO/Founder Mike Henry, CEO/Founder About Mythic 2 Focused on high-performance Edge AI Full stack co-design:
More informationEvaluation and Validation
Evaluation and Validation Jian-Jia Chen (Slides are based on Peter Marwedel) TU Dortmund, Informatik 12 Germany Springer, 2010 2016 年 01 月 05 日 These slides use Microsoft clip arts. Microsoft copyright
More informationINF2270 Spring Philipp Häfliger. Lecture 8: Superscalar CPUs, Course Summary/Repetition (1/2)
INF2270 Spring 2010 Philipp Häfliger Summary/Repetition (1/2) content From Scalar to Superscalar Lecture Summary and Brief Repetition Binary numbers Boolean Algebra Combinational Logic Circuits Encoder/Decoder
More informationXarxes de distribució del senyal de. interferència electromagnètica, consum, soroll de conmutació.
Xarxes de distribució del senyal de rellotge. Clock skew, jitter, interferència electromagnètica, consum, soroll de conmutació. (transparències generades a partir de la presentació de Jan M. Rabaey, Anantha
More informationEE382 Processor Design Winter 1999 Chapter 2 Lectures Clocking and Pipelining
Slide 1 EE382 Processor Design Winter 1999 Chapter 2 Lectures Clocking and Pipelining Slide 2 Topics Clocking Clock Parameters Latch Types Requirements for reliable clocking Pipelining Optimal pipelining
More informationSerial Parallel Multiplier Design in Quantum-dot Cellular Automata
Serial Parallel Multiplier Design in Quantum-dot Cellular Automata Heumpil Cho and Earl E. Swartzlander, Jr. Application Specific Processor Group Department of Electrical and Computer Engineering The University
More informationMark Redekopp, All rights reserved. Lecture 1 Slides. Intro Number Systems Logic Functions
Lecture Slides Intro Number Systems Logic Functions EE 0 in Context EE 0 EE 20L Logic Design Fundamentals Logic Design, CAD Tools, Lab tools, Project EE 357 EE 457 Computer Architecture Using the logic
More information! Charge Leakage/Charge Sharing. " Domino Logic Design Considerations. ! Logic Comparisons. ! Memory. " Classification. " ROM Memories.
ESE 57: Digital Integrated Circuits and VLSI Fundamentals Lec 9: March 9, 8 Memory Overview, Memory Core Cells Today! Charge Leakage/ " Domino Logic Design Considerations! Logic Comparisons! Memory " Classification
More informationHow to create designs with Dynamic/Adaptive Voltage Scaling. Roy H. Liu National Semiconductor Corporation
How to create designs with Dynamic/Adaptive Voltage Scaling Roy H. Liu National Semiconductor orporation What you will learn from this session Design challenges with variable voltage level Design partitioning
More informationClock signal in digital circuit is responsible for synchronizing the transfer to the data between processing elements.
1 2 Introduction Clock signal in digital circuit is responsible for synchronizing the transfer to the data between processing elements. Defines the precise instants when the circuit is allowed to change
More informationEE115C Winter 2017 Digital Electronic Circuits. Lecture 19: Timing Analysis
EE115C Winter 2017 Digital Electronic Circuits Lecture 19: Timing Analysis Outline Timing parameters Clock nonidealities (skew and jitter) Impact of Clk skew on timing Impact of Clk jitter on timing Flip-flop-
More informationLecture 2: CMOS technology. Energy-aware computing
Energy-Aware Computing Lecture 2: CMOS technology Basic components Transistors Two types: NMOS, PMOS Wires (interconnect) Transistors as switches Gate Drain Source NMOS: When G is @ logic 1 (actually over
More informationESE 570: Digital Integrated Circuits and VLSI Fundamentals
ESE 570: Digital Integrated Circuits and VLSI Fundamentals Lec 19: March 29, 2018 Memory Overview, Memory Core Cells Today! Charge Leakage/Charge Sharing " Domino Logic Design Considerations! Logic Comparisons!
More informationBinary Decision Diagrams and Symbolic Model Checking
Binary Decision Diagrams and Symbolic Model Checking Randy Bryant Ed Clarke Ken McMillan Allen Emerson CMU CMU Cadence U Texas http://www.cs.cmu.edu/~bryant Binary Decision Diagrams Restricted Form of
More informationSemiconductor Memories
Semiconductor References: Adapted from: Digital Integrated Circuits: A Design Perspective, J. Rabaey UCB Principles of CMOS VLSI Design: A Systems Perspective, 2nd Ed., N. H. E. Weste and K. Eshraghian
More informationTestability. Shaahin Hessabi. Sharif University of Technology. Adapted from the presentation prepared by book authors.
Testability Lecture 6: Logic Simulation Shaahin Hessabi Department of Computer Engineering Sharif University of Technology Adapted from the presentation prepared by book authors Slide 1 of 27 Outline What
More informationEnergy Delay Optimization
EE M216A.:. Fall 21 Lecture 8 Energy Delay Optimization Prof. Dejan Marković ee216a@gmail.com Some Common Questions Is sizing better than V DD for energy reduction? What are the optimal values of gate
More informationArea-Time Optimal Adder with Relative Placement Generator
Area-Time Optimal Adder with Relative Placement Generator Abstract: This paper presents the design of a generator, for the production of area-time-optimal adders. A unique feature of this generator is
More information