Dynamic Voltage and Frequency Scaling Under a Precise Energy Model Considering Variable and Fixed Components of the System Power Dissipation

Size: px
Start display at page:

Download "Dynamic Voltage and Frequency Scaling Under a Precise Energy Model Considering Variable and Fixed Components of the System Power Dissipation"

Transcription

1 Dynamic Voltage and Frequency Scaling Under a Precise Energy Model Csidering Variable and Fixed Compents of the System Power Dissipati Kihwan Choi W-bok Lee Ramakrishna Soma Massoud Pedram University of Southern California Outline! Background! Workload decompositi " Executi time model " System energy model! Fine-grained DVFS policy! Experimental results! Cclusi 1

2 Background DVFS! DVFS is a method through which different amount of energy is allocated to perform a task! Power csumpti of a digital CMOS circuit is: P = α C V 2 f eff α : switching factor C eff : effective capacitance V : operating voltage f : operating frequency! Energy required to run a task during T is: E = P T V 2 (assuming f V, T f 1 )! Lowering V (while simultaneously and proportiately cutting f) causes a quadratic reducti in E! The target CPU frequency is calculated as follows: " Given a task with workload, W, and latency cstraint, D " f target is hence calculated as W/D (Note that T task = D) Overview of prior DVFS works! Most DVFS methods are ccerned about CPU energy reducti ly " More precisely, dynamic porti of the CPU energy! Most computing systems, however, comprise of many subsystems such as memory and peripheral devices! Lowering CPU frequency can cause shorter battery lifetime due to increased energy csumpti in the subsystems Power f = 1 f cpu =.5 P cpu = 1 P cpu =.125 =.9 =.9 P mod E sys = 1.9 P mod E sys = 2.5 P cpu P mod Time ~8% more energy csumpti 2

3 DVFS for the minimal system energy! Two requirements " Satisfy timing cstraint " Minimize the system energy! Timing cstraint " Different applicatis exhibit disparate executi time variati as a functi of the CPU frequency change " Accurate modeling of the task executi time as the CPU frequency is varied! Minimal system energy " Power csumpti of each system compent should be known " Info. about each compent state, i.e., active or idle is required! These two requirements can be satisfied by using the workload decompositi approach Workload decompositi! CPU-bound vs. memory-bound applicatis show different executi time variati according to the CPU frequency! Workload of a program csists of -chip (W ) and chip (W ) workloads " W : work performed inside the CPU, e.g., ALU operati " W : work performed outside the CPU, e.g., -chip memory access after cache miss! Program executi time T W W T = T + T = + cpu ext f f! Given a task with workload, W and W, and latency cstraint, D W ftarget = W D ext f 3

4 System power breakdown! Power csumpti profile fluctuates greatly due to alternate executi of W and W " W (W ) requires the CPU (sub-module) power! System power csumpti can be broken into the following compents: remains unchanged DC-DC cverter, PLL, leakage, PCI bridges idle + fixed fixed variable idle standing active when each compent is not used CPU idle, memory is not accessed when each compent is used for some task CPU active, memory is accessed obtained by simple measurements or using values in the spec Performance mitoring unit (PMU)! W is modeled as: N W = CPI = N CPI i avg i = 1 N CPI : number of chip instructis : CPU clocks per instructi! PMU the PXA255 processor chip can report up to 15 different dynamic events during executi of a program " Cache hit/miss counts, TLB hit/miss counts, No. of stall cycles, Total no. of instructis being executed, Branch mispredicti counts! For DVFS, we use the PMU to generate statistics for " Total no. of instructis being executed (INSTR) " No. of stall cycles due to /-chip data dependencies (STALL) " No. of Data Cache misses (DMISS)! We also record the no. of clock cycles from the beginning of the program executi (CCNT) 4

5 Frequency settings in BitsyX! PXA255 can operate from 1MHz to 4MHz, with a core supply voltage of.85v to 1.3V! Internal bus cnects the core and other functial blocks inside the CPU! External bus is cnected to SDRAM (64MB)! Nine frequency combinatis (f cpu, f int, f ext ) Freq. Set f cpu [MHz] V cpu [V] f int [MHz] f ext [MHz] F 1.85 F F F F F F F F Executi time and frequency settings! Executi time variati over different frequency combinatis math, crc, djpeg, qsort, and gzip " math is CPU-bound ( strgly dependent f cpu ) " gzip is memory-bound (f int & f ext dependent ) Executi time (norm.) math gzip Frequency combinati Freq. Set F 1 F F F F F F F F 8 f cpu (MHz) f int (MHz) f ext (MHz) 1 5

6 Calculating T (I)! T W avg is calculated as: T =, W = N CPI cpu f! We define SPI as ratio of the number of stall cycles to the total instructi count " SPI avg = STALL / INSTR, during a time quantum avg avg avg SPI = SPI + SPI CPI = CPI + SPI avg min avg Onchip CPI value without any stall cycles CPI avg gzip SPI avg Calculating T (II)! Based the following observati: " The more D-cache miss events, the higher probability of -chip accesses! We define DPI as ratio of the number of D-cache miss events to the total instructi count " DPI avg = DMISS / INSTR, during a time quantum CPI min CPI avg min CPI = CPI + dpi2 spi( DPI) avg CPI min SPI DF = avg CPI min ( CPI CPI ) n avg SPI Dpi2spi(DPI) CPI -CPI min DF*(n-1) DF*2 DF*1 DPI DPI K 1 K 1 < DPI K 2 K n-2 < DPI K n-1 K n-1 < DPI K n DPI > K n K n is cstant: K 1 <K 2 < < K n 6

7 Calculating T! T is dependent the f ext as well as f int! Example: when a D-cache miss occurs, two operatis are performed: " Data fetch from the external memory (f ext ) " Data transfer to the CPU core where the cache-line and destinati register are updated (f int )! Due to lack of exact timing informati, we have opted to model T as: α α = + = + (1 ) W W T Tint Text int ext f f! An α value of ~.35 was obtained for tested applicatis " The average error in predicting the executi time was less than 2% for all nine frequency settings System energy modeling BitsyX! Hard to get system energy without workload decompositi! Using workload decompositi Power P, ( t) sys F n power T t 1 t 1 +T time std P sys, Fn P F n T F n P int,f n T int,f n P ext, Fn T ext, Fn t 1 t 2 t 4 t 3 active power standing power Time E = P T + P T + P T + P T std sys, F sys, F F F int, F int, F ext, F ext, F n n n n n n n n 7

8 Accuracy of the system energy model! The estimated energy csumpti for djpeg " The average error rate is less than 4% measured parameters Freq. set F F *1.33 F F F *1.33 F F F 8 F 9 std P sys, F n ( mw ) ( m W ) P F n 675 P int,f n ( mw ) 1733 P ext, F n ( m W ) Energy csumpti [J] measured estimated Frequency combinati P ~ k V f + k f 2 cpu int int,f n 1 Fn n 2 n k 1 =.73 [nf], k 2 = 6.2 [V 2 nf] Determining the optimal frequency setting! Csider timing cstraint followed by system energy minimizati " For a timing cstraint, we used performance loss (PF loss ) which is defined as: ( TF T ) n F PFloss = TF! Pseudo code for optimal frequency selecti Timing cstraint Energy minimizati 1. Ψ = { F min,, F }, Γ = {φ }, and E min = 2. for every frequency setting F n in Ψ 3. i+1 i if ( T (1 + PFloss ) T F F n ) 4. Γ = Γ F n ; 5. for every frequency setting F n in Γ 6. calculate system energy using proposed model 7. if ( E sys E min ), Fn 8. E min = E sys,fn ; F opt i+1 = F n ; 8

9 The software architecture! The software architecture comprises of a proc interface module and a policy setting module tightly linked with the Linux scheduler, the PMU, and the freq. and voltage ctrol circuitry the BitsyX board External PF loss input parameter Kernel Space proc Interface Module Linux Scheduler Policy Setting Module PMU Access Module DVFS Module PXA255 Processor Actual performance [%] Experimental results (I)! Compared two DVFS techniques: " SE-DVFS : proposed DVFS (saving the system energy) " CE-DVFS : cvential DVFS (saving the CPU energy)! Resulting performance loss factors: Target PF loss CE-DVFS 1% 2% 3% 4% 5% Actual performance [%] SE-DVFS Target PF loss 1% 2% 3% 4% 5% 9

10 Experimental results (II)! System energy csumpti of the two DVFS approaches compared to the case without any DVFS " CE-DVFS : always more energy csumed " SE-DVFS : system energy saving for some applicatis CE-DVFS SE-DVFS 1 1 System energy saving [%] Target PF loss 1% 2% 3% 4% 5% System energy saving [%] Target PF loss 1% 2% 3% 4% 5% Experimental results (III)! Actual power csumpti of the two DVFS methods! For gzip with 3% target PF loss, SE-DVFS results in 11.4% lower total system energy than CE-DVFS Power csumpti [mw] 5 gzip, with 3% Target PF loss CE-DVFS avg. power : 2619mW, 6.531sec Energy : 17.15J Time [sec] Power csumpti [mw] 5 gzip, with 3% Target PF loss SE-DVFS avg. power : 272.3mW, 5.568sec Energy : J Time [sec] 1

11 Experimental results (IV)! CE-DVFS vs. SE-DVFS " SE-DVFS results in 2% ~ 18% higher system energy savings compared to CE-DVFS System energy difference [%] Target PF loss 1% 2% 3% 4% 5% Cclusis! A DVFS policy for the actual system energy reducti was proposed and implemented, which uses line decompositi of the applicati workload into -chip and -chip compents! Based actual current measurements in the BitsyX platform, up to 18% more system energy saving was achieved with the proposed DVFS compared with the results in the previous DVFS techniques! For both CPU and memory-bound programs, given timing cstraints were also satisfied 11

Dynamic Voltage and Frequency Scaling based on Workload Decomposition *

Dynamic Voltage and Frequency Scaling based on Workload Decomposition * Dynamic Voltage and Frequency Scaling based on Workload Decomposition * Kihwan Choi, Ramakrishna Soma, and Massoud Pedram Department o EE-Systems, University o Southern Caliornia, Los Angeles, CA 989 {kihwanch,

More information

Amdahl's Law. Execution time new = ((1 f) + f/s) Execution time. S. Then:

Amdahl's Law. Execution time new = ((1 f) + f/s) Execution time. S. Then: Amdahl's Law Useful for evaluating the impact of a change. (A general observation.) Insight: Improving a feature cannot improve performance beyond the use of the feature Suppose we introduce a particular

More information

Lecture 2: Metrics to Evaluate Systems

Lecture 2: Metrics to Evaluate Systems Lecture 2: Metrics to Evaluate Systems Topics: Metrics: power, reliability, cost, benchmark suites, performance equation, summarizing performance with AM, GM, HM Sign up for the class mailing list! Video

More information

Stochastic Dynamic Thermal Management: A Markovian Decision-based Approach. Hwisung Jung, Massoud Pedram

Stochastic Dynamic Thermal Management: A Markovian Decision-based Approach. Hwisung Jung, Massoud Pedram Stochastic Dynamic Thermal Management: A Markovian Decision-based Approach Hwisung Jung, Massoud Pedram Outline Introduction Background Thermal Management Framework Accuracy of Modeling Policy Representation

More information

Performance, Power & Energy. ELEC8106/ELEC6102 Spring 2010 Hayden Kwok-Hay So

Performance, Power & Energy. ELEC8106/ELEC6102 Spring 2010 Hayden Kwok-Hay So Performance, Power & Energy ELEC8106/ELEC6102 Spring 2010 Hayden Kwok-Hay So Recall: Goal of this class Performance Reconfiguration Power/ Energy H. So, Sp10 Lecture 3 - ELEC8106/6102 2 PERFORMANCE EVALUATION

More information

Performance Metrics for Computer Systems. CASS 2018 Lavanya Ramapantulu

Performance Metrics for Computer Systems. CASS 2018 Lavanya Ramapantulu Performance Metrics for Computer Systems CASS 2018 Lavanya Ramapantulu Eight Great Ideas in Computer Architecture Design for Moore s Law Use abstraction to simplify design Make the common case fast Performance

More information

Dynamic Power Management under Uncertain Information. University of Southern California Los Angeles CA

Dynamic Power Management under Uncertain Information. University of Southern California Los Angeles CA Dynamic Power Management under Uncertain Information Hwisung Jung and Massoud Pedram University of Southern California Los Angeles CA Agenda Introduction Background Stochastic Decision-Making Framework

More information

CPU Consolidation versus Dynamic Voltage and Frequency Scaling in a Virtualized Multi-Core Server: Which is More Effective and When

CPU Consolidation versus Dynamic Voltage and Frequency Scaling in a Virtualized Multi-Core Server: Which is More Effective and When 1 CPU Consolidation versus Dynamic Voltage and Frequency Scaling in a Virtualized Multi-Core Server: Which is More Effective and When Inkwon Hwang, Student Member and Massoud Pedram, Fellow, IEEE Abstract

More information

New Exploration Frameworks for Temperature-Aware Design of MPSoCs. Prof. David Atienza

New Exploration Frameworks for Temperature-Aware Design of MPSoCs. Prof. David Atienza New Exploration Frameworks for Temperature-Aware Degn of MPSoCs Prof. David Atienza Dept Computer Architecture and Systems Engineering (DACYA) Complutense Univerty of Madrid, Spain Integrated Systems Lab

More information

L16: Power Dissipation in Digital Systems. L16: Spring 2007 Introductory Digital Systems Laboratory

L16: Power Dissipation in Digital Systems. L16: Spring 2007 Introductory Digital Systems Laboratory L16: Power Dissipation in Digital Systems 1 Problem #1: Power Dissipation/Heat Power (Watts) 100000 10000 1000 100 10 1 0.1 4004 80088080 8085 808686 386 486 Pentium proc 18KW 5KW 1.5KW 500W 1971 1974

More information

Administrivia. Course Objectives. Overview. Lecture Notes Week markem/cs333/ 2. Staff. 3. Prerequisites. 4. Grading. 1. Theory and application

Administrivia. Course Objectives. Overview. Lecture Notes Week markem/cs333/ 2. Staff. 3. Prerequisites. 4. Grading. 1. Theory and application Administrivia 1. markem/cs333/ 2. Staff 3. Prerequisites 4. Grading Course Objectives 1. Theory and application 2. Benefits 3. Labs TAs Overview 1. What is a computer system? CPU PC ALU System bus Memory

More information

Tracking board design for the SHAGARE stratospheric balloon project. Supervisor : René Beuchat Student : Joël Vallone

Tracking board design for the SHAGARE stratospheric balloon project. Supervisor : René Beuchat Student : Joël Vallone Tracking board design for the SHAGARE stratospheric balloon project Supervisor : René Beuchat Student : Joël Vallone Motivation Send & track a gamma-ray sensor in the stratosphere with a meteorological

More information

Continuous heat flow analysis. Time-variant heat sources. Embedded Systems Laboratory (ESL) Institute of EE, Faculty of Engineering

Continuous heat flow analysis. Time-variant heat sources. Embedded Systems Laboratory (ESL) Institute of EE, Faculty of Engineering Thermal Modeling, Analysis and Management of 2D Multi-Processor System-on-Chip Prof David Atienza Alonso Embedded Systems Laboratory (ESL) Institute of EE, Falty of Engineering Outline MPSoC thermal modeling

More information

ICS 233 Computer Architecture & Assembly Language

ICS 233 Computer Architecture & Assembly Language ICS 233 Computer Architecture & Assembly Language Assignment 6 Solution 1. Identify all of the RAW data dependencies in the following code. Which dependencies are data hazards that will be resolved by

More information

CSE140L: Components and Design Techniques for Digital Systems Lab. Power Consumption in Digital Circuits. Pietro Mercati

CSE140L: Components and Design Techniques for Digital Systems Lab. Power Consumption in Digital Circuits. Pietro Mercati CSE140L: Components and Design Techniques for Digital Systems Lab Power Consumption in Digital Circuits Pietro Mercati 1 About the final Friday 09/02 at 11.30am in WLH2204 ~2hrs exam including (but not

More information

NCU EE -- DSP VLSI Design. Tsung-Han Tsai 1

NCU EE -- DSP VLSI Design. Tsung-Han Tsai 1 NCU EE -- DSP VLSI Design. Tsung-Han Tsai 1 Multi-processor vs. Multi-computer architecture µp vs. DSP RISC vs. DSP RISC Reduced-instruction-set Register-to-register operation Higher throughput by using

More information

Lecture: Pipelining Basics

Lecture: Pipelining Basics Lecture: Pipelining Basics Topics: Performance equations wrap-up, Basic pipelining implementation Video 1: What is pipelining? Video 2: Clocks and latches Video 3: An example 5-stage pipeline Video 4:

More information

Che-Wei Chang Department of Computer Science and Information Engineering, Chang Gung University

Che-Wei Chang Department of Computer Science and Information Engineering, Chang Gung University Che-Wei Chang chewei@mail.cgu.edu.tw Department of Computer Science and Information Engineering, Chang Gung University } 2017/11/15 Midterm } 2017/11/22 Final Project Announcement 2 1. Introduction 2.

More information

MAX4636EUB -40 C to +85 C 10 µmax 1 NO1 IN1 GND. 3 x 3 THIN QFN OFF ON ON OFF SWITCHES SHOWN FOR "0" INPUT. Maxim Integrated Products 1

MAX4636EUB -40 C to +85 C 10 µmax 1 NO1 IN1 GND. 3 x 3 THIN QFN OFF ON ON OFF SWITCHES SHOWN FOR 0 INPUT. Maxim Integrated Products 1 9-79; Rev ; /3 Fast, Low-Voltage, Dual 4 SPDT General Description The are fast, dual 4 singlepole/double-throw (SPDT) analog switches that operate with supply voltages from +.8V to +.V. High switching

More information

Combine Dynamic Time-slice Scaling with DVFS for Coordinating Thermal and Fairness on CPU

Combine Dynamic Time-slice Scaling with DVFS for Coordinating Thermal and Fairness on CPU Combine Dynamic Time-slice Scaling with DVFS for Coordinating Thermal and Fairness on CPU Gangyong Jia Department of Computer Science and Technology Hangzhou Dianzi University Hangzhou, China gangyong@hdu.edu.cn

More information

Lecture 13: Sequential Circuits, FSM

Lecture 13: Sequential Circuits, FSM Lecture 13: Sequential Circuits, FSM Today s topics: Sequential circuits Finite state machines 1 Clocks A microprocessor is composed of many different circuits that are operating simultaneously if each

More information

Embedded Systems Design: Optimization Challenges. Paul Pop Embedded Systems Lab (ESLAB) Linköping University, Sweden

Embedded Systems Design: Optimization Challenges. Paul Pop Embedded Systems Lab (ESLAB) Linköping University, Sweden of /4 4 Embedded Systems Design: Optimization Challenges Paul Pop Embedded Systems Lab (ESLAB) Linköping University, Sweden Outline! Embedded systems " Example area: automotive electronics " Embedded systems

More information

CMP N 301 Computer Architecture. Appendix C

CMP N 301 Computer Architecture. Appendix C CMP N 301 Computer Architecture Appendix C Outline Introduction Pipelining Hazards Pipelining Implementation Exception Handling Advanced Issues (Dynamic Scheduling, Out of order Issue, Superscalar, etc)

More information

CSE370: Introduction to Digital Design

CSE370: Introduction to Digital Design CSE370: Introduction to Digital Design Course staff Gaetano Borriello, Brian DeRenzi, Firat Kiyak Course web www.cs.washington.edu/370/ Make sure to subscribe to class mailing list (cse370@cs) Course text

More information

Computer Architecture

Computer Architecture Lecture 2: Iakovos Mavroidis Computer Science Department University of Crete 1 Previous Lecture CPU Evolution What is? 2 Outline Measurements and metrics : Performance, Cost, Dependability, Power Guidelines

More information

Microprocessor Power Analysis by Labeled Simulation

Microprocessor Power Analysis by Labeled Simulation Microprocessor Power Analysis by Labeled Simulation Cheng-Ta Hsieh, Kevin Chen and Massoud Pedram University of Southern California Dept. of EE-Systems Los Angeles CA 989 Outline! Introduction! Problem

More information

PERFORMANCE METRICS. Mahdi Nazm Bojnordi. CS/ECE 6810: Computer Architecture. Assistant Professor School of Computing University of Utah

PERFORMANCE METRICS. Mahdi Nazm Bojnordi. CS/ECE 6810: Computer Architecture. Assistant Professor School of Computing University of Utah PERFORMANCE METRICS Mahdi Nazm Bojnordi Assistant Professor School of Computing University of Utah CS/ECE 6810: Computer Architecture Overview Announcement Jan. 17 th : Homework 1 release (due on Jan.

More information

Tradeoff between Reliability and Power Management

Tradeoff between Reliability and Power Management Tradeoff between Reliability and Power Management 9/1/2005 FORGE Lee, Kyoungwoo Contents 1. Overview of relationship between reliability and power management 2. Dakai Zhu, Rami Melhem and Daniel Moss e,

More information

CHAPTER 5 - PROCESS SCHEDULING

CHAPTER 5 - PROCESS SCHEDULING CHAPTER 5 - PROCESS SCHEDULING OBJECTIVES To introduce CPU scheduling, which is the basis for multiprogrammed operating systems To describe various CPU-scheduling algorithms To discuss evaluation criteria

More information

Digital System Clocking: High-Performance and Low-Power Aspects. Vojin G. Oklobdzija, Vladimir M. Stojanovic, Dejan M. Markovic, Nikola M.

Digital System Clocking: High-Performance and Low-Power Aspects. Vojin G. Oklobdzija, Vladimir M. Stojanovic, Dejan M. Markovic, Nikola M. Digital System Clocking: High-Performance and Low-Power Aspects Vojin G. Oklobdzija, Vladimir M. Stojanovic, Dejan M. Markovic, Nikola M. Nedovic Wiley-Interscience and IEEE Press, January 2003 Nov. 14,

More information

Worst-Case Execution Time Analysis. LS 12, TU Dortmund

Worst-Case Execution Time Analysis. LS 12, TU Dortmund Worst-Case Execution Time Analysis Prof. Dr. Jian-Jia Chen LS 12, TU Dortmund 02, 03 May 2016 Prof. Dr. Jian-Jia Chen (LS 12, TU Dortmund) 1 / 53 Most Essential Assumptions for Real-Time Systems Upper

More information

Performance Metrics & Architectural Adaptivity. ELEC8106/ELEC6102 Spring 2010 Hayden Kwok-Hay So

Performance Metrics & Architectural Adaptivity. ELEC8106/ELEC6102 Spring 2010 Hayden Kwok-Hay So Performance Metrics & Architectural Adaptivity ELEC8106/ELEC6102 Spring 2010 Hayden Kwok-Hay So What are the Options? Power Consumption Activity factor (amount of circuit switching) Load Capacitance (size

More information

DC-DC Converter-Aware Power Management for Battery-Operated Embedded Systems

DC-DC Converter-Aware Power Management for Battery-Operated Embedded Systems 53.2 Converter-Aware Power Management for Battery-Operated Embedded Systems Yongseok Choi and Naehyuck Chang School of Computer Science & Engineering Seoul National University Seoul, Korea naehyuck@snu.ac.kr

More information

Chapter 2 Logic Synthesis by Signal-Driven Decomposition

Chapter 2 Logic Synthesis by Signal-Driven Decomposition Chapter 2 Logic Synthesis by Signal-Driven Decompositi Anna Bernasci, Valentina Ciriani, Gabriella Trucco, and Tiziano Villa Abstract This chapter investigates some restructuring techniques based decompositi

More information

MS4525HRD (High Resolution Digital)

MS4525HRD (High Resolution Digital) MS4525HRD (High Resolution Digital) Integrated Digital Pressure Sensor (24-bit Σ ADC) Fast Conversion Down to 1 ms Low Power, 1 µa (standby < 0.15 µa) Supply Voltage: 1.8 to 3.6V Pressure Range: 1 to 150

More information

Worst-Case Execution Time Analysis. LS 12, TU Dortmund

Worst-Case Execution Time Analysis. LS 12, TU Dortmund Worst-Case Execution Time Analysis Prof. Dr. Jian-Jia Chen LS 12, TU Dortmund 09/10, Jan., 2018 Prof. Dr. Jian-Jia Chen (LS 12, TU Dortmund) 1 / 43 Most Essential Assumptions for Real-Time Systems Upper

More information

ESE 570: Digital Integrated Circuits and VLSI Fundamentals

ESE 570: Digital Integrated Circuits and VLSI Fundamentals ESE 570: Digital Integrated Circuits and VLSI Fundamentals Lec 19: March 29, 2018 Memory Overview, Memory Core Cells Today! Charge Leakage/Charge Sharing " Domino Logic Design Considerations! Logic Comparisons!

More information

Scheduling for Reduced CPU Energy

Scheduling for Reduced CPU Energy Scheduling for Reduced CPU Energy M. Weiser, B. Welch, A. Demers and S. Shenker Appears in "Proceedings of the First Symposium on Operating Systems Design and Implementation," Usenix Association, November

More information

Chapter 8. Low-Power VLSI Design Methodology

Chapter 8. Low-Power VLSI Design Methodology VLSI Design hapter 8 Low-Power VLSI Design Methodology Jin-Fu Li hapter 8 Low-Power VLSI Design Methodology Introduction Low-Power Gate-Level Design Low-Power Architecture-Level Design Algorithmic-Level

More information

Lecture 23. Dealing with Interconnect. Impact of Interconnect Parasitics

Lecture 23. Dealing with Interconnect. Impact of Interconnect Parasitics Lecture 23 Dealing with Interconnect Impact of Interconnect Parasitics Reduce Reliability Affect Performance Classes of Parasitics Capacitive Resistive Inductive 1 INTERCONNECT Dealing with Capacitance

More information

Clock signal in digital circuit is responsible for synchronizing the transfer to the data between processing elements.

Clock signal in digital circuit is responsible for synchronizing the transfer to the data between processing elements. 1 2 Introduction Clock signal in digital circuit is responsible for synchronizing the transfer to the data between processing elements. Defines the precise instants when the circuit is allowed to change

More information

Circuit Modeling for Practical Many-core Architecture Design Exploration

Circuit Modeling for Practical Many-core Architecture Design Exploration Circuit Modeling for Practical Many-core Architecture Design Exploration Redefining design abstractions Dean Truong Bevan Baas VLSI Computation Lab University of California, Davis Outline Motivation Circuit

More information

! Charge Leakage/Charge Sharing. " Domino Logic Design Considerations. ! Logic Comparisons. ! Memory. " Classification. " ROM Memories.

! Charge Leakage/Charge Sharing.  Domino Logic Design Considerations. ! Logic Comparisons. ! Memory.  Classification.  ROM Memories. ESE 57: Digital Integrated Circuits and VLSI Fundamentals Lec 9: March 9, 8 Memory Overview, Memory Core Cells Today! Charge Leakage/ " Domino Logic Design Considerations! Logic Comparisons! Memory " Classification

More information

CMP 338: Third Class

CMP 338: Third Class CMP 338: Third Class HW 2 solution Conversion between bases The TINY processor Abstraction and separation of concerns Circuit design big picture Moore s law and chip fabrication cost Performance What does

More information

Lecture 6 Power Zhuo Feng. Z. Feng MTU EE4800 CMOS Digital IC Design & Analysis 2010

Lecture 6 Power Zhuo Feng. Z. Feng MTU EE4800 CMOS Digital IC Design & Analysis 2010 EE4800 CMOS Digital IC Design & Analysis Lecture 6 Power Zhuo Feng 6.1 Outline Power and Energy Dynamic Power Static Power 6.2 Power and Energy Power is drawn from a voltage source attached to the V DD

More information

TU Wien. Energy Efficiency. H. Kopetz 26/11/2009 Model of a Real-Time System

TU Wien. Energy Efficiency. H. Kopetz 26/11/2009 Model of a Real-Time System TU Wien 1 Energy Efficiency Outline 2 Basic Concepts Energy Estimation Hardware Scaling Power Gating Software Techniques Energy Sources 3 Why are Energy and Power Awareness Important? The widespread use

More information

MS BA Micro Altimeter Module

MS BA Micro Altimeter Module High resolution module, 20cm Fast conversion down to ms Low power, µa (standby < 0.5 µa) QFN package 5.0 x 3.0 x.0 mm 3 Supply voltage.8 to 3.6 V Integrated digital pressure sensor (24 bit Σ AC) Operating

More information

Vector Lane Threading

Vector Lane Threading Vector Lane Threading S. Rivoire, R. Schultz, T. Okuda, C. Kozyrakis Computer Systems Laboratory Stanford University Motivation Vector processors excel at data-level parallelism (DLP) What happens to program

More information

Optimal Voltage Allocation Techniques for Dynamically Variable Voltage Processors

Optimal Voltage Allocation Techniques for Dynamically Variable Voltage Processors Optimal Allocation Techniques for Dynamically Variable Processors 9.2 Woo-Cheol Kwon CAE Center Samsung Electronics Co.,Ltd. San 24, Nongseo-Ri, Kiheung-Eup, Yongin-City, Kyounggi-Do, Korea Taewhan Kim

More information

Measurement & Performance

Measurement & Performance Measurement & Performance Timers Performance measures Time-based metrics Rate-based metrics Benchmarking Amdahl s law Topics 2 Page The Nature of Time real (i.e. wall clock) time = User Time: time spent

More information

Performance, Power & Energy

Performance, Power & Energy Recall: Goal of this class Performance, Power & Energy ELE8106/ELE6102 Performance Reconfiguration Power/ Energy Spring 2010 Hayden Kwok-Hay So H. So, Sp10 Lecture 3 - ELE8106/6102 2 What is good performance?

More information

Measurement & Performance

Measurement & Performance Measurement & Performance Topics Timers Performance measures Time-based metrics Rate-based metrics Benchmarking Amdahl s law 2 The Nature of Time real (i.e. wall clock) time = User Time: time spent executing

More information

A Physical-Aware Task Migration Algorithm for Dynamic Thermal Management of SMT Multi-core Processors

A Physical-Aware Task Migration Algorithm for Dynamic Thermal Management of SMT Multi-core Processors A Physical-Aware Task Migration Algorithm for Dynamic Thermal Management of SMT Multi-core Processors Abstract - This paper presents a task migration algorithm for dynamic thermal management of SMT multi-core

More information

MS4525HRD (High Resolution Digital) Integrated Digital Pressure Sensor (24-bit Σ ADC) Fast Conversion Down to 1 ms Low Power, 1 µa (standby < 0.15 µa)

MS4525HRD (High Resolution Digital) Integrated Digital Pressure Sensor (24-bit Σ ADC) Fast Conversion Down to 1 ms Low Power, 1 µa (standby < 0.15 µa) Integrated Digital Pressure Sensor (24-bit Σ ADC) Fast Conversion Down to 1 ms Low Power, 1 µa (standby < 0.15 µa) Supply Voltage: 1.8 to 3.6V Pressure Range: 1 to 150 PSI I 2 C and SPI Interface up to

More information

UNISONIC TECHNOLOGIES CO., LTD L16B40 Preliminary CMOS IC

UNISONIC TECHNOLOGIES CO., LTD L16B40 Preliminary CMOS IC UNISONIC TECHNOLOGIES CO., LT L16B40 Preliminary CMOS IC 16-BIT CONSTANT CURRENT LE RIER WITH BUILT-IN TO ELIMINATE THE GHOSTING ESCRIPTION SSOP-24 UTC L16B40 is a new 16-bit cstant current LE driver IC

More information

School of EECS Seoul National University

School of EECS Seoul National University 4!4 07$ 8902808 3 School of EECS Seoul National University Introduction Low power design 3974/:.9 43 Increasing demand on performance and integrity of VLSI circuits Popularity of portable devices Low power

More information

Runtime Mechanisms for Leakage Current Reduction in CMOS VLSI Circuits

Runtime Mechanisms for Leakage Current Reduction in CMOS VLSI Circuits Runtime Mechanisms for Leakage Current Reduction in CMOS VLSI Circuits Afshin Abdollahi University of Southern California Farzan Fallah Fuitsu Laboratories of America Massoud Pedram University of Southern

More information

MM74C912 6-Digit BCD Display Controller/Driver

MM74C912 6-Digit BCD Display Controller/Driver 6-Digit BCD Display Controller/Driver General Description The display controllers are interface elements, with memory, that drive a 6-digit, 8-segment LED display. The display controllers receive data

More information

PLA Minimization for Low Power VLSI Designs

PLA Minimization for Low Power VLSI Designs PLA Minimization for Low Power VLSI Designs Sasan Iman, Massoud Pedram Department of Electrical Engineering - Systems University of Southern California Chi-ying Tsui Department of Electrical and Electronics

More information

MS BA01 Variometer Module, with LCP cap

MS BA01 Variometer Module, with LCP cap High resolution module, 10 cm Fast conversion down to 1 ms Low power, 1 µa (standby < 0.15 µa) QFN package 5.0 x 3.0 x 1.7 mm 3 Supply voltage 1.8 to 3.6 V Integrated digital pressure sensor (24 bit ΔΣ

More information

MM74C90 MM74C93 4-Bit Decade Counter 4-Bit Binary Counter

MM74C90 MM74C93 4-Bit Decade Counter 4-Bit Binary Counter 4-Bit Decade Counter 4-Bit Binary Counter General Description The MM74C90 decade counter and the MM74C93 binary counter and complementary MOS (CMOS) integrated circuits constructed with N- and P-channel

More information

Lecture 12: Energy and Power. James C. Hoe Department of ECE Carnegie Mellon University

Lecture 12: Energy and Power. James C. Hoe Department of ECE Carnegie Mellon University 18 447 Lecture 12: Energy and Power James C. Hoe Department of ECE Carnegie Mellon University 18 447 S18 L12 S1, James C. Hoe, CMU/ECE/CALCM, 2018 Housekeeping Your goal today a working understanding of

More information

CIS 371 Computer Organization and Design

CIS 371 Computer Organization and Design CIS 371 Computer Organization and Design Unit 13: Power & Energy Slides developed by Milo Mar0n & Amir Roth at the University of Pennsylvania with sources that included University of Wisconsin slides by

More information

Voltage and Frequency Control With Adaptive Reaction Time in Multiple-Clock-Domain Processors

Voltage and Frequency Control With Adaptive Reaction Time in Multiple-Clock-Domain Processors Voltage and Frequency Control With Adaptive Reaction Time in Multiple-Clock-Domain Processors Qiang Wu Philo Juang Margaret Martonosi Douglas W. Clark Depts. of Computer Science and Electrical Engineering

More information

ECE 3401 Lecture 23. Pipeline Design. State Table for 2-Cycle Instructions. Control Unit. ISA: Instruction Specifications (for reference)

ECE 3401 Lecture 23. Pipeline Design. State Table for 2-Cycle Instructions. Control Unit. ISA: Instruction Specifications (for reference) ECE 3401 Lecture 23 Pipeline Design Control State Register Combinational Control Logic New/ Modified Control Word ISA: Instruction Specifications (for reference) P C P C + 1 I N F I R M [ P C ] E X 0 PC

More information

Technical Report GIT-CERCS. Thermal Field Management for Many-core Processors

Technical Report GIT-CERCS. Thermal Field Management for Many-core Processors Technical Report GIT-CERCS Thermal Field Management for Many-core Processors Minki Cho, Nikhil Sathe, Sudhakar Yalamanchili and Saibal Mukhopadhyay School of Electrical and Computer Engineering Georgia

More information

74LVC823A 9-bit D-type flip-flop with 5-volt tolerant inputs/outputs; positive-edge trigger (3-State)

74LVC823A 9-bit D-type flip-flop with 5-volt tolerant inputs/outputs; positive-edge trigger (3-State) INTEGRATED CIRCUITS inputs/outputs; positive-edge trigger (3-State) 1998 Sep 24 FEATURES 5-volt tolerant inputs/outputs, for interfacing with 5-volt logic Supply voltage range of 2.7V to 3.6V Complies

More information

Basic Computer Organization and Design Part 3/3

Basic Computer Organization and Design Part 3/3 Basic Computer Organization and Design Part 3/3 Adapted by Dr. Adel Ammar Computer Organization Interrupt Initiated Input/Output Open communication only when some data has to be passed --> interrupt. The

More information

Reliability-aware Thermal Management for Hard Real-time Applications on Multi-core Processors

Reliability-aware Thermal Management for Hard Real-time Applications on Multi-core Processors Reliability-aware Thermal Management for Hard Real-time Applications on Multi-core Processors Vinay Hanumaiah Electrical Engineering Department Arizona State University, Tempe, USA Email: vinayh@asu.edu

More information

LH5P8128. CMOS 1M (128K 8) Pseudo-Static RAM PIN CONNECTIONS

LH5P8128. CMOS 1M (128K 8) Pseudo-Static RAM PIN CONNECTIONS LH5P8128 FEATURES 131,072 8 bit organization Access times (MAX.): 60/80/100 ns Cycle times (MIN.): 100/130/160 ns Single +5 V power supply Power consumption: Operating: 572/385/275 mw (MAX.) Standby (CMOS

More information

FPGA Implementation of a Predictive Controller

FPGA Implementation of a Predictive Controller FPGA Implementation of a Predictive Controller SIAM Conference on Optimization 2011, Darmstadt, Germany Minisymposium on embedded optimization Juan L. Jerez, George A. Constantinides and Eric C. Kerrigan

More information

ECE 407 Computer Aided Design for Electronic Systems. Simulation. Instructor: Maria K. Michael. Overview

ECE 407 Computer Aided Design for Electronic Systems. Simulation. Instructor: Maria K. Michael. Overview 407 Computer Aided Design for Electronic Systems Simulation Instructor: Maria K. Michael Overview What is simulation? Design verification Modeling Levels Modeling circuits for simulation True-value simulation

More information

Lecture 25. Dealing with Interconnect and Timing. Digital Integrated Circuits Interconnect

Lecture 25. Dealing with Interconnect and Timing. Digital Integrated Circuits Interconnect Lecture 25 Dealing with Interconnect and Timing Administrivia Projects will be graded by next week Project phase 3 will be announced next Tu.» Will be homework-like» Report will be combined poster Today

More information

UC Santa Barbara. Operating Systems. Christopher Kruegel Department of Computer Science UC Santa Barbara

UC Santa Barbara. Operating Systems. Christopher Kruegel Department of Computer Science UC Santa Barbara Operating Systems Christopher Kruegel Department of Computer Science http://www.cs.ucsb.edu/~chris/ Many processes to execute, but one CPU OS time-multiplexes the CPU by operating context switching Between

More information

MM74C373 MM74C374 3-STATE Octal D-Type Latch 3-STATE Octal D-Type Flip-Flop

MM74C373 MM74C374 3-STATE Octal D-Type Latch 3-STATE Octal D-Type Flip-Flop MM74C374 3-STATE Octal D-Type Latch 3-STATE Octal D-Type Flip-Flop General Description The and MM74C374 are integrated, complementary MOS (CMOS), 8-bit storage elements with 3- STATE outputs. These outputs

More information

Efficient Power Management Schemes for Dual-Processor Fault-Tolerant Systems

Efficient Power Management Schemes for Dual-Processor Fault-Tolerant Systems Efficient Power Management Schemes for Dual-Processor Fault-Tolerant Systems Yifeng Guo, Dakai Zhu The University of Texas at San Antonio Hakan Aydin George Mason University Outline Background and Motivation

More information

CMP 334: Seventh Class

CMP 334: Seventh Class CMP 334: Seventh Class Performance HW 5 solution Averages and weighted averages (review) Amdahl's law Ripple-carry adder circuits Binary addition Half-adder circuits Full-adder circuits Subtraction, negative

More information

80386DX. 32-Bit Microprocessor FEATURES: DESCRIPTION: Logic Diagram

80386DX. 32-Bit Microprocessor FEATURES: DESCRIPTION: Logic Diagram 32-Bit Microprocessor 21 1 22 1 2 10 3 103 FEATURES: 32-Bit microprocessor RAD-PAK radiation-hardened agait natural space radiation Total dose hardness: - >100 Krad (Si), dependent upon space mission Single

More information

2Ω, Quad, SPST, CMOS Analog Switches

2Ω, Quad, SPST, CMOS Analog Switches 9-73; Rev ; 4/ 2Ω, Quad, SPST, CMOS Analog Switches General Description The // quad analog switches feature.6ω max on-resistance (R ) when operating from a dual ±5V supply. R is matched between channels

More information

Simulation of Process Scheduling Algorithms

Simulation of Process Scheduling Algorithms Simulation of Process Scheduling Algorithms Project Report Instructor: Dr. Raimund Ege Submitted by: Sonal Sood Pramod Barthwal Index 1. Introduction 2. Proposal 3. Background 3.1 What is a Process 4.

More information

Embedded Systems 23 BF - ES

Embedded Systems 23 BF - ES Embedded Systems 23-1 - Measurement vs. Analysis REVIEW Probability Best Case Execution Time Unsafe: Execution Time Measurement Worst Case Execution Time Upper bound Execution Time typically huge variations

More information

EE 466/586 VLSI Design. Partha Pande School of EECS Washington State University

EE 466/586 VLSI Design. Partha Pande School of EECS Washington State University EE 466/586 VLSI Design Partha Pande School of EECS Washington State University pande@eecs.wsu.edu Lecture 8 Power Dissipation in CMOS Gates Power in CMOS gates Dynamic Power Capacitance switching Crowbar

More information

Modern Computer Architecture

Modern Computer Architecture Modern Computer Architecture Lecture1 Fundamentals of Quantitative Design and Analysis (II) Hongbin Sun 国家集成电路人才培养基地 Xi an Jiaotong University 1.4 Trends in Technology Logic: transistor density 35%/year,

More information

ww.padasalai.net

ww.padasalai.net t w w ADHITHYA TRB- TET COACHING CENTRE KANCHIPURAM SUNDER MATRIC SCHOOL - 9786851468 TEST - 2 COMPUTER SCIENC PG - TRB DATE : 17. 03. 2019 t et t et t t t t UNIT 1 COMPUTER SYSTEM ARCHITECTURE t t t t

More information

- Part 4 - Multicore and Manycore Technology: Chances and Challenges. Vincent Heuveline

- Part 4 - Multicore and Manycore Technology: Chances and Challenges. Vincent Heuveline - Part 4 - Multicore and Manycore Technology: Chances and Challenges Vincent Heuveline 1 Numerical Simulation of Tropical Cyclones Goal oriented adaptivity for tropical cyclones ~10⁴km ~1500km ~100km 2

More information

Leakage Aware Dynamic Voltage Scaling for Real-Time Embedded Systems

Leakage Aware Dynamic Voltage Scaling for Real-Time Embedded Systems Leakage Aware Dynamic Voltage Scaling for Real-Time Embedded Systems Ravindra Jejurikar jezz@cecs.uci.edu Cristiano Pereira cpereira@cs.ucsd.edu Rajesh Gupta gupta@cs.ucsd.edu Center for Embedded Computer

More information

TDDI04, K. Arvidsson, IDA, Linköpings universitet CPU Scheduling. Overview: CPU Scheduling. [SGG7] Chapter 5. Basic Concepts.

TDDI04, K. Arvidsson, IDA, Linköpings universitet CPU Scheduling. Overview: CPU Scheduling. [SGG7] Chapter 5. Basic Concepts. TDDI4 Concurrent Programming, Operating Systems, and Real-time Operating Systems CPU Scheduling Overview: CPU Scheduling CPU bursts and I/O bursts Scheduling Criteria Scheduling Algorithms Multiprocessor

More information

EE241 - Spring 2001 Advanced Digital Integrated Circuits

EE241 - Spring 2001 Advanced Digital Integrated Circuits EE241 - Spring 21 Advanced Digital Integrated Circuits Lecture 12 Low Power Design Self-Resetting Logic Signals are pulses, not levels 1 Self-Resetting Logic Sense-Amplifying Logic Matsui, JSSC 12/94 2

More information

LH5P832. CMOS 256K (32K 8) Pseudo-Static RAM

LH5P832. CMOS 256K (32K 8) Pseudo-Static RAM LH5P832 CMOS 256K (32K 8) Pseudo-Static RAM FEATURES 32,768 8 bit organization Access time: 100/120 ns (MAX.) Cycle time: 160/190 ns (MIN.) Power consumption: Operating: 357.5/303 mw Standby: 16.5 mw TTL

More information

Dual D Flip-Flop with Set and Reset High-Speed Silicon-Gate CMOS

Dual D Flip-Flop with Set and Reset High-Speed Silicon-Gate CMOS TECHNICAL DATA IN74ACT74 Dual D Flip-Flop with Set and Reset High-Speed Silicon-Gate CMOS The IN74ACT74 is identical in pinout to the LS/ALS74, HC/HCT74. The IN74ACT74 may be used as a level converter

More information

A Combined Analytical and Simulation-Based Model for Performance Evaluation of a Reconfigurable Instruction Set Processor

A Combined Analytical and Simulation-Based Model for Performance Evaluation of a Reconfigurable Instruction Set Processor A Combined Analytical and Simulation-Based Model for Performance Evaluation of a Reconfigurable Instruction Set Processor Farhad Mehdipour, H. Noori, B. Javadi, H. Honda, K. Inoue, K. Murakami Faculty

More information

NEW MEXICO STATE UNIVERSITY THE KLIPSCH SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING Ph.D. QUALIFYING EXAMINATION

NEW MEXICO STATE UNIVERSITY THE KLIPSCH SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING Ph.D. QUALIFYING EXAMINATION Write your four digit code here... NEW MEXICO STATE UNIVERSITY THE KLIPSCH SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING Ph.D. QUALIFYING EXAMINATION Exam Instructions: August 16, 2010 9:00 AM - 1:00 PM

More information

/ : Computer Architecture and Design

/ : Computer Architecture and Design 16.482 / 16.561: Computer Architecture and Design Summer 2015 Homework #5 Solution 1. Dynamic scheduling (30 points) Given the loop below: DADDI R3, R0, #4 outer: DADDI R2, R1, #32 inner: L.D F0, 0(R1)

More information

Lecture 13: Sequential Circuits, FSM

Lecture 13: Sequential Circuits, FSM Lecture 13: Sequential Circuits, FSM Today s topics: Sequential circuits Finite state machines Reminder: midterm on Tue 2/28 will cover Chapters 1-3, App A, B if you understand all slides, assignments,

More information

MICROPROCESSOR REPORT. THE INSIDER S GUIDE TO MICROPROCESSOR HARDWARE

MICROPROCESSOR REPORT.   THE INSIDER S GUIDE TO MICROPROCESSOR HARDWARE MICROPROCESSOR www.mpronline.com REPORT THE INSIDER S GUIDE TO MICROPROCESSOR HARDWARE ENERGY COROLLARIES TO AMDAHL S LAW Analyzing the Interactions Between Parallel Execution and Energy Consumption By

More information

Temperature-aware Task Partitioning for Real-Time Scheduling in Embedded Systems

Temperature-aware Task Partitioning for Real-Time Scheduling in Embedded Systems Temperature-aware Task Partitioning for Real-Time Scheduling in Embedded Systems Zhe Wang, Sanjay Ranka and Prabhat Mishra Dept. of Computer and Information Science and Engineering University of Florida,

More information

Octal 3-State Noninverting Buffer/Line Driver/Line Receiver High-Performance Silicon-Gate CMOS

Octal 3-State Noninverting Buffer/Line Driver/Line Receiver High-Performance Silicon-Gate CMOS TECNICAL DATA Octal 3-State Noninverting Buffer/Line Driver/Line Receiver igh-performance Silicon-ate CMOS IN74CT244A The IN74CT244A is identical in pinout to the LS/ALS244. The device may be used as a

More information

CSE 380 Computer Operating Systems

CSE 380 Computer Operating Systems CSE 380 Computer Operating Systems Instructor: Insup Lee & Dianna Xu University of Pennsylvania, Fall 2003 Lecture Note 3: CPU Scheduling 1 CPU SCHEDULING q How can OS schedule the allocation of CPU cycles

More information

Leveraging Task-Parallelism in Energy-Efficient ILU Preconditioners

Leveraging Task-Parallelism in Energy-Efficient ILU Preconditioners Leveraging Task-Parallelism in Energy-Efficient ILU Preconditioners José I. Aliaga Leveraging task-parallelism in energy-efficient ILU preconditioners Universidad Jaime I (Castellón, Spain) José I. Aliaga

More information

ECE 250 / CPS 250 Computer Architecture. Basics of Logic Design Boolean Algebra, Logic Gates

ECE 250 / CPS 250 Computer Architecture. Basics of Logic Design Boolean Algebra, Logic Gates ECE 250 / CPS 250 Computer Architecture Basics of Logic Design Boolean Algebra, Logic Gates Benjamin Lee Slides based on those from Andrew Hilton (Duke), Alvy Lebeck (Duke) Benjamin Lee (Duke), and Amir

More information