L16: Power Dissipation in Digital Systems. L16: Spring 2007 Introductory Digital Systems Laboratory

Similar documents
EECS 427 Lecture 11: Power and Energy Reading: EECS 427 F09 Lecture Reminders

CSE493/593. Designing for Low Power

EE241 - Spring 2001 Advanced Digital Integrated Circuits

EE 466/586 VLSI Design. Partha Pande School of EECS Washington State University

CSE140L: Components and Design Techniques for Digital Systems Lab. Power Consumption in Digital Circuits. Pietro Mercati

Lecture 21: Packaging, Power, & Clock

Lecture 8-1. Low Power Design

EE241 - Spring 2000 Advanced Digital Integrated Circuits. Announcements

VLSI Design I; A. Milenkovic 1

CSE140L: Components and Design Techniques for Digital Systems Lab. FSMs. Instructor: Mohsen Imani. Slides from Tajana Simunic Rosing

EE115C Winter 2017 Digital Electronic Circuits. Lecture 6: Power Consumption

EEC 216 Lecture #3: Power Estimation, Interconnect, & Architecture. Rajeevan Amirtharajah University of California, Davis

ASIC FPGA Chip hip Design Pow Po e w r e Di ssipation ssipa Mahdi Shabany

CIS 371 Computer Organization and Design

Administrative Stuff

Last Lecture. Power Dissipation CMOS Scaling. EECS 141 S02 Lecture 8

Parallel Processing and Circuit Design with Nano-Electro-Mechanical Relays

Lecture 15: Scaling & Economics

Objective and Outline. Acknowledgement. Objective: Power Components. Outline: 1) Acknowledgements. Section 4: Power Components

Introduction to CMOS VLSI Design (E158) Lecture 20: Low Power Design

Digital Integrated Circuits A Design Perspective. Arithmetic Circuits. Jan M. Rabaey Anantha Chandrakasan Borivoje Nikolic.

Scaling of MOS Circuits. 4. International Technology Roadmap for Semiconductors (ITRS) 6. Scaling factors for device parameters

Where Does Power Go in CMOS?

Lecture 12: Energy and Power. James C. Hoe Department of ECE Carnegie Mellon University

School of EECS Seoul National University

Implications on the Design

Power Dissipation. Where Does Power Go in CMOS?

CMPEN 411 VLSI Digital Circuits Spring Lecture 19: Adder Design

Today. ESE532: System-on-a-Chip Architecture. Energy. Message. Preclass Challenge: Power. Energy Today s bottleneck What drives Efficiency of

Lecture 2: CMOS technology. Energy-aware computing

ECE321 Electronics I

BeiHang Short Course, Part 7: HW Acceleration: It s about Performance, Energy and Power

Numbering Systems. Computational Platforms. Scaling and Round-off Noise. Special Purpose. here that is dedicated architecture

Today. ESE532: System-on-a-Chip Architecture. Energy. Message. Preclass Challenge: Power. Energy Today s bottleneck What drives Efficiency of

THE INVERTER. Inverter

EE115C Winter 2017 Digital Electronic Circuits. Lecture 19: Timing Analysis

Digital Integrated Circuits A Design Perspective. Arithmetic Circuits. Jan M. Rabaey Anantha Chandrakasan Borivoje Nikolic.

L15: Custom and ASIC VLSI Integration

Lecture 34: Portable Systems Technology Background Professor Randy H. Katz Computer Science 252 Fall 1995

Moore s Law Forever?

Spiral 2 7. Capacitance, Delay and Sizing. Mark Redekopp

Lecture 6 Power Zhuo Feng. Z. Feng MTU EE4800 CMOS Digital IC Design & Analysis 2010

Amdahl's Law. Execution time new = ((1 f) + f/s) Execution time. S. Then:

MODULE III PHYSICAL DESIGN ISSUES

EE241 - Spring 2003 Advanced Digital Integrated Circuits

Memory Thermal Management 101

Variations-Aware Low-Power Design with Voltage Scaling

CMPEN 411 VLSI Digital Circuits Spring Lecture 14: Designing for Low Power

Impact of Scaling on The Effectiveness of Dynamic Power Reduction Schemes

Xarxes de distribució del senyal de. interferència electromagnètica, consum, soroll de conmutació.

Lecture 23. Dealing with Interconnect. Impact of Interconnect Parasitics

University of Toronto. Final Exam

Design for Manufacturability and Power Estimation. Physical issues verification (DSM)

EEC 216 Lecture #2: Metrics and Logic Level Power Estimation. Rajeevan Amirtharajah University of California, Davis

Announcements. EE141- Fall 2002 Lecture 7. MOS Capacitances Inverter Delay Power

Vectorized 128-bit Input FP16/FP32/ FP64 Floating-Point Multiplier

Low power Architectures. Lecture #1:Introduction

Vidyalankar S.E. Sem. III [CMPN] Digital Logic Design and Analysis Prelim Question Paper Solution

! Charge Leakage/Charge Sharing. " Domino Logic Design Considerations. ! Logic Comparisons. ! Memory. " Classification. " ROM Memories.

Digital Integrated Circuits 2nd Inverter

EE241 - Spring 2005 Advanced Digital Integrated Circuits. Admin. Lecture 10: Power Intro

Performance, Power & Energy. ELEC8106/ELEC6102 Spring 2010 Hayden Kwok-Hay So

VLSI Design. [Adapted from Rabaey s Digital Integrated Circuits, 2002, J. Rabaey et al.] ECE 4121 VLSI DEsign.1

Clock signal in digital circuit is responsible for synchronizing the transfer to the data between processing elements.

The Linear-Feedback Shift Register

ESE 570: Digital Integrated Circuits and VLSI Fundamentals

Chapter 8. Low-Power VLSI Design Methodology

Lecture 9: Clocking, Clock Skew, Clock Jitter, Clock Distribution and some FM

Energy Delay Optimization

Variation-Resistant Dynamic Power Optimization for VLSI Circuits

TU Wien. Energy Efficiency. H. Kopetz 26/11/2009 Model of a Real-Time System

Topic 4. The CMOS Inverter

CMPEN 411 VLSI Digital Circuits Spring 2012 Lecture 17: Dynamic Sequential Circuits And Timing Issues

Floating Point Representation and Digital Logic. Lecture 11 CS301

Timing Issues. Digital Integrated Circuits A Design Perspective. Jan M. Rabaey Anantha Chandrakasan Borivoje Nikolić. January 2003

Novel Devices and Circuits for Computing

Hw 6 due Thursday, Nov 3, 5pm No lab this week

Dynamic operation 20

Dynamic Combinational Circuits. Dynamic Logic

Evaluating Power. Introduction. Power Evaluation. for Altera Devices. Estimating Power Consumption

VLSI Design I; A. Milenkovic 1

Name: Answers. Mean: 83, Standard Deviation: 12 Q1 Q2 Q3 Q4 Q5 Q6 Total. ESE370 Fall 2015

Energy-Efficient Real-Time Task Scheduling in Multiprocessor DVS Systems

Lecture 25. Dealing with Interconnect and Timing. Digital Integrated Circuits Interconnect

Skew-Tolerant Circuit Design

Lecture 23. CMOS Logic Gates and Digital VLSI I

CMPEN 411 VLSI Digital Circuits Spring Lecture 21: Shifters, Decoders, Muxes

EE141Microelettronica. CMOS Logic

Digital Integrated Circuits A Design Perspective

EEC 116 Lecture #5: CMOS Logic. Rajeevan Amirtharajah Bevan Baas University of California, Davis Jeff Parkhurst Intel Corporation

Intro To Digital Logic

Digital Electronics Part II - Circuits

Chapter 1. Binary Systems 1-1. Outline. ! Introductions. ! Number Base Conversions. ! Binary Arithmetic. ! Binary Codes. ! Binary Elements 1-2

Topics. Dynamic CMOS Sequential Design Memory and Control. John A. Chandy Dept. of Electrical and Computer Engineering University of Connecticut

Dynamic Combinational Circuits. Dynamic Logic

NAME SID EE42/100 Spring 2013 Final Exam 1

Lecture 7 Circuit Delay, Area and Power

CMPEN 411 VLSI Digital Circuits Spring 2011 Lecture 07: Pass Transistor Logic

GMU, ECE 680 Physical VLSI Design 1

Low Power CMOS Dr. Lynn Fuller Webpage:

Transcription:

L16: Power Dissipation in Digital Systems 1

Problem #1: Power Dissipation/Heat Power (Watts) 100000 10000 1000 100 10 1 0.1 4004 80088080 8085 808686 386 486 Pentium proc 18KW 5KW 1.5KW 500W 1971 1974 1978 1985 199 000 004 008 Year Power Density (W/cm) 10000 1000 100 10 1 4004 8008 8080 8086 Rocket Nozzle Nuclear Reactor Hot Plate 8085 86 386 486 Sun s Surface P6 Pentium proc 1970 1980 1990 000 010 Year Courtesy Intel (S. Borkar) How do you cool these chips?? heat sink chip

Problem #: Energy Consumption Nominal Capacity (Watt-hours/lb) 50 Rechargable Lithium 40 Ni-Metal Hydride 30 0 Nickel-Cadmium 10 0 65 70 75 80 85 90 95 Year (from Jon Eager, Gates Inc., S. Watanabe, Sony Inc.) The Energy Problem 7.5 cm 3 AA battery Alkaline: ~10,000J What can One Joule of energy do? Battery (40+ lbs) Mow your lawn for 1 ms Operate a processor for ~ 7s No Moore s law for batteries Today: Understand where power goes and ways to manage it Send a 1 Megabyte file over 80.11b 3

Dynamic Energy Dissipation V DD Charging E 0 1 = C L V DD Discharging V DD i DD E cap = 1/C L V DD R P R P IN =0 E diss, RP = 1/C L V DD IN =1 E diss,rn =1/C L V DD R N C L R N C L P = C L V DD f clk 4

The Transition Activity Factor α 0 >1 Current Input 00 Next Input 00 Output Transition 1 > 1 A B Z 00 01 1 > 1 00 00 01 01 01 01 10 10 11 00 01 10 11 00 1 > 1 1 > 0 1 > 1 1 > 1 1 > 1 1 > 0 1 > 1 Assume inputs (A,B) arrive at f and are uniformly distributed What is the average power dissipation? 10 01 1 > 1 10 10 10 11 1 > 1 1 > 0 α 0 >1 = 3/16 11 00 0 > 1 11 01 0 > 1 11 11 10 11 0 > 1 0 > 0 P = α 0 >1 C L V DD f 5

Junction (Silicon) Temperature Simple Scenario Silicon T j -T a =R θja P D R θja is the thermal resistance between silicon and Ambient P D Realistic Scenario Silicon Case Sink T J T C T S T A T J R θjc T J T C P D R θja R θcs T S T A R θsa T j =T a + R θja P D T A Make this as low as possible R θca = R θcs +R θsa is minimized by facilitating heat transfer (bolt case to extended metal surface heat sink) 6

Intel Pentium 4 Thermal Guidelines Pentium 4 @ 3.06 GHz dissipates 81.8W! Maximum T C = 69 C R CA < 0.3 C/W for 50 C ambient Typical chips dissipate 0.5-1W (cheap packages without forced air cooling) Temp ( o C) Cache 70 C Execution core 10 o C Integer & FP ALUs Courtesy of Intel (Ram Krishnamurthy) 7

Power Reduction Strategies P = α 0 >1 C L V DD f Reduce Transition Activity or Switching Events Reduce Capacitance (e.g., keep wires short) Reduce Power Supply Voltage Frequency is typically fixed by the application, though this can be adjusted to control power Optimize at all levels of design hierarchy 8

Clock Gating is a Good Idea! Clock gating reduces activity and is the most common low-power technique used today Global Clock Enable_Adder Adder Off Adder Clock + Multiplier On X Enable_Multiplier Multiplier Clock 100 s of different clocks in a microprocessor Clock Gating Reduces Energy, does it reduce Power? 9

Does your GHz Processor run at a GHz? Processor Chip Activity Control Thermal Sensor Note that there is a difference between average and peak power On-chip thermal sensor (diode based), measures the silicon temperature If the silicon junction gets too hot (say 15 C), then the activity is reduced (e.g., reduce clock rate or use clock gating) Use of Thermal Feedback 10

Power Supply Resonance L board L package R grid Board decap On-die decap Courtesy of Motorola (David Blaauw) Switching currents Can write a Virus to Activate Power Supply Resonance! 00Mhz Design 11

Number Representation: Two s s Complement vs. Sign Magnitude Two s complement Sign-Magnitude -4-6 -5-7 +0 1111 0000 1110 0001 +1 1101 0010 1100 0011 + +3-3 1011 0100 +4-1010 1001-1 1000 0111 0110 0101 +6 +5-0 +7 Consider a 16 bit bus where inputs toggles between +1 and 1 (i.e., a small noise input) Which representation is more energy efficient? 1

Bus Coding to Reduce Activity D Q Majority Function Extra bit to indicated if the bus is inverted invert N Input Output Data Bus [Stan94] 13

Time Sharing is a Bad Idea Time Sharing Increases Switching Activity 14

Not just a 6-16 1 Issue: Cool Software??? MEMORY address CPU address 16 a[0] a[1] a[] a[3] 0111111100000000 0111111100000001 0111111100000010 0111111100000011 b[0] b[1] b[] b[3] 1000000000000000 1000000000000001 1000000000000010 1000000000000011 float a [56], b[56]; float pi= 3.14; for (i = 0; i < 55; i++) { a[i] = sin(pi * i /56); b[i] = cos(pi * i /56); } 51(8)++4+8+16+3+64+18+56 = 4607 bit transitions float a [56], b[56]; float pi= 3.14; for (i = 0; i < 55; i++) {a[i] = sin(pi * i /56);} for (i = 0; i < 55; i++) {b[i] = cos(pi * i /56);} (8)+(+4+8+16+3+64+18+56) = 1030 transitions 15

Glitching Transitions Chain Topology A B Tree Topology A B C D + C + + + D + + (A+B) + (C+D) (((A+B) + C)+D) Balancing paths reduces glitching transitions Structures such as multipliers have lot of glitching transitions Keeping logic depths short (e.g., pipelining) reduces glitching 16

Reduce Supply Voltage : But is it Free? V DD V DD G + V DD - t =0+ K V S ( V DD V T D ) C L IN OUT Delay = C L V i D S = k C L V DD ( V DD DD V T ) ( V DD V V T ) 1 V DD V DD from V to 1V, energy by x4, delay x 17

Transistors Are Free (What do you do with a Billion Transistors?) f=1ghz V DD =V IN f = 500Mhz V DD =1V IN IN f = 500Mhz V DD =1V X X X OUT SELECT OUT P serial = C mult f P parallel = (C mult 1 f /) = P serial /4 Trade Area for Low Power 18

Algorithmic Workload Compare Current Image... Receiver just updates...to Previous Image 19 Frequency of Occurrence 0.06 0.04 0.0 0.00 0 500 1000 1500 000 Number of IDCTs per Frame Exploit Time Varying Algorithmic Workload To Vary the Power Supply Voltage

Dynamic Voltage Scaling (DVS) Fixed Power Supply ACTIVE IDLE Variable Power Supply ACTIVE E FIXED = ½ C V DD 1.0 E VARIABLE = ½ C (V DD /) = E FIXED / 4 Normalized Energy 0.8 0.6 0.4 0. Fixed Supply Variable Supply 0 0 0. 0.4 0.6 0.8 1.0 Normalized Workload [Gutnik97] 0

DVS on a Processor Digitally adjustable DC-DC converter powers SA-1110 core 3.6V 5 Controller V out SA-1110 µos Control µos selects appropriate clock frequency based on workload and latency constraints 1

Hardware vs. Software Flexibility 0.5nJ/Op DSP 1nJ/Op Embedded Processor 0.1-1pJ/Op FPGA Direct Mapped Hardware Courtesy of R. Brodersen, J. Rabaey, TI, ARM/StrongARM Energy/Operation

Energy Efficiency of Software Processor (StrongARM-1100) [A. Sinha, DAC] FPGA (Xilinx) CLB CLB CLB CLB 45 40 35 [Montanaro, JSSC 96] I/O CLB 9% 5% [Kusse 98, UCB] Power (%) 30 5 0 15 10 Clock 1% 65% Interconnect 5 0 Cache Control GCLK EBOX I/O,PLL Software Energy Dissipation has Large Overhead 3

Trends: Leakage and Power Gating V DD Switching (computing) E = CV DD C V DD E = V DD I 0 10 0 1 C Leakage (standby) 10 -V T /S In today s 65nm CMOS Technology : 30-50% of power is leakage! Total Energy/Switching Energy Duty Cycle (%) Low V T devices are leaky - Use a High V T device is used to gate leakage current Sleep 4

Next Generation Low-Power Digital: Sub-Threshold Operation Strong Inversion Operation: fast, power-hungry V DD = 0.18V 10 0 Normalized I D 10 10 4 Subthreshold Operation: slow, minimum energy operation Data Memory Control logic 10 6 Butterfly Datapath Twiddle ROMs 10 8 0 0. 0.4 0.6 0.8 1 Normalized V GS Exploit Sub-threshold Operation (V DD < V T ) for Sensor Circuits 5

Trends: Energy Scavenging MEMS Generator Power Harvesting Shoes Jose Mur Miranda/ Jeff Lang Vibration-to-Electric Conversion ~ 10µW Joe Paradiso (Media Lab) After 3-6 steps, it provides 3 ma for 0.5 sec ~10mW 6