A Mathematical Solution to. by Utilizing Soft Edge Flip Flops

Similar documents
L4: Sequential Building Blocks (Flip-flops, Latches and Registers)

Jin-Fu Li Advanced Reliable Systems (ARES) Lab. Department of Electrical Engineering. Jungli, Taiwan

Lecture 9: Sequential Logic Circuits. Reading: CH 7

P2 (10 points): Given the circuit below, answer the following questions:

L4: Sequential Building Blocks (Flip-flops, Latches and Registers)

Reducing Delay Uncertainty in Deeply Scaled Integrated Circuits Using Interdependent Timing Constraints

Circuit A. Circuit B

Sequential vs. Combinational

MODULE 5 Chapter 7. Clocked Storage Elements

Energy Delay Optimization

Problem Set 9 Solutions

Integrated Circuits & Systems

Hold Time Illustrations

Sequential Circuit Analysis

CSE241 VLSI Digital Circuits Winter Lecture 07: Timing II

Skew-Tolerant Circuit Design

ECEN 248: INTRODUCTION TO DIGITAL SYSTEMS DESIGN. Week 7 Dr. Srinivas Shakkottai Dept. of Electrical and Computer Engineering

Chapter 13. Clocked Circuits SEQUENTIAL VS. COMBINATIONAL CMOS TG LATCHES, FLIP FLOPS. Baker Ch. 13 Clocked Circuits. Introduction to VLSI

Methodology to Achieve Higher Tolerance to Delay Variations in Synchronous Circuits

GMU, ECE 680 Physical VLSI Design

ELEC Digital Logic Circuits Fall 2014 Sequential Circuits (Chapter 6) Finite State Machines (Ch. 7-10)

EE141Microelettronica. CMOS Logic

EE141- Spring 2007 Digital Integrated Circuits

Topics. CMOS Design Multi-input delay analysis. John A. Chandy Dept. of Electrical and Computer Engineering University of Connecticut

Digital Integrated Circuits A Design Perspective

Digital Integrated Circuits A Design Perspective

ΗΜΥ 307 ΨΗΦΙΑΚΑ ΟΛΟΚΛΗΡΩΜΕΝΑ ΚΥΚΛΩΜΑΤΑ Εαρινό Εξάμηνο 2018

Design for Manufacturability and Power Estimation. Physical issues verification (DSM)

EE241 - Spring 2001 Advanced Digital Integrated Circuits

Topics. Dynamic CMOS Sequential Design Memory and Control. John A. Chandy Dept. of Electrical and Computer Engineering University of Connecticut

9/18/2008 GMU, ECE 680 Physical VLSI Design

Digital Integrated Circuits A Design Perspective

Designing Sequential Logic Circuits

NTE74176 Integrated Circuit TTL 35Mhz Presettable Decade Counter/Latch

Jan M. Rabaey Anantha Chandrakasan Borivoje Nikolic. November Digital Integrated Circuits 2nd Sequential Circuits

Clock Strategy. VLSI System Design NCKUEE-KJLEE

ESE 570: Digital Integrated Circuits and VLSI Fundamentals

UNIVERSITY OF CALIFORNIA, BERKELEY College of Engineering Department of Electrical Engineering and Computer Sciences

Fundamentals of Computer Systems

Chapter 7 Sequential Logic

An Analytical Model for Interdependent Setup/Hold-Time Characterization of Flip-flops

NTE74177 Integrated Circuit TTL 35Mhz Presettable Binary Counter/Latch

The Linear-Feedback Shift Register

EEC 216 Lecture #3: Power Estimation, Interconnect, & Architecture. Rajeevan Amirtharajah University of California, Davis

EE115C Winter 2017 Digital Electronic Circuits. Lecture 6: Power Consumption

CMPEN 411 VLSI Digital Circuits Spring Lecture 14: Designing for Low Power

EE371 - Advanced VLSI Circuit Design

CMPEN 411 VLSI Digital Circuits Spring 2012 Lecture 17: Dynamic Sequential Circuits And Timing Issues

Lecture 7: Logic design. Combinational logic circuits

ELE2120 Digital Circuits and Systems. Tutorial Note 9

ELEC516 Digital VLSI System Design and Design Automation (spring, 2010) Assignment 4 Reference solution

Where Does Power Go in CMOS?

Sequential Logic Circuits

UMBC. At the system level, DFT includes boundary scan and analog test bus. The DFT techniques discussed focus on improving testability of SAFs.

EE241 - Spring 2003 Advanced Digital Integrated Circuits

LOGIC CIRCUITS. Basic Experiment and Design of Electronics. Ho Kyung Kim, Ph.D.

CSE 140 Midterm 2 Tajana Simunic Rosing. Spring 2008

Timing Issues. Digital Integrated Circuits A Design Perspective. Jan M. Rabaey Anantha Chandrakasan Borivoje Nikolić. January 2003

ALU, Latches and Flip-Flops

GMU, ECE 680 Physical VLSI Design 1

School of EECS Seoul National University

Last Lecture. Power Dissipation CMOS Scaling. EECS 141 S02 Lecture 8

INTEGRATED CIRCUITS. For a complete data sheet, please also download:

EE241 - Spring 2000 Advanced Digital Integrated Circuits. Announcements

LOGIC CIRCUITS. Basic Experiment and Design of Electronics

Lecture Outline. ESE 570: Digital Integrated Circuits and VLSI Fundamentals. Total Power. Energy and Power Optimization. Worksheet Problem 1

Skew Management of NBTI Impacted Gated Clock Trees

Chapter 14 Sequential logic, Latches and Flip-Flops

Introduction. Simulation methodology. Simulation results

Power Dissipation. Where Does Power Go in CMOS?

ELCT201: DIGITAL LOGIC DESIGN

Chapter 4. Sequential Logic Circuits

Lecture 10: Sequential Networks: Timing and Retiming

Chapter #6: Sequential Logic Design

Clocked Synchronous State-machine Analysis

Characterization and Design of Sequential Circuit Elements to Combat Soft Error 1

Synchronous Sequential Logic

Timing Analysis with Clock Skew

6 Synchronous State Machine Design

Next, we check the race condition to see if the circuit will work properly. Note that the minimum logic delay is a single sum.

UNIVERSITY OF CALIFORNIA

5. Sequential Logic x Computation Structures Part 1 Digital Circuits. Copyright 2015 MIT EECS

Memory Elements I. CS31 Pascal Van Hentenryck. CS031 Lecture 6 Page 1

Lecture 9: Clocking, Clock Skew, Clock Jitter, Clock Distribution and some FM

ALU A functional unit

Digital Design. Sequential Logic

ESE 570: Digital Integrated Circuits and VLSI Fundamentals

Lecture 10: Synchronous Sequential Circuits Design

Lecture 24. CMOS Logic Gates and Digital VLSI II

CPE100: Digital Logic Design I

Memory, Latches, & Registers

CSE493/593. Designing for Low Power

Sequential Logic Worksheet

EECS 427 Lecture 15: Timing, Latches, and Registers Reading: Chapter 7. EECS 427 F09 Lecture Reminders

Homework #4. CSE 140 Summer Session Instructor: Mohsen Imani. Only a subset of questions will be graded

ECEN 248: INTRODUCTION TO DIGITAL SYSTEMS DESIGN. Week 9 Dr. Srinivas Shakkottai Dept. of Electrical and Computer Engineering

Motivation for CDR: Deserializer (1)

Time Allowed 3:00 hrs. April, pages

Sequential Circuit Timing. Young Won Lim 11/6/15

Parallel Processing and Circuit Design with Nano-Electro-Mechanical Relays

Transcription:

A Mathematical Solution to Power Optimal Pipeline Design by Utilizing Soft Edge Flip Flops M. Ghasemazar, B. Amelifard, M. Pedram University of Southern California Department of Electrical Engineering August 11, 2008 ISLPED 2008 1

Outline Soft-Edge Flip Flops Power Optimal Pipeline Design Problem Formulation SEFF Modeling Experimental Results Conclusion 2

Soft Edge Flip Flop Key idea: Allow the data to pass through a flip flop during a transparency window, instead of on a triggering clock edge Key advantage: Enable slack passing between adjacent pipeline stages which separated by (master-slave) flipflops oware Circuit implementation: Delay the clock of the master latch to create a window during which both the master and slave latches are ON D Setup Time D Q SEFF CLK Tr ransparency Windo Hold Time DATA 3

SEFF Implementation Conventional (Hard Edge) D Q Master-Slave FF Soft Edge D D Q Master-Slave FF D D D Delay D D 4

SEFF Characteristics Setup and hold times, and clock-to-q delay of a soft-edge flip-flop are all functions of the transparency window width, w Simulations show a linear dependency on w t ( w ) = a w + a thi, ( wi ) = b1 wi + b0 tcq i( wi ) = c wi + c si, i 1 i 0, 1 0 Time (ps) Setup/Hold 100 80 60 40 20 0 Setup Time y = 0.921x - 30.45 40-20 60 80 100 120 140 Hold -40 Time y = -0.651x + 33.54-60 Window size (ps) 5

SEFF Characteristics cont d Power consumption of a SEFF is monotonically increasing with its window size (w). This is due to: Higher switching activities in the internal nodes in the transparency window Higher dynamic and leakage power consumption in the additional delay generation circuitry Experimental evaluation of total power consumption: 2 FF, i = 2 i + 1 i + 0 P d w d w d Power Dissipa ation (uw) 350 300 250 200 150 100 50 0 0 40 80 120 160 Transparency window (ps) 6

Pipeline Basics CLK D Q D Q D Q FF0 C1 FF1 C2 FF2 t cq,i d i t s,i Timing constraints for a linear pipeline d + t + t T i N δ i s, i cq, i 1 clk 1 + t t i N i cq, i 1 hi, 1 Substitute FFs with SEFFs (1) (2) First and Last FF s remain hard-edge ones This is needed to avoid imposing constraints on the sender/receiver of data Intermediate stage FF s may be substituted by SEFFs d T t ( w ) t ( w ) 1 i N i clk s, i i cq, i 1 i 1 δi hi, ( i ) cq, i 1 ( i 1 ) 1 t w t w i N 7

Power Optimal Pipeline Main Idea: Passing available slack of some stages to more timing critical stages to provide them with more freedom in power optimization through voltage scaling For example, let T clk =T clk,min =560ps and t s =t h =t cq =30ps If FF1 is replaced with a SEFF with a window size of 50ps the first stage borrows 50ps from the second stage the circuit can be powered with a lower supply voltage level Ideally, 10% V dd reduction ->19% power saving C1 C2 C3 D Q D Q D Q D Q d1=500ps d2=400ps d3=450ps FF0 FF1 FF2 FF3 CLK 8

PSLP Problem Statement Power-optimal Soft Linear Pipeline Design Goal: Minimize the total power consumption of an N-stage linear pipeline circuit Variables: Optimal supply voltage level (1 variable) Transparency windows size of the individual soft-edge FF-sets (N-1) Delay elements to avoid hold time violations (N) Constraints: N N 1 N = total + Comb, i + FF, i i DE, i Setup/hold times i= 1 i= 1 i= 1 Window size limits st..() I d () v T t ( w, v) t ( w, v);1 i N i clk s, i i cq, i 1 i 1 Single supply voltage Min. P P ( v) P ( w, v) P ( z, v) ( II) δ ( v) + z t ( w, v) t ( w, v);1 i N i i hi, i cq, i 1 i 1 ( III) w w w ;1 i N 1 min i 0 1 max ( IV) v V, V..., Vm { } 1 i 9

SEFF Modeling Setup time, hold time, clock-to-q delay, and power dissipation i are functions of both voltage and ( ) transparency window size Voltage-dependent coefficients are determined from SPICE simulations tsi, ( wi, v) = a1( v) wi + a0( v) thi, wi, v = b1( v) wi + b0( v) tcq, i ( wi, v) = c1( v) wi + c0( v) 2 FF, i = 2 i + 1 i + 0 P d v w d v w d v ( ) ( ) ( ) 0 100 200 Setup Tim me (ps) -10-20 -30-40 -50-60 Vdd=0.9V Vdd=1.0V Vdd=1.1V Vdd=1.2V Hold Tim me (ps) 80 60 40 20 Vdd=0.9V Vdd=1.0V Vdd=1.1V Vdd=1.2V -to-q de elay (ps) 180 160 140 120 100 Vdd=0.9V Vdd=1.0V Vdd=1.1V Vdd=1.2V -70 40 60 80 100 120 140 Transparency window (ps) 0 40 60 80 100 120 140 Transparency window (ps) 80 40 60 80 100 120 140 Transparency window (ps) 10

Combinational Circuit Modeling Total power consumption at voltage level, v: 2 3 v v P () v = P + P V V comb, i dyn, i leak, i 0 0 Max and Min combinational logic cell delays (calculated from the alpha power law): V0 V t di() v = di( V0 ) v V δ V V α 0 t i () v = δ i ( V 0 ) v V t t V α Power dissipation overhead of a delay element: DE (, ) ( ) P z v = k v z 11

Solving the PSLP To solve PSLP Enumerate all possible values for v PSLP with fixed voltage (PSLP-FV)) P comb,i terms drop out of the cost function Voltage constraint (IV) disappears All other timing and power parameters become only dependent on w i and z i variables For each fixed v, a quadratic program is set up and solved We must minimize a quadratic cost function subject to linear inequality constraints PSLP-FV can be solved optimally in polynomial time 12

Experimental Setup Hspice simulations were used to extract parameters that are needed for the problem formulation 65nm Predictive Technology Model (PTM) Nominal supply voltage 1.2V Die temperature 100 o C The SIS optimization package was used to synthesize a set of linear pipelines as test-bench circuits The MOSEK toolbox used to solve the mathematical optimization problem All results were collected on a 2.4GHz Pentium 4PC with 2GB memory 13

Benchmark Spec Testbench (max, min) stage delays at nominal Clock (# of stages) voltage (ps) freq. (GHz) TB1 (4) (320,140), (332,150), (308,150), 2.0 (320,170) TB2 (5) (320,140), (332,150), (308,150), (280,145), (320,170) 2.0 TB3 (3) (325, 150), (310,155), (219,160) 2.0 TB4 (5) (275,40), (235,40), (245,60), (275,50), 50) (275,70) 70) TB5 (4) (310,100), (245,40), (245,50), (245,60) 2.5 2.5 14

Experimental Results Using slack passing to minimize power without degrading performance TB Power Red. (%) Optimum Vdd (V) Optimum Window size (ps) TB1 32.1 1.0 40, 49, 22 TB2 33.8 1.0 40, 49, 46, 21 TB3 48.1 095 0.95 43,52 TB4 16.3 1.10 36, 35, 35, 20 TB5 25.4 1.05 60, 41, 36 - Area overhead: Utilizing slack passing to improve performance Negligible compared Testbench Performance to size of the rest of Improvement (%) the pipeline circuit TB1 14% - Runtime for all TB2 15% benchmarks: Less TB3 20% than one second TB4 5% TB5 10% 15

A Case Study: 34-bit Adder Problem: How to partition a 34-bit adder into 4 stages of pipeline to achieve maximum performance? D X Q D Q Y D Q bits FF0 FF1 bits FF2 Z bits D Q FF3 T bits D Q FF4 CLK Maximum Performance X+Y+Z+T=34 D Q C3 C4 C1 D Q C2 D Q D Q D Q 10 8 810 8 8 FF0 FF1 FF2 FF3 FF4 CLK C1 C2 C3 D Q D Q D Q D Q 98 89 FF0 9 FF1 FF2 FF3 C4 8 D Q FF4 CLK 16

A Case Study: 34-bit Adder Maximum Performance Configuration Vdd (V) Min Clock Period (ps) Power Consumption (mw) 10 8 8 8 1.2 450 6.42 8 10 8 8 1.2 472 6.50 8 8 10 8 1.2 472 6.51 8 8 8 10 1.2 486 6.55 9 9 8 8 9 8 1.2 455 6.42 9 8 9 8 1.2 433 6.51 17

A Case Study: 34-bit Adder Problem: How to partition a 34-bit adder into 4 stages of pipeline to achieve minimum power at target performance level? Minimum Power @ 2.0GHz Configuration Vdd (V) Power Consumption (MW) 10 8 8 8 8 8 105 1.05 49 4.9 8 10 8 8 1.15 5.1 9 9 8 8 1.05 4.9 9 8 8 9 1.10 4.9 18

Conclusion We presented a new technique to minimize the total power consumption of a linear pipeline circuit by utilizing soft-edge flip-flops and choosing the optimal supply voltage level for the pipeline We formulated the problem as a mathematical program and solved it efficiently Our experimental results demonstrate that this technique is quite effective in reducing the power consumption of a pipeline circuit under a performance constraint Future work will focus on problem of minimizing the energy cost of throughput in a linear pipeline p circuit with dynamic error detection and correction capability 19