Clock-Gating and Its Application to Low Power Design of Sequential Circuits

Similar documents
Department of Electrical & Electronic Engineeing Imperial College London. E4.20 Digital IC Design. Median Filter Project Specification

Clock-Gating and Its Application to Low Power Design of Sequential Circuits

Structure and Drive Paul A. Jensen Copyright July 20, 2003

Module 3 LOSSY IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

χ x B E (c) Figure 2.1.1: (a) a material particle in a body, (b) a place in space, (c) a configuration of the body

Module 9. Lecture 6. Duality in Assignment Problems

Gated Clock Routing Minimizing the Switched Capacitance *

( ) = ( ) + ( 0) ) ( )

Scroll Generation with Inductorless Chua s Circuit and Wien Bridge Oscillator

2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification

Design and Optimization of Fuzzy Controller for Inverse Pendulum System Using Genetic Algorithm

Fundamental loop-current method using virtual voltage sources technique for special cases

COMPARISON OF SOME RELIABILITY CHARACTERISTICS BETWEEN REDUNDANT SYSTEMS REQUIRING SUPPORTING UNITS FOR THEIR OPERATIONS

Odd/Even Scroll Generation with Inductorless Chua s and Wien Bridge Oscillator Circuits

Improvement of Histogram Equalization for Minimum Mean Brightness Error

COEFFICIENT DIAGRAM: A NOVEL TOOL IN POLYNOMIAL CONTROLLER DESIGN

Chapter - 2. Distribution System Power Flow Analysis

AGC Introduction

Numerical Heat and Mass Transfer

The Order Relation and Trace Inequalities for. Hermitian Operators

Over-Temperature protection for IGBT modules

Chapter 8. Potential Energy and Conservation of Energy

Uncertainty in measurements of power and energy on power networks

Week3, Chapter 4. Position and Displacement. Motion in Two Dimensions. Instantaneous Velocity. Average Velocity

DUE: WEDS FEB 21ST 2018

Snce h( q^; q) = hq ~ and h( p^ ; p) = hp, one can wrte ~ h hq hp = hq ~hp ~ (7) the uncertanty relaton for an arbtrary state. The states that mnmze t

9 Derivation of Rate Equations from Single-Cell Conductance (Hodgkin-Huxley-like) Equations

Section 8.3 Polar Form of Complex Numbers

Coarse-Grain MTCMOS Sleep

8 Derivation of Network Rate Equations from Single- Cell Conductance Equations

One-sided finite-difference approximations suitable for use with Richardson extrapolation

CONTRAST ENHANCEMENT FOR MIMIMUM MEAN BRIGHTNESS ERROR FROM HISTOGRAM PARTITIONING INTRODUCTION

Transfer Functions. Convenient representation of a linear, dynamic model. A transfer function (TF) relates one input and one output: ( ) system

System in Weibull Distribution

Analysis of the Magnetomotive Force of a Three-Phase Winding with Concentrated Coils and Different Symmetry Features

STUDY OF A THREE-AXIS PIEZORESISTIVE ACCELEROMETER WITH UNIFORM AXIAL SENSITIVITIES

C/CS/Phy191 Problem Set 3 Solutions Out: Oct 1, 2008., where ( 00. ), so the overall state of the system is ) ( ( ( ( 00 ± 11 ), Φ ± = 1

TOPICS MULTIPLIERLESS FILTER DESIGN ELEMENTARY SCHOOL ALGORITHM MULTIPLICATION

A new Approach for Solving Linear Ordinary Differential Equations

Psychology 282 Lecture #24 Outline Regression Diagnostics: Outliers

NON-CENTRAL 7-POINT FORMULA IN THE METHOD OF LINES FOR PARABOLIC AND BURGERS' EQUATIONS

The optimal delay of the second test is therefore approximately 210 hours earlier than =2.

Temperature. Chapter Heat Engine

Suppose that there s a measured wndow of data fff k () ; :::; ff k g of a sze w, measured dscretely wth varable dscretzaton step. It s convenent to pl

Regularized Discriminant Analysis for Face Recognition

NP-Completeness : Proofs

Indeterminate pin-jointed frames (trusses)

Designing Information Devices and Systems II Spring 2018 J. Roychowdhury and M. Maharbiz Discussion 3A

Interconnect Optimization for Deep-Submicron and Giga-Hertz ICs

Run-time Active Leakage Reduction By Power Gating And Reverse Body Biasing: An Energy View

Turbulence classification of load data by the frequency and severity of wind gusts. Oscar Moñux, DEWI GmbH Kevin Bleibler, DEWI GmbH

Lecture 13 APPROXIMATION OF SECOMD ORDER DERIVATIVES

1 Derivation of Rate Equations from Single-Cell Conductance (Hodgkin-Huxley-like) Equations

Winter 2008 CS567 Stochastic Linear/Integer Programming Guest Lecturer: Xu, Huan

Stanford University CS359G: Graph Partitioning and Expanders Handout 4 Luca Trevisan January 13, 2011

Generalized Linear Methods

Pop-Click Noise Detection Using Inter-Frame Correlation for Improved Portable Auditory Sensing

Comparison of the Population Variance Estimators. of 2-Parameter Exponential Distribution Based on. Multiple Criteria Decision Making Method

Chapter 6 Electrical Systems and Electromechanical Systems

Chapter 5. Solution of System of Linear Equations. Module No. 6. Solution of Inconsistent and Ill Conditioned Systems

NON LINEAR ANALYSIS OF STRUCTURES ACCORDING TO NEW EUROPEAN DESIGN CODE

Copyright 2004 by Oxford University Press, Inc.

DESIGN AND ANALYSIS OF NEGATIVE VALUE CIRCUIT COMPONENTS IN PSPICE SIMULATION SOFTWARE

Energy Storage Elements: Capacitors and Inductors

Identification of Wind Turbine Model for Controller Design

Compilers. Spring term. Alfonso Ortega: Enrique Alfonseca: Chapter 4: Syntactic analysis

Effective Power Optimization combining Placement, Sizing, and Multi-Vt techniques

6.01: Introduction to EECS 1 Week 6 October 15, 2009

Pulse Coded Modulation

2016 Wiley. Study Session 2: Ethical and Professional Standards Application

Leakage and Dynamic Glitch Power Minimization Using Integer Linear Programming for V th Assignment and Path Balancing

EEL 6266 Power System Operation and Control. Chapter 3 Economic Dispatch Using Dynamic Programming

Experience with Automatic Generation Control (AGC) Dynamic Simulation in PSS E

Assessment of Site Amplification Effect from Input Energy Spectra of Strong Ground Motion

Lecture 4: Adders. Computer Systems Laboratory Stanford University

Physics 5153 Classical Mechanics. Principle of Virtual Work-1

College of Computer & Information Science Fall 2009 Northeastern University 20 October 2009

Analysis of Queuing Delay in Multimedia Gateway Call Routing

Managing Capacity Through Reward Programs. on-line companion page. Byung-Do Kim Seoul National University College of Business Administration

Canonical transformations

Adiabatic Sorption of Ammonia-Water System and Depicting in p-t-x Diagram

CHAPTER 5 NUMERICAL EVALUATION OF DYNAMIC RESPONSE

CSci 6974 and ECSE 6966 Math. Tech. for Vision, Graphics and Robotics Lecture 21, April 17, 2006 Estimating A Plane Homography

12. The Hamilton-Jacobi Equation Michael Fowler

The Synchronous 8th-Order Differential Attack on 12 Rounds of the Block Cipher HyRAL

Finding Dense Subgraphs in G(n, 1/2)

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur

Fast Power Network Analysis with Multiple Clock Domains

arxiv:cs.cv/ Jun 2000

ANSWERS. Problem 1. and the moment generating function (mgf) by. defined for any real t. Use this to show that E( U) var( U)

Power law and dimension of the maximum value for belief distribution with the max Deng entropy

This column is a continuation of our previous column

MAE140 - Linear Circuits - Winter 16 Midterm, February 5

FE REVIEW OPERATIONAL AMPLIFIERS (OP-AMPS)( ) 8/25/2010

Lab 2e Thermal System Response and Effective Heat Transfer Coefficient

Experimental Study on Ultimate Strength of Flexural-Failure-Type RC Beams under Impact Loading

LINEAR REGRESSION ANALYSIS. MODULE IX Lecture Multicollinearity

Calculation of time complexity (3%)

NUMERICAL DIFFERENTIATION

A Simple Inventory System

Transcription:

Clock-Gatng and Its Applcaton to Low Power Desgn of Sequental Crcuts ng WU Department of Electrcal Engneerng-Systems, Unversty of Southern Calforna Los Angeles, CA 989, USA, Phone: (23)74-448 Massoud PEDRAM Department of Electrcal Engneerng-Systems, Unversty of Southern Calforna Los Angeles, CA 989, USA, Phone: (23)74-4458 Xunwe WU Department of Electronc Engneerng, Hangzhou Unversty Hangzhou, Zhejang 328, CHINA ABSTRACT Ths paper models the clock behavor n a sequental crcut by a quaternary varable and uses ths representaton to propose and analyze two clock gatng technques. It then uses the coverng relatonshp between the trggerng transton of the clock and the actve cycles of varous flp-flops to generate a derved clock for each flp-flop n the crcut. Desgn examples usng gated clocks are provded next. Expermental results show that these desgns have deal logc functonalty wth lower power dsspaton compared to tradtonal desgns.

Clock-Gatng and Its Applcaton to Low Power Desgn of Sequental Crcuts I. INTRODUCTION The sequental crcuts n a system are consdered major contrbutors to the power dsspaton snce one nput of sequental crcuts s the clock, whch s the only sgnal that swtches all the tme. In addton, the clock sgnal tends to be hghly loaded. To dstrbute the clock and control the clock skew, one needs to construct a clock network (often a clock tree) wth clock buffers. All of ths adds to the capactance of the clock net. Recent studes ndcate that the clock sgnals n dgtal computers consume a large (5% - 45%) percentage of the system power (). Thus, the crcut power can be greatly reduced by reducng the clock power dsspaton. Most efforts for clock power reducton have focused on ssues such as reduced voltage swngs, buffer nserton and clock routng (2). In many cases swtchng of the clock causes a lot of unnecessary gate actvty. For that reason, crcuts are beng developed wth controllable clocks. Ths means that from the master clock other clocks are derved whch, based on certan condtons, can be slowed down or stopped completely wth respect to the master clock. Obvously, ths scheme results n power savngs due to the followng factors: ) Load on the master clock s reduced and the number of requred buffers n the clock tree s decreased. Therefore, the power dsspaton of clock tree can be reduced. 2) The flp-flop recevng the derved clock s not trggered n dle cycles; The correspondng dynamc power dsspaton s thus saved. 3) The exctaton functon of the flp-flop trggered by derved clock may be smplfed snce t has a don t care condton n the cycle when the flp-flop s not trggered by the derved clock. In (3) the authors presented a technque for savng power n the clock tree by stoppng the clock fed nto dle modules. However, a number of engneerng ssues related to the desgn of the clock tree were not addressed and hence, the proposed approach has not been adopted n practce. Ths paper nvestgates varous ssues n dervng a gated clock from a master clock. In secton II, a quaternary varable s used to model the clock behavor and to dscuss ts trggerng acton on flp-flops. Based on ths analyss, two clock-gatng schemes are proposed. In secton III, we use the coverng relaton between the clock and the transton behavors of the trggered flp-flops to derve condtons for gatng the master clock. Two common sequental crcuts,.e. 842 BCD code up-counter and three-excess counter, are then descrbed to llustrate the procedure for fndng a derved clock. In secton IV, a new technque for clock-gatng s presented

whch generates a clock synchronous wth the master clock. Ths elmnates the addtonal skew between the master clock and the derved clock. Thus, the desgned sequental crcut s a synchronous one. Fnally, we present crcut smulaton results to prove the qualty of the derved clock and ts ablty to reduce power dsspaton n the crcut. II. DESCRIPTION FOR CLOCK BEHAVIOR AND CLOCK-GATING In a synchronous system, a flp-flop s trggered by a certan drectonal transton of a clock sgnal. For the clock to be another sgnal rather than the master clock, t must offer the same drectonal transton to trgger the flpflop, and t must be n step wth the master clock. For the clock sgnal n a crcut f we denote ts logc values before and after a transton as (t) and (t) respectvely, four combnatons can be used to express dfferent behavors of the clock as shown n Table, where a specal quaternary varable denotes the correspondng behavor. The four values are (,,, ), where, represent two knds of transton behavors and, represent two knds of holdng behavors. (Note that although they have the same forms as sgnal values and, ther meanngs are dfferent.) Table UATERNARY REPRESENTATION FOR BEHAVIORS OF A SIGNAL ( t) ( t) Behavor -holdng -transton -transton -holdng In addton, we can also defne a lteral operaton to dentfy the behavor of a clock: f = b b = () f b, where {,,,}. Thus, the rsng transton and the fallng transton of a clock are bnary varables b and can serve as arguments of Boolean operatons. For example, from Table we have

=, =, = and =. Assume that there are n flp-flops n a sequental crcut and that ther outputs and clock nputs are denoted by and, =,,«,n-, respectvely. For a synchronous sequental crcut, we have =, namely all flpflops are trggered by the same master clock sgnal. However, f a flp-flop s to be dsconnected from the master clock durng some (dle) cycles, then we have to use a derved clock for. Notce that ths derved clock should be n step wth the master clock for the crcuts to reman synchronous. Generally, we consder that the derved clock s obtaned from the master clock and the outputs of other flpflops,,,,, (whch make transtons followng the trggerng transton of ther respectve, n clocks.) Snce both AND gatng and OR gatng can be used for controllng the master clock, we have the followng two clock-gatng forms = g p, (2) = g ( p ), (3) where g and p are functons of flp-flop outputs,,,,., n Consder a flp-flop trggered by the fallng clock transton as an example (.e. a negatve edge-trggered flpflop). The tmng relatonshps of, p, p and p are shown n Fg.. Note that p exhbts a delay wth respect to the fallng transton of clock, may have gltches (represented by vertcal grd lnes), and has ts fnal stable value n the zone where =. We can see that p cannot restran the gltches, and may even lead to an extra gltch. Therefore, (2) s sutable for the negatve trggered flp-flop whle (3) s not. Note that g n (2) must be gltch-free when =. The above dscusson shows that the fallng transton of n (2) occurs for the followng two cases: () When g = and p =, fallng transton of leads to fallng transton of the derved clock. Therefore, p may be named the transton propagate term. (2) When g = and g makes a fallng transton, the derved clock makes a fallng transton snce (and p ) s at that tme nstance. Therefore, g may be named the transton generate term. Fgure Tmng relatonshp of, p (g ), p and p From ths analyss, we obtan

= g g p. (4) Smlarly, we can fnd that the derved clock sgnal n (3) s sutable for the flp-flops trggered by the rsng clock transton. Here g n (3) must be gltch-free when =. The rsng transton of can be expressed as = g g p. (5) It should be ponted out that the attached crcutry needed for generatng the derved clock should be smple to avod excessve power dsspaton due to ths overhead crcutry. Therefore g and p n (2) and (3) should be relatvely smple functons. Especally, we requre g to be smple to avod dangerous gltches. Note that f g =, p = n (2) or g =, p = n (3), we return to the condton of applyng the master clock n a synchronous sequental crcut. III. DESIGN OF SEUENTIAL CIRCUITS BASED ON DERIVED CLOCK Assume that the derved clock for the flp-flop s. Fallng transtons of have to cover all cycles when the flp-flop makes transtons, and. The coverng relaton can be expressed as: ( ). (6) Snce AND operaton and OR operaton can be nterpreted as mnmum operaton and maxmum operaton on Boolean varables,.e. x y = mn( x, y) and x y = max( x, y), we can get the followng equatons from (6) ( ) = ( ), (7) ( ) =. (8) Therefore, we should obtan ( ), frst. Then we generate the derved clock for flp-flop, We wll show the procedure by usng desgn examples. Example. Desgn of an 842 BCD code up-counter The next states and state behavors of an 842 BCD code up-counter are shown n Table 2, where behavor of each flp-flop ( ) s denoted by. From Table 2, the correspondng next state Karnaugh maps and behavor Karnaugh maps may be obtaned, as shown n Fg.2(a) and 2(b). In these maps an empty box represents

the don t care condton. The two transton functons for each flp-flop can be derved from ther correspondng behavor Karnaugh maps as below: Table 2 NEXT STATES AND STATE BEHAVIORS OF AN 842 BCD CODE UP-COUNTER 3 2 3 2 3 2 3 = 2, 3 = 3 ; (9) 2 = 2, = 2 2 ; () = 3, = 3 ; () =, =. (2) Therefore, we have 3 3 = ( 3 2 ), (3) 2 2 =, (4)

= 3, (5). (6) = From (3)-(5) we fnd that ( ), ( =,2,3). Accordng to (2), = can serve as the needed fallng transton trgger for flp-flops, 2, and 3, namely wth (4), we get g =, p = and =. ( =,2,3). 2 = 3 = =. Comparng these As for, (6) ndcates that the clock for s no other than the master clock. Snce we only need take care of the exctaton nput when the flp-flop receves a trggerng fallng clock transton (.e. entres n map), we don t care what the exctaton nputs n other condtons are. Therefore the next state Karnaugh maps for flpflops, 2, and 3 n Fg.2(a) can be smplfed to those shown n Fg.2(c). Fgure 2 (a) Next state Karnaugh maps, (b) behavor Karnaugh maps, (c) smplfed next state Karnaugh maps From Fg.2(a),(c) we can get the correspondng both synchronous and asynchronous desgns, as shown n Fg.3. (We say asynchronous, because now not all flp-flops are trggered at the same tme.) Obvously the correspondng combnatonal crcuts are smpler. Besdes, snce three flp-flops 3, 2, have no dynamc power dsspaton half of the tme when there s no clock trggerng, and because the smpler combnatonal crcuts has lower node capactance, the asynchronous desgn s savng power. Fgure 3 Crcut realzatons of BCD code up-counter (a) synchronous desgn, (b) asynchronous desgn Example 2. Desgn of an excess-three code up-counter The next state and state transton of an excess-three code up-counter are shown n Table 3. Transton functons for each flp-flop can be derved as below 3 = 2, = 3 2 3 ; (7) 2 = 2, = ( 3 2 2 ) 2 ; (8) = ( 3 2 ), = ; (9)

=, =. (2) Table 3 THE NEXT STATES AND STATE BEHAVIORS OF A EXCESS-THREE CODE UP-COUNTER. 3 2 3 2 3 2 Therefore, we have 3 3 = ( 3 2 2 ) 2, (2) = 2 ( 3 2 ) = ( 3 2 2 = ), (22) = ( 3 2 ) = ( 3 2 ), (23). (24) = Based on (2) and (4), (23) and (24) can be re-expressed as 2 2 = [ ( 3 2 ) ], (25)

= [ ( 3 2 ) ], (26) Obvously, f we take 3 = 2, 2 = [ ( 3 2 ) ], = [ ( 3 2 ) ] and =, the coverng relaton wll set the exctaton functons of all the four flp-flops as D = ( =,,2,3). On the other hand, f we use the master clock for trggerng all four flp-flops, we obtan the followng complcated exctaton functons: D 3 2 3 2 =, D 2 2 3 3 =, D =, 3 2 D =. Snce the above D 3, D 2 and D have complcated forms ther correspondng synchronous crcut realzaton wll have a complcated combnatonal crcut wth more node capactance and hence hgher power dsspaton. On the other hand, the correspondng asynchronous crcut realzaton wth D = savng snce the four flp-flops are solated from the trggerng clock n the dle cycles. s much smpler. There s power IV. SYNCHRONOUS DERIVED CLOCK AND ITS APPLICATION In the Example of the last secton we take =, ( =,2,3). From (2) we can also wrte as =, ( =,2,3). Comparng ths wth (4), we have g =, p = and =. Accordng to ths form of the derved clock we get another asynchronous desgn, as shown n Fg.4(a). At the frst glance, the crcut has one AND gate more than the desgn n Fg.3(b). Besdes, t appears that the derved clock -3 may have an ncreased phase delay. However, the tmng relaton shown n Fg. ndcates that the transton delay of -3 s ndependent of the delay of the output. The delay between and -3 s only 2t g (t g s the average delay of a gate), whch s less than the delay of the flp-flop output. Fgure 4. BCD code up-counter by gatng clock (a) asynchronous desgn, (b) synchronous desgn Based on the above dscusson, we can rewrte * = as =. Besdes, we take from the prevous stage of the clock tree. Thus, we obtan a new desgn, as shown n Fg.4(b). If we consder delay of the

nverter and NOR gate beng roughly the same, the fallng transtons of and * 3 n the crcut wll occur smultaneously. Ths desgn s synchronous n the sense that all flp-flops are trggered n synchrony wth the global clock. We smulated the new desgn n Fg.4(b) by SPICE 3f3 usng 2µ CMOS technology, whch proved that the new desgn has an deal logc operaton. We also measure the power dsspaton of two synchronous desgns n Fg.3(a) and Fg.4(b). The power dsspaton dagrams are shown n Fg.5, and prove that the new desgn reduces the power dsspaton by 22%. Fgure 5. Power dsspaton dagram V. CONCLUSION The behavoral descrpton of a clock s the bass to analyze ts trggerng acton on flp-flops. Based on t, two types of clock-gatng were ntroduced to form a derved clock. We showed that the procedure for desgnng a derved clock can be systematzed so as to solate the trggered flp-flop from the master clock n ts dle cycles. The acheved power savng can be sgnfcant. However, the addtonal clock skew may lower the maxmum operaton frequency. Based on analyzng the tmng relaton n clock-gatng, we then presented a new technque for generatng the derved clock, whch s synchronous wth the master clock. Crcut smulaton proved the qualty of the new derved clock and ts capablty to reduce power dsspaton. The engneerng ssues mentoned n (3) have thus been resolved for practcal applcaton, openng the path for wde-spread adopton of the clock-gatng technque n low power desgn of custom ICs.

REFERENCES. M. Pedram, Power mnmzaton n IC Desgn: Prncples and applcatons, ACM Transactons on Desgn Automaton, vol., no., pp.3-56, Jan. 996. 2. G. Fredman, Clock dstrbuton desgn n VLSI crcuts: an overvew, n Proc. IEEE ISCAS, San Jose, pp.475-478, May 994. 3. E. Tellez, A. Farrah and M. Sarrafzadeh, Actvty-drven clock desgn for low power crcuts, n Proc. IEEE ICCAD, San Jose, pp.62-65, Nov. 995.

Fg. Tmng relatonshp of, p (g ), p and p p (g ) p p extra gltch Fg.2. (a) Next state Karnaugh maps, (b) behavor Karnaugh maps, (c) smplfed next state Karnaugh maps 3 2 3 2 3 2 3 2 (a) 3 2 3 2 3 2 3 2 3 2 (b) 3 2 3 2 3 2 3 2 (c) 3 2

Fg 3. Crcut realzatons of BCD code up-counter (a) synchronous desgn, (b) asynchronous desgn 2 2 2 3 (a) D D D 2 2 D 3 3 2 3 3 2 2 (b) D D D 2 2 D 3 3 2 3 Fg 4. BCD code up-counter by gatng clock (a) asynchronous desgn, (b) synchronous desgn D D, 2, 3, 2, 3-3 Clk * -3 (a) (b)

Fg. 5. Power dsspaton dagram x.9.8 Fg. 3(a) energy dsspaton.7.6.5.4 Fg. 4(b).3.2..5.5 2 2.5 3 3.5 4 tme x 8