Coarse-Grain MTCMOS Sleep

Similar documents
Effective Power Optimization combining Placement, Sizing, and Multi-Vt techniques

Interconnect Optimization for Deep-Submicron and Giga-Hertz ICs

Statistical Circuit Optimization Considering Device and Interconnect Process Variations

Distributed Sleep Transistor Network for Power Reduction

Sleep Transistor Distribution in Row-Based MTCMOS Designs

Generalized Linear Methods

Leakage and Dynamic Glitch Power Minimization Using Integer Linear Programming for V th Assignment and Path Balancing

POWER AND PERFORMANCE OPTIMIZATION OF STATIC CMOS CIRCUITS WITH PROCESS VARIATION

Outline. Communication. Bellman Ford Algorithm. Bellman Ford Example. Bellman Ford Shortest Path [1]

Chapter - 2. Distribution System Power Flow Analysis

Computing Correlated Equilibria in Multi-Player Games

Variability-Driven Module Selection with Joint Design Time Optimization and Post-Silicon Tuning

A FAST HEURISTIC FOR TASKS ASSIGNMENT IN MANYCORE SYSTEMS WITH VOLTAGE-FREQUENCY ISLANDS

U.C. Berkeley CS294: Beyond Worst-Case Analysis Luca Trevisan September 5, 2017

Run-time Active Leakage Reduction By Power Gating And Reverse Body Biasing: An Energy View

Single-Facility Scheduling over Long Time Horizons by Logic-based Benders Decomposition

Clock-Gating and Its Application to Low Power Design of Sequential Circuits

Lecture 4: Adders. Computer Systems Laboratory Stanford University

Winter 2008 CS567 Stochastic Linear/Integer Programming Guest Lecturer: Xu, Huan

Estimating Delays. Gate Delay Model. Gate Delay. Effort Delay. Computing Logical Effort. Logical Effort

Lecture Notes on Linear Regression

Amiri s Supply Chain Model. System Engineering b Department of Mathematics and Statistics c Odette School of Business

( ) = ( ) + ( 0) ) ( )

Yong Joon Ryang. 1. Introduction Consider the multicommodity transportation problem with convex quadratic cost function. 1 2 (x x0 ) T Q(x x 0 )

Complex Numbers, Signals, and Circuits

Module 3 LOSSY IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

Department of Electrical and Computer Engineering FEEDBACK AMPLIFIERS

Reliable Power Delivery for 3D ICs

Chapter Newton s Method

Resource Allocation with a Budget Constraint for Computing Independent Tasks in the Cloud

ECE559VV Project Report

Dynamic Programming. Preview. Dynamic Programming. Dynamic Programming. Dynamic Programming (Example: Fibonacci Sequence)

Simultaneous Device and Interconnect Optimization

College of Computer & Information Science Fall 2009 Northeastern University 20 October 2009

Comparison of the Population Variance Estimators. of 2-Parameter Exponential Distribution Based on. Multiple Criteria Decision Making Method

VLSI Design I; A. Milenkovic 1

Statistical Energy Analysis for High Frequency Acoustic Analysis with LS-DYNA

Simultaneous Optimization of Berth Allocation, Quay Crane Assignment and Quay Crane Scheduling Problems in Container Terminals

An Upper Bound on SINR Threshold for Call Admission Control in Multiple-Class CDMA Systems with Imperfect Power-Control

Digital Signal Processing

An Efficient Algorithm for Statistical Minimization of Total Power under Timing Yield Constraints

Outline and Reading. Dynamic Programming. Dynamic Programming revealed. Computing Fibonacci. The General Dynamic Programming Technique

COS 521: Advanced Algorithms Game Theory and Linear Programming

EEL 6266 Power System Operation and Control. Chapter 3 Economic Dispatch Using Dynamic Programming

Which Separator? Spring 1

1 Convex Optimization

CS-433: Simulation and Modeling Modeling and Probability Review

Curve Fitting with the Least Square Method

SOLVING CAPACITATED VEHICLE ROUTING PROBLEMS WITH TIME WINDOWS BY GOAL PROGRAMMING APPROACH

Lecture 10 Support Vector Machines. Oct

Portfolios with Trading Constraints and Payout Restrictions

1 Derivation of Rate Equations from Single-Cell Conductance (Hodgkin-Huxley-like) Equations

Queueing Networks II Network Performance

Hongyi Miao, College of Science, Nanjing Forestry University, Nanjing ,China. (Received 20 June 2013, accepted 11 March 2014) I)ϕ (k)

Errors for Linear Systems

find (x): given element x, return the canonical element of the set containing x;

Randomness and Computation

Temperature. Chapter Heat Engine

18.1 Introduction and Recap

2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification

Pricing and Resource Allocation Game Theoretic Models

Abstract. The assumptions made for rank computation are as follows. (see Figure 1)

An Interactive Optimisation Tool for Allocation Problems

OPTIMISATION. Introduction Single Variable Unconstrained Optimisation Multivariable Unconstrained Optimisation Linear Programming

Feature Selection: Part 1

princeton univ. F 17 cos 521: Advanced Algorithm Design Lecture 7: LP Duality Lecturer: Matt Weinberg

A Simple Inventory System

Logistic Regression. CAP 5610: Machine Learning Instructor: Guo-Jun QI

Math1110 (Spring 2009) Prelim 3 - Solutions

p 1 c 2 + p 2 c 2 + p 3 c p m c 2

Magnetic Field Around The New 400kV OH Power Transmission Lines In Libya

Stanford University CS359G: Graph Partitioning and Expanders Handout 4 Luca Trevisan January 13, 2011

4DVAR, according to the name, is a four-dimensional variational method.

Newton s Method for One - Dimensional Optimization - Theory

STUDY OF A THREE-AXIS PIEZORESISTIVE ACCELEROMETER WITH UNIFORM AXIAL SENSITIVITIES

Markov Chain Monte Carlo (MCMC), Gibbs Sampling, Metropolis Algorithms, and Simulated Annealing Bioinformatics Course Supplement

Chapter 3 Describing Data Using Numerical Measures

Chapter 12. Ordinary Differential Equation Boundary Value (BV) Problems

Lecture 7: Multistage Logic Networks. Best Number of Stages

Lecture 12: Classification

For now, let us focus on a specific model of neurons. These are simplified from reality but can achieve remarkable results.

Solution (1) Formulate the problem as a LP model.

Nodal analysis of finite square resistive grids and the teaching effectiveness of students projects

Army Ants Tunneling for Classical Simulations

A PROBABILITY-DRIVEN SEARCH ALGORITHM FOR SOLVING MULTI-OBJECTIVE OPTIMIZATION PROBLEMS

Introduction to Information Theory, Data Compression,

The Minimum Universal Cost Flow in an Infeasible Flow Network

P R. Lecture 4. Theory and Applications of Pattern Recognition. Dept. of Electrical and Computer Engineering /

Section 8.3 Polar Form of Complex Numbers

Interconnect Modeling

Learning Theory: Lecture Notes

Amplification and Relaxation of Electron Spin Polarization in Semiconductor Devices

An Admission Control Algorithm in Cloud Computing Systems

Analysis of Queuing Delay in Multimedia Gateway Call Routing

Aging model for a 40 V Nch MOS, based on an innovative approach F. Alagi, R. Stella, E. Viganò

Why working at higher frequencies?

Support Vector Machines. Vibhav Gogate The University of Texas at dallas

FUZZY GOAL PROGRAMMING VS ORDINARY FUZZY PROGRAMMING APPROACH FOR MULTI OBJECTIVE PROGRAMMING PROBLEM

arxiv: v1 [math.oc] 3 Aug 2010

ELECTRONIC DEVICES. Assist. prof. Laura-Nicoleta IVANCIU, Ph.D. C13 MOSFET operation

Transcription:

Coarse-Gran MTCMOS Sleep Transstor Szng Usng Delay Budgetng Ehsan Pakbazna and Massoud Pedram Unversty of Southern Calforna Dept. of Electrcal Engneerng DATE-08 Munch, Germany

Leakage n CMOS Technology V dd s reduced wth CMOS technology scalng V th must be lowered to recover the transstor swtchng speed The subthreshold leakage current ncreases exponentally wth decreasng V th A hghly effectve leakage control mechansm has proven to be the MTCMOS technque 2

Overvew of MTCMOS A hgh-v th transstor s used to dsconnect low-v th transstors from the ground or the supply rals n V dd P out Hgh Threshold Vrtual Supply Low Threshold n V dd P out SLEEP Vrtual Ground N SLEEP N Hgh Threshold 3

Coarse-Gran MTCMOS Coarse-gran vs. fne-gran: Smaller sleep transstor area Lower leakage Regular standard cell lbrary can be used (no need to characterze new cells) SLEEPB TVSS 2 1 3 4 VVSS

Sleep Transstor Layout V TVSS SLEEPB SLEEP T VSS VSS Sngle transstor sto footer swtch Sngle transstor header swtch SLEEPB1 SLEEPB2 VVSS TVSS Double-transstor (mother/daughter) footer swtch

Sleep Transstor Placement Symmetrc placement styles are preferred due to lower routng complexty for T/TVSS and SLEEP/SLEEPB sgnals TVSS TVSS TVSS TVSS TVSS TVSS VVSS VVSS VVSS VVSS VVSS VVSS Column-based Staggered

Noton of Module (r,) (,) denotes the module that s formed around the th sleep transstor n the r th row of the standard cell layout The cells belongng to (r,) are those thatt are n the r th row and are closest n dstance to the th sleep transstor n that row (1,1) Module 1 of row 1 (1,2) Module 2 of row 1 VVSS VVSS TVSS TVSS TVSS

Tme-dependant Current Source Model for Modules VVSS ral resstance between the cells nsde each module s gnored r VSS(r,) denotes the VVSS resstance between modules (r,) and (r,+1) I M (r,) (t) and I st (t) denote the (r,) module dschargng current and the sleep transstor current of module (r,) M (r, -1) M (r, ) M (r, +1) I M (r,-1) (t) W st(r,-1) st (r,-1) r VSS(r,-1) I M (r,) (t) W st(r,) r VSS(r,) I (t) I st (t) (r,) I M (r,+1) (t) W st(r,+1) I st (r,+1) (t)

Motvatonal Example Crcut: FO4 nverter chan M1 M2 Modules: M1 and M2 Sleep Transstors: replaced by ther lnear resstve models, R 1 and R 2 CMOS (R 1 =R 2 =0) delay:103ps V IN 1 4 V A 16 64 R 1 R 2 V OUT C L =FO4 Module Module Delay (pco sec) Module Peak Current (ma) M 1 46 03 0.3 M 2 57 4.65

Effect of Slack Dstrbuton on Total Sleep Transstor Sze Module Total Sleep Tx Crcut Delay (ps) Delay Resstance (ps) (Ω) CMOS T M1 = 46 T M2 = 57 103 R 1 =0 R 2 =0 T M1 = 50.6 T M2 = 62.7 113.3 R 1 =250 R 2 =9 MTCMOS T M1 = 52 T M2 = 61.3 113.3 R 1 =330 R 2 =2 T M1 = 48 T M2 = 65.3 113.3 R 1 =110 R 2 =25 R (Ω -1 ) R -1 0.1151 0.5030 0.0491 Total avalable slack: 10.3ps (10% delay penalty) Case 1: unformly dstrbuted slack (medum) Case 2: 80% for M1 and 20% for M2 (worst) Case 3: 20% for M1 and 80% for M2 (best) Current-aware optmzaton: must slow down modules wth larger dscharge current more

Delay-Budgetng Constrants for Szng Delay-budgetng g constrants: non-negatve slack for all nodes { n} { n} ' ' ' ' ' sn = mn rfanouts of C d n max afanns of C + d n 0 slack node n requred tme for node n arrval tme at node n d n s the delay for cell C n M wth VVSS voltage v. We can show: ' v d = d + d n n n VtL delay ncrease due to MTCMOS To smplfy the constrants we only consder the tmng crtcal paths need to defne the noton of path delay!

Path Delay n MTCMOS The delay ncrease for path π k s the summaton of delay ncreases for all the gates n π k : max C t n mn, t I ( C st n ) θ ( C ) st θ C Cn n V n πk Cn π DD k Cn max R Δ dπ = Δ d = d k V C θ(c n ) s the ndex of the module that cell C n belongs to R st s the lnear resstance value for th sleep transstor R st max C t n, t I st s nversely proportonal to (wdth) Cn W st mn max s the max current value flowng through durng the tme wndow C mn, C t n t n max when cell C n s swtchng tl n R st

Module Current Example The module current s the tme-ndexed summaton of the expected currents for all the cells nsde the module Current (ma A) 0.45 Current profle for a 0.4 module wth 3 cells 0.35 and tme wndows: 0.3 C1:[40,60] 0.25 C2:[60,80] C3:[50,70] 0.2 0.15 0.1 0.05 0 20 30 40 50 60 70 80 90 10 Tme (psec)

Delay-Budgetng (DB) Szng Problem Clock cycle s dvded nto N equal tme ntervals. t j s the begnnng tme of the j th nterval. IM ( t ) s the swtchng current of module M at j tme t j. Mnmze M RR = 1 1 st max C t n mn, t st Rst I s.t. : 1. Δ dπ = d DDR_MAX n d k V V where: n k 2. R I ( t ) VVSS_MAX; 1 M, 1 j N st st j C π DD tl, j: I ( t ) = I ( t ) = 0 and st j st j Cn max 0 N+ 1 Rst I ( ) 1 st t 1 j R I 1 1 ( t ) R I ( t ) R I ( t ) st + st + j st st j st st = + + j I ( t ) I ( t ) st j M j r VSS 1 r r r max VSS VSS VSS 1 delay-budgetng constrants 1 k K, 1 N statc NM constrants t

BCM and MCM The delay-budgetng constrants can be wrtten as: M = 1 a R DDR_MAX d ; 1 k K k st Defnton 1- At any gven step of the szng algorthm, the most crtcal module (MCM) s the module wth the maxmum delay contrbuton n the K most crtcal paths: K MCM = arg max a kr st M k = 1 Defnton 2- At any gven step of the szng algorthm the best canddate module (BCM) s defned as the module whose sleep transstor upszng by a certan percentage wll result n the largest delay mprovement for unsatsfed paths. One can show: max ({ k π } k πk ) BCM = MCM π 1 k K, Δ d d > DDR_MAX

Current-Aware Optmzaton Defnton 3- Least-cost BCM (LBCM) s the BCM whose sleep transstor upszng wll result n the mnmum ncrease n the objectve functon Lemma- LBCM can be calculated as: LBCM = arg mn M = BCM k = 1 K Δ dπ > DDR_MAX d k max a k At each step of the algorthm, ths lemma makes the proposed algorthm a current-aware optmzaton algorthm

Algorthm (step 1) Step 1- Intalzaton (NM constrants) Algorthm: Slp_Intalze(I M (t), VVSS_MAX) 1: /*Intalzng varables*/ 2: for =1 to M do 3: R = R ; st MAX 4: end for 5: calculate Ist ( t ) j and v ( tj ) = Rst I ( ) st t j for all, j ; 6: whle (v (t j ) > VVSS_MAX for some or j) do 7: M m =FndMnModule{VVSS_MAX - v (t j )}; 8: R = VVSS _ MAX I ( t ) for all j; st st j m 9: update Ist ( t ) j and v ( tj ) = Rst I ( ) st t j for all, j; 10: end whle 11: return R for all ; st m

Algorthm (step 2) Step 2- Optmzaton (DB constrants) Algorthm: Slp_Szng(R st-ntal, I M (t), VVSS_MAX) 1: calculate I st ( t ) and v t = R I t for all, j ; j ( j ) st ( ) ntal st j 2: whle (mn_slack < 0) 3: fnd LBCM and m=lbcm; 4: R = R α R ; st st st m m m 5: update I st ( t ) j and v ( tj ) = Rst I ( ) st t j for all, j; 6: mn_slack = ; 7: for k=1 to K, j=1 to N 8: f ( Δ DDR_MAX < mn_slack ) d π k 9: mn_slack = 10: end f 11: end for 12: end whle 13: return( R ) for all ; st Δ d π k DDR_MAX dmax;

Smulaton Approach Max delay degradaton rato, DDR_MAX=10% Vrtual ral resstance, r = 0.1Ω VSS Max number of the crtcal paths, K=100 Resstance decrement factor, α = 01 0.1 Crcut # of cells Total sleep TX wdth (λ) # of Proposed vs. Proposed vs. Footers [X]=[Chou- [Y]=[Chou- Proposed [X] (%) [Y] (%) DAC 06] DAC 07] C17 7 2 53 44 16 70 64 9sym 276 30 786 715 312 60 56 C432 214 30 811 665 343 57 48 C880 467 55 1290 1173 579 55 51 C1355 546 60 1437 1597 727 49 54 C3540 1307 280 3920 3469 1679 57 52 C5315 1783 320 5799 5631 3372 42 40 Avg. 2.0 1.89 1 55 52

Concluson A new sleep transstor szng approach s proposed The algorthm takes a max crcut slowdown factor and produces the szes of varous sleep transstors whle consderng the DC parastcs of the vrtual ground The problem can be formulated as a szng wth delaybudgetng and solved effcently usng a heurstc szng algorthm The algorthm approaches the optmum soluton by slowng down the modules wth larger amount of dschargng g current more than the ones wth smaller amount of dschargng current, current-aware optmzaton The proposed technque uses at least 40% less total sleep transstor wdth compared to other approaches