Coarse-Gran MTCMOS Sleep Transstor Szng Usng Delay Budgetng Ehsan Pakbazna and Massoud Pedram Unversty of Southern Calforna Dept. of Electrcal Engneerng DATE-08 Munch, Germany
Leakage n CMOS Technology V dd s reduced wth CMOS technology scalng V th must be lowered to recover the transstor swtchng speed The subthreshold leakage current ncreases exponentally wth decreasng V th A hghly effectve leakage control mechansm has proven to be the MTCMOS technque 2
Overvew of MTCMOS A hgh-v th transstor s used to dsconnect low-v th transstors from the ground or the supply rals n V dd P out Hgh Threshold Vrtual Supply Low Threshold n V dd P out SLEEP Vrtual Ground N SLEEP N Hgh Threshold 3
Coarse-Gran MTCMOS Coarse-gran vs. fne-gran: Smaller sleep transstor area Lower leakage Regular standard cell lbrary can be used (no need to characterze new cells) SLEEPB TVSS 2 1 3 4 VVSS
Sleep Transstor Layout V TVSS SLEEPB SLEEP T VSS VSS Sngle transstor sto footer swtch Sngle transstor header swtch SLEEPB1 SLEEPB2 VVSS TVSS Double-transstor (mother/daughter) footer swtch
Sleep Transstor Placement Symmetrc placement styles are preferred due to lower routng complexty for T/TVSS and SLEEP/SLEEPB sgnals TVSS TVSS TVSS TVSS TVSS TVSS VVSS VVSS VVSS VVSS VVSS VVSS Column-based Staggered
Noton of Module (r,) (,) denotes the module that s formed around the th sleep transstor n the r th row of the standard cell layout The cells belongng to (r,) are those thatt are n the r th row and are closest n dstance to the th sleep transstor n that row (1,1) Module 1 of row 1 (1,2) Module 2 of row 1 VVSS VVSS TVSS TVSS TVSS
Tme-dependant Current Source Model for Modules VVSS ral resstance between the cells nsde each module s gnored r VSS(r,) denotes the VVSS resstance between modules (r,) and (r,+1) I M (r,) (t) and I st (t) denote the (r,) module dschargng current and the sleep transstor current of module (r,) M (r, -1) M (r, ) M (r, +1) I M (r,-1) (t) W st(r,-1) st (r,-1) r VSS(r,-1) I M (r,) (t) W st(r,) r VSS(r,) I (t) I st (t) (r,) I M (r,+1) (t) W st(r,+1) I st (r,+1) (t)
Motvatonal Example Crcut: FO4 nverter chan M1 M2 Modules: M1 and M2 Sleep Transstors: replaced by ther lnear resstve models, R 1 and R 2 CMOS (R 1 =R 2 =0) delay:103ps V IN 1 4 V A 16 64 R 1 R 2 V OUT C L =FO4 Module Module Delay (pco sec) Module Peak Current (ma) M 1 46 03 0.3 M 2 57 4.65
Effect of Slack Dstrbuton on Total Sleep Transstor Sze Module Total Sleep Tx Crcut Delay (ps) Delay Resstance (ps) (Ω) CMOS T M1 = 46 T M2 = 57 103 R 1 =0 R 2 =0 T M1 = 50.6 T M2 = 62.7 113.3 R 1 =250 R 2 =9 MTCMOS T M1 = 52 T M2 = 61.3 113.3 R 1 =330 R 2 =2 T M1 = 48 T M2 = 65.3 113.3 R 1 =110 R 2 =25 R (Ω -1 ) R -1 0.1151 0.5030 0.0491 Total avalable slack: 10.3ps (10% delay penalty) Case 1: unformly dstrbuted slack (medum) Case 2: 80% for M1 and 20% for M2 (worst) Case 3: 20% for M1 and 80% for M2 (best) Current-aware optmzaton: must slow down modules wth larger dscharge current more
Delay-Budgetng Constrants for Szng Delay-budgetng g constrants: non-negatve slack for all nodes { n} { n} ' ' ' ' ' sn = mn rfanouts of C d n max afanns of C + d n 0 slack node n requred tme for node n arrval tme at node n d n s the delay for cell C n M wth VVSS voltage v. We can show: ' v d = d + d n n n VtL delay ncrease due to MTCMOS To smplfy the constrants we only consder the tmng crtcal paths need to defne the noton of path delay!
Path Delay n MTCMOS The delay ncrease for path π k s the summaton of delay ncreases for all the gates n π k : max C t n mn, t I ( C st n ) θ ( C ) st θ C Cn n V n πk Cn π DD k Cn max R Δ dπ = Δ d = d k V C θ(c n ) s the ndex of the module that cell C n belongs to R st s the lnear resstance value for th sleep transstor R st max C t n, t I st s nversely proportonal to (wdth) Cn W st mn max s the max current value flowng through durng the tme wndow C mn, C t n t n max when cell C n s swtchng tl n R st
Module Current Example The module current s the tme-ndexed summaton of the expected currents for all the cells nsde the module Current (ma A) 0.45 Current profle for a 0.4 module wth 3 cells 0.35 and tme wndows: 0.3 C1:[40,60] 0.25 C2:[60,80] C3:[50,70] 0.2 0.15 0.1 0.05 0 20 30 40 50 60 70 80 90 10 Tme (psec)
Delay-Budgetng (DB) Szng Problem Clock cycle s dvded nto N equal tme ntervals. t j s the begnnng tme of the j th nterval. IM ( t ) s the swtchng current of module M at j tme t j. Mnmze M RR = 1 1 st max C t n mn, t st Rst I s.t. : 1. Δ dπ = d DDR_MAX n d k V V where: n k 2. R I ( t ) VVSS_MAX; 1 M, 1 j N st st j C π DD tl, j: I ( t ) = I ( t ) = 0 and st j st j Cn max 0 N+ 1 Rst I ( ) 1 st t 1 j R I 1 1 ( t ) R I ( t ) R I ( t ) st + st + j st st j st st = + + j I ( t ) I ( t ) st j M j r VSS 1 r r r max VSS VSS VSS 1 delay-budgetng constrants 1 k K, 1 N statc NM constrants t
BCM and MCM The delay-budgetng constrants can be wrtten as: M = 1 a R DDR_MAX d ; 1 k K k st Defnton 1- At any gven step of the szng algorthm, the most crtcal module (MCM) s the module wth the maxmum delay contrbuton n the K most crtcal paths: K MCM = arg max a kr st M k = 1 Defnton 2- At any gven step of the szng algorthm the best canddate module (BCM) s defned as the module whose sleep transstor upszng by a certan percentage wll result n the largest delay mprovement for unsatsfed paths. One can show: max ({ k π } k πk ) BCM = MCM π 1 k K, Δ d d > DDR_MAX
Current-Aware Optmzaton Defnton 3- Least-cost BCM (LBCM) s the BCM whose sleep transstor upszng wll result n the mnmum ncrease n the objectve functon Lemma- LBCM can be calculated as: LBCM = arg mn M = BCM k = 1 K Δ dπ > DDR_MAX d k max a k At each step of the algorthm, ths lemma makes the proposed algorthm a current-aware optmzaton algorthm
Algorthm (step 1) Step 1- Intalzaton (NM constrants) Algorthm: Slp_Intalze(I M (t), VVSS_MAX) 1: /*Intalzng varables*/ 2: for =1 to M do 3: R = R ; st MAX 4: end for 5: calculate Ist ( t ) j and v ( tj ) = Rst I ( ) st t j for all, j ; 6: whle (v (t j ) > VVSS_MAX for some or j) do 7: M m =FndMnModule{VVSS_MAX - v (t j )}; 8: R = VVSS _ MAX I ( t ) for all j; st st j m 9: update Ist ( t ) j and v ( tj ) = Rst I ( ) st t j for all, j; 10: end whle 11: return R for all ; st m
Algorthm (step 2) Step 2- Optmzaton (DB constrants) Algorthm: Slp_Szng(R st-ntal, I M (t), VVSS_MAX) 1: calculate I st ( t ) and v t = R I t for all, j ; j ( j ) st ( ) ntal st j 2: whle (mn_slack < 0) 3: fnd LBCM and m=lbcm; 4: R = R α R ; st st st m m m 5: update I st ( t ) j and v ( tj ) = Rst I ( ) st t j for all, j; 6: mn_slack = ; 7: for k=1 to K, j=1 to N 8: f ( Δ DDR_MAX < mn_slack ) d π k 9: mn_slack = 10: end f 11: end for 12: end whle 13: return( R ) for all ; st Δ d π k DDR_MAX dmax;
Smulaton Approach Max delay degradaton rato, DDR_MAX=10% Vrtual ral resstance, r = 0.1Ω VSS Max number of the crtcal paths, K=100 Resstance decrement factor, α = 01 0.1 Crcut # of cells Total sleep TX wdth (λ) # of Proposed vs. Proposed vs. Footers [X]=[Chou- [Y]=[Chou- Proposed [X] (%) [Y] (%) DAC 06] DAC 07] C17 7 2 53 44 16 70 64 9sym 276 30 786 715 312 60 56 C432 214 30 811 665 343 57 48 C880 467 55 1290 1173 579 55 51 C1355 546 60 1437 1597 727 49 54 C3540 1307 280 3920 3469 1679 57 52 C5315 1783 320 5799 5631 3372 42 40 Avg. 2.0 1.89 1 55 52
Concluson A new sleep transstor szng approach s proposed The algorthm takes a max crcut slowdown factor and produces the szes of varous sleep transstors whle consderng the DC parastcs of the vrtual ground The problem can be formulated as a szng wth delaybudgetng and solved effcently usng a heurstc szng algorthm The algorthm approaches the optmum soluton by slowng down the modules wth larger amount of dschargng g current more than the ones wth smaller amount of dschargng current, current-aware optmzaton The proposed technque uses at least 40% less total sleep transstor wdth compared to other approaches