DKDT: A Performance Aware Dual Dielectric Assignment for Tunneling Current Reduction Saraju P. Mohanty Dept of Computer Science and Engineering University of North Texas smohanty@cs.unt.edu http://www.cs.unt.edu/~smohanty/ 2/4/2005 1
Introduction Outline of the Talk Why Dual-K K and Dual-T Related Work DKDT Assignment Algorithm Cell Characterization for DKDT Conclusions 2/4/2005 2
Why Low-Power? Motivation: : Extending battery life Source: Power Integrations Inc 2/4/2005 3
Why Low-Power?.. Battery Lifetime Cooling and Energy Costs Environmental Concerns System Reliability 2/4/2005 4
Power Dissipation in CMOS Total Power Dissipation Static Dissipation Dynamic Dissipation Sub-threshold current Tunneling current Reverse-biased diode Leakage Capacitive Switching Short circuit Contention current Source: Weste and Harris 2005 2/4/2005 5
Leakages in Nanometer CMOS I 1 : reverse bias pn junction (both ON & OFF) I 2 : subthreshold leakage (OFF ) I 3 : oxide tunneling current (both ON & OFF) I 4 : gate current due to hot carrier injection (both ON & OFF) I 5 : gate induced drain leakage (OFF) I 6 : channel punch through current (OFF) Source Gate I 3, I 4 Drain N + N + I 6 I 2 P-Substrate 2/4/2005 6 I 1 I 5 Source: Roy 2003
Tunneling paths in an Inverter Low Input : Input supply feeds the tunneling current. High Input : Gate supply feeds the tunneling current. V dd V dd V in =V Low ON V out = V High V in = V High OFF V out =V Low OFF ON 2/4/2005 7
Power Dissipation Trend Gate Length Dynamic Tunneling Trajectory, if High-K Sub-Threshold Normalized Power Dissipation Normalized Power Dissipation Physical Gate Length (nm) Physical Gate Length (nm) Chronological (Year) Source: Hansen 2004 2/4/2005 8
Why Dual-K K and Dual-T T? Gate oxide tunneling current I gate [Kim2003, Chandrakasan2001] (K and α are experimentally derived factors): I gate gate =K W gate gate (V dd /T gate gate ) 2 exp ( ( α T gate gate /V dd ) Options for reduction of tunneling current : Decreasing of supply voltage V dd (will play its role) Increasing gate SiO 2 thickness T gate (opposed to the technology trend!!) Decreasing gate width W gate (only linearly dependent) 2/4/2005 9
Why Dual-K K and Dual-T T? We believe that use of multiple dielectrics (denoted as K gate ) of multiple thickness (denoted as T gate ) will reduce the gate tunneling current significantly while maintaining the performance. 2/4/2005 10
Why Dual-K K and Dual-T T? (Low K Vs gate High K gate gate ) N + N + N + N + P P Low K gate Larger I gate, Smaller delay High K gate Smaller I gate, Larger delay 2/4/2005 11
Why Dual-K K and Dual-T T? (Low T Vs gate High T gate gate ) N + N + N + N + P P Low T gate Larger I gate, Smaller delay High T gate Smaller I gate, Larger delay 2/4/2005 12
Why Dual-K K and Dual-T T? (Four Combinations of K & gate T gate gate) N + P N + N + P N + (1) K 1 T 1 (2) K 1 T 2 Tunneling Current Delay N + N + P N + N + P (3) K 2 T 1 (4) K 2 T 2 2/4/2005 13
Why Dual-K K and Dual-T T? (Example: Four Types of Inverter) Assumption: : all transistors of a logic gate are of same K gate and equal T gate gate. (1) K 1 T 1 (2) K 1 T 2 (3) K 2 T 1 (4) K 2 T 2 2/4/2005 14
Dielectrics for Replacement of SiO 2 Silicon Oxynitride (SiO x N y ) (K=5.7 for SiON) Silicon Nitride (Si 3 N 4 ) (K=7) Oxides of : Aluminum (Al), Titanium (Ti), Zirconium (Zr), Hafnium (Hf), Lanthanum (La), Yttrium (Y), Praseodymium (Pr), their mixed oxides with SiO 2 and Al 2 O 3 NOTE: : I gate is still dependent on T gate irrespective of dielectric material. 2/4/2005 15
Related Works (Tunneling Current Reduction) Inukai et. al. in CICC2000: Boosted Gate MOS (BGMOS) device using dual T ox and dual V Th for both gate and subthreshold standby leakage reduction. Rao et. al. in ESSCIRC2003: Sleep state assignment for MTCMOS circuits for reduction of both gate and subthreshold leakage. 2/4/2005 16
Related Works (Tunneling Current Reduction) Lee et. al. in DAC2003 and TVLSI2004Feb : Pin reordering to minimize gate leakage during standby positions of NOR and NAND gates. Sultania, et. al. in DAC2004 and ICCD2004: Heuristic for dual T ox assignment for tunneling current and delay tradeoff. 2/4/2005 17
Related Works Developed methods that use oxide of different thicknesses for tunneling reduction. Do not handle emerging dielectrics that will replace SiO 2 to reduce the tunneling current. Either consider ON or OFF state, but do not account both. Degradation in performance due to dual thickness approach. 2/4/2005 18
Key Contributions of this Work Introduces a new approach called dual dielectric assignment for tunneling current reduction. Considers dual thickness approach for both of the dielectrics. Explores a combined approach called DKDT (Dual-K of Dual Thickness) and proposes an assignment algorithm. Accounts the tunneling current for both ON and OFF state. Presents a methodology for logic gates characterization for worst-case tunneling considering non-sio 2 dielectrics for low end nano-technology. 2/4/2005 19
DKDT Based Logic Synthesis Input Circuit Cell Library (4 Types) K 1 T 1 K 1 T 2 K 2 T 1 K 2 T 2 Technology Independent Optimization Intermediate Circuit Technology Mapping Mapped Circuit DKDT Assignment and Optimization Tunneling Optimized Circuit Placement and Routing Placement Legalization and ECO Routing Final Circuit Layout 2/4/2005 20
DKDT Assignment : Basis Observation: : Tunneling current of logic gates increases and propagation delay decreases in the order K 2 T 2, K 2 T 1, K 1 T 2, and K 1 T 1 (where, K 1 < K 2 and T 1 < T 2 ). Strategy: : Assign a higher order K and T to a logic gate under consideration To reduce tunneling current Provided increase in path-delay does not violate the target delay 2/4/2005 21
DKDT Assignment : Algorithm Step 1: Represent the network as a directed acyclic graph G(V, E). Step 2: Initialize each vertex v G(V, E) with the values of tunneling current and delay for K 1 T 1 assignment. Step 3: Find the set of all paths P{Π in } for all vertex in the set of primary inputs (Π in ), leading to the primary outputs Π out. Step 4: Compute the delay D P for each path p P{Π in }. 2/4/2005 22
DKDT Assignment : Algorithm Step 5: Find the critical path delay D CP for K 1 T 1 assignment. Step 6: Mark the critical path(s) P CP, where P CP is subset P{Π in }. Step 7: Assign target delay D T = D CP. Step 8: Traverse each node in the network and attempt to assign K-T K T in the order K 2 T 2, K 2 T 1, K 1 T 2, and K 1 T 1 to reduce tunneling while maintaining performance. 2/4/2005 23
(1) (2) { DKDT Assignment : Algorithm (1) FOR each vertex v G(V, E) (1) Determine all paths P v to which node v belongs ; (2) Assign K 2 T 2 to v ; (3) Calculate new critical delay D CP ; (4) Calculate slack in delay as ΔD D = D T D CP ; (5) IF ( ΔD D < 0) then (6) { (1) Assign K 2 T 1 to v ; Calculate D CP ; Calculate ΔD D ; (2) IF (ΔD( D < 0) then (3) { (1) Assign K 1 T 2 to v ; Calculate D CP ; Calculate ΔD D ; (2) IF (ΔD( D < 0) then (1) reassign K 1 T 1 to v ; (4) } // end IF (7) } // end IF (3) // end FOR 2/4/2005 24
DKDT Assignment Algorithm (Time Complexity) Assume that there are n number of gates in the original network representing any circuit. The statements from Step-01 to Step-03 take Θ(n 2 ) time in worst case. The run time for statements from Step-04 to Step-07 are of Θ(n 2 ) complexity. The heuristic loop which assigns DKDT takes Θ(n 3 ) time. Thus, the overall worst case time complexity of the DKDT assignment algorithm is Θ(n 3 ). 2/4/2005 25
DKDT Algorithm : Demo 7 8 1 2 5 6 9 10 22 23 24 18 19 20 21 3 4 13 14 15 11 12 16 17 Original Network 2/4/2005 26
DKDT Algorithm : Demo 6 7 1 5 8 9 10 2 11 3 4 13 14 15 16 12 NAND Network 2/4/2005 27
DKDT Algorithm : Demo 0.44 0.44 6 7 0.44 0.44 0.44 1 5 8 9 10 0.44 2 0.44 11 0.44 0.44 0.44 0.44 3 4 13 14 15 16 0.44 0.44 0.44 0.44 0.44 12 Network with Node Delays 2/4/2005 28
DKDT Algorithm : Demo 0.44 0.44 0.44 0.88 6 7 0.44 0.44 0.44 1.32 0.44 1.76 2.2 2.64 1 5 8 9 10 0.44 0.44 0.44 1.32 0.44 0.44 2 11 0.44 0.44 0.44 0.44 0.44 0.44 0.44 3 4 13 14 15 16 1.32 0.44 0.44 0.88 1.76 2.2 2.64 12 Network with Path Delays (Fix, D T = 2.64) 2/4/2005 29
DKDT Algorithm : Demo Node1 : K 2 T 2? 6 7 0.70 0.70 0.44 1.14 0.44 1.58 2.02 2.46 1 5 8 9 10 0.44 1.32 0.44 0.44 2 11 0.44 0.44 0.44 3 4 13 14 15 16 1.58 2.02 2.46 12 D CP < D T Yes, K 2 T 2 for Node1 2/4/2005 30
DKDT Algorithm : Demo Node2 : K 2 T 2? 6 7 0.70 0.70 1 5 8 9 10 0.70 0.70 2 11 2.90 0.44 1.58 0.44 0.44 0.44 3 4 13 14 15 16 0.44 1.14 2.02 2.46 2.90 12 D CP > D T No for K 2 T 2 2/4/2005 31
DKDT Algorithm : Demo Node2 : K 2 T 1? 6 7 0.70 0.70 1 5 8 9 10 0.64 0.64 2 11 2.84 0.44 1.52 0.44 0.44 0.44 3 4 13 14 15 16 0.44 1.08 1.96 2.40 2.84 12 D CP > D T No for K 2 T 1 2/4/2005 32
DKDT Algorithm : Demo Node2 : K 1 T 2? 6 7 0.70 0.70 1 5 8 9 10 0.6 0.6 2 11 2.80 0.44 1.48 0.44 0.44 0.44 3 4 13 14 15 16 0.44 1.04 1.92 2.36 2.80 12 D CP > D T No for K 1 T 2 2/4/2005 33
DKDT Algorithm : Demo Node2 : Reassign K 1 T 1 6 7 0.70 0.70 1 5 8 9 10 0.64 0.64 2 11 2.64 0.44 1.32 0.44 0.44 0.44 3 4 13 14 15 16 0.44 0.88 1.76 2.20 2.64 12 2/4/2005 34
DKDT Algorithm : Demo 6 7 1 5 8 9 10 2.64 2 11 3 4 13 14 15 16 2.64 12 Final Assignment 2/4/2005 35
Logic Cell Characterization : Load The Berkeley Predictive Technology Model (BPTM) has been used. The first step in the characterization was the selection of an appropriate capacitive load (C Load = 10 * C ggpmos used). The supply voltage is held at V DD = 0.7V. We define the delay as the time difference between the 50% level of input and output. 2/4/2005 36
Logic Cell Characterization : t r For worst-case scenarios in the development of the algorithm, we chose the maximum delay time [ i.e. maximum (t pdr, t pdf ) ].] The effect of switching pulse rise time t r initially examined on the delay characteristics. was To eliminate an explicit dependence of the algorithm results on t r, we chose a value that is realistic yet does not affect the delay significantly. 2/4/2005 37
Logic Cell Characterization : t r Delay Versus Rise Time Delay Versus Rise Time Delay (in ps) Delay (ps) 300 250 200 150 100 50 0 0 0.5 1 1.5 2 2.5 3 Rise Time (1ps to 1ns, in log10) Rise Time (in log10 scale from 1ps to 1ns) Selected, t r = 10ps PMOS NMOS 2/4/2005 38
Logic Cell Characterization : I gate I gs I gcs I gd I gcd I gb BSIM4 Model Direct tunneling current is calculated by evaluating both the source and drain components. For the logic gate, I gate = Σ MOS ( I gs + I gd ). This accounts for tunneling current contributions from devices in both the ON and OFF state. 2/4/2005 39
Cell Characterization : K Modeling gate The effect of varying dielectric material was modeled by calculating an equivalent oxide thickness (T * ox ) according to the formula: T * ox ox = (K gate / K ox ) T gate Here, K gate is the dielectric constant of the gate dielectric material other than SiO 2, (of thickness T gate gate ), while K ox SiO 2. is the dielectric constant of 2/4/2005 40
Cell Characterization : T Modeling gate The effect of varying oxide thickness T ox was incorporated by varying TOXE in SPICE model. Length of the device is proportionately changed to minimize the impact of higher dielectric thickness on the device performance : L* = (T * ox / T ox ) L Length and width of the transistors are chosen to maintain (W:L) ratio of (4:1) for NMOS and (8:1) for PMOS. 2/4/2005 41
Cell Characterization : I Vs gate Kgate Tunneling Current Vs Dielectric Constant Tunneling Current Igate Versus Vs Kgate Dielectric Constant gate (na) Tunneling Current Igate (in I na) gate Tunneling Current I 12 10 8 6 4 2 0 3.9 4.4 4.9 5.4 Gate Dielectric Constant (Kgate) Gate Dielectric Constant (K gate ) Inverter NAND 2/4/2005 42
Cell Characterization : I Vs gate T gate Tunneling Current Vs Dielectric Thickness Igate Vs Tgate gate (na) Tunneling Current Igate (in na) I gate Tunneling Current I 12 10 8 6 4 2 0 1.4 1.45 1.5 1.55 1.6 1.65 1.7 Gate Dielectric Electrical Thickness (Tgate) in nm gate Gate Dielectric Thickness T gate (nm) Inverter NAND 2/4/2005 43
Cell Characterization : T Vs pd K gate Propagation Delay Vs Dielectric Constant Tpd Vs Kgate pd ( ps) Propagation Delay Tpd (in T ps) pd Propagation Delay T 500 400 300 200 100 0 3.9 4.4 4.9 5.4 Gate Dielectric Constant (Kgate) Gate Dielectric Constant (K gate ) Inverter NAND 2/4/2005 44
Cell Characterization : T Vs pd T gate Propagation Delay Vs Dielectric Thickness Tpd Vs Tgate pd ( ps) Propagation Delay Tpd (ps) T pd Propagation Delay T 400 350 300 250 200 150 100 1.4 1.45 1.5 1.55 1.6 1.65 1.7 Gate Dielectric Electrical Thickness (Tgate) in nm gate (nm) Gate Dielectric Thickness T gate (nm) Inverter NAND 2/4/2005 45
Experimental Results: Setup DKDT algorithm was implemented in C and used along with SIS, and tested on the ISCAS'85 benchmarks. The library of gates consisting of four types of NAND and four types of inverters was characterized for the 45 nm technology using SPECTRE tool. We used K 1 = 3.9 (for SiO 2 ), K 2 = 5.7 (for SiON), T 1 = 1.4 nm, and T 2 = 1.7 nm to perform our experiments. The value of T 1 is chosen as the default value from the BSIM4.4.0 model card and value of T 2 is intuitively chosen based on the characterization process. 2/4/2005 46
Experimental Results : Table CKTs C17 C432 C499 C880 C1355 C1908 C2670 C3540 C5315 C6288 C7552 Gates 24 160 202 383 546 880 1193 1669 2406 2406 3512 Critical Delay (ps) 2.22 6.67 3.56 10.68 3.56 11.57 42.71 31.59 40.04 43.15 45.82 Current for K 1 T 1 (na) 187.2 4071.6 5885.1 6739.2 5885.1 10015.2 18415.8 35708.4 29027.7 29355.3 34947.9 Current for DKDT (na) 93.66 281.4 656.06 375.38 305.16 319.69 1734.08 2461.93 1220.89 413.96 695.38 %Redu ction 49.96 93.08 88.85 94.42 94.81 96.80 90.85 93.10 95.79 98.59 98.01 2/4/2005 47
DKDT Algorithm Results Comparision of K1T1 Vs DKDT Assignment with %Reduction Tunneling Current and % Reduction 40 35 30 K1T1 25 DKDT 20 %Reduction % Reduction %Reduction 100 98 96 94 92 90 88 86 84 82 15 10 Tunneling Current in microamps 5 0 C7552 C6288 C5315 C3540 C2670 C1908 C1355 C880 C499 C432 Tunneling Current ( μa) Benchmark Circuits Benchmarks Circuits 2/4/2005 48
Comparative View : Table Bench- mark Circuits C432 C499 C880 C1355 C1908 C2670 C3540 C5315 C6288 C7552 Sultania (100 nm) %Reduc %Pena tion lty 83.8 70.2 92.3 67.9 82.3 92.6 91.4 91.7 62.7 91.6 24.6 25.4 25.5 25.0 25.2 25.3 25.1 26.1 25.7 25.3 Lee (100 nm) %Redu %Pena ction lty 59.52 28.25 45.43 33.50 27.76 33.80 36.40 34.34 45.86 28.10 NA NA NA NA NA NA NA NA NA NA DKDT (45 nm) %Redu %Pena ction lty 93.08 88.85 94.42 94.81 96.80 90.58 93.10 95.79 98.59 98.01 0 0 0 0 0 0 0 0 0 0 2/4/2005 49
Comparative View : Chart % Reduction 100 90 80 70 60 50 40 30 20 10 0 Lee 2004 Sultania 2004 DKDT C432 C499 C880 C1355 C1908 C2670 C3540 C5315 Benchmark Circuits C6288 C7552 NOTE: DKDT has not time penalty. 2/4/2005 50
Conclusions and Future Works New approach DKDT for tunneling current reduction accounting for both the ON and OFF states. Polynomial time complexity heuristic algorithm could carry out such DKDT assignment for benchmark circuits in reasonable amount of time. Experiments prove significant reductions in tunneling current without performance penalty. 2/4/2005 51
Conclusions and Future Works Modeling of other high-k K dielectrics is under progress. Development of optimal assignment algorithm can be considered. Tradeoff of tunneling, area and performance needs to be explored. DKDT based design may need more masks for the lithographic process during fabrication. 2/4/2005 52
The Latest Chip Designed (Claim:: Lowest power consuming image watermarking chip available at present) 2/4/2005 53
The Latest Chip : Statistics Technology : TSMC 0.25 µ Total Area : 16.2 sq mm Dual Clocks : 280 MHz and 70 MHz Dual Voltages : 2.5V and 1.5V No. of Transistors : 1.4 million Power Consumption : 0.3 mw 2/4/2005 54
Thank You?? 2/4/2005 55