ourse dministration PE/EE 47, PE 57 VLSI Design I L3: Wires, Design for Speed Department of Electrical and omputer Engineering University of labama in Huntsville leksandar Milenkovic (.ece.uah.edu/~milenka ).ece.uah.edu/~milenka/cpe57-5f Instructor: leksandar Milenkovic milenka@ece.uah.edu.ece.uah.edu/~milenka E 7-L Mon. 5:3 PM 6:3 PM, Wen. :3 3:3 PM URL: http://.ece.uah.edu/~milenka/cpe57-5f T: Joel Wilder Labs: Lab#4: due /4/5; Lab#5: //5 Hs: Solutions in secure directory /scr (cpe47fall5,?) Project: Proposals due as on //5 Test I: /7/5 Text: MOS VLSI Design, 3rd ed., Weste, Harris Revie: hapters,, 3, 4; Today: Wires, Design for Speed (meet M in the Lab tonight) //5 VLSI Design I;. Milenkovic Outline Introduction Introduction Wire Resistance Wire apacitance Wire R Delay rosstalk Wire Engineering Repeaters hips are mostly made of ires called interconnect In stick diagram, ires set size Transistors are little things under the ires Many layers of ires Wires are as important as transistors Speed Poer Noise lternating layers run orthogonally //5 VLSI Design I;. Milenkovic 3 //5 VLSI Design I;. Milenkovic 4 Wire Geometry Layer Stack Pitch = + s spect ratio: R = t/ Old processes had R << Modern processes have R Pack in many skinny ires t h l s MI.6 µm process has 3 metal layers Modern processes use 6-+ metal layers Example: Layer T (nm) W (nm) S (nm) R Intel 8 nm process 6 7 86 86. M: thin, narro (< 3λ) 5 6 8 8. High density cells 4 8 54 54. M-M4: thicker 7 3 7 3 3. For longer ires 7 7 3 3. 7 M5-M6: thickest 48 5 5.9 8 For V DD, GND, clk Substrate //5 VLSI Design I;. Milenkovic 5 //5 VLSI Design I;. Milenkovic 6 VLSI Design I;. Milenkovic
Wire Resistance hoice of Metals ρ= resistivity (Ω*m) ρ l l R = = R t R = sheet resistance (Ω/ ) is a dimensionless unit(!) ount number of squares R = R * (# of squares) l t Rectangular lock R = R (L/W) Ω t l l 4 Rectangular locks R = R (L/W) Ω = R (L/W) Ω Until 8 nm generation, most ires ere aluminum Modern processes often use copper u atoms diffuse into silicon and damage FETs Must be surrounded by a diffusion barrier Metal Silver (g) opper (u) Gold (u) luminum (l) Tungsten (W) Molybdenum (Mo) ulk resistivity (µω*cm).6.7..8 5.3 5.3 //5 VLSI Design I;. Milenkovic 7 //5 VLSI Design I;. Milenkovic 8 Sheet Resistance Typical sheet resistances in 8 nm process Layer Sheet Resistance (Ω/ ) Diffusion (silicided) 3- Diffusion (no silicide) 5- Polysilicon (silicided) 3- Polysilicon (no silicide) 5-4 Metal.8 Metal.5 Metal3.5 Metal4.3 Metal5. Metal6. ontacts Resistance ontacts and vias also have - Ω Use many contacts for loer R Many small contacts for current croding around periphery //5 VLSI Design I;. Milenkovic 9 //5 VLSI Design I;. Milenkovic Wire apacitance Wire has capacitance per unit length To neighbors To layers above and belo total = top + bot + adj h t h s top bot adj layer n+ layer n layer n- apacitance Trends Parallel plate equation: = ε/d Wires are not parallel plates, but obey trends Increasing area (W, t) increases capacitance Increasing distance (s, h) decreases capacitance Dielectric constant ε = kε ε = 8.85 x -4 F/cm k = 3.9 for SiO Processes are starting to use lo-k dielectrics k 3 (or less) as dielectrics use air pockets //5 VLSI Design I;. Milenkovic //5 VLSI Design I;. Milenkovic VLSI Design I;. Milenkovic
M apacitance Data 8 8 Typical ires have ~. ff/µm ompare to ff/µm for gate capacitance total (af/µm) 4 35 3 5 5 M, M3 planes Isolated s = 3 s = 48 s = 64 s= s = 3 s = 48 s = 64 s= Diffusion & Polysilicon Diffusion capacitance is very high (about ff/µm) omparable to gate capacitance Diffusion also has high resistance void using diffusion runners for ires! Polysilicon has loer but high R Use for transistor gates Occasionally for very short ires beteen gates 5 5 5 (nm) //5 VLSI Design I;. Milenkovic 3 //5 VLSI Design I;. Milenkovic 4 Lumped Element Models Wires are a distributed system pproximate ith lumped element models R R N segments R/N R/N R/N R/N /N /N /N /N R R/ R/ Example Metal ire in 8 nm process 5 mm long.3 µm ide onstruct a 3-segment π-model R = permicron = / / L-model π-model T-model 3-segment π-model is accurate to 3% in simulation L-model needs segments for same accuracy! Use single segment π-model for Elmore delay //5 VLSI Design I;. Milenkovic 5 //5 VLSI Design I;. Milenkovic 6 Example Metal ire in 8 nm process 5 mm long.3 µm ide onstruct a 3-segment π-model R =.5 Ω/ => R = 78 Ω permicron =. ff/µm => = pf Wire R Delay Estimate the delay of a x inverter driving a x inverter at the end of the 5mm ire from the previous example. R =.5 kω*µm for gates Unit inverter:.36 µm nmos,.7 µm pmos 6 Ω 6 Ω 6 Ω 67 ff 67 ff 67 ff 67 ff 67 ff 67 ff t pd = //5 VLSI Design I;. Milenkovic 7 //5 VLSI Design I;. Milenkovic 8 VLSI Design I;. Milenkovic 3
Wire R Delay Simulated Wire Delays Estimate the delay of a x inverter driving a x inverter at the end of the 5mm ire from the previous example. R =.5 kω*µm for gates Unit inverter:.36 µm nmos,.7 µm pmos 78 Ω voltage (V).5.5 L L/ L/4 L/ L V out 69 Ω 5 ff 5 ff 4 ff.5 t pd =. ns Driver Wire Load //5 VLSI Design I;. Milenkovic 9.5.5.5 3 3.5 4 4.5 5 time (nsec) //5 VLSI Design I;. Milenkovic Wire Delay Models Wire Delay Models, con t Ideal ire same voltage is present at every segment of the ire at every point in time - at equi-potential only holds for very short ires, i.e., interconnects beteen very nearest neighbor gates Lumped model hen only a single parasitic component (, R, or L) is dominant the different fractions are lumped into a single circuit element When the resistive component is small and the sitching frequency is lo to medium, can consider only ; the ire itself does not introduce any delay; the only impact on performance comes from ire capacitance Driver V out R Driver V out Lumped R model total ire resistance is lumped into a single R and total capacitance into a single good for short ires; pessimistic and inaccurate for long ires Distributed R model circuit parasitics are distributed along the length, L, of the ire c and r are the capacitance and resistance per unit length r L c L r L r L r L r L c L c L c L c L V N (r,c,l) V N c ire lumped capacitance per unit length good for short ires; pessimistic and inaccurate for long ires //5 VLSI Design I;. Milenkovic Delay is determined using the Elmore delay equation N τ Di = c k r ik //5 VLSI Design I;. Milenkovic k= hain Netork Elmore Delay hain Netork Elmore Delay τ D =c r τ D =c r +c (r +r ) r r r i- r i r i- i N N V N r r r i- r i r i- i N N V N c c c i- c i c N c c c i- c i c N τ Di =c r +c (r +r )+ +c i (r +r + +r i ) Elmore delay equation N i τ DN = c i r ii = c i r j Elmore delay equation N i τ DN = c i r ii = c i r j τ Di =c r eq +c r eq +3c 3 r eq + + ic i r eq //5 VLSI Design I;. Milenkovic 3 //5 VLSI Design I;. Milenkovic 4 VLSI Design I;. Milenkovic 4
Distributed R Model for Simple Wires length L R ire can be modeled by N segments of length L/N The resistance and capacitance of each segment are given by r L/N and c L/N τ DN = (L/N) (cr+cr+ +Ncr) = (crl ) (N(N+))/(N ) = R((N+)/(N)) here R (= rl) and (= cl) are the total lumped resistance and capacitance of the ire For large N τ DN = R/ = rcl / Delay of a ire is a quadratic function of its length, L The delay is / of that predicted (by the lumped model) Putting It ll Together R Driver r,c,l V out Total propagation delay consider driver and ire τ D = R Driver + (R )/ = R Driver +.5r c L and t p =.69 R Driver +.38 R here R = r L and = c L The delay introduced by ire resistance becomes dominant hen (R )/ R Driver W (hen L R Driver /R ) For an R Driver = kω driving an µm ide l ire, L crit is.67 cm //5 VLSI Design I;. Milenkovic 5 //5 VLSI Design I;. Milenkovic 6 Design Rules of Thumb rc delays should be considered hen t pr > t pgate of the driving gate L crit > (t pgate /.38rc) actual L crit depends upon the size of the driving gate and the interconnect material rc delays should be considered hen the rise (fall) time at the line input is smaller than R, the rise (fall) time of the line t rise < R hen not met, the change in the signal is sloer than the propagation delay of the ire so a lumped model suffices Delay ith Long Interconnects When gates are farther apart, ire capacitance and resistance can no longer be ignored. (r, c, L) c int V out t p =.69R dr int + (.69R dr +.38R ) +.69(R dr +R ) fan here R dr = (R eqn + R eqp )/ =.69R dr ( int + fan ) +.69(R dr c +r fan )L +.38r c L Wire delay rapidly becomes the dominate factor (due to the quadratic term) in the delay budget for longer ires. c fan //5 VLSI Design I;. Milenkovic 7 //5 VLSI Design I;. Milenkovic 8 rosstalk capacitor does not like to change its voltage instantaneously. ire has high capacitance to its neighbor. When the neighbor sitches from -> or ->, the ire tends to sitch too. alled capacitive coupling or crosstalk. rosstalk effects Noise on nonsitching ires Increased delay on sitching ires rosstalk Delay ssume layers above and belo on average are quiet Second terminal of capacitor can be ignored Model as gnd = top + bot Effective adj depends on behavior of neighbors Miller effect onstant Sitching ith Sitching opposite V eff() gnd MF adj gnd //5 VLSI Design I;. Milenkovic 9 //5 VLSI Design I;. Milenkovic 3 VLSI Design I;. Milenkovic 5
rosstalk Delay ssume layers above and belo on average are quiet Second terminal of capacitor can be ignored Model as gnd = top + bot Effective adj depends on behavior of neighbors Miller effect gnd adj gnd rosstalk Noise rosstalk causes noise on nonsitching ires If victim is floating: model as capacitive voltage divider adj Vvictim = Vaggressor gnd v + adj onstant Sitching ith Sitching opposite V V DD V DD eff() gnd + adj gnd gnd + adj MF V aggressor ggressor Victim adj gnd-v V victim //5 VLSI Design I;. Milenkovic 3 //5 VLSI Design I;. Milenkovic 3 Driven Victims Usually victim is driven by a gate that fights noise Noise depends on relative resistances Victim driver is in linear region, agg. in saturation If sizes are same, R aggressor = -4 x R victim oupling Waveforms Simulated coupling for adj = victim ggressor.8.5. adj Vvictim = V + + k k τ gnd v aggressor = = τ adj aggressor ( + ) ( + ) R aggressor gnd a adj R victim victim gnd v adj R aggressor ggressor V gnd-a aggressor adj R victim Victim gnd-v V victim Victim (undriven): 5%.9.6 Victim (half size driver): 6% Victim (equal size driver): 8%.3 Victim (double size driver): 4% 4 6 8 4 8 t (ps) //5 VLSI Design I;. Milenkovic 33 //5 VLSI Design I;. Milenkovic 34 Noise Implications So hat if e have noise? If the noise is less than the noise margin, nothing happens Static MOS logic ill eventually settle to correct output even if disturbed by large noise spikes ut glitches cause extra delay lso cause extra poer from false transitions Dynamic logic never recovers from glitches Memories and other sensitive circuits also can produce the rong anser Wire Engineering Goal: achieve delay, area, poer goals ith acceptable noise Degrees of freedom: //5 VLSI Design I;. Milenkovic 35 //5 VLSI Design I;. Milenkovic 36 VLSI Design I;. Milenkovic 6
Wire Engineering Goal: achieve delay, area, poer goals ith acceptable noise Degrees of freedom:..8 Width.8.7.6.6 Spacing.4 Delay (ns): R/...8.6.4. 5 5 Pitch (nm) oupling: adj / ( adj + gnd ).5.4.3.. 5 5 Pitch (nm) Wire Spacing (nm) 3 48 64 Wire Engineering Goal: achieve delay, area, poer goals ith acceptable noise Degrees of freedom:..8 Width.8.7.6.6 Spacing.4.5...4 Layer.8 Delay (ns): R/.6.4. 5 5 Pitch (nm) oupling: adj / ( adj + gnd ).3.. 5 5 Pitch (nm) Wire Spacing (nm) 3 48 64 //5 VLSI Design I;. Milenkovic 37 //5 VLSI Design I;. Milenkovic 38 Wire Engineering Goal: achieve delay, area, poer goals ith acceptable noise Degrees of freedom:..8 Width.8.7.6.6 Spacing.4.5...4 Layer.8.3.6. Shielding.4. Delay (ns): R/. 5 5 Pitch (nm) oupling: adj / ( adj + gnd ) 5 5 Pitch (nm) Wire Spacing (nm) 3 48 64 Repeaters R and are proportional to l R delay is proportional to l Unacceptably great for long ires //5 VLSI Design I;. Milenkovic 39 //5 VLSI Design I;. Milenkovic 4 Repeaters R and are proportional to l R delay is proportional to l Unacceptably great for long ires reak long ires into N shorter segments Drive each one ith an inverter or buffer Driver Wire Length: l Receiver Repeater Design Ho many repeaters should e use? Ho large should each one be? Equivalent ircuit Wire length l/n Wire apaitance *l/n, Resistance R *l/n Inverter idth W (nmos = W, pmos = W) Gate apacitance *W, Resistance R/W l/n N Segments Segment l/n l/n Driver Repeater Repeater Repeater Receiver //5 VLSI Design I;. Milenkovic 4 //5 VLSI Design I;. Milenkovic 4 VLSI Design I;. Milenkovic 7
Repeater Design Ho many repeaters should e use? Ho large should each one be? Equivalent ircuit Wire length l Wire apacitance *l, Resistance R *l Inverter idth W (nmos = W, pmos = W) Gate apacitance *W, Resistance R/W R/W R ln l/n l/n 'W Repeater Results Write equation for Elmore Delay Differentiate ith respect to W and N Set equal to, solve l R = N R t pd = + l W = ( ) R R R R ~6-8 ps/mm in 8 nm process //5 VLSI Design I;. Milenkovic 43 //5 VLSI Design I;. Milenkovic 44 Revie: MOS Inverter: Dynamic V DD Designing for Speed t phl = f(, L ) Department of Electrical and omputer Engineering University of labama in Huntsville L V out t phl =.69 R eqn L t phl =.69 (3/4 ( L V DD )/ I DSTn ) =.5 L / (W/L n k n V DSTn ) = V DD //5 VLSI Design I;. Milenkovic 46 Revie: Designing Inverters for Performance Sitch Delay Model Reduce L internal diffusion capacitance of the gate itself interconnect capacitance fanout Increase W/L ratio of the transistor the most poerful and effective performance optimization tool in the hands of the designer atch out for self-loading! Increase V DD only minimal improvement in performance at the cost of increased energy dissipation Slope engineering - keeping signal rise and fall times smaller than or equal to the gate propagation delays and of approximately equal values good for performance good for poer consumption //5 VLSI Design I;. Milenkovic 47 L int INVERTER NND L //5 VLSI Design I;. Milenkovic 48 R eq NOR int L VLSI Design I;. Milenkovic 8
Input Pattern Effects on Delay Delay Dependence on Input Patterns L int Delay is dependent on the pattern of inputs Lo to high transition both inputs go lo delay is.69 / L since to p-resistors are on in parallel one input goes lo delay is.69 L High to lo transition both inputs go high delay is.69 L dding transistors in series (ithout sizing) slos don the circuit Voltage, V 3.5.5.5 == =, = =, = 3 4 -.5 time, psec -input NND ith NMOS =.5µm/.5 µm PMOS =.75µm/.5 µm L = ff Input Data Pattern == =, = =, = == =, = =, = Delay (psec) 69 6 5 35 76 57 //5 VLSI Design I;. Milenkovic 49 //5 VLSI Design I;. Milenkovic 5 Transistor Sizing Fan-In onsiderations L int D 3 L Distributed R model (Elmore delay) int L D t phl =.69 R eqn ( + +3 3 +4 L ) Propagation delay deteriorates rapidly as a function of fan-in quadratically in the orst case. //5 VLSI Design I;. Milenkovic 5 //5 VLSI Design I;. Milenkovic 5 t p (psec) 5 75 5 t p as a Function of Fan-In t phl quadratic function of fan-in 5 t plh linear function of 4 6 8 4 6 fan-in fan-in Gates ith a fan-in greater than 4 should be avoided. //5 VLSI Design I;. Milenkovic 53 t p Fast omplex Gates: Design Technique Transistor sizing as long as fan-out capacitance dominates Progressive sizing In N In 3 In In MN M3 M M 3 L Distributed R line M > M > M3 > > MN The fet closest to the output should be the smallest. an reduce delay by more than %; decreasing gains as technology shrinks //5 VLSI Design I;. Milenkovic 54 VLSI Design I;. Milenkovic 9
Fast omplex Gates: Design Technique Input re-ordering hen not all inputs arrive at the same time critical path critical path Fast omplex Gates: Design Technique Input re-ordering hen not all inputs arrive at the same time critical path critical path In 3 M3 charged L In M3 In M In M In In M 3 M L charged In 3 In In charged In M3 L M3 M charged In M M charged In 3 M delay determined by time to discharge L, and L delay determined by time to discharge L charged discharged discharged //5 VLSI Design I;. Milenkovic 55 //5 VLSI Design I;. Milenkovic 56 Sizing and Ordering Effects Fast omplex Gates: Design Technique 3 lternative logic structures 3 3 3 D 3 F = DEFGH 4 4 L = ff D 4 5 4 6 4 7 3 Progressive sizing in pull-don chain gives up to a 3% improvement. Input ordering saves 5% critical path 3% critical path D 7% //5 VLSI Design I;. Milenkovic 57 //5 VLSI Design I;. Milenkovic 58 Fast omplex Gates: Design Technique 4 Isolating fan-in from fan-out using buffer insertion L L Logical Effort: Design Technique 5 Logical effort generalizes to multistage netorks Path Logical Effort G = g i Path Electrical Effort out-path H = in-path Path Effort F = f = gh i i i Real lesson is that optimizing the propagation delay of a gate in isolation is misguided. g = h = x/ x g = 5/3 h = y/x y g 3 = 4/3 h 3 = z/y z g 4 = h 4 = /z //5 VLSI Design I;. Milenkovic 59 //5 VLSI Design I;. Milenkovic 6 VLSI Design I;. Milenkovic
ranching Effort Multistage Delays Introduce branching effort ccounts for branching beteen stages in path b = on path + on path off path = b i No e compute the path effort F = GH Note: h i = H Path Effort Delay Path Parasitic Delay Path Delay D F = fi P = p i D = d = D + P i F //5 VLSI Design I;. Milenkovic 6 //5 VLSI Design I;. Milenkovic 6 D = d = D + P i Designing Fast ircuits F Delay is smallest hen each stage bears same effort fˆ = gh = F i i N Thus minimum delay of N stage path is D = NF + P N This is a key result of logical effort Find fastest possible delay Doesn t require calculating gate sizes //5 VLSI Design I;. Milenkovic 63 Gate Sizes Ho ide should the gates be for least delay? ˆ out f = gh= g in gi in = i fˆ outi Working backard, apply capacitance transformation to find input capacitance of each gate given load it drives. heck ork by verifying input cap spec is met. //5 VLSI Design I;. Milenkovic 64 est Number of Stages Ho many stages should a path use? Minimizing number of stages is not alays fastest Example: drive 64-bit datapath ith unit inverter est Number of Stages Ho many stages should a path use? Minimizing number of stages is not alays fastest Example: drive 64-bit datapath ith unit inverter Initial Driver Initial Driver D = D = NF /N + P = N(64) /N + N 8 4.8 6 8 3 Datapath Load 64 64 64 64 Datapath Load 64 64 64 64 N: f: D: 3 4 N: f: D: 64 65 8 8 3 4 4.8 5 5.3 Fastest //5 VLSI Design I;. Milenkovic 65 //5 VLSI Design I;. Milenkovic 66 VLSI Design I;. Milenkovic
Derivation onsider adding inverters to end of path Ho many give least delay? n ( ) N D = NF + pi + N n pinv i= D N N N = F ln F + F + pinv = N Define best stage effort ρ = F N ( ) p + ρ lnρ = inv Logic lock: n Stages Path Effort F N - n Extra Inverters est Stage Effort has no closed-form solution pinv + ρ ( lnρ ) = Neglecting parasitics (p inv = ), e find ρ =.78 (e) For p inv =, solve numerically for ρ = 3.59 //5 VLSI Design I;. Milenkovic 67 //5 VLSI Design I;. Milenkovic 68 Sensitivity nalysis Ho sensitive is delay to using exactly the best number of stages? D(N) /D(N).6.5.4.6..5. (ρ=6) (ρ =.4)..5.7..4..4 < ρ < 6 gives delay ithin 5% of optimal We can be sloppy! I like ρ = 4 N / N //5 VLSI Design I;. Milenkovic 69 VLSI Design I;. Milenkovic