ECE260B CSE241A Winter 2007 Interconnects Website: http://vlsicad.ucsd.edu/courses/ece260b-w07 ECE 260B CSE 241A Interconnects 1
Outline Interconnect Scaling and Power Resistance Capacitance and Inductance Delay ECE 260B CSE 241A Interconnects 2
SEMATECH Prototype BEOL stack, 2000 Global (up to 5) Via Wire Passivation Dielectric Etch Stop Layer Dielectric Capping Layer Intermediate (up to 4) Copper Conductor with Barrier/Nucleation Layer Local (2) Pre Metal Dielectric Tungsten Contact Plug What are some implications of reverse-scaled global interconnects? ECE 260B CSE 241A Interconnects 3 Slide courtesy of Chris Case, BOC Edwards
Intel 130nm BEOL Stack Intel 6LM 130nm process with vias shown (connecting layers) Aspect ratio = thickness / minimum width ECE 260B CSE 241A Interconnects 4
Damascene and Dual-Damascene Process Damascene process named after the ancient Middle Eastern technique for inlaying metal in ceramic or wood for decoration Single Damascene Dual Damascene ILD Deposition Oxide Trench / Via Etch Oxide Trench Etch Metal Fill Metal Fill Metal CMP Metal CMP ECE 260B CSE 241A Interconnects 5
Cu Dual-Damascene Process Bulk copper removal Cu Damascene Process Barrier removal Polishing pad touches both up and down area after step height Different polish rates on different materials Dishing and erosion arise from different polish rates for copper and oxide Oxide erosion Copper dishing Oxide over-polish ECE 260B CSE 241A Interconnects 6
Area Fill & Metal Slot for Copper CMP Copper Oxide Area Fill Metal Slot Dishing can thin the wire or pad, causing higher-resistance wires or lower-reliability bond pads Erosion can also result in a sub-planar dip on the wafer surface, causing short-circuits between adjacent wires on next layer Oxide erosion and copper dishing can be controlled by area filling and metal slotting ECE 260B CSE 241A Interconnects 7
Evolution of Interconnect Modeling Needs Before 1990, wires were thick and wide while devices were big and slow Large wiring capacitances and device resistances Wiring resistance << device resistance Model wires as capacitances only In the 1990s, scaling (by scale factor S) led to smaller and faster devices and smaller, more resistive wires Reverse scaling of properties of wires RC models became necessary Interconnects dominate VLSI system performance In the 2000s, frequencies can be high enough that inductance has become a major component of total impedance Approaches Hierarchical time-budgeting Top-level chip-integration ECE 260B CSE 241A Interconnects 8
Interconnect Scaling s w t h l w: width of interconnect s: spacing between interconnects on same layer h: dielectric thickness (spacing between interconnects in two vertically adjacent layers) l: length of interconnect t: thickness of interconnect ECE 260B CSE 241A Interconnects 9 Slide courtesy of Sherief Reda, Brown
Constant thickness scaling versus reduced thickness scaling reduced thickness scaling l w t constant thickness scaling l w t w S t S w S t l S l S ECE 260B CSE 241A Interconnects 10 Slide courtesy of Sherief Reda, Brown
Implications of Ideal Interconnect Scaling ECE 260B CSE 241A Interconnects 11 Slide courtesy of Sherief Reda, Brown
Interconnect delay is dominating gate delay bottleneck ECE 260B CSE 241A Interconnects 12 Slide courtesy of Sherief Reda, Brown
ITRS predictions imply global wires will most likely be buffered to reduce their delay bottleneck gate delay local (scaled) global global wires (no repeaters) (repeaters) Delay of local interconnects is relatively scaling well; global wires are a problem ECE 260B CSE 241A Interconnects 13 Slide courtesy of Sherief Reda, Brown
With scaling the reachable radius of a buffer decreases we need more and more buffers bottleneck repeaters required to buffer Itanium global interconnects A corner-to-corner (BL-UR) wire in Itanium (180nm) requires 6 repeaters to span die Repeaters consume chip area; consume power; add vias ECE 260B CSE 241A Interconnects 14 Slide courtesy of Sherief Reda, Brown
It takes an increasing number of clock cycles to span a die [Matzke, TI 97] Wires need to be pipelined (repeaters with states) to maintain synchronization in face of latency variations Use networks that route packets instead of global wires (network-on-achip NoC) ECE 260B CSE 241A Interconnects 15 Slide courtesy of Sherief Reda, Brown
Interconnect Statistics Local Interconnect S Local = S Technology S Global = S Die Global Interconnect What are some implications? ECE 260B CSE 241A Interconnects 16
Case Study Low-power, state-of-the-art µ-processor Dynamic switching power analysis Interconnect attributes: Length Capacitance Fanout (FO) Hierarchy data Net type Activity factors (AF) Miscellaneous ECE 260B CSE 241A Interconnects 17 Slide courtesy of Magen/Kolodny/Shamir, Intel
Interconnect Length Distribution Source: Shekhar Y. Borkar, CRL - Intel ECE 260B CSE 241A Interconnects 18 Slide courtesy of Magen/Kolodny/Shamir, Intel
Interconnect Length Distribution Log Log scale Exponential decrease with length Global clock not included ECE 260B CSE 241A Interconnects 19 Slide courtesy of Magen/Kolodny/Shamir, Intel
Total Dynamic Power Total Dynamic Power Global clock not included Local nets = 66% Global nets = 34% Normalized Dynamic Power Length [um] ECE 260B CSE 241A Interconnects 20 Slide courtesy of Magen/Kolodny/Shamir, Intel
Total Dynamic Power Breakdown Global clock included ECE 260B CSE 241A Interconnects 21 Slide courtesy of Magen/Kolodny/Shamir, Intel
Reducing Dynamic Capacitive (Switching) Power Capacitance: Function of fan-out, wire length, transistor sizes Supply Voltage: Has been dropping with successive generations P dyn = C L V DD2 P 0 1 f Activity factor: How often, on average, do wires switch? Clock frequency: Increasing Slide courtesy of Mary Jane Irwin, PSU ECE 260B CSE 241A Interconnects 22
Power Breakdown by Net Types Global clock included Interconnect Power (Interconnect Only) Total Power (Gate, Diffusion and Interconnect) ECE 260B CSE 241A Interconnects 23 Slide courtesy of Magen/Kolodny/Shamir, Intel
Outline Interconnect Scaling and Power Resistance Capacitance and Inductance Delay ECE 260B CSE 241A Interconnects 24
Resistance & Sheet Resistance R = ρ L T W T L Sheet Resistance R W R 1 R 2 Resistance seen by current going from left to right is same in each block ECE 260B CSE 241A Interconnects 25
Bulk Resistivity Aluminum dominant until ~2000 Copper has taken over in past 4-5 years Copper as good as it gets ECE 260B CSE 241A Interconnects 26
Interconnect Resistance Resistance scales badly True scaling would reduce width and thickness by S each node R ~ S 2 for a fixed line length and material Reverse scaling wires get smaller and slower, devices get smaller and faster At higher frequencies, current crowds to edges of conductor (thickness of conduction = skin depth) increased R ECE 260B CSE 241A Interconnects 27
Copper Resistivity: The Ugly Reality Resistivity (uohm-cm) Conductor resistivity increases expected to appear around 100 nm linewidth - will impact intermediate wiring first - ~ 2006 Cu Resistivity vs. Linewidth WITHOUT Cu Barrier 2.5 2.4 2.3 2.2 2.1 2 1.9 1.8 1.7 1.6 1.5 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Line Width (um) 100nm ITRS Requirement WITH Cu Barrier 70nm ITRS Requirement WITH Cu Barrier Courtesy of SEMATECH ECE 260B CSE 241A Interconnects 28 Slide courtesy of Chris Case, BOC Edwards
Outline Interconnects Resistance Capacitance and Inductance Delay ECE 260B CSE 241A Interconnects 29
Capacitance: Parallel Plate Model ILD = interlevel (or, interlayer) dielectric L W T H ILD SiO 2 Substrate Bottom plate of cap can be another metal layer C int = e ox * (W*L / t ox ) ECE 260B CSE 241A Interconnects 30
Insulator Permittivities Huge effort to develop low-k dielectrics (e r < 4.0) for metal Reduces capacitance helps delay and power Materials have been identified, but process integration has been difficult at best ECE 260B CSE 241A Interconnects 31
Capacitance Values for Different Configurations Parallel-plate model substantially underestimates capacitance as line width drops below order of ILD height Why? ECE 260B CSE 241A Interconnects 32
Line Dimensions and Fringing Capacitance Lateral cap w S Line dimensions: W, S, T, H Sometimes H is called T in the literature, which can be confusing ECE 260B CSE 241A Interconnects 33
Interwire (Coupling) Capacitance Level2 Insulator Level1 SiO 2 Substrate Coupling effects among neighboring wires ECE 260B CSE 241A Interconnects 34
Interwire Capacitance Layer Poly M1 M2 M3 M4 M5 Capacitance (af/um) at minimum spacing 40 95 85 85 85 115 Example: Two M3 lines run parallel to each other for 1mm. The capacitance between them is 85aF/um * 1000um = 85000aF = 85fF Interwire capacitance today reaches ~80% of total wire capacitance Past M1 Sub Present / Future M1 Sub ECE 260B CSE 241A Interconnects 35
Capacitance Estimation Empirical capacitance models are easiest and fastest Handle limited configurations (e.g., range of T/H ratio) Some limiting assumptions (e.g., no neighboring wires) C wire Capacitance per unit length = ε ox W H ILD + 0.77 + 1.06 W H ILD 0.25 T + 1.06 H wire ILD 0.5 Rules of thumb: e.g., 0.2 ff/um for most wire widths < 2um Cf. MOSFET gate capacitance ~ 1 ff/um width Pattern-matching approaches applied to multilayer crosssections ECE 260B CSE 241A Interconnects 36
Capacitive Crosstalk Noise Two coupled lines W S C c C c T Cross-section view H C a C v C a Ground Plane Interwire capacitance allows neighboring wires to interact Charge injected across C c results in temporary (in static logic) glitch in voltage from the supply rail at the victim ECE 260B CSE 241A Interconnects 37
Crosstalk From Capacitive Coupling Q = C c (ΔV v ΔV A ) = charge delivered to coupling capacitor = C cv ΔV v - (See Weste/Harris, 4.5.4.1) C cv = (ΔV v ΔV A ) / ΔV v * C c Miller Coupling Factor (MCF) Equal rise/fall times (or both step inputs) C CV = 2C C V A and V V have opposite transitions (assume R A = R V and C A = C V dv A / dt = -dv V / dt) Current through C C is given by C C * (dv A / dt + dv V / dt). This should be the same as current through equivalent capacitor, which is C CV dv V / dt C CV = C C * (1 + (dv A / dt) / (dv V / dt)) ECE 260B CSE 241A Interconnects 38
Crosstalk From Capacitive Coupling Glitches caused by capacitive coupling between wires An aggressor wire switches A victim wire is charged or discharged by the coupling capacitance (cf. charge-sharing analysis) An otherwise quiet victim may look like it has temporarily switched This is bad if: The victim is a clock or asynchronous reset The victim is a signal whose value is being latched at that moment What are some fixes? Aggressor Victim ECE 260B CSE 241A Interconnects 39 Slide courtesy of Paul Rodman, ReShape
Crosstalk: Timing Pull-In A switching victim is aided (sped up) by coupled charge This is bad if your path now violates hold time checks Fixes include adding delay elements to your path Aggressor Victim ECE 260B CSE 241A Interconnects 40 Slide courtesy of Paul Rodman, ReShape
Crosstalk: Timing Push-Out A switching victim is hindered (slowed down) by coupled charge This is bad if your path now violates setup time checks Fixes include spacing the wires, using strong drivers, Aggressor Victim ECE 260B CSE 241A Interconnects 41 Slide courtesy of Paul Rodman, ReShape
Crosstalk Delay Calculation: Levels of Accuracy Discard coupling capacitances De-coupling by replacing coupling caps by double ground caps (conservative, MCF = 2) De-coupling by Miller factors (MCF between 0,2) Simulating multi-input multi-output (MIMO) networks Input 1 Output 1 Input 2 Output 2 ECE 260B CSE 241A Interconnects 42
Worst Case Aggressor Scenario Stimuli vector For RC interconnects - Aggressors take opposite transition max delay - Aggressors take identical transition min delay For RLC interconnects -? Aggressor 1 Aggressor alignment For (linear) interconnects - Aggressors are aligned with each other to make max crosstalk noise peak - Align the noise peak to make max delay variation For worst case gate delay -? Aggressor 2 Noise Δ delay alignment ECE 260B CSE 241A Interconnects 43
Calculation Flow Timing window overlaps enable crosstalk delay variation Chicken-egg dilemma: delay vs. crosstalk Iteration Starting with the assumption that all timing windows are overlapped (pessimistic about the unknowns) Refine calculation by reducing pessimism Aggressor Victim overlap Timing window assumptions Crosstalk delay calculation Δ delay ECE 260B CSE 241A Interconnects 44
Scaling of Delay Uncertainty Delay Noise Aggressor Relatively greater coupling noise due to line dimension scaling Tighter timing budgets to achieve fast circuit speed ( all paths critical ) Train wreck? Timing analysis can be guardbanded by scaling the coupling capacitance by a Miller Coupling Factor to account for push-in or push-out. ECE 260B CSE 241A Interconnects 45 Victim ΔT d / T d (%) 85 80 75 70 65 60 55 50 45 40 35 30 25 Slide courtesy of Kevin Cao, Berkeley Delay Uncertainty Delay Uncertainty Nominal Delay 0.35 0.30 0.25 0.20 0.15 0.10 Technology Generation (μm)
Inductance When signal is coupled to a ground plane, the current loop has an inductance. More apparent for upper layer metals and longer lines Simple lumped model: Gives interconnect transmission-line qualities Propagates signal energy, with delay; sharper rise times; ringing Magnetic flux couples to many signals computational challenge Not just coupled to immediately adjacent signals (unlike capacitors) Coupling over a larger distance Bigger lumped model: matrix of coupling coefficients not sparse ECE 260B CSE 241A Interconnects 46 Slide courtesy of Ken Yang, UCLA
Inductance is Important ωl R If where Copper interconnects R is reduced Faster clock speeds Thick, low-resistance (reverse-scaled) global lines Chips are getting larger long lines large current loops Frequency of interest is determined by signal rise time, not clock frequency ω = 2 πf = 2 π π 1 t r ECE 260B CSE 241A Interconnects 47 Slide courtesy of Massoud/Sylvester/Kawa, Synopsys
Inductance V = L d I/d t V 2 = M 12 d I 1 /d t Faraday s law V = N d (B A) / d t B = µ (N / l) I L = µ N 2 A / l V = voltage I = current L = inductance M 12 = mutual inductance associated with conductors 1, 2 N = number of turns of the coil B = magnetic flux A = area of magnetic field circled by the coil l = height of the coil t = time At high frequencies, can be significant portion of total impedance Z = R + jwl (w = 2πf = angular frequency) ECE 260B CSE 241A Interconnects 48
On-Chip Inductance Inductance is a loop quantity Knowledge of return path is required, but hard to determine Signal Line Return Path For example, the return path depends on the frequency ECE 260B CSE 241A Interconnects 49 Slide courtesy of Massoud/Sylvester/Kawa, Synopsys
Frequency-Dependent Return Path ( R >> ωl) At low frequency, and current tries to minimize impedance minimize resistance R + jω use as many returns as possible (parallel resistances) Gnd Gnd Gnd Signal Gnd Gnd Gnd ( L) ( R << ωl) At high frequency, and current tries to minimize impedance minimize inductance ( R + jωl) use smallest possible loop (closest return path) L dominates, current returns collapse Power and ground lines always available as low-impedance current returns Gnd Gnd Gnd Signal Gnd Gnd Gnd ECE 260B CSE 241A Interconnects 50 Slide courtesy of Massoud/Sylvester/Kawa, Synopsys
Inductance Trends Inductance = weak (log) function of conductor dimensions Inductance = strong function of distance to current return path (e.g., power grid) Want nearby ground line to provide a small current loop (cf. Alpha 21164) Inductance most significant in long, low-r, fast-switching nets Clocks are most susceptible ECE 260B CSE 241A Interconnects 51
Inductance vs. Capacitance Capacitance Locality problem is easy: electric field lines suck up to nearest neighbor conductors Local calculation is hard: all the effort is in accuracy Inductance Locality problem is hard: magnetic field lines are not local; current returns can be complex Local calculation is easy: no strong geometry dependence; analytic formulae work very well Intuitions for design Seesaw effect between inductance and capacitance Minimize variations in L and C rather than absolutes - E.g., would techniques used to minimize variation in capacitive coupling also benefit inductive coupling? ECE 260B CSE 241A Interconnects 52 Slide courtesy of Sylvester/Shepard
Outline Interconnects Capacitance and Inductance Resistance Delay ECE 260B CSE 241A Interconnects 53
Interconnect: Distributing the Capacitance The resistance and capacitance of an interconnect is distributed. Model by using R and C. Π Model is the best Distributed model uses N segments. - More accurate but computationally expensive - Number of nodes blows up. Lump model uses 1 segment of Π. - Sufficient for most nets (point to point) Distributed using multiple lumps of Π model of a single wire ECE 260B CSE 241A Interconnects 54 Slide courtesy of Ken Yang, UCLA
Transition Degradation - Propagating Wavefront Transition degradation leads to increased downstream (gate and interconnect) delays Step response of a distributed RC wire as function of location along wire and time ECE 260B CSE 241A Interconnects 55
RC Line Models and Step Response T_th = ln (1 / (1 Th)) * T_ED (e.g., T_0.9 = 2.3 * T_ED; T_0.632 = T_ED) ECE 260B CSE 241A Interconnects 56
Elmore Delay = First Moment of Transfer Function H(t) = step input response h(t) = impulse response = dh(t)/dt = transfer function in time domain T 50% = median of h(t) T HT ( ) = htdt ( ) = 05. 0 T ED = mean of h(t) T = ED t h() t dt 0 T ED = first moment of h(t) st Hs ( ) = e 2 htdt ( ) = m + ms+ ms+... m i 0 0 1 2 i ( ) i = 1 thtdt () i! 0 ECE 260B CSE 241A Interconnects 57
Elmore Delay = Simple Delay Metric Upper bound 50% delay for RC trees T ED = T 50% if symmetric h(t) h(t) T ED T ED > T 50% for monotonic waveforms T 50% with increased transition time t elm t T ED = T 50% / ln2 for an RC load driven by a step input Simple (linear time) computation Incremental facilitate ECO (Engineering Chang Order) R C ECE 260B CSE 241A Interconnects 58
Elmore Delay for RC Network ECE 260B CSE 241A Interconnects 59
Driving Large Capacitances t phl = C L V swing /2 V DD I av V in C L V out Transistor Sizing ECE 260B CSE 241A Interconnects 60
Driving Large Capacitances: Inverter As Buffer A α*a In C in 1 α C L = X * C in Total propagation delay = t p (inv) + t p (buffer) t p0 = delay of min-size inverter with single min-size inverter as fanout load Minimize t p = α * t p0 + X/α * t p0 α opt = sqrt(x) ; t p,opt = 2 t p0 * sqrt(x) Use only if combined delay is less than unbuffered case ECE 260B CSE 241A Interconnects 61 Slide courtesy of Mary Jane Irwin, PSU
Delay Reduction With Cascaded Buffers C L = xc in = u N C in in 1 u u 2 u N-1 out C in C 1 C 2 C L Cascade of buffers with increasing sizes (U = tapering factor) can reduce delay If load is driven by a large transistor (which is driven by a smaller transistor) then its turn-on time dominates overall delay Each buffer charges the input capacitance of the next buffer in the chain and speeds up charging, reducing total delay Cascaded buffers are useful when R int < R tr ECE 260B CSE 241A Interconnects 62 Slide courtesy of Mary Jane Irwin, PSU
Line Propagation Delay t p as Function of U and X 60.0 u/ln(u) 40.0 x=10,000 x=1000 20.0 x=100 x=10 0.0 1.0 3.0 5.0 7.0 u Total line delay as function of driver size, load capacitance ECE 260B CSE 241A Interconnects 63 Slide courtesy of Mary Jane Irwin, PSU
Reducing RC Delay With Repeaters RC delay is quadratic in length must reduce length T_50 = 0.4 * R_int * C_int + 0.7 * (R_tr * C_int + R_tr * C_L + R_int * C_L) Observation: 2 2 = 4 and 1+1 = 2 but 1 2 + 1 2 = 2 driver receiver driver receiver L = 2 units Repeater = strong driver (usually inverter or pair of inverters for non-inversion) that is placed along a long RC line to break up the line and reduce delay ECE 260B CSE 241A Interconnects 64
Optimum Number and Size of Repeaters ECE 260B CSE 241A Interconnects 65
Repeaters vs. Cascaded Buffers Repeaters are used to drive long RC lines Breaking up the quadratic dependence of delay on line length is the goal Typically sized identically Cascaded buffers are used to drive large capacitive loads, where there is no parasitic resistance We put all buffers at the beginning of the load This would be pointless for a long RC wire since the wire RC delay would be unaffected and would dominate the total delay ECE 260B CSE 241A Interconnects 66 Slide courtesy of D. Sylvester, U. Michigan
Gate Delay Gate delay is a measure of an input transition to an output transition. May have different delays for different input to output paths. Inputs Outputs Different for an upward or downward transition. - t plh propagation delay from LOW-to-HIGH (of the output) A transition is defined as the time at which a signal crosses a logical threshold voltage, V THL. Digital Abstraction for 1 and 0 Often use V DD /2. Logic Gate ECE 260B CSE 241A Interconnects 67 Slide courtesy of Ken Yang, UCLA
Static CMOS Gate Delay Output of a gate drives the inputs to other gates (and wires). Only pull-up or pull-down, not both. Capacitive loads. Delay is due to the charging and discharging of a capacitor and the length of time it takes. out in out in t phl C LOAD V THL The delay of EACH is treated as separately calculable in t PD1 t PD2 out t PD = t PD1 + t PD2 ECE 260B CSE 241A Interconnects 68 Slide courtesy of Ken Yang, UCLA
RC Model We can model a transistor with a resistor (Take into account the different regions of operation?) (Use a realistic transition time to model an input switching?) We can take the average capacitance of a transistor as well The model we will primarily use: Delay = R DRV C LOAD (the time constant) R proportional to L/W - Wider device (stronger drive) in Inverter Model R DRVP out - Smaller R DRV shorter delay. R DRVN ECE 260B CSE 241A Interconnects 69 Slide courtesy of Ken Yang, UCLA
CΔV/I Model Another common expression for delay is CΔV/I. Based on the capacitance charging and discharging ΔV is the voltage to the transition (V DD /2) Very similar model except we are breaking R into 2 components, V/I I = average drive current This helps understand what determines R I is proportional to mobility and W/L I is proportional to V 2 (V is proportional to V DD ) For example, we can anticipate what might happen if V DD drops. ECE 260B CSE 241A Interconnects 70 Slide courtesy of Ken Yang, UCLA
Reading Assignment Friday 1/12 Logic Synthesis: Weste/Harris Section 8.4 Verification: Weste/Harris Section 9.1 ECE 260B CSE 241A Interconnects 71
Homework Friday 1/12 (1) Exercise 4.28 of the textbook (2) Exercise 4.32 of the textbook (3) Exercise 4.34 of the textbook (4) Find and give a statement of Rent s Rule REMINDER HOMEWORK DEADLINE: Homeworks from Wednesday and Friday of Week N are due in class on Wednesday of Week N+1 REMINDER LAB ACCESS: Students (especially ECE students must see Robin Vespertino in EBU 3B Room 2248, preferably between 1-5pm M-F, to scan their ID cards ECE 260B CSE 241A Interconnects 72