Lecture 25 Dealing with Interconnect and Timing
Administrivia Projects will be graded by next week Project phase 3 will be announced next Tu.» Will be homework-like» Report will be combined poster
Today s Lecture Dealing with resistive interconnect Some thoughts about inductance Introduction to timing
INTERCONNECT DealingwithResistance
RI Introduced Noise V DD I φ pre R V DD - V X V I R V
Power and Ground Distribution V DD GND Logic Logic V DD V DD GND (a) Finger-shaped network GND (b) Network with multiple supply pins
Resistance and the Power Distribution Problem Before After Requires fast and accurate peak current prediction Heavily influenced by packaging technology Source: Simplex
Electromigration (1) Limits dc-current to 1 ma/µm
Electromigration (2)
The Impact of Resistivity T r The distributed rc-line R 1 R 2 R N-1 R N C 1 C 2 C N-1 C N V in 2.5 2.5 Diffused signal propagation Delay ~ L 2 voltage (V) voltage (V) 2 2 1.5 1.5 1 1 0.5 0.5 x= L/10 x= L/10 x = L/4 x = L/4 x = L/2 x = L/2 x= L x= L 0 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 time (nsec) time (nsec)
d The Global Wire Problem w w ( R C + R C R C ) T = 0.377R C + 0.693 + Challenges No further improvements to be expected after the introduction of Copper (superconducting, optical?) Design solutions» Use of fat wires» Insert repeaters but might become prohibitive (power, area)» Efficient chip floorplanning Towards communication-based design» How to deal with latency?» Is synchronicity an absolute necessity? d out d w w out
Reducing RC-delay Repeater
Architecture Must Evolve to Fit the Landscape Global operations Low bandwidth High latency & High power 20 Clocks 90,000 tracks Local, parallel operations High bandwidth Low latency & Low power Source: Bill Dally, Stanford
Interconnect: # of Wiring Layers # of metal layers is steadily increasing due to: T ins ρ =2.2 µω-cm M6 Increasing die size and device count: we need more wires and longer wires to connect everything W S M5 Rising need for a hierarchical wiring network; local wires with high density and global wires with low RC H M4 3.5 Minimum Widths (Relative) 4.0 Minimum Spacing (Relative) substrate M3 M2 M1 poly 0.25 µm wiring stack 3.0 2.5 2.0 1.5 1.0 0.5 0.0 1.0µ 0.8µ 0.6µ 0.35µ 0.25µ M5 M4 M3 M2 M1 Poly 3.5 3.0 2.5 2.0 1.5 1.0 0.5 0.0 1.0µ 0.8µ 0.6µ 0.35µ 0.25µ M5 M4 M3 M2 M1 Poly
Interconnect Projections: Copper Copper is planned in full sub-0.25 µm process flows and large-scale designs (IBM, Motorola, IEDM97) With cladding and other effects, Cu ~2.2µΩ-cm vs. 3.5 for Al(Cu) 40% reduction in resistance Electromigration improvement; 100X longer lifetime (IBM, IEDM97)» Electromigration is a limiting factor beyond 0.18 µm if Al is used(hp, IEDM95) Vias
ISSUES IN TIMING
The Clock Skew Problem Clock Rates as High as 2 GHz in CMOS! φ t φ t φ t φ In CL1 R1 CL2 R2 CL3 R3 t i Out t l,min t l,max t r,min t r,max Clock Edge Timing Depends upon Position
Delay of Clock Wire R S r c C L r = 0.07 Ω/q, c = 0.04 ff/µm 2 (Tungsten wire)
Clock Skew
Constraints on Skew φ δ φ t φ t r,min + t l,min + t i t φ = t φ + δ R1 data R2 (a) Race between clock and data. φ δ φ φ + T t φ t r,max + t l,max + t i t φ + T = t φ + T + δ R1 data R2 (b) Data should be stable before clock pulse is applied.
Clock Constraints in Edge-Triggered Logic T δ t rmin, t rmax, + t i + + t + i t l, min t l, max δ Maximum Clock Skew Determined by Minimum Delay between Latches Minimum Clock Period Determined by Maximum Delay between Latches
Positive and Negative Skew φ Data CL R CL R CL R (a) Positive skew φ Data CL R CL R CL R (b) Negative skew
Howskewcanhelp
Clock Skew in Master-Slave Two Phase Design φ 1 φ 2 φ 1 φ 2 In CL1 CL2 CL3 M1 S1 M2 S2 M3 S3 φ 1
Clock Skew in 2-phase design new data applied to CL2 previous data latched into M2 φ 1 T φ1 φ 2 T φ12 T φ2 T φ21 φ 1 δ δ T φ12 clock period T clock overlap t min > δ -T φ12 t max > T + δ T φ12
How to counter Clock Skew? Negative Skew REG φ REG. REG log Out In REG φ φ Positive Skew φ Clock Distribution Data and Clock Routing
Clock Distribution CLOCK H-Tree Network Observe: Only Relative Skew is Important
Clock Network with Distributed Buffering Local Area Module Module secondary clock drivers Module Module Module Module main clock driver CLOCK Reduces absolute delay, and makes Power-Down easier Sensitive to variations in Buffer Delay
Example: DEC Alpha 21164 Clock Frequency: 300 MHz - 9.3 Million Transistors Total Clock Load: 3.75 nf Power in Clock Distribution network : 20 W (out of 50) Uses Two Level Clock Distribution: Single 6-stage driver at center of chip Secondary buffers drive left and right side clock grid in Metal3 and Metal4 Total driver size: 58 cm!
21164 Clocking t rise = 0.35ns t cycle = 3.3ns Clock waveform final drivers pre-driver Location of clock driver on die t skew = 150ps 2 phase single wire clock, distributed globally 2 distributed driver channels» Reduced RC delay/skew» Improved thermal distribution» 3.75nF clock load» 58 cm final driver width Local inverters for latching Conditional clocks in caches to reduce power More complex race checking Device variation
Clock Drivers
Clock Skew in Alpha Processor
EV6 Clocking t cycle =1.67ns t rise = 0.35ns Global clock waveform t skew =50ps PLL 2 Phase, with multiple conditional buffered clocks» 2.8 nf clock load» 40 cm final driver width Local clocks can be gated off to save power Reduced load/skew Reduced thermal issues Multiple clocks complicate race checking
EV6 Clock Results ps 5 10 15 20 25 30 35 40 45 50 ps 300 305 310 315 320 325 330 335 340 345 GCLK Skew (at Vdd/2 Crossings) GCLK Rise Times (20% to 80% Extrapolated to 0% to 100%)
EV7 Clock Hierarchy Active Skew Management and Multiple Clock Domains NCLK (Mem Ctrl) + widely dispersed drivers DLL DLL DLL + DLLs compensate static and lowfrequency variation + divides design and verification effort L2L_CLK (L2 Cache) GCLK (CPU Core) PLL L2R_CLK (L2 Cache) - DLL design and verification is added work SYSCLK + tailored clocks