ECE680: Physical VLSI Design Chapter VII Timing Issues in Digital Circuits (chapter 10 in textbook) GMU, ECE 680 Physical VLSI Design 1
Synchronous Timing (Fig. 10 1) CLK In R Combinational 1 R Logic 2 C in C out Out T > t c q + t logic + t su t hold < t c q, cd + t logic, cd hold c q, cd logic, cd GMU, ECE 680 Physical VLSI Design 2
Timing Definitions Latch Parameters D Q Clk Clk D PW m t hold T t su Q t c-q t d-q Delays can be different for rising and falling data transitions GMU, ECE 680 Physical VLSI Design 3
Register Parameters D Q Clk Clk T D t hold t su Q t c-q Delays can be different for rising and falling data transitions GMU, ECE 680 Physical VLSI Design 4
Clock Uncertainties Devices 4 Power Supply 3 Interconnect 2 6 Capacitive Load 1 Clock Generation 5 Temperature 7 Coupling to Adjacent Lines Sources of clock uncertainty GMU, ECE 680 Physical VLSI Design 5
Clock Nonidealities Clock skew Spatial variation in temporally equivalent clock edges; deterministic + random, t SK Clock jitter Temporal variations in consecutive edges of the clock signal; modulation + random noise Cycle to cycle (short term) t JS Long term t JL Variation of the pulse width Important for level sensitive clocking GMU, ECE 680 Physical VLSI Design 6
Clock Skew and Jitter Clk t SK Clk t JS Both skew and jitter affect the effective cycle time Only skew affects the race margin T clk + δ 2t jitter > t c q + t logic + t su δ + 2t jitter + t hold < t c q, cd + t logic, cd Page 501 GMU, ECE 680 Physical VLSI Design 7
Clock Skew # of registers Earliest occurrence of Clk edge Nominal /2 Latest occurrence of Clk edge Nominal + /2 Insertion delay Max Clk skew Clk delay GMU, ECE 680 Physical VLSI Design 8
Positive and Negative Skew In R1 R2 R3 Combinational Combinational D Q D Q Logic D Q Logic CLK t CLK1 t CLK2 t CLK3 delay (a) Positive skew delay In R1 D Q Combinational Logic R2 D Q Combinational Logic R3 D Q t CLK1 t CLK2 t CLK3 delay delay CLK (b) Negative skew GMU, ECE 680 Physical VLSI Design 9
Positive Skew T CLK CLK1 T CLK 1 3 CLK2 2 4 t h Launching edge arrives before the receiving edge GMU, ECE 680 Physical VLSI Design 10
Negative Skew T CLK + CLK1 T CLK 1 3 CLK2 2 4 Receiving edge arrives before the launching edge GMU, ECE 680 Physical VLSI Design 11
Timing Constraints In R1 D Q Combinational Logic R2 D Q CLK t CLK1 t CLK2 t c q t logic t c q, cd t su, t hold t logic, cd Minimum cycle time: T = t c q + t su + t logic Worst case is when receiving edge arrives early (positive ) GMU, ECE 680 Physical VLSI Design 12
Timing Constraints In D R1 Q Combinational Logic R2 D Q CLK t CLK1 t CLK2 t c q t c q, cd t logic t logic, cd t su, t hold Hold time constraint: t (c q, cd) + t (logic, cd) > t hold + Worst case is when receiving edge arrives late Race between data and clock GMU, ECE 680 Physical VLSI Design 13
Impact of Jitter T CLK CLK -t ji tte r t jitter t ji tte r In REGS CLK t c-q, t c-q, cd t su, t hold t jitter Combinational Logic t t logic log ic, cd GMU, ECE 680 Physical VLSI Design 14
Longest Logic Path in Edge Triggered Systems Clk T Clk-Q T T LM T SU T JI + Latest point of launching Earliest arrival of next cycle GMU, ECE 680 Physical VLSI Design 15
Clock Constraints in Edge Triggered Systems If launching edge is late and receiving edge is early, the data will not be too late if: T c-q + T LM + T SU < T T JI,1 T JI,2 - Minimum cycle time is determined by the maximum delays through the logic T c-q + T LM + T SU + + 2 T JI < T Skew can be either positive or negative GMU, ECE 680 Physical VLSI Design 16
Shortest Path Earliest point of launching Clk T Clk-Q T Lm Clk T H Nominal clock edge Data must not arrive before this time Example 10.1 GMU, ECE 680 Physical VLSI Design 17
Clock Constraints in Edge Triggered Systems If launching edge is early and receiving edge is late: T c-q + T LM T JI,1 < T H + T JI,2 + Minimum logic delay T c-q + T LM < T H + 2T JI + T clk + δ 2t jitter > t c q + t logic + t su δ + 2t jitter + t hold < t c q, cd + t logic, cd GMU, ECE 680 Physical VLSI Design 18
How to counter Clock Skew? Negative Skew REG RE EG. RE EG log Out In REG Positive Skew Clock Distribution Data and Clock Routing GMU, ECE 680 Physical VLSI Design 19
Flip Flop Flop Based Timing Logic delay Skew Flip-flop delay Flip -flop Logic T SU T Clk-Q Representation after M. Horowitz, VLSI Circuits i 1996. T clk + δ 2t jitter > t c q + t logic + t su δ + 2t jitter + t hold < t c q, cd + t logic, cd GMU, ECE 680 Physical VLSI Design 20
Flip Flops Flops and Dynamic Logic Logic delay T SU T SU T Clk-Q T Clk-Q Precharge Evaluate Logic delay Evaluate Precharge Flip flops p are used only with static logic GMU, ECE 680 Physical VLSI Design 21
Latch timing t D-Q When data arrives to transparent latch D Q Latch is a soft barrier Clk t Clk-Q When data arrives to closed latch Data has to be re-launched GMU, ECE 680 Physical VLSI Design 22
Single Phase Clock with Latches Latch Logic T skl T skl T skt T skt Clk PW P GMU, ECE 680 Physical VLSI Design 23
Latch Based Design L1 latch is transparent L2 latch is transparent when =0 when =1 L1 Lth Latch Logic L2 Lth Latch Logic GMU, ECE 680 Physical VLSI Design 24
Slack borrowing In L1 L2 L1 B D Q CLB_A CLB_B D Q D Q a b c d e t pd,a t pd,b CLK1 CLK2 CLK1 T CLK CLK1 CLK2 slack passed to next stage t pd,a t DQ t pd,b t DQ a valid b valid c valid e valid d valid GMU, ECE 680 Physical VLSI Design 25
Latch Based Timing Static logic Skew L1 Latch Logic L2 Latch L2 latch L1 latch Logic Long path Can tolerate skew! Short path GMU, ECE 680 Physical VLSI Design 26
Clock Distribution H-tree CLK Clock is distributed in a tree-like fashion GMU, ECE 680 Physical VLSI Design 27
More realistic H tree [Restle98] GMU, ECE 680 Physical VLSI Design 28
The Grid System Driver GCLK GCLK Driver Driver GCLK No rc-matching Large power Driver GCLK GMU, ECE 680 Physical VLSI Design 29
Example: DEC Alpha 21164 Clock Frequency: 300 MHz - 9.3 Million Transistors Total Clock Load: 3.75 nf Power in Clock Distribution network : 20 W (out of 50) Uses Two Level Clock Distribution: Single 6-stage driver at center of chip Secondary buffers drive left and right side clockgridinmetal3andmetal4 in and Metal4 Total driver size: 58 cm! GMU, ECE 680 Physical VLSI Design 30
21164 Clocking t rise = 0.35ns t cycle = 3.3ns Clock waveform final drivers pre driver Location of clock driver on die t skew = 150ps 2 phase single wire clock, distributed ib d globally ll 2 distributed driver channels Reduced RC delay/skew Improved thermal distribution 3.75nF clock load 58 cm final driver width Local inverters for latching Conditional clocks in caches to reduce power More complex race checking Device variation GMU, ECE 680 Physical VLSI Design 31
Clock Drivers GMU, ECE 680 Physical VLSI Design 32
Clock Skew in Alpha Processor GMU, ECE 680 Physical VLSI Design 33
EV6 (Alpha 21264) Clocking 600 MHz 0.35 micron CMOS t cycle = 1.67ns t rise = 0.35ns Global clock waveform PLL t skew = 50ps 2 Phase, with multiple conditional buffered clocks 2.8 nf clock load 40 cm final driver width Local clocks can be gated off to save power Reduced load/skew Reduced thermal issues Multiple clocks complicate race checking GMU, ECE 680 Physical VLSI Design 34
21264 Clocking GMU, ECE 680 Physical VLSI Design 35
EV6 Clock Results ps ps 5 300 10 305 15 310 20 315 25 320 30 325 35 330 40 335 45 340 50 345 GCLK Skew GCLK Rise Times (at Vdd/2 Crossings) (20% to 80% Extrapolated to 0% to 100%) GMU, ECE 680 Physical VLSI Design 36
EV7 Clock Hierarchy Active Skew Management and Multiple Clock Domains DLL NCLK (Mem Ctrl) DLL DLL + widely dispersed drivers + DLLs compensate static and low frequency variation + divides design and verification effort L2L CLK (L2 Cache) GCLK (CPU Core) PLL L2R CLK (L2 Cache) - DLL design and verification is added work + tailored clocks SYSCLK 37
Self-timed and Asynchronous Design Functions of clock in synchronous design 1) Acts as completion signal 2) Ensures the correct ordering of events Truly asynchronous design 1) Completion is ensured by careful timing analysis 2) Ordering of events is implicit in logic Self timed design 1) Completion ensured by completion signal 2) Ordering imposed by handshaking protocol GMU, ECE 680 Physical VLSI Design 38
Synchronous Pipelined Datapath In R1 D Q Logic Block #1 R2 D Q Logic Block #2 R3 D Q Logic Block #3 R4 D Q CLK t pd,reg t pd1 t pd2 t pd3 Advantages: disadvantages: Easy to integrate Frequency assume the worst case of logic GMU, ECE 680 Physical VLSI Design 39
Self Timed Pipelined Datapath Req Req Req Req Ack HS Ack HS Ack HS ACK Startt Done Startt Done Startt Done In R1 F1 R2 F2 R3 F3 Out t pf1 t pf2 t pf3 GMU, ECE 680 Physical VLSI Design 40
Completion Signal Generation In LOGIC NETWORK Out Start DELAY MODULE Done Using Delay Element (e.g. in memories) GMU, ECE 680 Physical VLSI Design 41
Completion Signal Generation Using Redundant Signal Encoding GMU, ECE 680 Physical VLSI Design 42
Completion Signal in DCVSL V DD V DD Start B0 B1 Done B0 B1 In1 In1 In2 In2 PDN PDN Start GMU, ECE 680 Physical VLSI Design 43
Self Timed Adder Start V DD P P P P P 0 P 1 P 2 P 3 Start V DD Done C 0 C 0 C 1 C 2 C 3 C 4 C 4 C 4 G 0 G 1 G 2 G 3 C 3 C 4 C 3 Start Startt P 0 P 1 V DD P 2 P 3 C 2 C 2 C 1 Start C 0 C 1 C 2 C 3 C 4 (b) Completion signal C 4 C 1 C 0 K 0 K 1 K 2 K 3 Start (a) Differential carry generation GMU, ECE 680 Physical VLSI Design 44
Completion Signal Using Current Sensing Inputs Start Register Input V DD Static CMOS Logic GND sense Current Sensor A Output Start A B t delay t overlap Min Delay Generator B Done Done Output t MDG t pd-nor valid GMU, ECE 680 Physical VLSI Design 45
Hand Shaking Protocol Req Ack Req 2 SENDER Data RECEIVER Ack 3 (a) Sender-receiver configuration Data 1 1 Two Phase Handshake cycle 1 cycle 2 Sender s action Receiver s action (b) Timing i diagram GMU, ECE 680 Physical VLSI Design 46
Event Logic The Muller C Element A B A B F n 1 0 0 0 C F 0 1 F n 1 0 F n 1 1 1 (a) Schematic (b) Truth table V DD V DD V DD A B S R Q F A B B B F A F (a) Logic A B B (b) Majority Function (c) Dynamic GMU, ECE 680 Physical VLSI Design 47
2-Phase Handshake Protocol Sender logic Data ready Data Receiverer logic Data accepted C Req Ack Handshake logic Advantage : FAST - minimal # of signaling events (important for global interconnect) Disadvantage : edge - sensitive, has state GMU, ECE 680 Physical VLSI Design 48
Example: Self-timed FIFO In R1 R2 R3 Out En Done Req i C C C Req 0 Ack i Ack o All 1s or 0s -> pipeline empty Alternating 1s and 0s -> pipeline full GMU, ECE 680 Physical VLSI Design 49
2 Phase Protocol GMU, ECE 680 Physical VLSI Design 50
Example From [Horowitz] GMU, ECE 680 Physical VLSI Design 51
Example GMU, ECE 680 Physical VLSI Design 52
Example GMU, ECE 680 Physical VLSI Design 53
Example GMU, ECE 680 Physical VLSI Design 54
4-Phase Handshake Protocol Req 2 4 Sender s action Receiver s action Ack 3 5 Data 1 1 Cycle 1 Cycle 2 Also known as RTZ Slower, but unambiguous GMU, ECE 680 Physical VLSI Design 55
4-Phase Handshake Protocol Implementation using Muller C elements Sender logic Data Receiver logic Data ready Data accepted C C S Req Ack Handshake logic GMU, ECE 680 Physical VLSI Design 56
Self Resetting Logic completion detection (L1) completion detection (L2) Precharged Precharged Precharged Logic Block Logic Block Logic Block (L1) (L2) (L3) completion detection (L3) V DD A B C int out Post charge logic GMU, ECE 680 Physical VLSI Design 57
Clock Delayed Domino GND CLK1 CLK2 (to next stage) V DD Q1 (also D2) D1 Pulldown Network GMU, ECE 680 Physical VLSI Design 58
Asynchronous Synchronous y Interface Asynchronous system f in Synchronous system f CLK Synchronization GMU, ECE 680 Physical VLSI Design 59
Synchronizers and Arbiters Arbiter: Circuit to decide which of 2 events occurred first Synchronizer: Arbiter with clock as one of the inputs Problem: Circuit HAS to make a decision in limited time which decision is not important Caveat: It is impossible to ensure correct operation But, we can decrease the error probability at the expense of delay GMU, ECE 680 Physical VLSI Design 60
A Simple Synchronizer CLK D int I 1 Q CLK I 2 Data sampled on rising edge of the clock Latch willeventually resolve the signal value, but... this might take infinite time! GMU, ECE 680 Physical VLSI Design 61
Synchronizer: Output Trajectories 2.0 V out 10 1.0 0.0 0 100 200 300 time [ps] Single-pole l model for a flip-flop GMU, ECE 680 Physical VLSI Design 62
Mean Time to Failure GMU, ECE 680 Physical VLSI Design 63
Example T f = 10 nsec = T T signal = 50 nsec t r = 1 nsec t=310psec V IH - V IL = 1 V (V DD = 5 V) N(T) = 3.9 10-9 errors/sec MTF (T) = 2.6 10 8 sec = 8.3 years MTF (0) = 2.5 sec GMU, ECE 680 Physical VLSI Design 64
Influence of Noise p(v) Uniform distribution around VM T logarithmic reduction 0 V IL V IH Still Uniform Initial Distribution Low amplitude noise does not influence synchronization behavior GMU, ECE 680 Physical VLSI Design 65
Typical Synchronizers 2 phase clocking circuit 2 Q 1 Q 2 1 Using delay line GMU, ECE 680 Physical VLSI Design 66
Cascaded Synchronizers Reduce MTF In O 1 O 2 Out Sync Sync Sync GMU, ECE 680 Physical VLSI Design 67
Arbiters Req1 Req2 Arbiter Ack1 Ack2 Req1 A B Ack2 (a) Schematic symbol Req2 Ack1 Req11 (b) Implementation Req2 A V T gap (c) () Timing diagram B Ack1 metastable t GMU, ECE 680 Physical VLSI Design 68
PLL Based Synchronization Chip 1 Chip 2 Digital System Data Digital System f system = N x f crystal PLL Divider reference clock PLL Clock Buffer Crystal Oscillator f crystal 200<Mhz GMU, ECE 680 Physical VLSI Design 69
PLL Block Diagram Reference clock Phase detector Up Charge pump Loop filter v cont VCO Local clock Down Divide by N System Clock GMU, ECE 680 Physical VLSI Design 70
Phase Detector Output before filtering ref local clock (a) Output ref local clock Output Output (Low pass filtered) (b) V DD Transfer characteristic -180-90 90 180 phase error (deg) (c) GMU, ECE 680 Physical VLSI Design 71
Phase Frequency Detector Rst D Q UP B B A A Rst D Q DN B (a) schematic UP = 0 DN = 1 A UP = 0 DN = 0 B (b) state transition diagram UP = 1 DN = 0 A A A B UP DN B UP DN (c) Timing waveforms GMU, ECE 680 Physical VLSI Design 72
PFD Response to Frequency A B UP DN GMU, ECE 680 Physical VLSI Design 73
PFD Phase Transfer Characteristic Average (UP-DN) V DD 2 2 phase error (deg) GMU, ECE 680 Physical VLSI Design 74
Charge Pump V DD UP To VCO Control Input DN GMU, ECE 680 Physical VLSI Design 75
PLL Simulation Contro ol Voltage (V V) 1.0 0.8 0.6 04 0.4 0.2 ref div vco ref 0.0 0 1 2 3 4 5 Time ( s) div vco GMU, ECE 680 Physical VLSI Design 76
Clock Generation using DLLs Delay Locked Loop (Delay Line Based) f REF Phase Det U D Charge Pump Filter DL f O Phase Locked Loop (VCO Based) f REF U N PD D CP VCO Filter f O GMU, ECE 680 Physical VLSI Design 77
Delay Locked Loop F REF PH Phase detect U D Charge pump C V CTRL VCDL (a) F O REF OUT UP DN (b) PH V CTRL Delay (c) GMU, ECE 680 Physical VLSI Design 78
DLL Based Clock Distribution VCDL 晻 Digital Circuitit CP/LF Phase Detector GLOBAL CLK VCDL 晻 Digital Circuit CP/LF Phase Detector GMU, ECE 680 Physical VLSI Design 79