EE241 - Spring 2006 Advanced Digital Integrated Circuits Lecture 20: Asynchronous & Synchronization Self-timed and Asynchronous Design Functions of clock in synchronous design 1) Acts as completion signal 2) Ensures the correct ordering of events Truly asynchronous design 1) Completion is ensured by careful timing analysis 2) Ordering of events is implicit in logic Self-timed design 1) Completion ensured by completion signal 2) Ordering imposed by handshaking protocol 2 1
Synchronous Datapath In R1 D Q Logic Block #1 R2 D Q Logic Block #2 R3 D Q Logic Block #3 R4 D Q CLK t pd,reg t pd1 t pd2 t pd3 3 Self-Timed Pipelined Datapath Req Ack HS Req Ack HS Req Ack HS Req Ack Start Done Start Done Start Done In R1 F1 R2 F2 R3 F3 Out t pf1 t pf2 t pf3 4 2
Completion Signal Generation In LOGIC NETWORK Out Start DELAY MODULE Done Using Delay Element (e.g. in memories) 5 Completion Signal Generation Using Redundant Signal Encoding 6 3
Completion Signal in DCVSL V DD V DD Start B0 In1 In1 In2 In2 PDN PDN B1 B0 B1 Done Start 7 Self-Timed Adder Start V DD P 0 C 0 P 1 C 1 P 2 C 2 P 3 C 3 C 4 C 4 Start C 4 V DD Done C 4 C 0 G 0 G 1 G 2 G 3 C 3 C 3 Start C 2 C 2 Start V DD P 0 P 1 P 2 P 3 C 0 C 1 C 2 C 3 C 4 C 4 C 1 C 1 Start (b) Completion signal C 0 K 0 K 1 K 2 K 3 Start (a) Differential carry generation 8 4
Hand-Shaking Protocol Req Ack Req 2 SENDER Data RECEIVER Ack 3 (a) Sender-receiver configuration Data 1 1 Two Phase Handshake cycle 1 cycle 2 Sender s action Receiver s action (b) Timing diagram 9 Event Logic The Muller-C Element A B C F A B F n+1 0 0 1 1 0 1 0 1 0 F n F n 1 (a) Schematic (b) Truth table V DD V DD V DD A B S R Q F A B B B F A F (a) Logic A B B (b) Majority Function (c) Dynamic 10 5
2-Phase Handshake Protocol Sender logic Data ready Data Receiver logic Data accepted C Req Ack Handshake logic Advantage : FAST - minimal # of signaling events (important for global interconnect) Disadvantage : edge - sensitive, has state 11 Example: Self-timed FIFO In R1 R2 R3 Out En Done Req i C C C Req 0 Ack i Ack o All 1s or 0s -> pipeline empty Alternating 1s and 0s -> pipeline full 12 6
2-Phase Protocol 13 Example From [Horowitz] 14 7
Example 15 Example 16 8
Example 17 4-Phase Handshake Protocol Req 2 4 Sender s action Receiver s action Ack 3 5 Data 1 1 Cycle 1 Cycle 2 Also known as RTZ Slower, but unambiguous 18 9
4-Phase Handshake Protocol Implementation using Muller-C elements Sender logic Data Receiver logic Data ready Data accepted C C S Req Ack Handshake logic 19 Self-Resetting Logic Precharged Logic Block (L1) completion detection (L1) Precharged Logic Block (L2) completion detection (L2) Precharged Logic Block (L3) completion detection (L3) V DD int out Post-charge logic A B C 20 10
Asynchronous-Synchronous Interface Asynchronous system f in Synchronous system f CLK Synchronization 21 Synchronizers and Arbiters Arbiter: Circuit to decide which of 2 events occurred first Synchronizer: Arbiter with clock φ as one of the inputs Problem: Circuit HAS to make a decision in limited time - which decision is not important Caveat: It is impossible to ensure correct operation But, we can decrease the error probability at the expense of delay 22 11
A Simple Synchronizer CLK D int I 1 Q CLK I 2 Data sampled on rising edge of the clock Latch will eventually resolve the signal value, but... this might take infinite time! 23 Synchronizer: Output Trajectories 2.0 V out 1.0 0.0 0 100 200 300 time [ps] Single-pole model for a flip-flop 24 12
Mean Time to Failure 25 Example T f = 10 nsec = T T signal = 50 nsec t r = 1 nsec t = 310 psec V IH - V IL = 1 V (V DD = 5 V) N(T) = 3.9 10-9 errors/sec MTF (T) = 2.6 10 8 sec = 8.3 years MTF (0) = 2.5 μsec 26 13
Influence of Noise p(v) Uniform distribution around VM T logarithmic reduction 0 V IL V IH Initial Distribution Still Uniform Low amplitude noise does not influence synchronization behavior 27 Typical Synchronizers 2 phase clocking circuit Q φ2 φ1 Q φ2 φ1 Using delay line 28 14
Cascaded Synchronizers Reduce MTF In O 1 O 2 Out Sync Sync Sync φ 29 Arbiters Req1 Req2 Arbiter Ack1 Ack2 Req1 A B Ack2 (a) Schematic symbol Req2 Ack1 Req1 (b) Implementation Req2 V A T gap B metastable Ack1 t (c) Timing diagram 30 15
PLL-Based Synchronization Digital System Chip 1 Data Chip 2 Digital System f system = N x f crystal PLL Divider reference clock PLL Clock Buffer Crystal Oscillator f crystal, 200<Mhz 31 Clock Distribution Goal: Minimization of uncertainty Clock skew (spatial uncertainty) Systematic Clock jitter (temporal uncertainty) Random cycle-to-cycle changes 32 16
Reading Chapter 13, (Chandrakasan et al), Clock Distribution by Bailey Chapter 12, (Chandrakasan et al), PLLs and DLLs by Maneatis Chapter 10, Rabaey et al. 33 Clock Distribution Tree Common, e.g. IBM S/390 Clock grid» DEC Alpha Length-matched Serpentines» Intel P6 34 17
Clock Distribution CLOCK Example: PowerPC 603 Gerosa, JSSC 12/94 H-Tree Network Observe: Only Relative Skew is Important 35 Clock Network with Distributed Buffering Local Area Module Module secondary clock drivers Module Module Module Module main clock driver CLOCK Reduces absolute delay, and makes Power-Down easier Sensitive to variations in Buffer Delay 36 18
Predriver Binary tree H - tree X - tree Arbitrary matched tree 37 Example IBM S/390 Clock skew Webb, JSSC 11/97 38 19
Clock Tree Delays Restle, VLSI 98 39 Impact of clock network sizing 40 20
Impact of clock network sizing 41 Final Stage: Tree vs. Grid RC-matched Tree Grid Courtesy of IEEE Press, New York. 2000 42 21
IR Emission Images Central buffer Clock repeaters Sector buffers Local clocks Sanda, ISSCC 99 43 DEC Alpha Evolution Clock driver placements 21064 21164 21164 Gronowski, JSSC 5/98 44 22
Clock Skews 21064 21164 21264 45 Example: DEC Alpha 21164 Clock Drivers 46 23
Clock Skew in Alpha Processor 47 Hybrid Grid DEC Alpha 21264 Bailey JSSC 11/98 48 24
Alpha 21264 49 Alpha 21264 Grids Global clock Major clock grids 50 25
Data-Dependent Gate Loading 51 Multi-GHz Clock Networks Phillip Restle, IBM Research IEEE SSCTC Workshop on Design for Multi-GigaHertz Processors, San Fransico, Feb. 7, 2000 http://www.research.ibm.com/people/r/restle/mghz.html http://www.research.ibm.com/people/r/restle/animations/dac01top.html 52 26
Clock Generation Delay-Locked Loop (Delay Line Based) f REF Phase Det U D Charge Pump Filter DL f O Phase-Locked Loop (VCO-Based) f REF U N PD D CP VCO Filter f O 53 Phase-Locked Loop Based Clock Generator Up Down Reference clock Phase detector Up Charge pump Loop filter V contr VCO Local clock Down Clock decode & buffer Divide by N φ 1 φ 2... Acts also as Clock Multiplier 54 27
Loop Components Phase Comparator Produces UP/DN pulses corresponding to phase difference Charge Pump Sources/sinks current for duration of UP/DN pulses Loop Filter Integrates current to produce control voltage Voltage-Controlled Delay Line Changes delay proportionally to voltage Voltage-Controlled Oscillator Generates frequency proportional to control voltage 55 PLL Jitter 56 28
DLL Locking Courtesy of IEEE Press, New York. 2000 57 Clock Deskewing Two clock spines, two DLLs, and a PD that controls them Geannopoulos, ISSCC 98 58 29
Clock Ring Clocks routed in parallel, opposite directions LCG aligns to the middle Shibayama, ISSCC 98 59 Synchronous Distributed Oscillators VCOs # of nearest neighbors Mizuno, ISSCC 98 60 30
Distributed PLLs Gutnik, ISSCC 2000 61 Intel Itanium TM Rusu, ISSCC 2000 62 31
Intel Itanium TM 63 32