ECE152B TC 1 Issues on Timing and Clocking X Combinational Logic Z... clock clock clock period ECE152B TC 2
Latch and Flip-Flop L CK CK 1 L1 1 L2 2 CK CK CK ECE152B TC 3 Clocking X Combinational Logic... clock Z For correct operation of a synchronous circuit: The clock period must be longer than the delay of the longest path in the combinational logic. The width of the clock pulse must be long enough to allow the flip-flops to change state. clock clock period ECE152B TC 4
Flip-Flop Setup and Hold Times Flip-flop setup time T su : the required time the data input signal value must be held stable prior to the arrival of clock pulse. Input 5% 5% date clk 5% Input clk Setup time Hold time Flip-flop hold time T h : the required time the data input signal value must be held stable after the arrival of clock pulse. ECE152B TC 5 Flip-flip Timing Parameters Clk Clk T t hold t su t c-q elays can be different for rising and falling data transitions ECE152B TC 6
Example: For the circuit shown below, assume the delay through the register (t pd ) is.6 and the delay through each logic block is indicated inside the box. Assume that the registers, which are positive edge- triggered, have a set-up time T su of.4. What is the minimum clock period? logic t pd =5 Clock θ register logic t pd =2 logic t pd =5 logic t pd =2 logic t pd =3 register t θ t θ ECE152B TC 7 A Simple RC Model for Logic Gates A G B equivalent circuit A G R out B a buffer/gate C input ECE152B TC 8
Interconnect Models C 1 C 2 driver C 3 1. Ignoring interconnects 2. Lumped capacitance model 3. RC tree model 4. RLC tree model 5. Transmission line models (RC, LC, RLC) 6. RC / RLC Network model ECE152B TC 9 Interconnect Models as a Capacitor 1. Ignoring Interconnects: 2. Lumped Capacitance model: R d V out R d V out C input C output C 1 + C 2 + C 3 C input C output C 1 + C 2 + C 3 + C wire V OUT 1 1. x V.5 x V V out (t) = V (1 e t/t ) T = Rd (C output +C 1 + C 2 + C 3 ) T 2T time or T = Rd (C output +C 1 + C 2 + C 3 + C wire ) V out -1 (.5V ) = (ln 2) T =.693 T V out (T) =.632 V ECE152B TC 1
Analysis of Simple RC Circuit R i ( t ) + v ( t ) = ( t ) d ( Cv ( t )) i( t ) = = C dt dv ( t ) RC + v ( t ) = v T dt v T dv ( t ) dt ( t ) state variable Input waveform v T (t) ± R C i(t) first-order linear differential equation with constant coefficients v(t) ECE152B TC 11 Analysis of Simple RC Circuit Zero-input response: (natural response) RC 1 v(t) () dv ( t ) + v ( t ) = dt dv(t) 1 = dt RC v N (t) = Ke t RC Step-input response: v v u(t) v (1-e -t/rc )u(t) dv ( t ) RC + v ( t ) = v u ( t ) dt t RC v ( t ) = v u ( t ) v ( t ) = Ke v u ( t ) F + match initial state: v ( ) = K + v u ( t ) = output response for step-input: t RC v ( t ) = v (1 e ) u ( t ) You can get the same result by Laplace Transform ECE152B TC 12
elays of Simple RC Circuit v(t) = v (1 - e -t/rc ) -- waveform under step input v u(t) v(t)=.5v t =.7RC i.e., delay =.7RC (5% delay) v(t)=.1v t =.1RC v(t)=.9v t = 2.3RC i.e., rise time = 2.2RC Rise time (Fall time): time for a waveform to rise from 1% to 9%(9% to 1%) of its steady state value V OL V OH V OH V OL Rise time Fall time ECE152B TC 13 Interconnect Models as a Tree 3. RC tree model 4. RLC tree model 5. Transmission line models (RC, LC, RLC) C 2 driver C 3 L-type π-type T-type or transmission line ECE152B TC 14
etermining Which Model to Use Some Rule-of-Thumbs: Need to consider C: if interconnect C is comparable to C of gates driven Need to consider R: if interconnect R is comparable to R of di driver Need to consider L: if ωl is comparable to R of interconnect ECE152B TC 15 Model Interconnects as RC Trees Each wire maybe segmented into several edges Each edge E modeled as a π-type or L-type circuit r E = unit res. length(e) c E = unit cap. length(e) ECE152B TC 16
Interconnect elay: Putting All Models Together G1 R in G2 G1 G2 + - R out C in C L R= R out +R int V in V in V out v v (1-e -t/rc ) C=C in +C L V out time Rise/fall time 2.2 RC => elay is proportional to the loading capacitance C L riving more gates result in longer rise/fall time Longer interconnects (larger R int and C in ) also result in longer rise/fall time ECE152B TC 17 Clock Tree If the # of flip-flops driven by the clock line is large, the clock rise time (also called slew rate) will be unacceptably long. Solution: Using clock power-up tree (adding buffers into the clock tree)............... ECE152B TC 18
Solution: Adding Buffers and Limiting Number of Their Fanouts Limiting the number of fanouts of each buffer to N Create a buffer tree to drive all flip-flops p while satisfying the constraint of fanout count The load seen by the clock source is significantly reduced The same idea can be used to reduce the delay of logic signals which drive a large number of gates and are on timing-critical paths ECE152B TC 19 An Example Assume 64 flip-flips to be driven by a single clock source Buffer delay (with zero load):.2 ns Interconnect delay:.2 ns with a single fanout (either to a flip-flop or to a buffer) Addition.1ns delay for each additional fanout Clk source......... Total delay from clock source to clock ports of flip-flops: 6.9 ns.2ns +.2ns + (.2ns + 63 x.1ns) = 6.9ns ECE152B TC 2
Example Clock Tree Assume each buffer has four fanouts Clk Source............... Total delay from clock source to clock ports of flip-flops: 2.3ns.2ns { th -level wire delay} +.2ns {1 st -level buffer delay} + (.2ns + 3 x.1ns) {1 th -level wire delay} +.2ns {2 nd -level buffer delay} + (.2ns + 3 x.1ns) {2 nd -level wire delay} +.2ns {3 rd -level buffer delay} + (.2ns + 3 x.1ns) {3 rd -level wire delay} ECE152B TC 21 Techniques for Improving Speed 1. Keep the logic gate depth shallow between flip-flops. 2. Avoid circuit designs that have highly loaded gates in the critical path. A gate delay will increase as the capacitive load is increased on the output of the gate. The primary sources of load capacitance are routing capacitance and the input capacitance of the driven gates. 3. uplicate logic to reduce fanouts (similar idea to clock tree buffering). 4. Avoid long interconnects 5. Gate sizing ECE152B TC 22
Gate Sizing: Making The riving Gate Larger (or Smaller) Larger the driving gate => Greater the driving current (so sharper V in ), but larger C input too (which slows down the signal coming into G1) G1 R G2 V in in B A B + R out G1 G2 - C in C L G1 v v (1-e -t/rc ) A R out B C input time ECE152B TC 23 Static Timing Analysis 11 Timing Specs REPORTS (courtesy P. Joshi, IBM) ECE152B TC 24
Static Timing Analysis PI1 1 4 6 5 PO1 Netlist with delay for each gate PI2 PI3 3 1 6 4 4 6 5 7 4 PO2 PO3 Arrival ltimes 1 7 13 18 PI1 1 4 6 5 PI2 PI3 3 3 1 1 9 6 4 4 7 7 6 15 5 14 722 418 PO1 PO2 PO3 ECE152B TC 25 Static Timing Analysis /4 1/5 7/9 13/15 18/22 PI1 1 4 6 5 PO1 arrival time/required time PI2 / 3 3/3 9/9 6 15/15 6 22/22 7 PO2 PI3 /8 1 1/9 4 4 7/15 7/13 5 14/18 18/22 4 PO3 slack = required time - arrival time 4 4 2 2 4 PI1 1 4 6 5 PI2 PI3 8 3 1 8 6 4 4 8 6 6 5 4 7 44 PO1 PO2 PO3 ECE152B TC 26
Timing Analysis with Interconnect elay 22 L A T C H 3 2 1 1 5 5 5 19 2 4 4 4 2 1 3 2 1 L A T C H ECE152B TC 27 Clock Non-idealities Clock skew Spatial variation in temporally equivalent clock edges; deterministic + random, t SK Clock jitter Temporal variations in consecutive edges of the clock signal; modulation + random noise Cycle-to-cycle y (short-term) t JS Long term t JL Variation of the pulse width Important for level sensitive clocking ECE152B TC 28
Clock Skew The delays from the clock source to the clock inputs of different flip-flops are different CLOCK RIVER A B ECE152B TC 29 Clock Skew and Jitter Clk t SK Clk t JS Both skew and jitter affect the effective cycle time Only skew affects the race margin ECE152B TC 3
Clock Uncertainties Sources of Skew and Jitter evices 2 4 Power Supply 3 Interconnect t 6 Capacitive Load 1 Clock Generation 5 Temperature 7 Coupling to Adjacent Lines Sources of clock uncertainty ECE152B TC 31 The clock skew problem: the race problem 1 1 2 2 1 2 1ns ns 2ns Correct operation: 1 -> 1, 2 -> 2 ue to clock skew: 1 -> 2 (error!!) Minimizing clock skew: istribute the clock signal in such a way that the interconnections from the clock source to the s clock inputs are of equal length. ECE152B TC 32
ECE152B TC 33 ECE152B TC 34
Positive and Negative Skew In R1 Combinational Logic R2 Combinational Logic R3 t 1 t 2 t 3 delay (a) Positive skew delay In R1 Combinational Logic R2 Combinational Logic R3 t 1 t 2 t 3 delay delay (b) Negative skew ECE152B TC 35 Positive Skew In R1 Combinational Logic R2 Combinational Logic R3 t 1 t 2 t 3 delay delay (a) Positive skew In 1 T + δ R1 R2 Combinational T 1 Logic 3 δ t 1 t 2 Combinational Logic R3 t 3 2 delay delay 2 4 (b) Negative skew δ + t h Launching edge arrives before the receiving edge ECE152B TC 36
Logic Co b at o a Logic t 1 delay t 2 Negative Skew delay t 3 (a) Positive skew In R1 Combinational Logic R2 Combinational Logic R3 t 1 t 2 t 3 delay delay (b) Negative skew T + δ 1 T 1 3 2 2 4 δ Receiving edge arrives before the launching edge ECE152B TC 37 Useful Clock Skew Clock skew is not always bad!! Example: 3 A B 1 t θ t θ Assume the propagation p delay (clock to delay) t c q =.6, the setup time t su =.4, the hold time t hd =.5 If t θ -t θ = (no clock skew), minimum clock period = 11 ECE152B TC 38
T+δ If t θ -t θ =δ=1 t θ A 3 B 1 t θ t θ t θ 1 >=11 For proper p operation, the time between positive edges at registers A and B must be greater than or equal to 11 clock period + clock skew >= 11 minimum clock period = 1 ECE152B TC 39 t θ A 3 1 B t θ t θ ata output of A ata input of B.6 3.6 t θ < (3.6.5) On the other hand, the clock skew cannot exceed 3.1 ns. Otherwise the data latched into register A may propagate through the short path and reach the data input of register B before the rising edge of the clock pulse of the same cycle reaching θ. ECE152B TC 4
A If t θ -t θ =1 Effect of Negative Clock Skew 3 1 B t θ t θ t θ t θ T-(t θ -t θ ) 1 >=11 For proper operation, the time between positive edges at registers A and B must be greater than or equal to 11 If we define: t θ -t θ =δ, δ would be negative for negative clock skew. For this example, δ = -1; T + δ >= 11, thus, the minimum clock period T min = 12 ECE152B TC 41 Effect of Negative Clock Skew 3 A 1 B t θ t θ t θ t θ Register B will never get the chance to latch the wrong data no matter how large the negative clock skew is. Race problem is not a concern for negative clock skew ECE152B TC 42
Summary Clock Period: T Longest delay from Reg. A to Reg. B: L A B Shortest delay from Reg. A to Reg. B: S A B propagation delay, setup time, hold time: t c q,t su, t hold 1. T + δ (t c q + L A B + t su ) 2. δ (t c q + S A B -t hold ) Where δ=(t θ - t θ ) Requirement (1) have to be satisfied for datapath th between any pair of registers where δ would be either positive or negative Requirement (2) have to be satisfied for datapath between any pair of registers with a positive clock skew ECE152B TC 43 Example: Assume t c q is.6, t su is.4 and t hold is.5. logic t pd =5 Clock θ logic t pd =2 logic t pd =2 register register logic t pd =5 logic t pd =3 t θ t θ (a) etermine the minimum clock period assuming a positive clock skew: δ = (t θ - t θ ) = 1. (b) Repeat part(a), factoring in a positive clock skew: δ = 3. (c) Repeat part(a), factoring in a negative clock skew:δ = -2. (d) erive the maximum positive clock skew (i.e. t θ > t θ ) that can be tolerated before the circuit fails. (e) erive the maximum negative clock skew (i.e. t θ < t θ ) that can be tolerated before the circuit fails. ECE152B TC 44
Impact of Jitter T 2 5 1 3 4 -t ji tte r t jitter 6 In REGS t c-q, t c-q, cd t su, t hold t jitter Combinational Logic t logic t logic, cd ECE152B TC 45 Be Very Careful About Gated Clock Controller A clk B clk A B An undesirable glitch or spike will result & cause an additional trigger of the clock. ECE152B TC 46
An alternative design: Controller 1 clk Controller A clk B This design causes less timing problem but consume more power. ECE152B TC 47 Clock Gating Typically 4-5% of active power is in the IC clock trees Clock gating allows some of this power eliminated in active modes Root and branch clock gating can be significant Resumption of clocking is very fast So clock-gated modules can return to run mode without loss of services ECE152B TC 48
Fine-grained Clock Gating always @(posedge ) begin if (EN) <= ; end RTL Synthesis EN Low-Power RTL Synthesis EN CG cell ECE152B TC 49 Clock Gating Synthesis always @(posedge ) begin q <= ; q1 <= 1; en <= SEL; end assign Out = en? q:q1; RTL Synthesis or Low-Power RTL Synthesis SEL 1 en q q1 Out SEL en always @(posedge clk) begin q if (sel == 1 b1) Low-Power RTL Synthesis q <= d; CG q1 Out if (sel == 1 b) 1 q1 <= d1; en <= sel; end CG assign out = en? q:q1; ECE152B TC Ref: Calypto 5
Sequential Clock Gating din_1 f_1 f_2 Original RTL vld_1 dout din_2 g_1 g_2 vld_2 din_1 vld_1 f_1 CG CG f_2 Power Optimized RTL dout g_1 g_2 din_2 CG CG vld_2 Combinational Analysis Sequential Analysis ECE152B TC Source: Calypto 51 CG Advanced Clock Gating (Example) din vld Sequential Analysis d_1 d_2 vld_1 vld_2 Combinational Analysis dout din Sequential Clock Gating d_1 d_2 vld CG CG CG dout Combinational Clock Gating vld_1 vld_2 ECE152B TC Source: Calypto 52 52
Improving Clock Gating Efficiency Clock Gating Efficiency = average percentage of time each register is gated for a given testbench 1% uration Clock Gated Registers/uration ti Original esign Clock Gating Efficiency Original esign Block 1 Block 2 38.8% 29% R Registers R n 1% Clock Gated Registers/uration Optimized esign uration Clock Gating Efficiency After Identifying more gating opportunities Block 1 Block 2 54.9% 47% R Registers R n Source: Calypto ECE152B TC 53 ealing with Asynchronous Inputs: Synchronizer Asynchronous signals incoming to a synchronous system must be synchronized with the rest of the system ASYNCHRONOUS INPUT SYNCHRONIZER SYNCHRONIZE SIGNAL METASTABLE STATE POSSIBLE OUTPUTS ECE152B TC 54
A multiple-stage synchronizer reduces the chance of synchronization failure. Outside SYSTEM SYSTEM CLOCK ECE152B TC 55 A Timing Optimization Technique - Pipelining Pipelining: a technique to break up a timingcritical data path into a series of small data paths by placing registers between sections. input F() output input Fa() Fb() Fc() output ECE152B TC 56
Another Timing Optimization Technique - Retiming Retiming: a technique to transform a given synchronous circuit into a faster circuit. An example: A digital correlator: The correlator takes a stream of bits x,x 1,... x k as input & compares it with a fixed-length pattern a,a 1,..., a k. After receiving each input x i, the correlator produces as output the number of matches. I.e. k y = i σ ( x i j, a j ) j = 1 if x = y; where σ (x, y) = otherwise ECE152B TC 57 An implementation for k=3 y i + + + x i δ δ δ δ a a 1 a 2 a 3 : register σ (P, ) = 1 P ; P = P+ + P : adder P δ P : comparator ECE152B TC 58
x i Original design Suppose each adder has a propagation delay of 7 ns. & each comparator 3ns. The longest propagatim delay is 24ns The clock period must be >= 24ns. A better design: y i x i y i + + + B δ δ δ δ a A a 1 a 2 a 3 + + + B δ δ δ δ a A a 1 a 2 a 3 ECE152B TC 59 These two designs are functionally equivalent: all input signals to the box portion arrive one clock tick earlier. thus, the boxed portion performs the same sequence of computation as the first design, but one clock tick earlier. Since the output from the boxed portion is delayed one clock tick by the new register at B, the remainder of the circuit sees the same behavior as in the 1st design. The longest propagation delay is reduced to 17 nsec. The elements in the boxed portion lead by one clock tick. Retiming - the technique of inserting & deleting registers to speed the design while preserve the function. ECE152B TC 6
Retiming Transformation a b c d x a x b d ECE152B TC 61 Example: A 5 4 B 2 C A 5 B 4 2 C Comparison: pipelining A 5 A 4 B 2 C ECE152B TC 62
Retiming Edge label: # of registers V 7 V 6 V 5 7 7 7 1 1 1 1 3 3 3 3 V 1 V 2 V 3 V 4 ECE152B TC 63 Retiming Edge label: # of registers V 7 V 6 V 5 7 1 7 1 7 1 1 1 1 1 3 1 3 1 3 1 3 V 1 V 2 V 3 V 4 ECE152B TC 64