Issues on Timing and Clocking

Similar documents
EECS 427 Lecture 14: Timing Readings: EECS 427 F09 Lecture Reminders

EE115C Winter 2017 Digital Electronic Circuits. Lecture 19: Timing Analysis

The Linear-Feedback Shift Register

Xarxes de distribució del senyal de. interferència electromagnètica, consum, soroll de conmutació.

GMU, ECE 680 Physical VLSI Design 1

Timing Issues. Digital Integrated Circuits A Design Perspective. Jan M. Rabaey Anantha Chandrakasan Borivoje Nikolić. January 2003

Clock Strategy. VLSI System Design NCKUEE-KJLEE

Lecture 9: Clocking, Clock Skew, Clock Jitter, Clock Distribution and some FM

EE241 - Spring 2007 Advanced Digital Integrated Circuits. Announcements

Lecture 27: Latches. Final presentations May 8, 1-5pm, BWRC Final reports due May 7 Final exam, Monday, May :30pm, 241 Cory

Lecture 9: Sequential Logic Circuits. Reading: CH 7

Integrated Circuits & Systems

CSE241 VLSI Digital Circuits Winter Lecture 07: Timing II

Designing Sequential Logic Circuits

Jin-Fu Li Advanced Reliable Systems (ARES) Lab. Department of Electrical Engineering. Jungli, Taiwan

Clock signal in digital circuit is responsible for synchronizing the transfer to the data between processing elements.

Problem Set 9 Solutions

L4: Sequential Building Blocks (Flip-flops, Latches and Registers)

GMU, ECE 680 Physical VLSI Design

9/18/2008 GMU, ECE 680 Physical VLSI Design

Chapter 8. Low-Power VLSI Design Methodology

Digital Integrated Circuits A Design Perspective

Jan M. Rabaey Anantha Chandrakasan Borivoje Nikolic. November Digital Integrated Circuits 2nd Sequential Circuits

Digital Integrated Circuits A Design Perspective

EEE2135 Digital Logic Design

TAU 2014 Contest Pessimism Removal of Timing Analysis v1.6 December 11 th,

L4: Sequential Building Blocks (Flip-flops, Latches and Registers)

Digital Integrated Circuits A Design Perspective

CHAPTER 9: SEQUENTIAL CIRCUITS

MASSACHUSETTS INSTITUTE OF TECHNOLOGY Department of Electrical Engineering and Computer Sciences

Lecture 5. MOS Inverter: Switching Characteristics and Interconnection Effects

ECEN 248: INTRODUCTION TO DIGITAL SYSTEMS DESIGN. Week 9 Dr. Srinivas Shakkottai Dept. of Electrical and Computer Engineering

For smaller NRE cost For faster time to market For smaller high-volume manufacturing cost For higher performance

Sequential vs. Combinational

TAU 2015 Contest Incremental Timing Analysis and Incremental Common Path Pessimism Removal (CPPR) Contest Education. v1.9 January 19 th, 2015

EE141Microelettronica. CMOS Logic

Time Allowed 3:00 hrs. April, pages

EE241 - Spring 2006 Advanced Digital Integrated Circuits

CMPEN 411 VLSI Digital Circuits Spring 2012 Lecture 17: Dynamic Sequential Circuits And Timing Issues

ESE 570: Digital Integrated Circuits and VLSI Fundamentals

MODULE 5 Chapter 7. Clocked Storage Elements

! Crosstalk. ! Repeaters in Wiring. ! Transmission Lines. " Where transmission lines arise? " Lossless Transmission Line.

VLSI Design Verification and Test Simulation CMPE 646. Specification. Design(netlist) True-value Simulator

King Fahd University of Petroleum and Minerals College of Computer Science and Engineering Computer Engineering Department

Logic Synthesis and Verification

DSP Design Lecture 5. Dr. Fredrik Edman.

EECS 427 Lecture 15: Timing, Latches, and Registers Reading: Chapter 7. EECS 427 F09 Lecture Reminders

Testability. Shaahin Hessabi. Sharif University of Technology. Adapted from the presentation prepared by book authors.

Chapter 7 Sequential Logic

Fundamentals of Computer Systems

Homework 2 due on Wednesday Quiz #2 on Wednesday Midterm project report due next Week (4 pages)

INTEGRATED CIRCUITS. For a complete data sheet, please also download:

ALU, Latches and Flip-Flops

LOGIC CIRCUITS. Basic Experiment and Design of Electronics. Ho Kyung Kim, Ph.D.

Appendix B. Review of Digital Logic. Baback Izadi Division of Engineering Programs

CPE/EE 422/522. Chapter 1 - Review of Logic Design Fundamentals. Dr. Rhonda Kay Gaede UAH. 1.1 Combinational Logic

LOGIC CIRCUITS. Basic Experiment and Design of Electronics

Synchronous 4 Bit Counters; Binary, Direct Reset

VLSI Signal Processing

ECE/Comp Sci 352 Digital Systems Fundamentals. Charles R. Kime Section 2 Fall Logic and Computer Design Fundamentals

Very Large Scale Integration (VLSI)

ESE 570: Digital Integrated Circuits and VLSI Fundamentals

Memory, Latches, & Registers

Adders, subtractors comparators, multipliers and other ALU elements

EE141- Spring 2007 Digital Integrated Circuits

Vidyalankar S.E. Sem. III [CMPN] Digital Logic Design and Analysis Prelim Question Paper Solution

Boolean Logic Continued Prof. James L. Frankel Harvard University

Chapter 5 CMOS Logic Gate Design

Lecture 10: Sequential Networks: Timing and Retiming

Digital Electronics. Part A

NTE4035B Integrated Circuit CMOS, 4 Bit Parallel In/Parallel Out Shift Register

UNIVERSITY OF CALIFORNIA

ECE 407 Computer Aided Design for Electronic Systems. Simulation. Instructor: Maria K. Michael. Overview

ESE 570: Digital Integrated Circuits and VLSI Fundamentals

5. Sequential Logic x Computation Structures Part 1 Digital Circuits. Copyright 2015 MIT EECS

Introduction EE 224: INTRODUCTION TO DIGITAL CIRCUITS & COMPUTER DESIGN. Lecture 6: Sequential Logic 3 Registers & Counters 5/9/2010

C.K. Ken Yang UCLA Courtesy of MAH EE 215B

DM74S373 DM74S374 3-STATE Octal D-Type Transparent Latches and Edge-Triggered Flip-Flops

Stop Watch (System Controller Approach)

Lecture 7: Logic design. Combinational logic circuits

Adders, subtractors comparators, multipliers and other ALU elements

UNISONIC TECHNOLOGIES CO., LTD L16B45 Preliminary CMOS IC

EECS Components and Design Techniques for Digital Systems. FSMs 9/11/2007

Chapter 3. Digital Design and Computer Architecture, 2 nd Edition. David Money Harris and Sarah L. Harris. Chapter 3 <1>

ENGG 1203 Tutorial _03 Laboratory 3 Build a ball counter. Lab 3. Lab 3 Gate Timing. Lab 3 Steps in designing a State Machine. Timing diagram of a DFF

Next, we check the race condition to see if the circuit will work properly. Note that the minimum logic delay is a single sum.

Chapter 3. Chapter 3 :: Topics. Introduction. Sequential Circuits

Physical Design of Digital Integrated Circuits (EN0291 S40) Sherief Reda Division of Engineering, Brown University Fall 2006

Lecture 23. Dealing with Interconnect. Impact of Interconnect Parasitics

Digital Electronics Final Examination. Part A

Synchronous Sequential Circuit Design. Digital Computer Design

Luis Manuel Santana Gallego 31 Investigation and simulation of the clock skew in modern integrated circuits

INTEGRATED CIRCUITS. For a complete data sheet, please also download:

CSE140: Design of Sequential Logic

UMBC. At the system level, DFT includes boundary scan and analog test bus. The DFT techniques discussed focus on improving testability of SAFs.

Models for representing sequential circuits

iretilp : An efficient incremental algorithm for min-period retiming under general delay model

CPE100: Digital Logic Design I

Fault Modeling. 李昆忠 Kuen-Jong Lee. Dept. of Electrical Engineering National Cheng-Kung University Tainan, Taiwan. VLSI Testing Class

Digital Logic Design - Chapter 4

Transcription:

ECE152B TC 1 Issues on Timing and Clocking X Combinational Logic Z... clock clock clock period ECE152B TC 2

Latch and Flip-Flop L CK CK 1 L1 1 L2 2 CK CK CK ECE152B TC 3 Clocking X Combinational Logic... clock Z For correct operation of a synchronous circuit: The clock period must be longer than the delay of the longest path in the combinational logic. The width of the clock pulse must be long enough to allow the flip-flops to change state. clock clock period ECE152B TC 4

Flip-Flop Setup and Hold Times Flip-flop setup time T su : the required time the data input signal value must be held stable prior to the arrival of clock pulse. Input 5% 5% date clk 5% Input clk Setup time Hold time Flip-flop hold time T h : the required time the data input signal value must be held stable after the arrival of clock pulse. ECE152B TC 5 Flip-flip Timing Parameters Clk Clk T t hold t su t c-q elays can be different for rising and falling data transitions ECE152B TC 6

Example: For the circuit shown below, assume the delay through the register (t pd ) is.6 and the delay through each logic block is indicated inside the box. Assume that the registers, which are positive edge- triggered, have a set-up time T su of.4. What is the minimum clock period? logic t pd =5 Clock θ register logic t pd =2 logic t pd =5 logic t pd =2 logic t pd =3 register t θ t θ ECE152B TC 7 A Simple RC Model for Logic Gates A G B equivalent circuit A G R out B a buffer/gate C input ECE152B TC 8

Interconnect Models C 1 C 2 driver C 3 1. Ignoring interconnects 2. Lumped capacitance model 3. RC tree model 4. RLC tree model 5. Transmission line models (RC, LC, RLC) 6. RC / RLC Network model ECE152B TC 9 Interconnect Models as a Capacitor 1. Ignoring Interconnects: 2. Lumped Capacitance model: R d V out R d V out C input C output C 1 + C 2 + C 3 C input C output C 1 + C 2 + C 3 + C wire V OUT 1 1. x V.5 x V V out (t) = V (1 e t/t ) T = Rd (C output +C 1 + C 2 + C 3 ) T 2T time or T = Rd (C output +C 1 + C 2 + C 3 + C wire ) V out -1 (.5V ) = (ln 2) T =.693 T V out (T) =.632 V ECE152B TC 1

Analysis of Simple RC Circuit R i ( t ) + v ( t ) = ( t ) d ( Cv ( t )) i( t ) = = C dt dv ( t ) RC + v ( t ) = v T dt v T dv ( t ) dt ( t ) state variable Input waveform v T (t) ± R C i(t) first-order linear differential equation with constant coefficients v(t) ECE152B TC 11 Analysis of Simple RC Circuit Zero-input response: (natural response) RC 1 v(t) () dv ( t ) + v ( t ) = dt dv(t) 1 = dt RC v N (t) = Ke t RC Step-input response: v v u(t) v (1-e -t/rc )u(t) dv ( t ) RC + v ( t ) = v u ( t ) dt t RC v ( t ) = v u ( t ) v ( t ) = Ke v u ( t ) F + match initial state: v ( ) = K + v u ( t ) = output response for step-input: t RC v ( t ) = v (1 e ) u ( t ) You can get the same result by Laplace Transform ECE152B TC 12

elays of Simple RC Circuit v(t) = v (1 - e -t/rc ) -- waveform under step input v u(t) v(t)=.5v t =.7RC i.e., delay =.7RC (5% delay) v(t)=.1v t =.1RC v(t)=.9v t = 2.3RC i.e., rise time = 2.2RC Rise time (Fall time): time for a waveform to rise from 1% to 9%(9% to 1%) of its steady state value V OL V OH V OH V OL Rise time Fall time ECE152B TC 13 Interconnect Models as a Tree 3. RC tree model 4. RLC tree model 5. Transmission line models (RC, LC, RLC) C 2 driver C 3 L-type π-type T-type or transmission line ECE152B TC 14

etermining Which Model to Use Some Rule-of-Thumbs: Need to consider C: if interconnect C is comparable to C of gates driven Need to consider R: if interconnect R is comparable to R of di driver Need to consider L: if ωl is comparable to R of interconnect ECE152B TC 15 Model Interconnects as RC Trees Each wire maybe segmented into several edges Each edge E modeled as a π-type or L-type circuit r E = unit res. length(e) c E = unit cap. length(e) ECE152B TC 16

Interconnect elay: Putting All Models Together G1 R in G2 G1 G2 + - R out C in C L R= R out +R int V in V in V out v v (1-e -t/rc ) C=C in +C L V out time Rise/fall time 2.2 RC => elay is proportional to the loading capacitance C L riving more gates result in longer rise/fall time Longer interconnects (larger R int and C in ) also result in longer rise/fall time ECE152B TC 17 Clock Tree If the # of flip-flops driven by the clock line is large, the clock rise time (also called slew rate) will be unacceptably long. Solution: Using clock power-up tree (adding buffers into the clock tree)............... ECE152B TC 18

Solution: Adding Buffers and Limiting Number of Their Fanouts Limiting the number of fanouts of each buffer to N Create a buffer tree to drive all flip-flops p while satisfying the constraint of fanout count The load seen by the clock source is significantly reduced The same idea can be used to reduce the delay of logic signals which drive a large number of gates and are on timing-critical paths ECE152B TC 19 An Example Assume 64 flip-flips to be driven by a single clock source Buffer delay (with zero load):.2 ns Interconnect delay:.2 ns with a single fanout (either to a flip-flop or to a buffer) Addition.1ns delay for each additional fanout Clk source......... Total delay from clock source to clock ports of flip-flops: 6.9 ns.2ns +.2ns + (.2ns + 63 x.1ns) = 6.9ns ECE152B TC 2

Example Clock Tree Assume each buffer has four fanouts Clk Source............... Total delay from clock source to clock ports of flip-flops: 2.3ns.2ns { th -level wire delay} +.2ns {1 st -level buffer delay} + (.2ns + 3 x.1ns) {1 th -level wire delay} +.2ns {2 nd -level buffer delay} + (.2ns + 3 x.1ns) {2 nd -level wire delay} +.2ns {3 rd -level buffer delay} + (.2ns + 3 x.1ns) {3 rd -level wire delay} ECE152B TC 21 Techniques for Improving Speed 1. Keep the logic gate depth shallow between flip-flops. 2. Avoid circuit designs that have highly loaded gates in the critical path. A gate delay will increase as the capacitive load is increased on the output of the gate. The primary sources of load capacitance are routing capacitance and the input capacitance of the driven gates. 3. uplicate logic to reduce fanouts (similar idea to clock tree buffering). 4. Avoid long interconnects 5. Gate sizing ECE152B TC 22

Gate Sizing: Making The riving Gate Larger (or Smaller) Larger the driving gate => Greater the driving current (so sharper V in ), but larger C input too (which slows down the signal coming into G1) G1 R G2 V in in B A B + R out G1 G2 - C in C L G1 v v (1-e -t/rc ) A R out B C input time ECE152B TC 23 Static Timing Analysis 11 Timing Specs REPORTS (courtesy P. Joshi, IBM) ECE152B TC 24

Static Timing Analysis PI1 1 4 6 5 PO1 Netlist with delay for each gate PI2 PI3 3 1 6 4 4 6 5 7 4 PO2 PO3 Arrival ltimes 1 7 13 18 PI1 1 4 6 5 PI2 PI3 3 3 1 1 9 6 4 4 7 7 6 15 5 14 722 418 PO1 PO2 PO3 ECE152B TC 25 Static Timing Analysis /4 1/5 7/9 13/15 18/22 PI1 1 4 6 5 PO1 arrival time/required time PI2 / 3 3/3 9/9 6 15/15 6 22/22 7 PO2 PI3 /8 1 1/9 4 4 7/15 7/13 5 14/18 18/22 4 PO3 slack = required time - arrival time 4 4 2 2 4 PI1 1 4 6 5 PI2 PI3 8 3 1 8 6 4 4 8 6 6 5 4 7 44 PO1 PO2 PO3 ECE152B TC 26

Timing Analysis with Interconnect elay 22 L A T C H 3 2 1 1 5 5 5 19 2 4 4 4 2 1 3 2 1 L A T C H ECE152B TC 27 Clock Non-idealities Clock skew Spatial variation in temporally equivalent clock edges; deterministic + random, t SK Clock jitter Temporal variations in consecutive edges of the clock signal; modulation + random noise Cycle-to-cycle y (short-term) t JS Long term t JL Variation of the pulse width Important for level sensitive clocking ECE152B TC 28

Clock Skew The delays from the clock source to the clock inputs of different flip-flops are different CLOCK RIVER A B ECE152B TC 29 Clock Skew and Jitter Clk t SK Clk t JS Both skew and jitter affect the effective cycle time Only skew affects the race margin ECE152B TC 3

Clock Uncertainties Sources of Skew and Jitter evices 2 4 Power Supply 3 Interconnect t 6 Capacitive Load 1 Clock Generation 5 Temperature 7 Coupling to Adjacent Lines Sources of clock uncertainty ECE152B TC 31 The clock skew problem: the race problem 1 1 2 2 1 2 1ns ns 2ns Correct operation: 1 -> 1, 2 -> 2 ue to clock skew: 1 -> 2 (error!!) Minimizing clock skew: istribute the clock signal in such a way that the interconnections from the clock source to the s clock inputs are of equal length. ECE152B TC 32

ECE152B TC 33 ECE152B TC 34

Positive and Negative Skew In R1 Combinational Logic R2 Combinational Logic R3 t 1 t 2 t 3 delay (a) Positive skew delay In R1 Combinational Logic R2 Combinational Logic R3 t 1 t 2 t 3 delay delay (b) Negative skew ECE152B TC 35 Positive Skew In R1 Combinational Logic R2 Combinational Logic R3 t 1 t 2 t 3 delay delay (a) Positive skew In 1 T + δ R1 R2 Combinational T 1 Logic 3 δ t 1 t 2 Combinational Logic R3 t 3 2 delay delay 2 4 (b) Negative skew δ + t h Launching edge arrives before the receiving edge ECE152B TC 36

Logic Co b at o a Logic t 1 delay t 2 Negative Skew delay t 3 (a) Positive skew In R1 Combinational Logic R2 Combinational Logic R3 t 1 t 2 t 3 delay delay (b) Negative skew T + δ 1 T 1 3 2 2 4 δ Receiving edge arrives before the launching edge ECE152B TC 37 Useful Clock Skew Clock skew is not always bad!! Example: 3 A B 1 t θ t θ Assume the propagation p delay (clock to delay) t c q =.6, the setup time t su =.4, the hold time t hd =.5 If t θ -t θ = (no clock skew), minimum clock period = 11 ECE152B TC 38

T+δ If t θ -t θ =δ=1 t θ A 3 B 1 t θ t θ t θ 1 >=11 For proper p operation, the time between positive edges at registers A and B must be greater than or equal to 11 clock period + clock skew >= 11 minimum clock period = 1 ECE152B TC 39 t θ A 3 1 B t θ t θ ata output of A ata input of B.6 3.6 t θ < (3.6.5) On the other hand, the clock skew cannot exceed 3.1 ns. Otherwise the data latched into register A may propagate through the short path and reach the data input of register B before the rising edge of the clock pulse of the same cycle reaching θ. ECE152B TC 4

A If t θ -t θ =1 Effect of Negative Clock Skew 3 1 B t θ t θ t θ t θ T-(t θ -t θ ) 1 >=11 For proper operation, the time between positive edges at registers A and B must be greater than or equal to 11 If we define: t θ -t θ =δ, δ would be negative for negative clock skew. For this example, δ = -1; T + δ >= 11, thus, the minimum clock period T min = 12 ECE152B TC 41 Effect of Negative Clock Skew 3 A 1 B t θ t θ t θ t θ Register B will never get the chance to latch the wrong data no matter how large the negative clock skew is. Race problem is not a concern for negative clock skew ECE152B TC 42

Summary Clock Period: T Longest delay from Reg. A to Reg. B: L A B Shortest delay from Reg. A to Reg. B: S A B propagation delay, setup time, hold time: t c q,t su, t hold 1. T + δ (t c q + L A B + t su ) 2. δ (t c q + S A B -t hold ) Where δ=(t θ - t θ ) Requirement (1) have to be satisfied for datapath th between any pair of registers where δ would be either positive or negative Requirement (2) have to be satisfied for datapath between any pair of registers with a positive clock skew ECE152B TC 43 Example: Assume t c q is.6, t su is.4 and t hold is.5. logic t pd =5 Clock θ logic t pd =2 logic t pd =2 register register logic t pd =5 logic t pd =3 t θ t θ (a) etermine the minimum clock period assuming a positive clock skew: δ = (t θ - t θ ) = 1. (b) Repeat part(a), factoring in a positive clock skew: δ = 3. (c) Repeat part(a), factoring in a negative clock skew:δ = -2. (d) erive the maximum positive clock skew (i.e. t θ > t θ ) that can be tolerated before the circuit fails. (e) erive the maximum negative clock skew (i.e. t θ < t θ ) that can be tolerated before the circuit fails. ECE152B TC 44

Impact of Jitter T 2 5 1 3 4 -t ji tte r t jitter 6 In REGS t c-q, t c-q, cd t su, t hold t jitter Combinational Logic t logic t logic, cd ECE152B TC 45 Be Very Careful About Gated Clock Controller A clk B clk A B An undesirable glitch or spike will result & cause an additional trigger of the clock. ECE152B TC 46

An alternative design: Controller 1 clk Controller A clk B This design causes less timing problem but consume more power. ECE152B TC 47 Clock Gating Typically 4-5% of active power is in the IC clock trees Clock gating allows some of this power eliminated in active modes Root and branch clock gating can be significant Resumption of clocking is very fast So clock-gated modules can return to run mode without loss of services ECE152B TC 48

Fine-grained Clock Gating always @(posedge ) begin if (EN) <= ; end RTL Synthesis EN Low-Power RTL Synthesis EN CG cell ECE152B TC 49 Clock Gating Synthesis always @(posedge ) begin q <= ; q1 <= 1; en <= SEL; end assign Out = en? q:q1; RTL Synthesis or Low-Power RTL Synthesis SEL 1 en q q1 Out SEL en always @(posedge clk) begin q if (sel == 1 b1) Low-Power RTL Synthesis q <= d; CG q1 Out if (sel == 1 b) 1 q1 <= d1; en <= sel; end CG assign out = en? q:q1; ECE152B TC Ref: Calypto 5

Sequential Clock Gating din_1 f_1 f_2 Original RTL vld_1 dout din_2 g_1 g_2 vld_2 din_1 vld_1 f_1 CG CG f_2 Power Optimized RTL dout g_1 g_2 din_2 CG CG vld_2 Combinational Analysis Sequential Analysis ECE152B TC Source: Calypto 51 CG Advanced Clock Gating (Example) din vld Sequential Analysis d_1 d_2 vld_1 vld_2 Combinational Analysis dout din Sequential Clock Gating d_1 d_2 vld CG CG CG dout Combinational Clock Gating vld_1 vld_2 ECE152B TC Source: Calypto 52 52

Improving Clock Gating Efficiency Clock Gating Efficiency = average percentage of time each register is gated for a given testbench 1% uration Clock Gated Registers/uration ti Original esign Clock Gating Efficiency Original esign Block 1 Block 2 38.8% 29% R Registers R n 1% Clock Gated Registers/uration Optimized esign uration Clock Gating Efficiency After Identifying more gating opportunities Block 1 Block 2 54.9% 47% R Registers R n Source: Calypto ECE152B TC 53 ealing with Asynchronous Inputs: Synchronizer Asynchronous signals incoming to a synchronous system must be synchronized with the rest of the system ASYNCHRONOUS INPUT SYNCHRONIZER SYNCHRONIZE SIGNAL METASTABLE STATE POSSIBLE OUTPUTS ECE152B TC 54

A multiple-stage synchronizer reduces the chance of synchronization failure. Outside SYSTEM SYSTEM CLOCK ECE152B TC 55 A Timing Optimization Technique - Pipelining Pipelining: a technique to break up a timingcritical data path into a series of small data paths by placing registers between sections. input F() output input Fa() Fb() Fc() output ECE152B TC 56

Another Timing Optimization Technique - Retiming Retiming: a technique to transform a given synchronous circuit into a faster circuit. An example: A digital correlator: The correlator takes a stream of bits x,x 1,... x k as input & compares it with a fixed-length pattern a,a 1,..., a k. After receiving each input x i, the correlator produces as output the number of matches. I.e. k y = i σ ( x i j, a j ) j = 1 if x = y; where σ (x, y) = otherwise ECE152B TC 57 An implementation for k=3 y i + + + x i δ δ δ δ a a 1 a 2 a 3 : register σ (P, ) = 1 P ; P = P+ + P : adder P δ P : comparator ECE152B TC 58

x i Original design Suppose each adder has a propagation delay of 7 ns. & each comparator 3ns. The longest propagatim delay is 24ns The clock period must be >= 24ns. A better design: y i x i y i + + + B δ δ δ δ a A a 1 a 2 a 3 + + + B δ δ δ δ a A a 1 a 2 a 3 ECE152B TC 59 These two designs are functionally equivalent: all input signals to the box portion arrive one clock tick earlier. thus, the boxed portion performs the same sequence of computation as the first design, but one clock tick earlier. Since the output from the boxed portion is delayed one clock tick by the new register at B, the remainder of the circuit sees the same behavior as in the 1st design. The longest propagation delay is reduced to 17 nsec. The elements in the boxed portion lead by one clock tick. Retiming - the technique of inserting & deleting registers to speed the design while preserve the function. ECE152B TC 6

Retiming Transformation a b c d x a x b d ECE152B TC 61 Example: A 5 4 B 2 C A 5 B 4 2 C Comparison: pipelining A 5 A 4 B 2 C ECE152B TC 62

Retiming Edge label: # of registers V 7 V 6 V 5 7 7 7 1 1 1 1 3 3 3 3 V 1 V 2 V 3 V 4 ECE152B TC 63 Retiming Edge label: # of registers V 7 V 6 V 5 7 1 7 1 7 1 1 1 1 1 3 1 3 1 3 1 3 V 1 V 2 V 3 V 4 ECE152B TC 64