Skew-Tolerant Circuit Design

Similar documents
Timing Analysis with Clock Skew

MODULE 5 Chapter 7. Clocked Storage Elements

The Linear-Feedback Shift Register

Logical Effort: Designing for Speed on the Back of an Envelope David Harris Harvey Mudd College Claremont, CA

Digital Integrated Circuits A Design Perspective

EE115C Winter 2017 Digital Electronic Circuits. Lecture 19: Timing Analysis

Lecture 9: Sequential Logic Circuits. Reading: CH 7

Jin-Fu Li Advanced Reliable Systems (ARES) Lab. Department of Electrical Engineering. Jungli, Taiwan

Timing Issues. Digital Integrated Circuits A Design Perspective. Jan M. Rabaey Anantha Chandrakasan Borivoje Nikolić. January 2003

GMU, ECE 680 Physical VLSI Design 1

Lecture 16: Circuit Pitfalls

UNIVERSITY OF CALIFORNIA, BERKELEY College of Engineering Department of Electrical Engineering and Computer Sciences

ECEN 248: INTRODUCTION TO DIGITAL SYSTEMS DESIGN. Week 9 Dr. Srinivas Shakkottai Dept. of Electrical and Computer Engineering

Sequential vs. Combinational

Dynamic Combinational Circuits. Dynamic Logic

Xarxes de distribució del senyal de. interferència electromagnètica, consum, soroll de conmutació.

Digital Integrated Circuits Designing Combinational Logic Circuits. Fuyuzhuo

Lecture 14: Circuit Families

Dynamic Combinational Circuits. Dynamic Logic

ESE 570: Digital Integrated Circuits and VLSI Fundamentals

Lecture 9: Clocking, Clock Skew, Clock Jitter, Clock Distribution and some FM

CSE241 VLSI Digital Circuits Winter Lecture 07: Timing II

GMU, ECE 680 Physical VLSI Design

9/18/2008 GMU, ECE 680 Physical VLSI Design

Digital Integrated Circuits A Design Perspective

Chapter 5 CMOS Logic Gate Design

CMPEN 411 VLSI Digital Circuits Spring 2012 Lecture 17: Dynamic Sequential Circuits And Timing Issues

Designing Sequential Logic Circuits

Jan M. Rabaey Anantha Chandrakasan Borivoje Nikolic. November Digital Integrated Circuits 2nd Sequential Circuits

Topics. Dynamic CMOS Sequential Design Memory and Control. John A. Chandy Dept. of Electrical and Computer Engineering University of Connecticut

L4: Sequential Building Blocks (Flip-flops, Latches and Registers)

ESE 570: Digital Integrated Circuits and VLSI Fundamentals

EE382 Processor Design Winter 1999 Chapter 2 Lectures Clocking and Pipelining

ESE 570: Digital Integrated Circuits and VLSI Fundamentals

Digital Integrated Circuits A Design Perspective

Clock signal in digital circuit is responsible for synchronizing the transfer to the data between processing elements.

Integrated Circuits & Systems

Lecture 1: Circuits & Layout

Digital Integrated Circuits A Design Perspective

Clock Strategy. VLSI System Design NCKUEE-KJLEE

For smaller NRE cost For faster time to market For smaller high-volume manufacturing cost For higher performance

EE141Microelettronica. CMOS Logic

! Charge Leakage/Charge Sharing. " Domino Logic Design Considerations. ! Logic Comparisons. ! Memory. " Classification. " ROM Memories.

EECS 427 Lecture 15: Timing, Latches, and Registers Reading: Chapter 7. EECS 427 F09 Lecture Reminders

Chapter 3. Digital Design and Computer Architecture, 2 nd Edition. David Money Harris and Sarah L. Harris. Chapter 3 <1>

L4: Sequential Building Blocks (Flip-flops, Latches and Registers)

UNIVERSITY OF CALIFORNIA

CPE/EE 427, CPE 527 VLSI Design I L18: Circuit Families. Outline

C.K. Ken Yang UCLA Courtesy of MAH EE 215B

EE371 - Advanced VLSI Circuit Design

Lecture 25. Dealing with Interconnect and Timing. Digital Integrated Circuits Interconnect

Final presentations May 8, 1-5pm, BWRC Final reports due May 7, 8pm Final exam, Monday, May :30pm, 241 Cory

Topics. CMOS Design Multi-input delay analysis. John A. Chandy Dept. of Electrical and Computer Engineering University of Connecticut

Unit 7 Sequential Circuits (Flip Flop, Registers)

ALU A functional unit

EE241 - Spring 2007 Advanced Digital Integrated Circuits. Announcements

EECS 427 Lecture 14: Timing Readings: EECS 427 F09 Lecture Reminders

Lecture 6: Circuit design part 1

Advanced Testing. EE5375 ADD II Prof. MacDonald

CMPEN 411. Spring Lecture 18: Static Sequential Circuits

ΗΜΥ 307 ΨΗΦΙΑΚΑ ΟΛΟΚΛΗΡΩΜΕΝΑ ΚΥΚΛΩΜΑΤΑ Εαρινό Εξάμηνο 2018

Digital System Clocking: High-Performance and Low-Power Aspects. Vojin G. Oklobdzija, Vladimir M. Stojanovic, Dejan M. Markovic, Nikola M.

Lecture 27: Latches. Final presentations May 8, 1-5pm, BWRC Final reports due May 7 Final exam, Monday, May :30pm, 241 Cory

Lecture 23. Dealing with Interconnect. Impact of Interconnect Parasitics

CPE/EE 422/522. Chapter 1 - Review of Logic Design Fundamentals. Dr. Rhonda Kay Gaede UAH. 1.1 Combinational Logic

EET 310 Flip-Flops 11/17/2011 1

EE213, Spr 2017 HW#3 Due: May 17 th, in class. Figure 1

MOSIS REPORT. Spring MOSIS Report 1. MOSIS Report 2. MOSIS Report 3

ALU, Latches and Flip-Flops

COMP 103. Lecture 16. Dynamic Logic

Fundamentals of Computer Systems

CMPEN 411 VLSI Digital Circuits Spring Lecture 19: Adder Design

Latches. October 13, 2003 Latches 1

9/18/2008 GMU, ECE 680 Physical VLSI Design

KINGS COLLEGE OF ENGINEERING DEPARTMENT OF ELECTRONICS AND COMMUNICATION ENGINEERING QUESTION BANK

THE UNIVERSITY OF MICHIGAN. Timing Analysis of Domino Logic

Clocking Issues: Distribution, Energy

Hw 6 due Thursday, Nov 3, 5pm No lab this week

Memory, Latches, & Registers

74LCX16374 Low Voltage 16-Bit D-Type Flip-Flop with 5V Tolerant Inputs and Outputs

Lecture Outline. ESE 570: Digital Integrated Circuits and VLSI Fundamentals. Total Power. Energy and Power Optimization. Worksheet Problem 1

Fundamentals of Computer Systems

Introduction to Digital Logic

74LCX16374 Low Voltage 16-Bit D-Type Flip-Flop with 5V Tolerant Inputs and Outputs

Itanium TM Processor Clock Design

Vidyalankar S.E. Sem. III [CMPN] Digital Logic Design and Analysis Prelim Question Paper Solution

INTEGRATED CIRCUITS. For a complete data sheet, please also download:

THE UNIVERSITY OF MICHIGAN. Faster Static Timing Analysis via Bus Compression

VLSI Design. [Adapted from Rabaey s Digital Integrated Circuits, 2002, J. Rabaey et al.] ECE 4121 VLSI DEsign.1

EEC 216 Lecture #3: Power Estimation, Interconnect, & Architecture. Rajeevan Amirtharajah University of California, Davis

EECS 427 Lecture 8: Adders Readings: EECS 427 F09 Lecture 8 1. Reminders. HW3 project initial proposal: due Wednesday 10/7

LOGIC CIRCUITS. Basic Experiment and Design of Electronics. Ho Kyung Kim, Ph.D.

Chapter 3. Chapter 3 :: Topics. Introduction. Sequential Circuits

Lecture 34: Portable Systems Technology Background Professor Randy H. Katz Computer Science 252 Fall 1995

Issues on Timing and Clocking

Adders, subtractors comparators, multipliers and other ALU elements

A Mathematical Solution to. by Utilizing Soft Edge Flip Flops

Sequential Logic. Rab Nawaz Khan Jadoon DCS. Lecturer COMSATS Lahore Pakistan. Department of Computer Science

Homework Assignment #1 Solutions EE 477 Spring 2017 Professor Parker

Hold Time Illustrations

Transcription:

Skew-Tolerant Circuit Design David Harris David_Harris@hmc.edu December, 2000 Harvey Mudd College Claremont, CA

Outline Introduction Skew-Tolerant Circuits Traditional Domino Circuits Skew-Tolerant Domino Circuits Design Methodology Example Talk based on: Skew-Tolerant Circuit Design, Morgan Kaufmann Publishers, 2001. Skew-Tolerant Circuit Design David Harris Page 2 of 50

Sequencing Overhead in Flip-Flop Systems Ideally, each full clock cycle T c is available for useful computation But we must keep token in current pipestage from catching up with token in next Purpose of sequencing elements such as flip-flops is not to remember data but merely to retard fast paths so they don t corrupt tokens further along in the pipeline This inherently retards the slower paths as well Introduces sequencing overhead inherent to keeping tokens sequenced T c CQ logic DC Flop Flop Cycle 1 For flip-flops: logic = T c CQ DC Skew-Tolerant Circuit Design David Harris Page 3 of 50

Clock Skew & Time Borrowing Clock skew increases sequencing overhead T c CQ logic DC t skew Flop Flop Cycle 1 Data may launch on the latest skewed clock edge Must setup before the earliest skewed clock edge Clock skew directly cuts into time available for computation logic = T c CQ DC t skew Any unused time at end of cycle is wasted No time borrowing possible to balance logic between stages Skew-Tolerant Circuit Design David Harris Page 4 of 50

Cycle Time Trends Much of CPU performance improvement comes from higher frequencies Clock rates increasing faster than simple process improvement allows Fixed sequencing overhead becomes greater portion of cycle time 100 1000 10 SpecInt95 1 0.1 80386 80486 Pentium Pentium II / III MHz 100 80386 80486 Pentium Pentium II / III 0.01 1985 1988 1991 1994 1997 2000 10 1985 1988 1991 1994 1997 2000 100 Fanout-of-4 (FO4) Inverter Delay (ps) 500 200 100 50 VDD = 5 VDD = 3.3 VDD = 2.5 2.0 µ 1.2 µ 0.8 µ 0.6 µ 0.35 µ 0.25 µ FO4 inverter delays / cycle 50 20 80386 80486 Pentium Pentium II / III 10 1985 1988 1991 1994 1997 2000 Process Skew-Tolerant Circuit Design David Harris Page 5 of 50

SIA Frequency Roadmap Semiconductor Industry Association has predicted clock frequencies Estimate cycle time in FO4 inverter delays Cycles are getting very short Minimizing sequencing overhead will be ever more important Process (mm) Year Frequency (MHz) Cycle time (FO4) 0.25 1997 750 (600) 13.3 (16.7) 0.18 1999 1250 (733) 11.1 (18.9) 0.15 2001 1500 (> 1500) 11.1 (<11.1) 0.13 2003 2100 9.2 0.10 2006 3500 7.1 0.07 2009 6000 6.0 0.05 2012 10000 5.0 Skew-Tolerant Circuit Design David Harris Page 6 of 50

Outline Introduction Skew-Tolerant Circuits Traditional Domino Circuits Skew-Tolerant Domino Circuits Design Methodology Example Skew-Tolerant Circuit Design David Harris Page 7 of 50

Transparent Latches Transparent latch-based systems use a latch in each half-cycle Latch is transparent when controlling clock is high, opaque when low Two latches required so tokens does not skip ahead to next pipeline stage T c _b Transparent Opaque Opaque Transparent DQ DQ _b Latch Latch Half-Cycle 1 Half-Cycle 2 logic = T c 2 DQ Skew-Tolerant Circuit Design David Harris Page 8 of 50

Transparent Latches with Skew Flip-flops are sensitive to skew because of the hard edges Data not launched until latest rising edge of clock Data must setup before earliest falling edge of clock Overhead would shrink if we could soften an edge Latch-based systems may still use entire cycle even if there is skew T c _b _b Latch Latch Half-Cycle 1 Half-Cycle 2 logic = T c 2 DQ Skew-Tolerant Circuit Design David Harris Page 9 of 50

Pulsed Latches Can we keep tokens in sequence with only one latch per cycle? Pulse the latch transparent briefly compared to delay through pipestage T c t pw φ p t pw DQ logic φ p DC t skew Case 1: pulse wider than setup time + clock skew data arrives while latch is transparent, no time wasted φ p t pw φ p Latch DC t skew wasted time Cycle 1 Case 2: pulse narrower than setup time + clock skew data may arrive while latch is opaque, leaving wasted time T c DQ if t pw DC + t skew logic = T c + t pw DQ DC t skew otherwise Skew-Tolerant Circuit Design David Harris Page 10 of 50

Min-Delay Min-Delay (a.k.a hold time or race problem) is especially insidious Setup time problems can be fixed by slowing clock Hold time problems require redesign of chip Consider min-delay constraint for flip-flops with little intervening logic δ CQ δ logic input arrives at Flop 2 t skew CD δ CQ + δ logic t skew + CD safe for Flop 2 input to change Flop 1 δ logic Flop 2 δ logic CD + t skew δ CQ Skew-Tolerant Circuit Design David Harris Page 11 of 50

Min-Delay (Transparent Latches) Latches may use nonoverlapping clocks to reduce min-delay problems But min-delay constraints exist between each latch (twice per cycle) _b δ CQ t skew Latch 1 Latch 2 t nonoverlap δ logic CD δ logic _b input arrives at Latch 2 safe for Latch 2 input to change δ CQ + δ logic t skew + CD t nonoverlap δ logic CD + t skew δ CQ t nonoverlap (in each half-cycle paths) Skew-Tolerant Circuit Design David Harris Page 12 of 50

Min-Delay (Pulsed Latches) Pulsed latches with wide pulses are especially susceptible to min-delay Data launches off earliest rising edge of pulse Must hold until after latest falling edge of pulse φ p t pw δ CQ δ logic input arrives at Latch 2 φ p t skew CD safe for Latch 2 input to change Latch 1 δ logic φ p Latch 2 δ logic t pw + CD + t skew δ CQ Skew-Tolerant Circuit Design David Harris Page 13 of 50

Time Borrowing Logic between latches may occupy more than exactly 1/2 cycle Borrow time so long as setup time at next latch is met Soft edges provide flexibility to partition and balance logic (unlike flops) T c _b T c /2 DC t borrow _b t skew Latch Combinational Logic Latch Half-Cycle 1 Half-Cycle 2 t borrow = T ----- c 2 DC t skew Skew-Tolerant Circuit Design David Harris Page 14 of 50

Time Borrowing with Pulses What if the clock duty cycle is less than 1/2 cycle (pulses or nonoverlap)? Still meet setup time before end of transparency pulse T c t pw _b T c /2 DC t borrow _b t skew t nonoverlap Latch Combinational Logic Latch Half-Cycle 1 Half-Cycle 2 t pw DC t skew pulsed latches t borrow = T ----- c t 2 nonoverlap DC t skew transparent latches Skew-Tolerant Circuit Design David Harris Page 15 of 50

Skew-Tolerant Circuits Summary Flip-Flops: Popular because CAD tools and junior engineers understand them Worst sequencing overhead and no time borrowing Transparent Latches: Better sequencing overhead Excellent time borrowing Understood by most but not all engineers and CAD tools Pulsed Latches: Lowest sequencing overhead Can be treated like flip-flops Requires thorough min-delay analysis Minimal time borrowing Element Sequencing overhead Time borrowing Min-delay δ logic Flip-flop CQ + DC + t skew 0 CD + t skew δ CQ Transparent latch Pulsed latch 2 DQ T ----- c 2 DC t skew t nonoverlap CD + t skew δ CQ t nonoverlap (in each half-cycle) DQ + max( 0, DC + t skew t pw ) t pw DC t skew t pw + CD + t skew δ CQ Skew-Tolerant Circuit Design David Harris Page 16 of 50

Outline Introduction Skew-Tolerant Circuits Traditional Domino Circuits Skew-Tolerant Domino Circuits Design Methodology Example Skew-Tolerant Circuit Design David Harris Page 17 of 50

Circuit Review CMOS circuits are slow because fat PMOS devices load input gates use precharged circuitry to remove PMOS from input Precharge: φ = 0 output set high through clocked PMOS transistor Evaluate: φ = 1 output possibly pulls low through logic transistors φ Y X A B C D A B C D φ NOR4 NOR4 φ Precharge Evaluate Precharge Y Skew-Tolerant Circuit Design David Harris Page 18 of 50

Domino Circuits inputs must be monotonically rising during evaluation Place inverting static gate between each dynamic gate / static pair is called a domino gate φ φ Domino Gate Domino gates can be safely cascaded Inputs from static CMOS logic must still be latched to avoid glitches Domino evaluation delays 1.5-2x faster than static CMOS Challenge is how to prevent precharge time from impacting critical path Traditional domino clocking approaches introduce severe overhead Skew-tolerant domino circuits can hide this overhead Skew-Tolerant Circuit Design David Harris Page 19 of 50

Traditional Domino Circuits How do we prevent precharge time from appearing in the critical path? Hide precharge time by ping-ponging between half-cycles T c _b Evaluate Precharge Precharge Evaluate _b _b _b _b Latch Latch Half-Cycle 1 Half-Cycle 2 One evaluates while other precharges Latches hold results during precharge DQ logic = T c 2 DQ Skew-Tolerant Circuit Design David Harris Page 20 of 50

Clock Skew Clock skew increases sequencing overhead Eval begins T c 1 2 1 1 2 Setup before here Latch Half-Cycle 1 DC + t skew Evaluation begins at latest rising edge Latch input setup before earliest falling edge Budget clock skew twice each cycle logic = T c 2 DC 2t skew Skew-Tolerant Circuit Design David Harris Page 21 of 50

Balancing Logic Logic may not exactly fit half-cycle T c _b _b _b _b Latch Latch Half-Cycle 1 Half-Cycle 2 Unusable time No flexibility to borrow time to balance logic between half-cycles logic = T c 2 DC 2t skew t imbalanced Skew-Tolerant Circuit Design David Harris Page 22 of 50

How bad is it? Consider 500 MHz system similar to Alpha 21164 T c = 2ns t skew = 200 ps (reported budget) DC = 50 ps (optimistic estimate) 2 ns Half-Cycle 1 Half-Cycle 2 Logic Setup Skew Logic Setup Skew 25% of cycle consumed by skew & setup! Skew-Tolerant Circuit Design David Harris Page 23 of 50

Relaxing the Timing Constraints Sequencing overhead caused by hard edges (just as for flip-flops) Data not launched until latest rising edge of clock Data must setup before earliest falling edge of clock Overhead would shrink if we could soften an edge Let s look at the role of the latch on the falling edge Latch function (1) Prevent glitches on inputs of domino gates (2) Hold results during precharge Latches would be unnecessary if we could do these tasks in another way Glitches will not occur if inputs come from domino logic Can we build domino circuits which consume the results before previous half-cycle precharges? Skew-Tolerant Circuit Design David Harris Page 24 of 50

Outline Introduction Skew-Tolerant Circuits Traditional Domino Circuits Skew-Tolerant Domino Circuits Design Methodology Example Skew-Tolerant Circuit Design David Harris Page 25 of 50

Skew-Tolerant Domino Circuits Overlap clocks so gate Y evaluates before X precharges T c φ 1 φ 2 overlap φ 1 φ 1 φ 1 φ 2 φ 2 φ 2 Latch X Y Phase 1 DC + t skew Phase 2 We can remove latches within domino pipeline Eliminate all sequencing overhead within the domino pipeline Still budget skew and setup at static / domino interface Skew-Tolerant Circuit Design David Harris Page 26 of 50

Skew-Tolerant Domino Waveforms In general, we can use N phases T c t e t p φ 1 φ 2 t overlap φ 3 φ 4 φ 1 φ 1 φ 2 φ 3 φ 3 φ 4 Phase 1 Phase 2 Phase 3 Phase 4 Each phase delayed T c /N from the previous What is the best duty cycle (t e and t p )? Skew-Tolerant Circuit Design David Harris Page 27 of 50

Precharge Time (t p ) Precharge time set by precharge of two gates in a phase with skewed clocks φ 1a t p φ 1b t prech t skew φ 1a φ 1b First gate begins precharging late Requires t prech to fully precharge Precharge must complete before second gate reenters evaluation t p = t prech + t skew Skew-Tolerant Circuit Design David Harris Page 28 of 50

Evaluation Time (t e ) Evaluation time set by next phase consuming result before first phase precharges t e φ 1b φ 2a t skew T c / N φ 1b φ 2a t hold Phase 1 Phase 2 First phase begins evaluation early Must not precharge until some hold time t hold after second phase evaluates t e = T ----- c + t N hold + t skew Skew-Tolerant Circuit Design David Harris Page 29 of 50

Skew Tolerance We ve found the best duty cycle Precharge: t p = t prech + t skew Evaluation: t e = T ----- c + t N hold + t skew Solve for the maximum tolerable skew N 1 -------------T t N c t prech t hold skew-max = -------------------------------------------------------- 2 Observations Skew tolerance increases with N Precharge and hold time cut into skew tolerance In the limit of large N and long cycles, skew tolerance ~ 1/2 cycle Skew-Tolerant Circuit Design David Harris Page 30 of 50

Local Skew Local clock domains Most circuits experience less skew in a small area than across chip Reduce skew within a phase of logic, thereby tolerating more skew globally local t p = t prech + t skew T t c global e = ----- + t N hold + t skew global N 1 local t skew-max = ------------T N c t prech t hold t skew This is the nominal overlap between phases Skew-Tolerant Circuit Design David Harris Page 31 of 50

Time Borrowing Time borrowing Excess overlap allows time borrowing into next phase (e.g. gate X) T c φ 1 φ 2 t borrow global t skew global t skew-max φ 1 φ 1 φ 1 φ 2 t borrow global t skew-max X Phase 1 Phase 2 global t skew = = N 1 ------------T N c t prech t hold local global t skew t skew Skew-Tolerant Circuit Design David Harris Page 32 of 50

Example and Summary Again consider a 500 MHz processor: T c = 2ns t skewg = 200 ps / t skewl = 50 ps t hold = 0 (conservative) t borrow (ps) 1000 750 500 250 0 Available Time Borrowing vs. # Clock Phases (assuming t prech = 500 ps) 250 583 750 2 3 4 6 8 N (# of clock phases) 917 1000 Domino Style Sequencing overhead Skew Tolerance Time borrowing Traditional Skew-Tolerant 2 DC 2t skew t imbalanced n/a 0 0 N 1 local -------------T N c t prech t hold t skew N 1 local -------------T N c t prech t hold t skew t skew global Skew-Tolerant Circuit Design David Harris Page 33 of 50

Outline Introduction Skew-Tolerant Circuits Traditional Domino Circuits Skew-Tolerant Domino Circuits Design Methodology Results Skew-Tolerant Circuit Design David Harris Page 34 of 50

Clock Generation Two-Phase Clock Generator φ 1 g φ 2 Low-skew complement generator Four-Phase Clock Generator Clock choppers φ 1 g φ 2 φ 3 Low-skew complement generator 1/4 cycle delay Optional clock choppers Inverters used to delay clock by quarter of nominal cycle time φ 4 Skew-Tolerant Circuit Design David Harris Page 35 of 50

Generator-Induced Clock Skew Variation in clock generator delays appears as clock skew 750 Complement Skew Overlap (ps) 500 250 Chopper Skew Delay Skew t borrow 0 2-Phase 4-Phase Variation in delay 20-30% relative to other clocks 4-Phase is still a large net benefit Skew-Tolerant Circuit Design David Harris Page 36 of 50

Interface with Logic Real systems mix both static and domino logic Transparent latches use φ 1 and φ 3 Pulsed latches use φ p (see book) Four-phase skew-tolerant domino uses φ 1, φ 2, φ 3, and φ 4 Which connections are legal? Domino outputs must be latched before driving static logic Use N-C 2 MOS Latch for speed and to avoid clock skew problems Q To static logic From domino logic D φ gate N-C 2 MOS latch Skew-Tolerant Circuit Design David Harris Page 37 of 50

Timing Types Timing types define legal interfaces between gates Signal suffixes define when data is legal to use, allow static verification _s indicates that a signal is stable throughout that phase _v indicates that a signal becomes valid before the end of that phase φ 1 φ 2 φ 3 φ 4 x_s23 y_v2 φ 1 φ 2 Latch x_s23 y_v2 Skew-Tolerant Circuit Design David Harris Page 38 of 50

4-Phase Skew-Tolerant Domino Timing Types gate output suffix is same as input suffix Clocked gates and latches set output suffix Element Type Clock Input Output Domino φ 1 _s1, _v4 (first) / v1 (others) _v1 φ 2 _s2, _v1 (first) / _v2 (others) _v2 φ 3 _s3, _v2 (first) / _v3 (others) _v3 φ 4 _s4, _v3 (first) / _v4 (others) _v4 Transparent Latch φ 1 _s1 _s23 φ 3 _s3 _s41 N-C 2 MOS Latch φ 2 _v2 _s34 φ 4 _v4 _s12 Skew-Tolerant Circuit Design David Harris Page 39 of 50

Example of Legal Connections φ 1 φ 2 φ 3 φ 4 φ 1 φ 3 φ 1 Latch s23 s23 Latch s23 s23 s3 s41 s41 s41 s41 s41 s23 Latch φ 1 φ 2 φ 4 NC 2 MOS s34 φ 1 φ 2 φ 2 φ 3 φ 3 φ 4 φ 4 φ 1 NC 2 MOS s12 v1 v1 v1 v1 v2 v2 v2 v2 v3 v3 v3 v3 v4 v4 v4 v4 v1 Skew-Tolerant Circuit Design David Harris Page 40 of 50

Example of Illegal Connections φ 1 φ 2 φ 3 φ 4 φ 1 φ 3 φ 1 Latch s23 Latch s23 s23 s23 s3 s41 s41 s41 s41 s41 s23 Latch φ 1 φ 1 φ 2 φ 2 φ 3 φ 3 φ 4 φ 4 φ 1 v1 v1 v1 v1 v2 v2 v2 v2 v3 v3 v3 v3 v4 v4 v4 v4 v1 Skew-Tolerant Circuit Design David Harris Page 41 of 50

Testability To test chips, we would like to be able to stop the clock and observe and control one element in each path through each cycle of logic. Stop with g low (φ 1 and φ 2 low, φ 3 and φ 4 high) Add full keeper to first φ 3 gate of each cycle to avoid floating high or low Different from half keeper of ordinary dynamic gates which only float high x_v2 from φ 2 domino weak y_v3 φ 3 φ 2 φ 3 x_v2 y_v3 output would float Skew-Tolerant Circuit Design David Harris Page 42 of 50

Transparent Latch Scan Add slave latch and access transistor to scan φ 1 static latches D φ 1 SCA SDO SDI SCB φ 1 Q slave latch Scan Procedure: 1. Stop g low 2. Toggle SCA and SCB to march data through the scan chain 3. Restart g How do we scan cycles of domino logic with no transparent latches? Skew-Tolerant Circuit Design David Harris Page 43 of 50

Domino Scan Scanning domino logic is very similar. Which gate should be scanned? SCA SDO SDI SCB φ 4 φ 4s pulldown network slave latch Next gate should be precharging so glitches do not propagate When normal operation resumes, scan data should remain until consumed Thus, choose last φ 4 domino gate of each cycle for scan Scan Procedure 1. Stop g low 2. Stop φ 4s low 3. Toggle SCA and SCB to march data through the scan chain 4. Restart g 5. Release φ 4s once scannable gate begins precharge (race) Skew-Tolerant Circuit Design David Harris Page 44 of 50

Improved Clock Generator Putting together skew-tolerant domino, stop clock, and scan: en g φ 1 φ 2 φ 3 φ 4 φ 4s SCA φ 4 R S SR latch solves the race reenabling φ 4s 1. Stop g low 2. Toggle SCA and SCB to march data through the scan chain. The first pulse of SCA will force φ 4s low. 3. Restart g. The falling edge of φ 4 will release φ 4s to track φ 4. Skew-Tolerant Circuit Design David Harris Page 45 of 50

Delayed Clocking / Delayed Reset Domino Alternative approach uses a single clock phase per gate: Locally generate clocks with matched delay buffers φ 1 φ 2 φ 3 φ 4 φ 5 φ 6 g φ 1 φ 2 φ 3 φ 4 φ 5 φ 6 φ 1 Skew-Tolerant Circuit Design David Harris Page 46 of 50

Outline Introduction Skew-Tolerant Circuits Traditional Domino Circuits Skew-Tolerant Domino Circuits Design Methodology Example Skew-Tolerant Circuit Design David Harris Page 47 of 50

Example 64-bit superscalar ALU self-bypass path _b _b _b x2 64-bit Adder Latch 1 mm 1 mm 2 mm x4 Other Result Mux ALU Block Bypass Mux (150 ff) Latch Traditional Domino To Data Cache φ 1 φ 2 φ 3 φ 3 φ 4 x2 64-bit Adder 1 mm 1 mm 2 mm x4 Other Result Mux ALU Block Bypass Mux (150 ff) Skew-Tolerant Domino To Data Cache Skew-Tolerant Circuit Design David Harris Page 48 of 50

Simulation Results Adders were simulated in 0.6µ HP14 technology Time (FO4 inverter delays) 15.0 13.0 18.6 16.6 No Skew 1 FO4 local skew 11.9 11.9 Traditional Latency Traditional Cycle Time Skew-tolerant Latency Skew-tolerant Cycle Time Skew-tolerant domino 25% + faster Skew-Tolerant Circuit Design David Harris Page 49 of 50

Skew-Tolerant Domino Conclusions Domino is very attractive for high speed ICs Traditional domino limited by sequencing overhead Latch delay Clock skew Inability to balance logic through time borrowing Skew-tolerant domino eliminates overhead Use overlapping clocks and remove latches Overhead at static / domino interface motivates building critical loops entirely from domino Domino design issues Four-phase skew-tolerant domino is a reasonable choice Timing types specify legal connections among static and dynamic gates Latchless pipelines can still be fully scanned for testability Simple clock generators Now in widespread use among top design teams DEC Alpha ALUs, Intel OTB Domino, IBM Delayed-Reset Domino, Sun Delayed Clocking, etc. Skew-Tolerant Circuit Design David Harris Page 50 of 50