Timing Analysis with Clock Skew

Similar documents
Skew-Tolerant Circuit Design

EE115C Winter 2017 Digital Electronic Circuits. Lecture 19: Timing Analysis

The Linear-Feedback Shift Register

Timing Issues. Digital Integrated Circuits A Design Perspective. Jan M. Rabaey Anantha Chandrakasan Borivoje Nikolić. January 2003

Problem Set 9 Solutions

GMU, ECE 680 Physical VLSI Design 1

UNIVERSITY OF CALIFORNIA, BERKELEY College of Engineering Department of Electrical Engineering and Computer Sciences

THE UNIVERSITY OF MICHIGAN. Faster Static Timing Analysis via Bus Compression

Lecture 9: Clocking, Clock Skew, Clock Jitter, Clock Distribution and some FM

Xarxes de distribució del senyal de. interferència electromagnètica, consum, soroll de conmutació.

Digital Integrated Circuits A Design Perspective

EECS 427 Lecture 14: Timing Readings: EECS 427 F09 Lecture Reminders

Fundamentals of Computer Systems

Lecture 9: Sequential Logic Circuits. Reading: CH 7

CSE241 VLSI Digital Circuits Winter Lecture 07: Timing II

Lecture 25. Dealing with Interconnect and Timing. Digital Integrated Circuits Interconnect

THE UNIVERSITY OF MICHIGAN. Timing Analysis of Domino Logic

Chapter 7 Sequential Logic

INTEGRATED CIRCUITS. For a complete data sheet, please also download:

Issues on Timing and Clocking

Homework #4. CSE 140 Summer Session Instructor: Mohsen Imani. Only a subset of questions will be graded

ESE 570: Digital Integrated Circuits and VLSI Fundamentals

Data AND. Output AND. An Analysis of Pipeline Clocking. J. E. Smith. March 19, 1990

Statistical Clock Skew Modeling with Data Delay Variations

EE371 - Advanced VLSI Circuit Design

ALU, Latches and Flip-Flops

EE241 - Spring 2007 Advanced Digital Integrated Circuits. Announcements

Lecture 10: Sequential Networks: Timing and Retiming

Statistical Clock Skew Modeling With Data Delay Variations

Clocked Synchronous State-machine Analysis

Synchronous Sequential Logic

Clock signal in digital circuit is responsible for synchronizing the transfer to the data between processing elements.

Timing Constraints in Sequential Designs. 63 Sources: TSR, Katz, Boriello & Vahid

Designing Sequential Logic Circuits

A Random Walk from Async to Sync. Paul Cunningham & Steev Wilcox

TAU 2014 Contest Pessimism Removal of Timing Analysis v1.6 December 11 th,

Lecture 27: Latches. Final presentations May 8, 1-5pm, BWRC Final reports due May 7 Final exam, Monday, May :30pm, 241 Cory

Adders, subtractors comparators, multipliers and other ALU elements

Lecture 16: Circuit Pitfalls

Jin-Fu Li Advanced Reliable Systems (ARES) Lab. Department of Electrical Engineering. Jungli, Taiwan

iretilp : An efficient incremental algorithm for min-period retiming under general delay model

Lecture 23. Dealing with Interconnect. Impact of Interconnect Parasitics

Logical Effort: Designing for Speed on the Back of an Envelope David Harris Harvey Mudd College Claremont, CA

INTEGRATED CIRCUITS. For a complete data sheet, please also download:

For smaller NRE cost For faster time to market For smaller high-volume manufacturing cost For higher performance

Skew Management of NBTI Impacted Gated Clock Trees

Dual D Flip-Flop with Set and Reset High-Speed Silicon-Gate CMOS

ESE 570: Digital Integrated Circuits and VLSI Fundamentals

MOSIS REPORT. Spring MOSIS Report 1. MOSIS Report 2. MOSIS Report 3

Chapter 4. Sequential Logic Circuits

A Mathematical Solution to. by Utilizing Soft Edge Flip Flops


CMSC 313 Lecture 25 Registers Memory Organization DRAM

Digital Logic Design - Chapter 4

Combinatorial Logic Design Multiplexers and ALUs CS 64: Computer Organization and Design Logic Lecture #13

Topics. CMOS Design Multi-input delay analysis. John A. Chandy Dept. of Electrical and Computer Engineering University of Connecticut

EE141- Spring 2007 Digital Integrated Circuits

EE382 Processor Design Winter 1999 Chapter 2 Lectures Clocking and Pipelining

Physical Design of Digital Integrated Circuits (EN0291 S40) Sherief Reda Division of Engineering, Brown University Fall 2006

! Charge Leakage/Charge Sharing. " Domino Logic Design Considerations. ! Logic Comparisons. ! Memory. " Classification. " ROM Memories.

Digital Integrated Circuits A Design Perspective

ABSTRACT TIMING VERIFICATION AND OPTIMIZATION OF CIRCUITS WITH LEVEL-SENSITIVE LATCHES. by Timothy Michael Burks

CMPEN 411 VLSI Digital Circuits Spring 2012 Lecture 17: Dynamic Sequential Circuits And Timing Issues

GMU, ECE 680 Physical VLSI Design

Homework 2 due on Wednesday Quiz #2 on Wednesday Midterm project report due next Week (4 pages)

9/18/2008 GMU, ECE 680 Physical VLSI Design

Digital Integrated Circuits A Design Perspective

Adders, subtractors comparators, multipliers and other ALU elements

Lecture 17: Designing Sequential Systems Using Flip Flops

Jan M. Rabaey Anantha Chandrakasan Borivoje Nikolic. November Digital Integrated Circuits 2nd Sequential Circuits

Review for Final Exam

EECS 427 Lecture 15: Timing, Latches, and Registers Reading: Chapter 7. EECS 427 F09 Lecture Reminders

Methodology to Achieve Higher Tolerance to Delay Variations in Synchronous Circuits

Latch/Register Scheduling For Process Variation

Sequential Logic Circuits

INTEGRATED CIRCUITS. For a complete data sheet, please also download:

EECS 151/251A Homework 5

Chapter 7. Sequential Circuits Registers, Counters, RAM

Memory Elements I. CS31 Pascal Van Hentenryck. CS031 Lecture 6 Page 1

Lecture 14: State Tables, Diagrams, Latches, and Flip Flop

Structural Delay Testing Under Restricted Scan of Latch-based Pipelines with Time Borrowing

Lecture 8: Combinational Circuits

74F193 Up/Down Binary Counter with Separate Up/Down Clocks

CPE/EE 422/522. Chapter 1 - Review of Logic Design Fundamentals. Dr. Rhonda Kay Gaede UAH. 1.1 Combinational Logic

Lecture #4: Potpourri

UNIVERSITY OF CALIFORNIA

Topic 8: Sequential Circuits

Chapter 3. Digital Design and Computer Architecture, 2 nd Edition. David Money Harris and Sarah L. Harris. Chapter 3 <1>

Vidyalankar S.E. Sem. III [ETRX] Digital Circuits and Design Prelim Question Paper Solution

Itanium TM Processor Clock Design

Design for Variability and Signoff Tips

LOGIC CIRCUITS. Basic Experiment and Design of Electronics

INTEGRATED CIRCUITS. For a complete data sheet, please also download:

The Design Procedure. Output Equation Determination - Derive output equations from the state table

ALU A functional unit

EECS 579: Logic and Fault Simulation. Simulation

Sequential Logic Worksheet

CS470: Computer Architecture. AMD Quad Core

VLSI Design Verification and Test Simulation CMPE 646. Specification. Design(netlist) True-value Simulator

ECEN 248: INTRODUCTION TO DIGITAL SYSTEMS DESIGN. Week 9 Dr. Srinivas Shakkottai Dept. of Electrical and Computer Engineering

Transcription:

, Mark Horowitz 1, & Dean Liu 1 David_Harris@hmc.edu, {horowitz, dliu}@vlsi.stanford.edu March, 1999 Harvey Mudd College Claremont, CA 1 (with Stanford University, Stanford, CA)

Outline Introduction Timing Analysis Formulation Timing Verification Algorithm Results Conclusion

Introduction Clock skew, as a fraction of the cycle time, is a growing problem for fast chips Fewer gate delays per cycle Poor transistor length, threshold tolerances Larger clock loads Bigger dice

Introduction Clock skew, as a fraction of the cycle time, is a growing problem for fast chips Fewer gate delays per cycle Poor transistor length, threshold tolerances Larger clock loads Bigger dice The designer may: Reduce skew Very hard; clock networks are already well optimized

Introduction Clock skew, as a fraction of the cycle time, is a growing problem for fast chips Fewer gate delays per cycle Poor transistor length, threshold tolerances Larger clock loads Bigger dice The designer may: Reduce skew Very hard; clock networks are already well optimized Tolerate skew Flip-flops and traditional domino circuits reduce cycle time by skew es and skew-tolerant domino can hide modest amounts of skew

Introduction Clock skew, as a fraction of the cycle time, is a growing problem for fast chips Fewer gate delays per cycle Poor transistor length, threshold tolerances Larger clock loads Bigger dice The designer may: Reduce skew Very hard; clock networks are already well optimized Tolerate skew Flip-flops and traditional domino circuits reduce cycle time by skew es and skew-tolerant domino can hide modest amounts of skew Only budget necessary skews Skew between nearby latches is often much less than skew across die Need better timing analysis for different skews between different latches

Timing Analysis Formulation Build on Sakallah, Mudge, Olukotun (SMO) analysis of latch-based systems. System contains: k clocks C = {, φ 2,..., φ k ) l latches L = {L 1, L 2,..., L l } φ 2 φ 2 L 1 L 2 L 3

Clock Waveforms 0 T c φ 2 φ 2 L 1 L 2 L 3 T c : cycle time

Clock Waveforms 0 T c Tφ1 = T c /2 T φ2 = T c /2 φ 2 φ 2 L 1 L 2 L 3 T c T φi : cycle time : duration for which φ i is high

Clock Waveforms 0 T T φ1 = T c /2 c φ 2 s φ1 = 0 s φ2 = T c /2 T φ2 = T c /2 φ 2 L 1 L 2 L 3 T c T φi s φi : cycle time : duration for which φ i is high : start time, relative to beginning of common clock, of φ i being high

Clock Waveforms 0 T T φ1 = T c /2 c φ 2 s φ1 = 0 s φ2 = T c /2 T φ2 = T c /2 S φ1φ2 = S φ2φ1 = -T c /2 φ 2 L 1 L 2 L 3 T c T φi s φi S φi φ j : cycle time : duration for which φ i is high : start time, relative to beginning of common clock, of φ i being high : phase shift from φ i to next occurrence of φ j. Used to translate times relative to particular clock phases.

Variables p i : clock phase controlling latch i φ 2 φ 2 L 1 L 2 L 3 p 1 : 1 p 2 : 2 p 3 : 1

Variables p i : clock phase controlling latch i DCi : setup time for latch i φ 2 φ 2 L 1 L 2 L 3 p 1 : 1 p 2 : 2 p 3 : 1

Variables p i : clock phase controlling latch i DCi DQi : setup time for latch i : propagation delay through latch i φ 2 φ 2 L 1 L 2 L 3 p 1 : 1 p 2 : 2 p 3 : 1

Variables p i : clock phase controlling latch i DCi DQi Assume: : setup time for latch i : propagation delay through latch i T c DQ = 1000 ps = 80 ps φ 2 0 500 1000 φ 2 L 1 L 2 L 3 12 = 670 23 =70 p 1 : 1 p 2 : 2 p 3 : 1 ij : propagation delay through logic between latches i and j

Variables p i : clock phase controlling latch i DCi DQi Assume: : setup time for latch i : propagation delay through latch i T c DQ = 1000 ps = 80 ps φ 2 0 500 1000 φ 2 L 1 L 2 L 3 12 = 670 23 =70 p 1 : 1 p 2 : 2 p 3 : 1 A 1 : 0 ps A 2 : 250 A 3 : -100 ij A i : propagation delay through logic between latches i and j : arrival time at latch i, relative to start of p i

Variables p i : clock phase controlling latch i DCi DQi Assume: : setup time for latch i : propagation delay through latch i T c DQ = 1000 ps = 80 ps φ 2 0 500 1000 φ 2 L 1 L 2 L 3 12 = 670 23 =70 p 1 : 1 p 2 : 2 p 3 : 1 A 1 : 0 ps A 2 : 250 A 3 : -100 D 1 : 0 D 2 : 250 D 3 : 0 ij A i D i : propagation delay through logic between latches i and j : arrival time at latch i, relative to start of p i : departure time from latch i

Variables p i : clock phase controlling latch i DCi DQi Assume: : setup time for latch i : propagation delay through latch i T c DQ = 1000 ps = 80 ps φ 2 0 500 1000 φ 2 L 1 L 2 L 3 12 = 670 23 =70 p 1 : 1 p 2 : 2 p 3 : 1 A 1 : 0 ps A 2 : 250 A 3 : -100 D 1 : 0 D 2 : 250 D 3 : 0 Q 1 : 80 Q 2 : 330 Q 3 : 80 ij A i D i Q i : propagation delay through logic between latches i and j : arrival time at latch i, relative to start of : departure time from latch i : output time of latch i p i

Timing Constraints Departure: i L D i = max( 0A, i )

Timing Constraints Departure: i L D i = max( 0A, i ) Output: i L Q i = D i + DQi

Timing Constraints Departure: i L D i = max( 0A, i ) Output: i L Q i = D i + DQi Arrival: ij, L A i = max( Q j + ji + S pj p ) i

Timing Constraints Departure: i L D i = max( 0A, i ) Output: i L Q i = D i + DQi Arrival: ij, L A i = max( Q j + ji + S pj p ) i Propagation Constraints: ij, L D i = max( 0max, ( D j + DQj + ji + S pj p ) i

Timing Constraints Departure: i L D i = max( 0A, i ) Output: i L Q i = D i + DQi Arrival: ij, L A i = max( Q j + ji + S pj p ) i Propagation Constraints: ij, L D i = max( 0max, ( D j + DQj + ji + S pj p ) i Setup Constraints: i L D i + DCi T pi

Clock skew is the difference between nominal and actual interarrival times of a pair of clocks. Enlarge set of physical clocks C to model skew between nominally identical clocks. Example: φ 2a L 3 D 3 C = {a, φ 2a, b, φ 2b } local t skew within domains a 4 L 4 D 4 b 6 L 6 D 6 global t skew between domains 5 7 φ 2a L 5 D 5 φ 2b L 7 D 7 ALU (clock domain a) Data Cache (clock domain b)

Single Skew Formulation Easy and conservative to budget global skew everywhere Effectively increases setup time at each latch Setup Constraints: global i L D i + DCi + t skew T pi Too conservative for high-speed designs with big global skews

Exact Skew Budgets How much skew must be budgeted? L 3 to L 4 : local skew φ 2a L 3 D 3 4 6 a L 4 D 4 b L 6 D 6 5 7 φ 2a L 5 D 5 φ 2b L 7 D 7 ALU (clock domain a) Data Cache (clock domain b)

Exact Skew Budgets How much skew must be budgeted? L 3 to L 4 : L 7 to L 4 : local skew global skew φ 2a L 3 D 3 4 6 a L 4 D 4 b L 6 D 6 5 7 φ 2a L 5 D 5 φ 2b L 7 D 7 ALU (clock domain a) Data Cache (clock domain b)

Exact Skew Budgets How much skew must be budgeted? L 3 to L 4 : local skew L 7 to L 4 : global skew L 5 to L 4 through transparent L 6, L 7 : local skew Must track launching clock to determine skew budget D 3 φ 2a L 3 4 6 a L 4 D 4 b L 6 D 6 5 7 φ 2a L 5 D 5 φ 2b L 7 D 7 ALU (clock domain a) Data Cache (clock domain b)

Exact Skew Formulation Define arrival and departure times with respect to launching clocks: A i c D i c φ i, φ j t skew : arrival time at latch i for path launched by clock c : departure time from latch i for path launched by clock c : skew between clocks φ i, φ j

Negative Departure Times Must now allow negative departure times with respect to other clocks: Path from L 5 to L 7 is earlier than L 6 to L 7, but sees more skew, miss setup Reaches L 6 at -50 ps, but L 6 may be transparent by then because of skew Departure times w.r.t. latch s own clock still must be nonnegative φ 2a L 3 T c = 1000 ps a φ 2a 4 6 450 ps L 4 b L 6 5 7 950 ps L φ 5 2b L 7 2a D 6 1b D 6 2a D 7 1b D 7 = 50 = 0 = 400 = 450 ALU (clock domain a) Data Cache (clock domain b) 2a, 2b t skew 1b, 2b t skew = = DC = 0 DQ = 0 150 ps 25 ps

Exact Constraints with Skew: Propagation Constraints (single skew): ij, L D i = max( 0max, ( D j + DQj + ji + S pj p ) i Setup Constraints (single skew): global i L D i + DCi + t skew T pi Propagation Constraints (exact skew): ij, Lc, C if c = p i c c then D i = max( 0max, ( D j + DQj + ji + S pj p )) i c c else D i = max( D j + DQj + ji + S pj p ) i Setup Constraints (exact skew): i Lc, C D i c + + DCi cp, i t skew T pi

Other Timing Constraints Flip-flops: No transparency, easier than latches Still budget skew between launching and receiving clocks Min-delay: Only requires checks between consecutive pairs of clocked elements Standard verification algorithms work if proper skew is used

Verification Algorithm Check constraints with generalized Szymanski-Shenoy relaxation algorithm 1 For each latch i: p 2 i max max D i = 0 ; Di = 0; c i = // initialize departure times 3 Enqueue 4 While queue is not empty 5 Dequeue 6 For each latch i in fanout of j c 7 A = D j + + + // calculate arrival time c 8 If A> D i AND c max i, c // is it possibly critical? cp, 9 If ( + + i > ) // does it violate setup time? 10 Report setup time violation 11 Else D i p i D j c DQj ji S pj p i c c 12 D i = A; Enqueue D i // keep following path max max 13 If ( A> D i ) D i A; p i ( ) A t + skew A DCi t skew T pi > D i max max = c i = c

Results Analyzed MAGIC: Memory & General Interconnect Controller of FLASH supercomputer Assume local t skew global = 250ps t skew Model A: As designed, from MAGIC.sdf database Model B: Flops converted to latch pairs, logic balanced between pairs = 500ps

Results Analyzed MAGIC: Memory & General Interconnect Controller of FLASH supercomputer Assume local t skew global = 250ps t skew Model A: As designed, from MAGIC.sdf database Model B: Flops converted to latch pairs, logic balanced between pairs CPU time < 1 second in all cases = 500ps Model A # Flip-Flops 10559 0 Model B # es 1819 22937 Single Skew T c 9.43 ns 8.05 ns # Departures Checked 3866 24995 Exact Skew T c 9.38 7.96 # Departures Checked 4009 25328

Conclusions Global skews will be too large for GHz + systems Use skew-tolerant circuit techniques such as latches Take advantage of smaller local skews where possible Requires support of timing analyzer Budget appropriate skew at each receiver Track departure times with respect to launching clocks Allow negative departure times with respect to other clocks Leads to explosion in number of timing constraints. However... Most are not tight because most critical paths do not borrow time across many latches Relaxation algorithm automatically prunes loose constraints Very small increase in runtime Expect synchronous systems well beyond 1 GHz