CSE241 VLSI Digital Circuits Winter Lecture 07: Timing II

Similar documents
Lecture 9: Clocking, Clock Skew, Clock Jitter, Clock Distribution and some FM

EE115C Winter 2017 Digital Electronic Circuits. Lecture 19: Timing Analysis

The Linear-Feedback Shift Register

CMPEN 411 VLSI Digital Circuits Spring 2012 Lecture 17: Dynamic Sequential Circuits And Timing Issues

Issues on Timing and Clocking

Clock signal in digital circuit is responsible for synchronizing the transfer to the data between processing elements.

Problem Set 9 Solutions

Xarxes de distribució del senyal de. interferència electromagnètica, consum, soroll de conmutació.

Lecture 23. Dealing with Interconnect. Impact of Interconnect Parasitics

Lecture 21: Packaging, Power, & Clock

UNIVERSITY OF CALIFORNIA, BERKELEY College of Engineering Department of Electrical Engineering and Computer Sciences

Lecture 25. Dealing with Interconnect and Timing. Digital Integrated Circuits Interconnect

Timing Issues. Digital Integrated Circuits A Design Perspective. Jan M. Rabaey Anantha Chandrakasan Borivoje Nikolić. January 2003

Design for Variability and Signoff Tips

GMU, ECE 680 Physical VLSI Design 1

Hold Time Illustrations

Digital VLSI Design. Lecture 8: Clock Tree Synthesis

EECS 427 Lecture 14: Timing Readings: EECS 427 F09 Lecture Reminders

Implementation of Clock Network Based on Clock Mesh

EEC 216 Lecture #3: Power Estimation, Interconnect, & Architecture. Rajeevan Amirtharajah University of California, Davis

MODULE 5 Chapter 7. Clocked Storage Elements

EE371 - Advanced VLSI Circuit Design

Itanium TM Processor Clock Design

Jin-Fu Li Advanced Reliable Systems (ARES) Lab. Department of Electrical Engineering. Jungli, Taiwan

Skew-Tolerant Circuit Design

EE241 - Spring 2006 Advanced Digital Integrated Circuits

Clock Strategy. VLSI System Design NCKUEE-KJLEE

Reducing Delay Uncertainty in Deeply Scaled Integrated Circuits Using Interdependent Timing Constraints

Design for Manufacturability and Power Estimation. Physical issues verification (DSM)

ESE 570: Digital Integrated Circuits and VLSI Fundamentals

! Crosstalk. ! Repeaters in Wiring. ! Transmission Lines. " Where transmission lines arise? " Lossless Transmission Line.

Managing Physical Design Issues in ASIC Toolflows Complex Digital Systems Christopher Batten February 21, 2006

L15: Custom and ASIC VLSI Integration

Lecture 27: Latches. Final presentations May 8, 1-5pm, BWRC Final reports due May 7 Final exam, Monday, May :30pm, 241 Cory

9/18/2008 GMU, ECE 680 Physical VLSI Design

Very Large Scale Integration (VLSI)

Interconnect s Role in Deep Submicron. Second class to first class

Physical Design of Digital Integrated Circuits (EN0291 S40) Sherief Reda Division of Engineering, Brown University Fall 2006

EE115C Winter 2017 Digital Electronic Circuits. Lecture 6: Power Consumption

Lecture 9: Sequential Logic Circuits. Reading: CH 7

A Random Walk from Async to Sync. Paul Cunningham & Steev Wilcox

ESE 570: Digital Integrated Circuits and VLSI Fundamentals

EE382 Processor Design Winter 1999 Chapter 2 Lectures Clocking and Pipelining

Clocking Issues: Distribution, Energy

Variation-aware Clock Network Design Methodology for Ultra-Low Voltage (ULV) Circuits

Logic Synthesis and Verification

Efficient Circuit Analysis under Multiple Input Switching (MIS) Anupama R. Subramaniam

Lecture 1: Circuits & Layout

Lecture 5: DC & Transient Response

Spiral 2 7. Capacitance, Delay and Sizing. Mark Redekopp

Designing Sequential Logic Circuits

Announcements. EE141- Fall 2002 Lecture 25. Interconnect Effects I/O, Power Distribution

Timing Analysis with Clock Skew

EECS 151/251A Spring 2018 Digital Design and Integrated Circuits. Instructors: Nick Weaver & John Wawrzynek. Lecture 10 EE141

Integrated Circuits & Systems

Interconnect (2) Buffering Techniques.Transmission Lines. Lecture Fall 2003

TAU 2014 Contest Pessimism Removal of Timing Analysis v1.6 December 11 th,

Hw 6 due Thursday, Nov 3, 5pm No lab this week

ESE 570: Digital Integrated Circuits and VLSI Fundamentals

Lecture 4: DC & Transient Response

NTE74HC109 Integrated Circuit TTL High Speed CMOS, Dual J K Positive Edge Triggered Flip Flop w/set & Reset

Session 8C-5: Inductive Issues in Power Grids and Packages. Controlling Inductive Cross-talk and Power in Off-chip Buses using CODECs

Minimizing Clock Latency Range in Robust Clock Tree Synthesis

Energy Delay Optimization

Lecture 6: DC & Transient Response

Interconnects. Wire Resistance Wire Capacitance Wire RC Delay Crosstalk Wire Engineering Repeaters. ECE 261 James Morizio 1

NTE74HC173 Integrated Circuit TTL High Speed CMOS, 4 Bit D Type Flip Flop with 3 State Outputs

Luis Manuel Santana Gallego 31 Investigation and simulation of the clock skew in modern integrated circuits

Skew Management of NBTI Impacted Gated Clock Trees

MM74C150 MM82C19 16-Line to 1-Line Multiplexer 3-STATE 16-Line to 1-Line Multiplexer

A Mathematical Solution to. by Utilizing Soft Edge Flip Flops

Chapter 5 CMOS Logic Gate Design

Lecture 5. MOS Inverter: Switching Characteristics and Interconnection Effects

Making Fast Buffer Insertion Even Faster via Approximation Techniques

Memory, Latches, & Registers

TAU 2015 Contest Incremental Timing Analysis and Incremental Common Path Pessimism Removal (CPPR) Contest Education. v1.9 January 19 th, 2015

ECE 3060 VLSI and Advanced Digital Design. Testing

VLSI Design I. Defect Mechanisms and Fault Models

Next, we check the race condition to see if the circuit will work properly. Note that the minimum logic delay is a single sum.

EECS 427 Lecture 15: Timing, Latches, and Registers Reading: Chapter 7. EECS 427 F09 Lecture Reminders

INTEGRATED CIRCUITS. For a complete data sheet, please also download:

Timing Analysis in Presence of Supply Voltage and Temperature Variations

Topics to be Covered. capacitance inductance transmission lines

Digital Integrated Circuits. The Wire * Fuyuzhuo. *Thanks for Dr.Guoyong.SHI for his slides contributed for the talk. Digital IC.

EE241 - Spring 2007 Advanced Digital Integrated Circuits. Announcements

EECS 427 Lecture 8: Adders Readings: EECS 427 F09 Lecture 8 1. Reminders. HW3 project initial proposal: due Wednesday 10/7

The Wire. Digital Integrated Circuits A Design Perspective. Jan M. Rabaey Anantha Chandrakasan Borivoje Nikolic. July 30, 2002

CMPEN 411. Spring Lecture 18: Static Sequential Circuits

On Potential Design Impacts of Electromigration Awareness

Lecture 16: Circuit Pitfalls

EE141Microelettronica. CMOS Logic

PARADE: PARAmetric Delay Evaluation Under Process Variation *

Steiner Trees in Chip Design. Jens Vygen. Hangzhou, March 2009

Lecture 2: CMOS technology. Energy-aware computing

EEC 118 Lecture #6: CMOS Logic. Rajeevan Amirtharajah University of California, Davis Jeff Parkhurst Intel Corporation

ECE 407 Computer Aided Design for Electronic Systems. Simulation. Instructor: Maria K. Michael. Overview

UNIVERSITY OF CALIFORNIA

Integrated Circuits & Systems

CMPEN 411 VLSI Digital Circuits Spring 2012

EE5780 Advanced VLSI CAD

Transcription:

CSE241 VLSI Digital Circuits Winter 2003 Lecture 07: Timing II CSE241 L3 ASICs.1

Delay Calculation Cell Fall Cap\Tr 0.05 0.2 0.5 0.01 0.02 0.16 0.30 0.5 2.0 0.04 0.32 0.178 0.08 0.64 0.60 1.20 0.1ns 0.147ns Cell Rise Cap\Tr 0.05 0.2 0.01 0.03 0.18 0.5 0.06 0.36 2.0 0.261 0.09 0.72 Fall Transition Cap\Tr 0.05 0.2 0.01 0.01 0.09 0.5 0.03 0.27 2.0 0.147 0.06 0.54 0.5 0.33 0.66 1.32 0.5 0.15 0.45 0.90 0.12ns Fall delay = 0.178ns Rise delay = 0.261ns Fall transition = 0.147ns Rise transition = 1.0pf CSE241 L3 ASICs.2

PVT (Process, Voltage, Temperature) Derating Actual cell delay = Original delay x K PVT CSE241 L3 ASICs.3

PVT Derating: Example + Min/Typ/Max Triples Proc_var (0.5:1.0:1.3) Voltage (5.5:5.0:4.5) Temperature (0:20:50) K P = 0.80 : 1.00 : 1.30 K V = 0.93 : 1.00 : 1.08 K T = 0.80 : 1.07 : 1.35 K PVT = 0.60 : 1.07 : 1.90 Cell delay = 0.261ns Derated delay = 0.157 : 0.279 : 0.496 {min : typical : max} CSE241 L3 ASICs.4

Conservatism of Gate Delay Modeling True gate delay depends on input arrival time patterns STA will assume that only 1 input is switching Will use worst slope among several inputs Vdd A A B t F pd B D F C L Time Vdd A t pd F Time CSE241 L3 ASICs.5

This Class + Logistics Reading Smith, Chapters 15, 16 http://vlsicad.ucsd.edu/presentations/iccad00tutorial/ Possibly: Sarrafzadeh/Wong Chapters 2 - placement, 3 - routing, (4 performance modeling) Schedule - MT will be take-home (and, easy), BUT you lose 5% if you don t show up on Thursday (attendance will be taken by Ben) - Thursday: Surprise guest lecturer on floorplan / placement HW #12: Suppose that you want to work on timing edges that are most critical according to some F(slack of the edge, #paths through the edge). How would you modify the STA calculation (longest path in a DAG) so that it also calculates the number of paths through each edge? Slide courtesy of S. P. Levitan, U. Pittsburg CSE241 L3 ASICs.6

Buffer Clustering Hierarchical clustering starting from clock sinks = leaves of clustering tree Fanout at each level between 5 and 200 (depends on buffer library) Often specify a clock topology in the tool as, e.g., (1)-6-8-5 root has 6 children, each of which has 8 children, each of which has 5 (leaf) children 240 clock sinks Big question: how to perform the hierarchical buffer clustering? What makes a good cluster? Sylvester CSE241 / Shepard, L3 ASICs.7 2001

Buffer Clustering by Space Partitioning Example: Cadence CT-Gen Pick fanout (e.g., 6-4) Pick long axis of bounding box of sinks Place buffers at medians (essentially) of chunks of sinks identified by spacepartitioning Why is this good? E.g., conservative (in what sense?), easy to predict Why is it bad? E.g., wastes a lot of resources Sylvester CSE241 / Shepard, L3 ASICs.8 2001

Buffer Clustering by Traditional Clustering Example: SPC, old Cell3 CTS Pick fanout (e.g., 6) Find clusters of size 6 Place buffers at centers or centroids or of clusters Recurse Why is this good? E.g., uses less wire Why is this bad? E.g., hard to predict the results, very brittle under ECOs, HW #13: Propose a hierarchical clustering strategy for buffered clock trees, and explain its pros and cons Sylvester CSE241 / Shepard, L3 ASICs.9 2001

Outline Clocking Storage elements Clocking metrics and methodology Clock distribution Package and useful-skew degrees of freedom Clock power issues Gate timing models CSE241 L3 ASICs.17

Skew Reduction Using Package Most clock network latency occurs at global level (largest distances spanned) Latency Skew With reverse scaling, routing low-rc signals at global level becomes more difficult & areaconsuming Sylvester CSE241 / Shepard, L3 ASICs.18 2001

Skew Reduction Using Package µp/asic System clock Solder bump substrate Incorporate global clock distribution into the package Flip-chip packaging allows for high density, low parasitic access from substrate to IC RC of package-level wiring up to 4 orders of magnitude smaller than on-chip wiring Global skew reduced Lower capacitance lower power Opens up global routing tracks Results not yet conclusive Sylvester CSE241 / Shepard, L3 ASICs.19 2001

Useful Skew (= cycle-stealing) Zero skew Useful skew FF fast FF slow FF FF fast FF slow FF Timing Slacks hold setup hold setup hold setup hold setup Zero skew Global skew constraint All skew is bad Useful skew Local skew constraints Shift slack to critical paths W. Dai, CSE241 UC Santa L3 ASICs.20 Cruz

Skew = Local Constraint Timing is correct as long as the signal arrives in the permissible skew range FF D : longest path d : shortest path FF -d + t hold < Skew < T period -D-t setup race condition safe permissible range cycle time violation W. Dai, CSE241 UC Santa L3 ASICs.21 Cruz

Skew Scheduling for Design Robustness Design will be more robust if clock signal arrival time is in the middle of permissible skew range, rather than on edge Can solve a linear program to maximize robustness = determine prescribed sink skews FF FF FF 2 ns 6 ns T = 6 ns 4 0 4 0 0 0 0 : at verge of violation 2 0 2 : more safety margin 2-2 W. Dai, CSE241 UC Santa L3 ASICs.22 Cruz

Potential Advantages of Useful Skew Reduce peak current consumption by distributing the FF switch point in the range of permissible skew CLK CLK 0-skew U-skew Affords extra margin to increase clock frequency or reduce sizing (= power) W. Dai, CSE241 UC Santa L3 ASICs.23 Cruz

Conventional Zero-Skew Flow Synthesis Placement 0-Skew Clock Synthesis Clock Routing Signal Routing Extraction & Delay Calculation Static Timing Analysis W. Dai, CSE241 UC Santa L3 ASICs.24 Cruz

Useful-Skew Flow Existing Placement U-Skew Clock Synthesis Permissible range generation Initial skew scheduling Clock tree topology synthesis Clock net routing Clock Routing Clock timing verification Signal Routing Extraction & Delay Calculation Static Timing Analysis W. Dai, CSE241 UC Santa L3 ASICs.25 Cruz

Outline Clocking Storage elements Clocking metrics and methodology Clock distribution Package and used-skew degrees of freedom Clock power issues Gate timing models CSE241 L3 ASICs.26

Clock Power Power consumption in clocks due to: Clock drivers Long interconnections Large clock loads all clocked elements (latches, FF s) are driven Different components dominate Depending on type of clock network used Ex. Grid huge pre-drivers & wire cap. drown out load cap. Sylvester CSE241 / Shepard, L3 ASICs.27 2001

Clock Power Is LARGE P = α C V dd2 f Not only is the clock capacitance large, it switches every cycle! Sylvester CSE241 / Shepard, L3 ASICs.28 2001

Low-Power Clocking Gated clocks Prevent switching in areas of chip not being used Easier in static designs Edge-triggered flops in ARM rather than transparent latches in Alpha Reduced load on clock for each latch/flop Eliminated spurious power-consuming transitions during latch flow- through (transparency) Sylvester CSE241 / Shepard, L3 ASICs.29 2001

Clock Area Clock networks consume silicon area (clock drivers, PLL, etc.) and routing area Routing area is most vital Top-level metals are used to reduce RC delays These levels are precious resources (unscaled) Power routing, clock routing, key global signals Reducing area also reduces wiring capacitance and power Typical # s: Intel Itanium 4% of M4/5 used in clock routing Sylvester CSE241 / Shepard, L3 ASICs.30 2001

Clock Slew Rates To maintain signal integrity and latch performance, minimum slew rates are required Too slow clock is more susceptible to noise, latches are slowed down, setup times eat into timing budget [T setup = 200 + 0.33 * T slew (ps)], more short-circuit power for large clock drivers Too fast burns too much power, overdesigned network, enhanced ground bounce Rule-of-thumb: T rise and T fall of clock are each between 10-20% of clock period (10% - aggressive target) 1 GHz clock; T rise = T fall = 100-200ps Sylvester CSE241 / Shepard, L3 ASICs.31 2001

Example: Alpha 21264 Grid + H-tree approach Power = 32% of total Wire usage = 3% of metals 3 & 4 4 major clock quadrants, each with a large driver connected to local grid structures Sylvester CSE241 / Shepard, L3 ASICs.32 2001

Alpha 21264 Skew Map Ref: Compaq, ASP-DAC00 Sylvester CSE241 / Shepard, L3 ASICs.33 2001

Power vs. Skew Fundamental design decision Meeting skew requirements is easy with unlimited power budget Wide wires reduce RC product but increase total C Driver upsizing reduces latency ( reduces skew as well) but increases buffer cap SOC context: plastic package power limit is 2-3 W Sylvester CSE241 / Shepard, L3 ASICs.34 2001

Clock Distribution Trends Timing Clock period dropping fast, skew must follow Slew rates must also scale with cycle time Jitter PLL s get better with CMOS scaling but other sources of noise increase - Power supply noise more important - Switching-dependent temperature gradients Materials Cu reduces RC slew degradation, potential skew Low-k decreases power, improves latency, skew, slews Power Complexity, dynamic logic, pipelining more clock sinks Larger chips bigger clock networks Sylvester CSE241 / Shepard, L3 ASICs.35 2001