Clock signal in digital circuit is responsible for synchronizing the transfer to the data between processing elements.

Similar documents
CMPEN 411 VLSI Digital Circuits Spring 2012 Lecture 17: Dynamic Sequential Circuits And Timing Issues

EE115C Winter 2017 Digital Electronic Circuits. Lecture 19: Timing Analysis

Xarxes de distribució del senyal de. interferència electromagnètica, consum, soroll de conmutació.

The Linear-Feedback Shift Register

Lecture 9: Clocking, Clock Skew, Clock Jitter, Clock Distribution and some FM

CMPEN 411. Spring Lecture 18: Static Sequential Circuits

Lecture 25. Dealing with Interconnect and Timing. Digital Integrated Circuits Interconnect

Lecture 23. Dealing with Interconnect. Impact of Interconnect Parasitics

Timing Issues. Digital Integrated Circuits A Design Perspective. Jan M. Rabaey Anantha Chandrakasan Borivoje Nikolić. January 2003

ΗΜΥ 307 ΨΗΦΙΑΚΑ ΟΛΟΚΛΗΡΩΜΕΝΑ ΚΥΚΛΩΜΑΤΑ Εαρινό Εξάμηνο 2018

GMU, ECE 680 Physical VLSI Design 1

Lecture 9: Sequential Logic Circuits. Reading: CH 7

Topics. Dynamic CMOS Sequential Design Memory and Control. John A. Chandy Dept. of Electrical and Computer Engineering University of Connecticut

MODULE 5 Chapter 7. Clocked Storage Elements

Integrated Circuits & Systems

Luis Manuel Santana Gallego 31 Investigation and simulation of the clock skew in modern integrated circuits

Digital Integrated Circuits A Design Perspective

ESE 570: Digital Integrated Circuits and VLSI Fundamentals

! Charge Leakage/Charge Sharing. " Domino Logic Design Considerations. ! Logic Comparisons. ! Memory. " Classification. " ROM Memories.

CSE241 VLSI Digital Circuits Winter Lecture 07: Timing II

CMOS Inverter. Performance Scaling

ESE 570: Digital Integrated Circuits and VLSI Fundamentals

Issues on Timing and Clocking

GMU, ECE 680 Physical VLSI Design

Lecture 21: Packaging, Power, & Clock

Clock Strategy. VLSI System Design NCKUEE-KJLEE

Chapter 13. Clocked Circuits SEQUENTIAL VS. COMBINATIONAL CMOS TG LATCHES, FLIP FLOPS. Baker Ch. 13 Clocked Circuits. Introduction to VLSI

CARNEGIE MELLON UNIVERSITY DEPARTMENT OF ELECTRICAL AND COMPUTER ENGINEERING DIGITAL INTEGRATED CIRCUITS FALL 2002

Clocking Issues: Distribution, Energy

9/18/2008 GMU, ECE 680 Physical VLSI Design

Digital Integrated Circuits A Design Perspective

EEC 216 Lecture #3: Power Estimation, Interconnect, & Architecture. Rajeevan Amirtharajah University of California, Davis

Digital Integrated Circuits A Design Perspective. Semiconductor. Memories. Memories

Digital Integrated Circuits A Design Perspective

Jan M. Rabaey Anantha Chandrakasan Borivoje Nikolic. November Digital Integrated Circuits 2nd Sequential Circuits

Topics. CMOS Design Multi-input delay analysis. John A. Chandy Dept. of Electrical and Computer Engineering University of Connecticut

Semiconductor Memories

EE141- Spring 2007 Digital Integrated Circuits

ESE 570: Digital Integrated Circuits and VLSI Fundamentals

Very Large Scale Integration (VLSI)

Memory, Latches, & Registers

Lecture 27: Latches. Final presentations May 8, 1-5pm, BWRC Final reports due May 7 Final exam, Monday, May :30pm, 241 Cory

VLSI Design, Fall Logical Effort. Jacob Abraham

Digital Integrated Circuits A Design Perspective

Y. Tsiatouhas. VLSI Systems and Computer Architecture Lab

Name: Answers. Mean: 83, Standard Deviation: 12 Q1 Q2 Q3 Q4 Q5 Q6 Total. ESE370 Fall 2015

UMBC. At the system level, DFT includes boundary scan and analog test bus. The DFT techniques discussed focus on improving testability of SAFs.

5. Sequential Logic x Computation Structures Part 1 Digital Circuits. Copyright 2015 MIT EECS

Chapter Overview. Memory Classification. Memory Architectures. The Memory Core. Periphery. Reliability. Memory

ESE 570: Digital Integrated Circuits and VLSI Fundamentals

EE141Microelettronica. CMOS Logic

Designing Sequential Logic Circuits

ECEN 248: INTRODUCTION TO DIGITAL SYSTEMS DESIGN. Week 9 Dr. Srinivas Shakkottai Dept. of Electrical and Computer Engineering

! Crosstalk. ! Repeaters in Wiring. ! Transmission Lines. " Where transmission lines arise? " Lossless Transmission Line.

9/18/2008 GMU, ECE 680 Physical VLSI Design

EE 447 VLSI Design. Lecture 5: Logical Effort

Sequential vs. Combinational

L4: Sequential Building Blocks (Flip-flops, Latches and Registers)

LOGIC CIRCUITS. Basic Experiment and Design of Electronics. Ho Kyung Kim, Ph.D.

Lecture 6: Logical Effort

Logical Effort: Designing for Speed on the Back of an Envelope David Harris Harvey Mudd College Claremont, CA

Topic 8: Sequential Circuits

ESE 570: Digital Integrated Circuits and VLSI Fundamentals

EE141-Fall 2011 Digital Integrated Circuits

EE241 - Spring 2006 Advanced Digital Integrated Circuits

Lecture 8: Combinational Circuit Design

Chapter 5 CMOS Logic Gate Design

Fault Modeling. 李昆忠 Kuen-Jong Lee. Dept. of Electrical Engineering National Cheng-Kung University Tainan, Taiwan. VLSI Testing Class

Chapter 7. Sequential Circuits Registers, Counters, RAM

Semiconductor memories

Itanium TM Processor Clock Design

EECS 427 Lecture 14: Timing Readings: EECS 427 F09 Lecture Reminders

Hold Time Illustrations

Implementation of Clock Network Based on Clock Mesh

MOSIS REPORT. Spring MOSIS Report 1. MOSIS Report 2. MOSIS Report 3

ECE429 Introduction to VLSI Design

MODULE III PHYSICAL DESIGN ISSUES

EE241 - Spring 2007 Advanced Digital Integrated Circuits. Announcements

Next, we check the race condition to see if the circuit will work properly. Note that the minimum logic delay is a single sum.

LOGIC CIRCUITS. Basic Experiment and Design of Electronics

Static CMOS Circuits. Example 1

EECS 151/251A Homework 5

NTE74HC299 Integrated Circuit TTL High Speed CMOS, 8 Bit Universal Shift Register with 3 State Output

UNIVERSITY OF CALIFORNIA, BERKELEY College of Engineering Department of Electrical Engineering and Computer Sciences

Managing Physical Design Issues in ASIC Toolflows Complex Digital Systems Christopher Batten February 21, 2006

Digital System Clocking: High-Performance and Low-Power Aspects. Vojin G. Oklobdzija, Vladimir M. Stojanovic, Dejan M. Markovic, Nikola M.

CMOS Digital Integrated Circuits Lec 13 Semiconductor Memories

ALU A functional unit

Problems in VLSI design

C.K. Ken Yang UCLA Courtesy of MAH EE 215B

L4: Sequential Building Blocks (Flip-flops, Latches and Registers)

Physical Design of Digital Integrated Circuits (EN0291 S40) Sherief Reda Division of Engineering, Brown University Fall 2006

EE382 Processor Design Winter 1999 Chapter 2 Lectures Clocking and Pipelining

CMOS logic gates. João Canas Ferreira. March University of Porto Faculty of Engineering

Spiral 2 7. Capacitance, Delay and Sizing. Mark Redekopp

ALU, Latches and Flip-Flops

Adders, subtractors comparators, multipliers and other ALU elements

Introduction to Computer Engineering. CS/ECE 252, Fall 2012 Prof. Guri Sohi Computer Sciences Department University of Wisconsin Madison

Skew-Tolerant Circuit Design

ESE570 Spring University of Pennsylvania Department of Electrical and System Engineering Digital Integrated Cicruits AND VLSI Fundamentals

Transcription:

1

2

Introduction Clock signal in digital circuit is responsible for synchronizing the transfer to the data between processing elements. Defines the precise instants when the circuit is allowed to change the state. Clock signal should appear to all the processing elements at the same instance. 3

Review: Sequential Definitions Use two level sensitive latches of opposite type to build one master-slave flipflop that changes state on a clock edge (when the slave is transparent) Static storage static uses a bistable element with feedback to store its state and thus preserves state as long as the power is on Loading new data into the element: 1) cutting the feedback path (mux based); 2) overpowering the feedback path (SRAM based) Dynamic storage dynamic stores state on parasitic capacitors so the state held for only a period of time (milliseconds); requires periodic refresh dynamic is usually simpler (fewer transistors), higher speed, lower power but due to noise immunity issues always modify the circuit so that it is pseudostatic 4

Timing Classifications Synchronous systems All memory elements in the system are simultaneously updated using a globally distributed periodic synchronization signal (i.e., a global clock signal) Functionality is ensure by strict constraints on the clock signal generation and distribution to minimize Clock skew (spatial variations in clock edges) Clock jitter (temporal variations in clock edges) Asynchronous systems Self-timed (controlled) systems No need for a globally distributed clock, but have asynchronous circuit overheads (handshaking logic, etc.) Hybrid systems Synchronization between different clock domains Interfacing between asynchronous and synchronous domains 5

Clock Definition and Parameters The Clock is a periodic synchronization signal used as a time reference for data transfer in synchronous digital system. Skew Spatial variation of clock signal as distributed through the chip Clock Jitter Temporal variation of the clock with respect to reference edge Duty cycle variation 50/50 design target 6

Review: Synchronous Timing Basics R1 R2 In D Q Combinational logic D Q clk t clk1 t c-q, t su, t hold, t cdreg t plogic, t cdlogic t clk2 7

Review: Synchronous Timing Basics Under ideal conditions (i.e., when tclk1 = tclk2) T tc-q + tplogic + tsu thold tcdlogic + tcdreg Under real conditions, the clock signal can have both spatial (clock skew) and temporal (clock jitter) variations skew is constant from cycle to cycle (by definition); skew can be positive (clock and data flowing in the same direction) or negative (clock and data flowing in opposite directions) jitter causes T to change on a cycle-by-cycle basis 8

9

10

11

Sources of Clock Skew and Jitter in Clock Network 4 power supply 1 clock generation 3 interconnect 6 capacitive load PLL 2 clock drivers 7 capacitive coupling Skew manufacturing device variations in clock drivers interconnect variations environmental variations (power supply and temperature) 5 temperature Jitter clock generation capacitive loading and coupling environmental variations (power supply and temperature) 12

13

Clock-special signal Clock signals are often regarded as simple control signals; however, these signals have some very special characteristics and attributes. loaded with the greatest fanout, ravel over the longest distances, and operate at the highest speeds of any signal, either control or data, within the entire system. 14

Integral part of system design Tradeoffs --- system speed, physical die area, and power dissipation are greatly affected by the clock distribution network. The design methodology and structural topology of the clock distribution network should be considered in the development of a system for distributing the clock signals. 15

Requirements Clock waveforms must be particularly clean and sharp. No skew 16

Difficulty The requirement of distributing a tightly controlled clock signal to each synchronous register on a large hierarchically structured integrated circuit within specific temporal bounds is difficult. 17

Again: Synchronous Timing Basics R1 R2 In D Q Combinational logic D Q clk t clk1 t c-q, t su, t hold, t cdreg t plogic, t cdlogic t clk2 18

Review: Synchronous Timing Basics Under ideal conditions (i.e., when tclk1 = tclk2) T tc-q + tplogic + tsu thold tcdlogic + tcdreg Under real conditions, the clock signal can have both spatial (clock skew) and temporal (clock jitter) variations skew is constant from cycle to cycle (by definition); skew can be positive (clock and data flowing in the same direction) or negative (clock and data flowing in opposite directions) jitter causes T to change on a cycle-by-cycle basis 19

Sources of Clock Skew and Jitter in Clock Network 4 power supply 1 clock generation 3 interconnect 6 capacitive load PLL 2 clock drivers 7 capacitive coupling Skew manufacturing device variations in clock drivers interconnect variations environmental variations (power supply and temperature) 5 temperature Jitter clock generation capacitive loading and coupling environmental variations (power supply and temperature) 20

Positive Clock Skew Clock and data flow in the same direction In clk D R1 1 Q t clk1 δ > 0 2 Combinational logic T T + δ delay 3 4 R2 D Q t clk2 T : δ + t hold t hold : 21

Positive Clock Skew Clock and data flow in the same direction In clk D R1 1 Q t clk1 δ > 0 Combinational logic T T + δ delay 3 R2 D Q t clk2 2 4 T : t hold : δ + t hold T + δ t c-q + t plogic + t su so T t c-q + t plogic + t su - δ t hold + δ t cdlogic + t cdreg so t hold t cdlogic + t cdreg - δ 22

Positive Clock Skew δ > 0: Improves performance, but makes t hold harder to meet. If t hold is not met (race conditions), the circuit malfunctions independent of the clock period! 23

Negative Clock Skew Clock and data flow in opposite directions In D R1 Q t clk1 1 Combinational logic T T + δ delay 3 R2 D Q t clk2 clk 2 δ < 0 4 T : t hold : 24

Negative Clock Skew Clock and data flow in opposite directions In D R1 Q t clk1 1 Combinational logic T T + δ delay 3 R2 D Q t clk2 clk 2 δ < 0 4 T T : + δ t c-q + t plogic + t su so T t c-q + t plogic + t su - δ t hold t : hold + δ t cdlogic + t cdreg so t hold t cdlogic + t cdreg - δ 25

Negative Clock Skew δ < 0: Degrades performance, but thold is easier to meet (eliminating race conditions) 26

Clock Jitter Jitter causes T to vary on a cycleby-cycle basis In clk R1 t clk T Combinational logic -t jitter +t jitter T : 27

Clock Jitter Jitter causes T to vary on a cycle-by-cycle basis In clk R1 t clk T Combinational logic -t jitter +t jitter T T : - 2t jitter t c-q + t plogic + t su so T t c-q + t plogic + t su + 2t jitter Jitter directly reduces the performance of a sequential circuit 28

Combined Impact of Skew and Jitter Constraints on the minimum clock period (δ > 0) In D 1 R1 Q t clk1 δ > 0 -t jitter Combinational logic T T + δ 6 12 D R2 Q t clk2 T t c-q + t plogic + t su - δ + 2t jitter t hold t cdlogic + t cdreg δ 2t jitter δ > 0 with jitter: Degrades performance, and makes t hold even harder to meet. (The acceptable skew is reduced by jitter.) 29

30

Technology scaling Technology scaling, in that long global interconnect lines become much more highly resistive as line dimensions are decreased. This increased line resistance is one of the primary reasons for the growing importance of clock distribution on synchronous performance. 31

Clock distribution strategies (only relative phase between two clocking element is important) 32

Achieve Zero skew routing Route clock to destinations such that clock edges appear at the same time 33

Clock tree Single driver---if the interconnect resistance of the buffer at the clock source is small as compared to the buffer output resistance, maintaining high-quality waveform shapes (i.e., short transition times) Use elmore formula to compute delay Balance delay paths Drawback---large delay, drive capability should be high 34

Terminology The unique clock source is frequently described as the root of the tree, the initial portion of the tree as the trunk, individual paths driving each register as the branches, and the registers being driven as the leaves 35

Buffered clock Tree interconnect resistance large The most common and general approach to equi-potential clock distribution is the use of buffered trees, It leads to an asymmetric structure ALL PATHS ARE BALANCED 36

37

38

Buffered clock Tree Insert buffers either at the clock source and/or along a clock path, forming a tree structure. 39

Buffers The distributed buffers serve the double function of amplifying the clock signals degraded by the distributed interconnect impedances and isolating the local clock nets from upstream load impedances 40

DESIGN All nodes have capacitance All branches have resistance Fix the load (fan out ) of each buffer Compute no.of levels required Position the buffers optimally Guidelines- minimize delay buffer delay=segment delay 41

42

3D Skew Visualization 43

Mesh version of clk tree 44

Mesh version of clock tree Shunt paths further down the clock distribution network are placed to minimize the interconnect resistance within the clock tree. This mesh structure effectively places the branch resistances in parallel, minimizing the clock skew. 45

46

CDN properties H TREE symmetric,regular array, clk skew can be small X TREE- variant of H TREE Zero skew is achieved maintaining the distributed interconnect and buffers to be identical from the clock signal source to the clocked register of each clock path. each clock path from the clock source to a clocked register has practically the same delay. 47

Skew The primary delay difference between the clock signal paths is due to variations in process parameters that affect the interconnect impedance and, in particular, any active distributed buffer amplifiers. The amount of clock skew within an H-tree structured clock distribution network is strongly dependent upon the physical size, the control of the semiconductor process, and the degree to which active buffers are distributed within the H- 48 tree structure

Clock Distribution H-tree CLK Clock is distributed in a tree-like fashion 49

H-Tree Clock Network If the paths are perfectly balanced, clock skew is zero Clock Can insert clock gating at multiple levels in clock tree Can shut off entire subtree if all gating conditions are satisfied Idle condition Clock Gated clock 50

More realistic H-tree [Restle98] 51

Tapered H tree The conductor widths in H-tree structures are designed to progressively decrease as the signal propagates to lower levels of the hierarchy. This strategy minimizes reflections of the high-speed clock signals at the branching points. 52

H Tree---Difficulty -1 Clock routed in both the vertical and horizontal directions. For a standard twolevel metal CMOS process, this manhattan structure creates added difficulty in routing the clock lines without using either resistive interconnect or multiple high resistance vias between the two metal lines. 3 level metal process 53

Difficulty -2 Furthermore, the interconnect capacitance (and therefore the power dissipation) is much greater for the H-tree as compared with the standard clock tree since the total wire length tends to be much greater An important tradeoff between clock delay and clock skew in the design of highspeed clock distribution networks. 54

Grid Low skew achievable Lots of excess interconnect Large power dissipation 55

Clock distribution-hierarchical Distribute global reference to various parts of the chip with zero skew Local distribution of the clock while considering local load variations., permitted clk skew,. Power saving strategies are used here. 56

GCLK Gridded global clock signal (GCLK) is distributed over the entire IC in order to maintain a low-resistance reference clock signal and to distribute the power dissipated by the clock distribution network across the die area The global clock signal GCLK is the source of thousands of buffered and conditional (or gated) clock signals driving registers across the IC 57

58

Example: DEC Alpha 21164 Clock Frequency: 300 MHz - 9.3 Million Transistors Total Clock Load: 3.75 nf Power in Clock Distribution network : 20 W (out of 50) Uses Two Level Clock Distribution: Single 6-stage driver at center of chip Secondary buffers drive left and right side clock grid in Metal3 and Metal4 Total driver size: 58 cm! 59

DEC Alpha 21164 (EV5) 300 MHz clock (9.3 million transistors on a 16.5x18.1 mm die in 0.5 micron CMOS technology) single phase clock 3.75 nf total clock load Extensive use of dynamic logic 20 W (out of 50) in clock distribution network Two level clock distribution Single 6 stage driver at the center of the chip Secondary buffers drive the left and right sides of the clock grid in m3 and m4 Total equivalent driver size of 58 cm!! 60

21164 Clocking t rise = 0.35ns t cycle = 3.3ns Clock waveform final drivers pre-driver Location of clock driver on die t skew = 150ps 2 phase single wire clock, distributed globally 2 distributed driver channels Reduced RC delay/skew Improved thermal distribution 3.75nF clock load 58 cm final driver width Local inverters for latching Conditional clocks in caches to reduce power More complex race checking Device variation 61

Clock Drivers 62

EV6 (Alpha 21264) Clocking 600 MHz 0.35 micron CMOS t cycle = 1.67ns t rise = 0.35ns Global clock waveform t skew = 50ps PLL 2 Phase, with multiple conditional buffered clocks 2.8 nf clock load 40 cm final driver width Local clocks can be gated off to save power Reduced load/skew Reduced thermal issues Multiple clocks complicate race checking 63

21264 Clocking 64

PLL ps 5 10 15 20 25 30 35 40 45 50 65

Clock Skew in Alpha Processor Absolute skew smaller than 90 ps The critical instruction and execution units all see the clock within 65 ps 66

ps 5 10 15 20 25 30 35 40 45 50 EV6 Clock Results ps 300 305 310 315 320 325 330 335 340 345 GCLK Skew (at Vdd/2 Crossings) GCLK Rise Times (20% to 80% Extrapolated to 0% to 100%) 67