EECS 151/251A Spring 2018 Digital Design and Integrated Circuits. Instructors: Nick Weaver & John Wawrzynek. Lecture 10 EE141

Similar documents
Very Large Scale Integration (VLSI)

EE141-Fall Digital Integrated Circuits. Announcements. Lab #2 Mon., Lab #3 Fri. Homework #3 due Thursday. Homework #4 due next Thursday

Spiral 2 7. Capacitance, Delay and Sizing. Mark Redekopp

CSE241 VLSI Digital Circuits Winter Lecture 07: Timing II

C.K. Ken Yang UCLA Courtesy of MAH EE 215B

EECS 151/251A Homework 5

Dynamic operation 20

Announcements. EE141- Spring 2003 Lecture 8. Power Inverter Chain

COMP 103. Lecture 10. Inverter Dynamics: The Quest for Performance. Section 5.4.2, What is this lecture+ about? PERFORMANCE

EE 466/586 VLSI Design. Partha Pande School of EECS Washington State University

Chapter 5 CMOS Logic Gate Design

EE241 - Spring 2000 Advanced Digital Integrated Circuits. Announcements

EEC 118 Lecture #6: CMOS Logic. Rajeevan Amirtharajah University of California, Davis Jeff Parkhurst Intel Corporation

The Inverter. Digital Integrated Circuits A Design Perspective. Jan M. Rabaey Anantha Chandrakasan Borivoje Nikolic

Lecture 9: Clocking, Clock Skew, Clock Jitter, Clock Distribution and some FM

CARNEGIE MELLON UNIVERSITY DEPARTMENT OF ELECTRICAL AND COMPUTER ENGINEERING DIGITAL INTEGRATED CIRCUITS FALL 2002

For smaller NRE cost For faster time to market For smaller high-volume manufacturing cost For higher performance

Lecture 5. MOS Inverter: Switching Characteristics and Interconnection Effects

ENEE 359a Digital VLSI Design

Lecture 7 Circuit Delay, Area and Power

CMPEN 411 VLSI Digital Circuits Spring 2012

Lecture 8: Combinational Circuit Design

Interconnect (2) Buffering Techniques. Logical Effort

EECS 141: FALL 05 MIDTERM 1

ECE 407 Computer Aided Design for Electronic Systems. Simulation. Instructor: Maria K. Michael. Overview

Issues on Timing and Clocking

EE115C Digital Electronic Circuits Homework #4

Lecture 23. Dealing with Interconnect. Impact of Interconnect Parasitics

Chapter 5. The Inverter. V1. April 10, 03 V1.1 April 25, 03 V2.1 Nov Inverter

Digital Integrated Circuits A Design Perspective

The CMOS Inverter: A First Glance

EE115C Digital Electronic Circuits Homework #5

EECS 579: Logic and Fault Simulation. Simulation

VLSI Design, Fall Logical Effort. Jacob Abraham

Digital Integrated Circuits 2nd Inverter

Logical Effort: Designing for Speed on the Back of an Envelope David Harris Harvey Mudd College Claremont, CA

UNIVERSITY OF CALIFORNIA, BERKELEY College of Engineering Department of Electrical Engineering and Computer Sciences

EE141Microelettronica. CMOS Logic

EE115C Winter 2017 Digital Electronic Circuits. Lecture 19: Timing Analysis

The CMOS Inverter: A First Glance

ESE 570: Digital Integrated Circuits and VLSI Fundamentals

Digital Integrated Circuits A Design Perspective

and V DS V GS V T (the saturation region) I DS = k 2 (V GS V T )2 (1+ V DS )

! Crosstalk. ! Repeaters in Wiring. ! Transmission Lines. " Where transmission lines arise? " Lossless Transmission Line.

EEC 116 Lecture #5: CMOS Logic. Rajeevan Amirtharajah Bevan Baas University of California, Davis Jeff Parkhurst Intel Corporation

Interconnect (2) Buffering Techniques.Transmission Lines. Lecture Fall 2003

Digital Integrated Circuits. The Wire * Fuyuzhuo. *Thanks for Dr.Guoyong.SHI for his slides contributed for the talk. Digital IC.

Integrated Circuits & Systems

EE241 - Spring 2001 Advanced Digital Integrated Circuits

ESE 570: Digital Integrated Circuits and VLSI Fundamentals

EEC 216 Lecture #3: Power Estimation, Interconnect, & Architecture. Rajeevan Amirtharajah University of California, Davis

Properties of CMOS Gates Snapshot

The Linear-Feedback Shift Register

EE382 Processor Design Winter 1999 Chapter 2 Lectures Clocking and Pipelining

Name: Answers. Mean: 83, Standard Deviation: 12 Q1 Q2 Q3 Q4 Q5 Q6 Total. ESE370 Fall 2015

EECS 151/251A Spring 2018 Digital Design and Integrated Circuits. Instructors: Nick Weaver & John Wawrzynek. Lecture 12 EE141

Lecture 21: Packaging, Power, & Clock

Digital Microelectronic Circuits ( )

Lecture 12 CMOS Delay & Transient Response

CMOS logic gates. João Canas Ferreira. March University of Porto Faculty of Engineering

ESE570 Spring University of Pennsylvania Department of Electrical and System Engineering Digital Integrated Cicruits AND VLSI Fundamentals

Introduction to CMOS VLSI Design (E158) Lecture 20: Low Power Design

Fig. 1 CMOS Transistor Circuits (a) Inverter Out = NOT In, (b) NOR-gate C = NOT (A or B)

STATIC TIMING ANALYSIS

MODULE 5 Chapter 7. Clocked Storage Elements

MOSIS REPORT. Spring MOSIS Report 1. MOSIS Report 2. MOSIS Report 3

EE5780 Advanced VLSI CAD

Next, we check the race condition to see if the circuit will work properly. Note that the minimum logic delay is a single sum.

EECS 312: Digital Integrated Circuits Midterm Exam 2 December 2010

EE 447 VLSI Design. Lecture 5: Logical Effort

EECS150 - Digital Design Lecture 17 - Combinational Logic Circuits. Limitations on Clock Rate - Review

CMPEN 411 VLSI Digital Circuits Spring 2011 Lecture 07: Pass Transistor Logic

Topics to be Covered. capacitance inductance transmission lines

Logical Effort. Sizing Transistors for Speed. Estimating Delays

Lecture 5. Logical Effort Using LE on a Decoder

Lecture 23. CMOS Logic Gates and Digital VLSI I

Managing Physical Design Issues in ASIC Toolflows Complex Digital Systems Christopher Batten February 21, 2006

Physical Design of Digital Integrated Circuits (EN0291 S40) Sherief Reda Division of Engineering, Brown University Fall 2006

9/18/2008 GMU, ECE 680 Physical VLSI Design

Homework #2 10/6/2016. C int = C g, where 1 t p = t p0 (1 + C ext / C g ) = t p0 (1 + f/ ) f = C ext /C g is the effective fanout

MASSACHUSETTS INSTITUTE OF TECHNOLOGY Department of Electrical Engineering and Computer Sciences

Clock signal in digital circuit is responsible for synchronizing the transfer to the data between processing elements.

Lecture 5: DC & Transient Response

EE141. Administrative Stuff

UNIVERSITY OF CALIFORNIA College of Engineering Department of Electrical Engineering and Computer Sciences. Professor Oldham Fall 1999

Lecture 6 Power Zhuo Feng. Z. Feng MTU EE4800 CMOS Digital IC Design & Analysis 2010

ESE570 Spring University of Pennsylvania Department of Electrical and System Engineering Digital Integrated Cicruits AND VLSI Fundamentals

Lecture 6: Logical Effort

EE115C Digital Electronic Circuits Homework #6

EE115C Winter 2017 Digital Electronic Circuits. Lecture 6: Power Consumption

5.0 CMOS Inverter. W.Kucewicz VLSICirciuit Design 1

Lecture 34: Portable Systems Technology Background Professor Randy H. Katz Computer Science 252 Fall 1995

Testability. Shaahin Hessabi. Sharif University of Technology. Adapted from the presentation prepared by book authors.

DC and Transient. Courtesy of Dr. Daehyun Dr. Dr. Shmuel and Dr.

Chapter 9. Estimating circuit speed. 9.1 Counting gate delays

EE M216A.:. Fall Lecture 5. Logical Effort. Prof. Dejan Marković

CMOS Digital Integrated Circuits Lec 13 Semiconductor Memories

EE141-Fall 2011 Digital Integrated Circuits

Announcements. EE141- Fall 2002 Lecture 7. MOS Capacitances Inverter Delay Power

EE 466/586 VLSI Design. Partha Pande School of EECS Washington State University

Transcription:

EECS 151/251A Spring 2018 Digital Design and Integrated Circuits Instructors: Nick Weaver & John Wawrzynek Lecture 10 EE141 1

What do ASIC/FPGA Designers need to know about physics? Physics effect: Area cost Delay performance Energy performance & cost Ideally, zero delay, area, and energy However, the physical devices occupy area, take time, and consume energy CMOS process lets us build transistors, wires, connections, and we get capacitors, inductors, and resistors whether or not we want them 2

Performance, Cost, Power How do we measure performance? operations/sec? cycles/sec? Performance is directly proportional to clock frequency Although it may not be the entire story: Ex: CPU performance = # instructions X CPI X clock period Spring 2018 EECS151 Page 3

Limitations on Clock Rate 1 Logic Gate Delay 2 Delays in flip-flops What are typical delay values? Both times contribute to limiting the clock period What must happen in one clock cycle for correct operation? All signals connected to FF (or memory) inputs must be ready and setup before rising edge of clock For now we assume perfect clock distribution (all flip-flops see the clock at the same time) Spring 2018 EECS151 Page 4

Example Parallel to serial converter circuit clk a T time(clk Q) + time(mux) + time(setup) T τ clk Q + τ mux + τ setup b Spring 2018 EECS151 Page 5

In General For correct operation: T τ clk Q + τ CL + τ setup for all paths How do we enumerate all paths? Any circuit input or register output to any register input or circuit output? Note: setup time for outputs is a function of what it connects to clk-to-q for circuit inputs depends on where it comes from Spring 2018 EECS151 Page 6

Gate Delay Modern CMOS gate delays on the order of a few picoseconds (However, highly dependent on gate context) Often expressed as FO4 delays (fan-out of 4) - as a process independent delay metric: the delay of an inverter, driven by an inverter 4x smaller than itself, and driving an inverter 4x larger than itself For a 90nm process FO4 is around 20ps Less than 10ps for a 32nm process 7

Path Delay For correct operation: Total Delay clock_period - FFsetup_time - FFclk_to_q on all paths High-speed processors critical paths have around 20 FO4 delays 8

FO4 Delays per clock period FO4 Delays 100 90 CPU Clock Periods 1985-2005 intel 386 80 intel 486 intel pentium 70 60 MIPS 2000 5 pipeline stages intel pentium 2 intel pentium 3 intel pentium 4 intel itanium Alpha 21064 50 Pentium Pro 10 pipeline stages Alpha 21164 Alpha 21264 Sparc Historical limit: about 12 40 30 20 10 Pentium 4 20 pipeline stages SuperSparc Sparc64 Mips HP PA Power PC AMD K6 AMD K7 AMD x86-64 0 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 00 01 02 03 04 05 Thanks to Francois Labonte, Stanford 9

CPU DB: Recording Microprocessor History With this open database, you can mine microprocessor trends over the past 40 years Andrew Danowitz, Kyle Kelley, James Mao, John P Stevenson, Mark Horowitz, Stanford University F04 Delays Per Cycle for Processor Designs 140 120 100 F04 / cycle 80 60 40 20 0 1985 1990 1995 2000 2005 2010 2015 FO4 delay per cycle is roughly proportional to the amount of computation completed per cycle 10

Gate Delay What determines the actual delay of a logic gate? Transistors are not perfect switches - cannot change terminal voltages instantaneously Consider the NAND gate: Current (I) value depends on: process parameters, transistor size CL / I CL models gate output, wire, inputs to next stage (Cap of Load) C integrates I creating a voltage change at output 11

More on transistor Current Transistors act like a cross between a resistor and current source ISAT depends on process parameters (higher for nfets than for pfets) and transistor size (layout): ISAT W/L 12

Physical Layout determines FET strength Switch-level abstraction gives a good way to understand the function of a circuit nfet (g=1? short circuit : open) pfet (g=0? short circuit : open) Understanding delay means going below the switch-level abstraction to transistor physics and layout details 13

Transistors as water valves (Cartoon physics) If electrons are water molecules, transistor strengths (W/L) are pipe diameters, and capacitors are buckets Vdd 1 A on p-fet fills up the capacitor with charge Open Charge 0 Water level Time Vdd Vdd 1 A on n-fet empties the bucket Open Out n Discharge 0 Water level 14 Time CS 250 L4: Timing UC Regents Spring 2016 UCB

The Switch Dynamic Model (Simplified) V GS S V GS V T S C S G G R on D C G D C D EE141 15

Switch Sizing What happens if we make a switch W times larger (wider) G V GS S V GS V T S C S W W G R on /W D C G W D C D W EE141 16

Switch Parasitic Model The pull-down switch (NMOS) V out V out V in R N C D V in R N W WC D C G WC G Minimum-size switch Sizing the transistor (factor W) EE141 We assume transistors of minimal length (or at least constant length) R s and C s in units of per unit width 17

Switch Parasitic Model The pull-up switch (PMOS) V in R P = 2R N V in R N V in R N W C G V out 2C G V out 2WC G V out C D 2C D 2WC D Minimum-size switch Sized for symmetry General sizing EE141 18

Inverter Parasitic Model Drain and gate capacitance of transistor are directly related by process (γ 1) V in R N W V out C D = γc G C in = 3WC G C int = 3WC D = 3WγC G R N W t p = 069 R N (3WγC G ) = 069(3γ)R N C G W Intrinsic delay of inverter independent of size EE141 19

Turning Rise/Fall Delay into Gate Delay Cascaded gates: 1 0 0 1 1 0 0 1 transfer curve for inverter 20

The Switch Inverter: Transient Response V(t) = V0 e t/rc t1/2 = ln(2) RC t phl = f(r on C L ) = 069 R n C L V in V in C in C in (a) Low-to-high (b) High-to-low EE141 21

More on CL Everything that connects to the output of a logic gate (or transistor) contributes capacitance: I Transistor drains Interconnection (wires/contacts/ vias) Transistor Gates 22

Inverter with Load Capacitance C in = 3WC G V in R N W R N W V out C int = 3WγC G C L t p = = = = R 069 W R 069 W 069(3C t inv N N G C ( γ + C ( C int ) = t + C 0 ( γ ) (3WγC G + C CL RN )( γ + ) C L in L in L + f ) ) f = fanout = ratio between load and input capacitance of gate EE141 23

Inverter Delay Model t p =t inv (γ+f) t inv technology constant Can be dropped from expression Delay unit-less variable (expressed in unit delays) Delay γ t p =γ+f f Question: how does transistor sizing (W) impact delay? EE141 24

Wire Delay Ideally, wires behave as transmission lines : signal wave-front moves close to the speed of light ~1ft/ns Time from source to destination is called the transit time In ICs most wires are short, and the transit times are relatively short compared to the clock period and can be ignored Not so on PC boards Spring 2018 EECS151 Page 25

Wires As parallel plate capacitors: C Area = width length Wires have finite resistance, so have distributed R and C: with r = res/length, c = cap/length, rcl 2 rc + 2rc +3rc + v1 v2 v3 v4 v1 v2 v3 v4 time 26

Even in those cases where the transmission line effect is negligible: Wires posses distributed resistance and capacitance v1 v2 v3 v4 Time constant associated with distributed RC is proportional to the square of the length Wire Delay For short wires on ICs, resistance is insignificant (relative to effective R of transistors), but C is important Typically around half of C of gate load is in the wires For long wires on ICs: busses, clock lines, global control signal, etc Resistance is significant, therefore distributed RC effect dominates signals are typically rebuffered to reduce delay: v1 v2 v3 v4 time Spring 2018 EECS151 Page 27

Recall: Positive edge-triggered flip-flop D Q A flip-flop samples right before the edge, and then holds value clk Sampling circuit clk Holds value clk clk clk clk clk Clock to Q delay results fr clk 28 EECS 151 UC Regents Spring 2018 UCB

Sensing: When clock is low D Q A flip-flop samples right before the edge, and then holds value clk Sampling circuit clk Holds value clk clk clk clk clk clk clk Clock to Q delay results fr clk = 0 clk = 1 clk clk clk clk clk EECS 151 Will capture new value clk Clock to Q delay on posedge results fr 29 Outputs clk last value captured UC Regents Spring 2018 UCB

Capture: When clock goes high D Q A flip-flop samples right before the edge, and then holds value clk Sampling circuit clk Holds value clk clk clk clk clk clk Clock to clk Q delay results fr clk = 1 clk = 0 clk clk clk clk clk EECS 151 Remembers value clk Clock to Q delay just captured results fr 30 Outputs value just clk captured UC Regents Spring 2018 UCB

Flip Flop delays: clk-to-q? setup? hold? clk clk D Q CLK clk clk clk clk CLK == 0 Sense D, but Q outputs old value clk Clock to Q delay results fr setup clk CLK 0->1 Capture D, pass value to Q hold? clk-to-q 31 EECS 151 UC Regents Spring 2018 UCB

Timing Analysis and Logic Delay Register: An Array of Flip-Flops Combinational Logic If our clock period T > worst-case delay 32through CL, does this ensure correct operation? EECS 151 UC Regents Spring 2018 UCB

Flip-Flop delays eat into time budget Combinational Logic ALU time budget T! # clk"q + # CL + # setup 33 EECS 151 UC Regents Spring 2018 UCB

Clock skew also eats into time budget CLKd CLK CLK CLK CLKd CLK CL As T 0, which circuit fails first? CL CLK CLK CLKd clock skew, delay in distribution EECS 151 T " T CL +T setup +T clk!q + worst case skew ost modern large high-performance chi 34 UC Regents Spring 2018 UCB

Delay Grid Tuned sector trees Delay Sector buffers x CS 250 L3: Timing Clock Tree Delays, IBM Power CPU y Buffer level 2 Buffer level 35 1 UC Regents Fall 2013 UCB

15 10 Delay Volts (V) 20 ps skew 05 00 0 500 1000 1500 2000 2500 Time (ps) Multiplefingered transmissio line x CS 250 L3: Timing Clock Tree Delays, IBM Power 36 y UC Regents Fall 2013 UCB

Clock Skew (cont) CLK CLK CL CLK CLK clock skew, delay in distribution Note reversed buffer In this case, clock skew actually provides extra time (adds to the effective clock period) This effect has been used to help run circuits as higher clock rates Risky business! Spring 2018 EECS151 Page 37

Components of Path Delay 1 # of levels of logic 2 Internal cell delay 3 wire delay 4 cell input capacitance 5 cell fanout 6 cell output drive strength 38

Who controls the delay? foundary engineer (TSMC) Library Developer (Aritsan) CAD Tools (DC, IC Compiler) Designer (you!) 1 # of levels synthesis RTL 2 Internal cell delay physical parameters cell topology, trans sizing cell selection 3 Wire delay physical parameters place & route layout generator 4 Cell input capacitance physical parameters cell topology, trans sizing cell selection instantiation 5 Cell fanout synthesis RTL 6 Cell drive strength physical parameters transistor sizing cell selection instantiation 39

From Delay Models to Timing Analysis clk Timing Analysis What is the smallest T that produces correct operation? Or, can we meet a target T? f T 1 MHz 1 μs 10 MHz 100 ns 100 MHz 10 ns 1 GHz 1 ns 40 EECS 151 UC Regents Spring 2018 UCB

Timing Closure: Searching for and beating down the critical path? Must consider all connected register pairs, paths, plus from input to register, plus register to output Design tools help in the search Synthesis tools work to meet clock constraint, report delays on paths, Special static timing analyzers accept a design netlist and report path delays, and, of course, simulators can be used to determine timing performance Tools that are expected to do something about the timing behavior (such as synthesizers), also include provisions for specifying input arrival times (relative to the clock), and output requirements (set-up 41 times of next stage)

Timing Analysis, real example The critical path Most paths have hundreds of picoseconds to spare Late-mode timing checks (thousands) 200 150 100 50 0 40 20 0 20 40 60 80 100 120 140 160 180 200 220 240 260 280 Timing slack (ps) From The circuit and physical design of the POWER4 microprocessor, IBM J Res and Dev, 46:1, Jan 2002, JD Warnock et al 42

Timing Optimization As an ASIC/FPGA designer you get to choose: The algorithm The Microarchitecture (block diagram) The RTL description of the CL blocks (number of levels of logic) Where to place registers and memory (the pipelining) Overall floorplan and relative placement of blocks 43

How to retime logic Circles are combinational logic, labelled with delays Critical path is 5 We want to improve it without changing circuit semantics IN 1 1 1 1 2 2 OUT Figure 1: A small graph before retiming The nodes represent logic delays, with the inputs and outputs passing through mandatory, fixed registers The critical path is 5 Add a register, move one circle Performance improves by 20% IN 1 1 1 1 2 2 OUT Figure 2: The example in Figure 2 after retiming The critical path is reduced from 5 to 4 Post-Placement C-slow Retiming for the Xilinx Virtex FPGA Logic Synthesis tools can do 44 this in simple cases Nicholas Weaver UC Berkeley Berkeley, CA Yury Markovskiy UC Berkeley Berkeley, CA Yatish Patel UC Berkeley Berkeley, CA John Wawrzynek UC Berkeley Berkeley, CA

Floorplaning: essential to meet timing 45 EECS 151 (Intel XScale 80200) UC Regents Spring 2018 UCB

Timing Analysis Tools Static Timing Analysis: Tools use delay models for gates and interconnect Traces through circuit paths Cell delay model capture For each input/output pair, internal delay (output load independent) output dependent delay Standalone tools (PrimeTime) and part of logic synthesis delay Back-annotation takes information from results of place and route to improve accuracy of timing analysis DC in topographical mode uses preliminary layout information to model interconnect parasitics Prior versions used a simple fan-out model of gate loading output load 46 EECS151, UC Berkeley Sp18

clk Hold-time Violations d FF q Some state elements have positive hold time requirements How can this be? Fast paths from one state element to the next can create a violation (Think about shift registers!) CAD tools do their best to fix violations by inserting delay (buffers) Of course, if the path is delayed too much, then cycle time suffers Difficult because buffer insertion changes layout, which changes path delay 47 EECS151, UC Berkeley Sp18

Driving Large Loads Large fanout nets: clocks, resets, memory bit lines, off-chip Relatively small driver results in long rise time (and thus large gate delay) Strategy: Staged Buffers How to optimally scale drivers? Optimal trade-off between delay per stage and total number of stages? 48

Inverter Chain In Out C L For some given C L : How many stages are needed to minimize delay? How to size the inverters? Anyone want to guess the solution? EE141 49

Careful about Optimization Problems Get fastest delay if build one very big inverter So big that delay is set only by self-loading Cload Likely not the problem you re interested in Someone has to drive this inverter EE141 50

Delay Optimization Problem #1 You are given: A fixed number of inverters The size of the first inverter The size of the load that needs to be driven Your goal: Minimize the delay of the inverter chain Need model for inverter delay vs size EE141 51

Apply to Inverter Chain In Out 1 2 N C L t p = t p1 + t p2 + + t pn t pj = tinv γ + C in, j+ 1 C in, j t t t C C C N N in, j+ 1 p = p, j = inv γ +, in, N + 1 = L j= 1 i= 1 C in, j EE141 52

Optimal Sizing for Given N Delay equation has N-1 unknowns, C in,2 C in,n To minimize the delay, find N-1 partial derivatives: C C t = + t + t + in, j in, j+ 1 p inv inv Cin, j 1 Cin, j dt 1 C = = dc C C p in, j+ 1 tinv tinv 2 in, j in, j 1 in, j 0 EE141 53

Optimal Sizing for Given N (cont d) Result: every stage has equal fanout (f): C C in, j in, j+ 1 in, j 1 in, j Size of each stage is geometric mean of two neighbors: = C C C = C C in, j in, j 1 in, j + 1 Equal fanout à every stage will have same delay EE141 54

Optimum Delay and Number of Stages When each stage has same fanout f : Fanout of each stage: Minimum path delay: N f = F = C / C f = N F p inv L in,1 ( N γ ) t = Nt + F EE141 55

Example In C 1 1 f f 2 Out C L = 8 C 1 C L /C 1 has to be evenly distributed across N = 3 stages: EE141 56

Delay Optimization Problem #2 You are given: The size of the first inverter The size of the load that needs to be driven Your goal: Minimize delay by finding optimal number and sizes of gates So, need to find N that minimizes: ( γ ) N t = Nt + C C p inv L in EE141 57

Untangling the Optimization Problem Rewrite N in terms of fanout/stage f: N f CL Cin N = = ln( CL Cin ) ln f ( ) 1/ N ) f + γ t p = Ntinv CL Cin + γ = tinv ln( CL Cin ) ln f t p ln f 1 γ f = tinv ln( CL Cin ) = 0 2 f ln f f = exp 1+ ( γ f ) For γ = 0, f = e, N = ln (C L /C in ) (no explicit solution) EE141 58

Optimum Effective Fanout f Optimum f for given process defined by γ 5 f = exp 1+ ( γ f ) 45 f opt 4 35 e 3 f opt = 36 for γ = 1 25 0 05 1 15 2 25 3 γ EE141 59

In Practice: Plot of Total Delay [Hodges, p281] Why the shape? Curves very flat for f > 2 Simplest/most common choice: f = 4 EE141 60

Normalized Delay As a Function of F N t = Nt ( γ + F ), F = C C p inv L in [Rabaey: page 210] (γ = 1) EE141 61

Conclusion Timing Optimization: You start with a target on clock period What control do you have? Biggest effect is RTL manipulation ie, how much logic to put in each pipeline stage We will be talking later about how to manipulate RTL for better timing results In most cases, the tools will do a good job at logic/circuit level: Logic level manipulation Transistor sizing Buffer insertion But some cases may be difficult and you may need to help The tools will need some help at the floorpan and layout 62 EECS151, UC Berkeley Sp18