EE M216A.:. Fall Lecture 4. Speed Optimization. Prof. Dejan Marković Speed Optimization via Gate Sizing

Similar documents
EE M216A.:. Fall Lecture 5. Logical Effort. Prof. Dejan Marković

Logical Effort: Designing for Speed on the Back of an Envelope David Harris Harvey Mudd College Claremont, CA

VLSI Design, Fall Logical Effort. Jacob Abraham

Digital Integrated Circuits A Design Perspective

C.K. Ken Yang UCLA Courtesy of MAH EE 215B

Logical Effort. Sizing Transistors for Speed. Estimating Delays

Lecture 8: Logic Effort and Combinational Circuit Design

Very Large Scale Integration (VLSI)

EE115C Digital Electronic Circuits Homework #6

EE 447 VLSI Design. Lecture 5: Logical Effort

Static CMOS Circuits. Example 1

Lecture 6: Logical Effort

Lecture 8: Combinational Circuit Design

ECE429 Introduction to VLSI Design

EE 466/586 VLSI Design. Partha Pande School of EECS Washington State University

Lecture 5: DC & Transient Response

Lecture 5. Logical Effort Using LE on a Decoder

EE115C Digital Electronic Circuits Homework #5

Interconnect (2) Buffering Techniques. Logical Effort

Lecture 12 CMOS Delay & Transient Response

Properties of CMOS Gates Snapshot

Lecture Outline. ESE 570: Digital Integrated Circuits and VLSI Fundamentals. Review: 1st Order RC Delay Models. Review: Two-Input NOR Gate (NOR2)

Lecture 7 Circuit Delay, Area and Power

Using MOS Models. C.K. Ken Yang UCLA Courtesy of MAH EE 215B

EE141. Administrative Stuff

Integrated Circuits & Systems

Digital Integrated Circuits A Design Perspective

EECS 151/251A Homework 5

and V DS V GS V T (the saturation region) I DS = k 2 (V GS V T )2 (1+ V DS )

The CMOS Inverter: A First Glance

ESE570 Spring University of Pennsylvania Department of Electrical and System Engineering Digital Integrated Cicruits AND VLSI Fundamentals

Lecture 5: DC & Transient Response

CARNEGIE MELLON UNIVERSITY DEPARTMENT OF ELECTRICAL AND COMPUTER ENGINEERING DIGITAL INTEGRATED CIRCUITS FALL 2002

Midterm. ESE 570: Digital Integrated Circuits and VLSI Fundamentals. Lecture Outline. Pass Transistor Logic. Restore Output.

Logic Effort Revisited

9/18/2008 GMU, ECE 680 Physical VLSI Design

Spiral 2 7. Capacitance, Delay and Sizing. Mark Redekopp

Lecture 6: DC & Transient Response

EE 434 Lecture 33. Logic Design

DC and Transient. Courtesy of Dr. Daehyun Dr. Dr. Shmuel and Dr.

Introduction to CMOS VLSI Design. Lecture 5: Logical Effort. David Harris. Harvey Mudd College Spring Outline

5.0 CMOS Inverter. W.Kucewicz VLSICirciuit Design 1

Lecture 1: Gate Delay Models

EE5780 Advanced VLSI CAD

7. Combinational Circuits

Dynamic operation 20

COMP 103. Lecture 16. Dynamic Logic

CMPEN 411 VLSI Digital Circuits Spring 2011 Lecture 07: Pass Transistor Logic

EEC 116 Lecture #5: CMOS Logic. Rajeevan Amirtharajah Bevan Baas University of California, Davis Jeff Parkhurst Intel Corporation

Name: Grade: Q1 Q2 Q3 Q4 Q5 Total. ESE370 Fall 2015

ESE 570: Digital Integrated Circuits and VLSI Fundamentals

CMOS Transistors, Gates, and Wires

Digital Microelectronic Circuits ( )

VLSI GATE LEVEL DESIGN UNIT - III P.VIDYA SAGAR ( ASSOCIATE PROFESSOR) Department of Electronics and Communication Engineering, VBIT

Lecture 6: Circuit design part 1

ESE 570: Digital Integrated Circuits and VLSI Fundamentals

MOSFET and CMOS Gate. Copy Right by Wentai Liu

ECE321 Electronics I

CPE/EE 427, CPE 527 VLSI Design I L18: Circuit Families. Outline

Homework #2 10/6/2016. C int = C g, where 1 t p = t p0 (1 + C ext / C g ) = t p0 (1 + f/ ) f = C ext /C g is the effective fanout

EE213, Spr 2017 HW#3 Due: May 17 th, in class. Figure 1

ENEE 359a Digital VLSI Design

EE 330 Lecture 37. Digital Circuits. Other Logic Families. Propagation Delay basic characterization Device Sizing (Inverter and multiple-input gates)

Introduction to CMOS VLSI Design. Logical Effort B. Original Lecture by Jay Brockman. University of Notre Dame Fall 2008

CPE/EE 427, CPE 527 VLSI Design I Pass Transistor Logic. Review: CMOS Circuit Styles

Lecture 4: DC & Transient Response

EEC 118 Lecture #6: CMOS Logic. Rajeevan Amirtharajah University of California, Davis Jeff Parkhurst Intel Corporation

EE141Microelettronica. CMOS Logic

CMOS logic gates. João Canas Ferreira. March University of Porto Faculty of Engineering

COMBINATIONAL LOGIC. Combinational Logic

Interconnect (2) Buffering Techniques.Transmission Lines. Lecture Fall 2003

THE INVERTER. Inverter

Chapter 4. Digital Integrated Circuit Design I. ECE 425/525 Chapter 4. CMOS design can be realized meet requirements from

EE141-Fall 2012 Digital Integrated Circuits. Announcements. Homework #3 due today. Homework #4 due next Thursday EECS141 EE141

CHAPTER 15 CMOS DIGITAL LOGIC CIRCUITS

ΗΜΥ 307 ΨΗΦΙΑΚΑ ΟΛΟΚΛΗΡΩΜΕΝΑ ΚΥΚΛΩΜΑΤΑ Εαρινό Εξάμηνο 2018

CPE/EE 427, CPE 527 VLSI Design I Delay Estimation. Department of Electrical and Computer Engineering University of Alabama in Huntsville

ECE 546 Lecture 10 MOS Transistors

ESE570 Spring University of Pennsylvania Department of Electrical and System Engineering Digital Integrated Cicruits AND VLSI Fundamentals

ECE 342 Electronic Circuits. Lecture 34 CMOS Logic

MOS Transistor Theory

Advanced VLSI Design Prof. A. N. Chandorkar Department of Electrical Engineering Indian Institute of Technology- Bombay

Miscellaneous Lecture topics. Mary Jane Irwin [Adapted from Rabaey s Digital Integrated Circuits, 2002, J. Rabaey et al.]

EE371 - Advanced VLSI Circuit Design

COMP 103. Lecture 10. Inverter Dynamics: The Quest for Performance. Section 5.4.2, What is this lecture+ about? PERFORMANCE

E40M Capacitors. M. Horowitz, J. Plummer, R. Howe

Lecture 9: Combinational Circuit Design

The Inverter. Digital Integrated Circuits A Design Perspective. Jan M. Rabaey Anantha Chandrakasan Borivoje Nikolic

Lecture 16: Circuit Pitfalls

Chapter 11. Inverter. DC AC, Switching. Layout. Sizing PASS GATES (CHPT 10) Other Inverters. Baker Ch. 11 The Inverter. Introduction to VLSI

Chapter 5. The Inverter. V1. April 10, 03 V1.1 April 25, 03 V2.1 Nov Inverter

High-to-Low Propagation Delay t PHL

Integrated Circuits & Systems

EE115C Digital Electronic Circuits Homework #4

ENEE 359a Digital VLSI Design

Homework Assignment #3 EE 477 Spring 2017 Professor Parker , -.. = 1.8 -, 345 = 0 -

Errata of K Introduction to VLSI Systems: A Logic, Circuit, and System Perspective

Announcements. EE141- Spring 2003 Lecture 8. Power Inverter Chain

Digital Microelectronic Circuits ( ) The CMOS Inverter. Lecture 4: Presented by: Adam Teman

Digital EE141 Integrated Circuits 2nd Combinational Circuits

Transcription:

EE M216A.:. Fall 2010 Lecture 4 Speed Optimization Prof. Dejan Marković ee216a@gmail.com Speed Optimization via Gate Sizing Gate sizing basics P:N ratio Complex gates Velocity saturation ti Tapering Developing intuition Number of stages vs. fanout Popular inverter chain example Formal approach: logical effort Sizing optimization for speed EEM216A.:. Fall 2010 Lecture 4: Speed D. Optimization Markovic / Slide 2 2

Basic Gate Sizing Relationships Rise and fall delays are determined by the pull up and pull down strength Besides the dimensions, strength depends on µ, C OX, V T PMOS is weaker because of lower µ P Larger P network than N network Increasing size of gate can reduce delay Inverse (1/) relationship with resistance (and hence delay) BUT it can slow down the gate driving it Proportional () relationship with Capacitance. So be careful! EEM216A.:. Fall 2010 Lecture 4: Speed D. Optimization Markovic / Slide 3 3 P:N Ratio for Equal Rise and Fall Delay Good to have roughly equal delays for different transitions Don t need to worry about a worst case sequence Size P s to compensate for mobility C OX, V T, L are roughly the same R DRV 1/ I 1/ µ Make the Pull up and Pull down resistances equal R N /R P = 1 = µ P P /µ N N = kβ, k = mobility ratio, β = P:N ratio P / N = µ N / µ P Approximately the same as making V THL = V DD /2 Easy for an inverter hat about more complex gates? EEM216A.:. Fall 2010 Lecture 4: Speed D. Optimization Markovic / Slide 4 4

Complex Gate Sizing N stack series devices need N times lower resistance N idth Make worst case strength of each path equal Multi input input transition can result in stronger network Long series stacking is VERY bad A B 6 6 6 E.g.: β = 2 2 2 A C B C EEM216A.:. Fall 2010 Lecture 4: Speed D. Optimization Markovic / Slide 5 5 Accounting for Velocity Saturation Series stacking is actually less velocity saturated If we use R no_stack = (4/3)R stack Adjust the single device size to account for velocity saturation 4/3 A B 6 6 6 E.g.: β = 2 2 2 4/3 A C B C EEM216A.:. Fall 2010 Lecture 4: Speed D. Optimization Markovic / Slide 6 6

P:N Ratio for Minimum Delay Delay of an inverter chain (2 inverters) to include t plh & t phl in P N P N out P N Let R PDRV ~ R 0 / P µ P, R NDRV ~ R 0 / N µ N, C G ~ C 0 (1+ P / N ) t PD = t D1 + t D2 = R 0 (1/ P µ P + 1/ N µ N ) C 0 (1+ P / N ) τ N (1+1/kβ)(1+β) Min(t PD ): dt PD /dβ = 0 = τ N (1 k/β 2 ) So β = P / N = sqrt(µ N /µ P ) Intuition is that since NMOS has more drive for a given size, it is better to use more NMOS EEM216A.:. Fall 2010 Lecture 4: Speed D. Optimization Markovic / Slide 7 7 FO4 Inverter Delay vs. P:N Ratio β Optimal β = sqrt(µ) for minimum delay Curve is relatively flat so not a strong delay tradeoff FO4 inverter delay (τ) β EEM216A.:. Fall 2010 Lecture 4: Speed D. Optimization Markovic / Slide 8 8

Tapering One observation from Elmore delay is that capacitance closer to the v source has less effect on delay τ delay =R(C 1 1 )+(R 1 +R 2 )(C 2 ) C 1 has less effect on delay than C 2 So taper stacked devices to speed them up Make the bottom ones bigger R 1 R 1 (many occurrences) has less resistance C 3 (multiplying larger R) has smaller capacitance In reality, tapering doesn t win as much because layout is less compact when stacking unequal sized transistors (causing more C) R 2 R 3 out GND C 3 C 2 C 1 EEM216A.:. Fall 2010 Lecture 4: Speed D. Optimization Markovic / Slide 9 9 Example (1/5): Delay of an N Input AND Function Series stacking: larger devices w/o improving drive strength Greater self loading capacitance e expect that with large number of inputs, it is no longer better to build bigger gates Comparison (approximate) 1 N input NAND gate driving an inverter 2 N/2 input NAND gates driving a NOR gate (to combine) Drive the same output load t PD1 t PD2 N-input β N /f N N /f β N N βf N f N N/2-input β N /f (N/2)( N )/f 2β N N βf N f N Let s analyze building blocks: NAND, NOR, INV EEM216A.:. Fall 2010 Lecture 4: Speed D. Optimization Markovic / Slide 10 10

Example (2/5): Delay of an N Input NAND Assume C GN/P = C DN/P = C 0 ff/µm NMOS Resistance = R 0 µm NAND, NMOS size is N N / f For N inputs, R 1 = R 2 = = R 0 /(N N /f) C 1 = C 2 = = C N = NC 0 N /f N inputs NMOS width = N N /f out R C LOAD 1 R N-1 C N-2 C N-1 R N Let β = 2 (t plh = t phl for simplified analysis), NMOS Devices PMOS and Output EEM216A.:. Fall 2010 Lecture 4: Speed D. Optimization Markovic / Slide 11 11 Example (3/5): Delay of Inverter and NOR Inverter R INV = R 0 / N C L_INV = C DIFF + C GATE = C 0 ( N (1+β) + f N (1+β)) For β = 2, t INV = R 0 C 0 (3+3f) 3f) C gate_inv = C 0 ( N (1+β)) Input capacitance of inverter NOR2 R NOR = R 0 / N C L_NOR = C DIFF + C GATE = C 0 ( N (2+2β) + f N (1+β)) For β = 2, t NOR = R 0 C 0 (6+3f) C gate_nor = C 0 ( N (1+2β)) Input capacitance of NOR2 EEM216A.:. Fall 2010 Lecture 4: Speed D. Optimization Markovic / Slide 12 12

Example (4/5): Comparison N input NAND and Inverter N/2 input NAND and NOR N/2 input NAND NMOS width = N/2 N /f Crossover at N = 5 with f = 4 (note the unequal C) EEM216A.:. Fall 2010 Lecture 4: Speed D. Optimization Markovic / Slide 13 13 Example (5/5): Table of Comparison N = 4 t p1 = 21 + 6f (45 for f = 4) t p2 = 13 + 8f (45 for f = 4) N = 6 t p1 = 36 + 6f (60 for f = 4) t p2 = 21 + 8f (53 for f = 4) N = 8 t p1 = 55 + 6f (79 for f = 4) t p2 = 30 + 8f (62 for f = 4) It does not make sense in delay to build large fan In static CMOS gates of fan in greater than 4! EEM216A.:. Fall 2010 Lecture 4: Speed D. Optimization Markovic / Slide 14 14

Transmission Gate Sizing Attempt to make a T gate have equal pull up and pull down resistance P:N ratio of k is not good for delay: NMOS still has some significant pull up up strength (even if not all the way to V DD ) PMOS has some pull down (but very weak) Using some common numbers R N_DN =R O kω µm, R N_UP =2R O kω µm (2 penalty, weak trans.) R P_UP =2.5R O kω µm, R P_DN =5R O kω µm (2 penalty, weak trans.) Let s try P = N Parallel Up, R TGUP = R N_UP R P_UP =1.1R O Parallel Down, R TGDN = R N_DN R P_DN =0.83R O So, using P / N = 1 is fairly reasonable Actual size may depend on the process technology EEM216A.:. Fall 2010 Lecture 4: Speed D. Optimization Markovic / Slide 15 15 Delay Analysis (So Far) Summary The capacitance and resistance of the devices determine the performance of the circuit Elmore Delay approximation gives initial insight into design Step response, does not account for signal slopes The sizing of the transistors (a first glimpse) Determines the logical threshold Determines the drive strength of the gate as well as the load it presents to the preceding gate which effects the delay Determines the cap. and hence power dissipated by the gate Large fan in gates imply large self loading and gate loading to the preceding gate Better to split into 2 gates when fan in is greater than 4 EEM216A.:. Fall 2010 Lecture 4: Speed D. Optimization Markovic / Slide 16 16

Speed Optimization via Gate Sizing Gate sizing basics P:N ratio Complex gates Velocity saturation ti Tapering Developing intuition Number of stages vs. fanout Popular inverter chain example Formal approach: logical effort Sizing optimization for speed EEM216A.:. Fall 2010 Lecture 4: Speed D. Optimization Markovic / Slide 17 17 Problem Statement Given: An arbitrary logical function A given implementation How do we decide the relative size of each gate? Constraints: C out (load), C in (load presented to input), and maximum delay in 0 C i0 out 0 in 1 C i1 Combinational Logical Network out 1 C o0 C o1 t pd < t pdmax EEM216A.:. Fall 2010 Lecture 4: Speed D. Optimization Markovic / Slide 18 18

Simplified Problem: Buffering C in N = 0 α 1 0 α 2 0 α N-1 0 Stage 1 Assume β (P:N ratio) = µ (mobility ratio) I PSAT =I NSAT R 0 = Pull down for NMOS with size 0 or PMOS with size β 0 C 0 = Gate capacitance of N+PMOS of size 0, β 0 Ignore Source/Drain & ire Capacitance τ 0 = R 0 C 0 Goal: sizes each of the N stages for minimum delay Delay for stage 1: α 1 C 0 R 0 = α 1 τ 0 Delay for stage 2: α 2 C 0 R 0 /α 1 = α 2 /α 1 τ 0 C out EEM216A.:. Fall 2010 Lecture 4: Speed D. Optimization Markovic / Slide 19 19 Optimal Fanout Fanout of each stage of the inverter chain Stage 1 = α 1, Stage 2 = α 2 / α 1 Assuming that the fanout of each stage is equal, α 0 Let α 1 = α 0, α 2 = α 2 0, α 3 = α 3 0 Let C out = C 0 α 0 N Total Delay = Sum (Delay of stage 1 N) Delay = τ 0 Nα 0 Since C in = C 0 (remember: both C in and C out are given) C out /C in = α 0 N For a given N, the optimal α 0 EEM216A.:. Fall 2010 Lecture 4: Speed D. Optimization Markovic / Slide 20 20

Optimum Number of Stages For an arbitrary N 50 45 40 Delay versus Fanout Delay 35 30 e 25 20 Min Delay 15 1 2 3 4 5 6 Fan Out Optimum buffer fanout is e (2.718) when the self loading is neglected EEM216A.:. Fall 2010 Lecture 4: Speed D. Optimization Markovic / Slide 21 21 Constant Fanout Per Stage? Intuition: what if we increase the size of 1 stage by (1+ ) R drv reduce 1/(1+ ) C load (previous stage) increases by (1+ ) Delay is summed and dr reduces less quickly than Ci increases So delay would increase if we deviate Mathematically: Delay = τ 0 (α 1 + α 2 /α 1 + α 3 /α 2 + α 4 /α 3 + α 5 /α 4 + ) ddelay/dα 1 = 0 ddelay/dα 2 1 = τ 0 (1 α 2 /α 12 ) So α 1 2 = α 2 ddelay/dα 2 = 0 ddelay/dα 2 = τ 0 (1/α 1 α 3 /α 22 ) So α 2 2 = α 1 α 3, thus α 1 3 = α 3 EEM216A.:. Fall 2010 Lecture 4: Speed D. Optimization Markovic / Slide 22 22

Optimal Buffering with Self Loading Intuition: without self loading Delay decreases proportionally with decreasing the # of stages But increasing fanout increases delay proportionally The two are equal at the optimum # of stages and fanout Intuition: with self loading Increasing fanout no longer increases delay proportionally Delay = R 0 (αc 0 + C sd ) New optimum # of stages would be less and fanout is bigger All equations remain the same except Delay EEM216A.:. Fall 2010 Lecture 4: Speed D. Optimization Markovic / Slide 23 23 Optimal Buffering as fn (Self Loading) The optimum changes with self loading A reasonable number to use for optimal delay is fanout of 4 Delay 60 50 40 30 Delay versus Fanout 20 10 p=0 p=1 p=2 p=3 p=4 p=5 0 2 3 4 5 6 Fan Out p = C sd /C 0 EEM216A.:. Fall 2010 Lecture 4: Speed D. Optimization Markovic / Slide 24 24

Buffer Optimization for Energy Delay Optimizing for Energy (Power) doesn t make sense because the optimum will be the smallest possible device size Instead optimize for the best Energy Delay tradeoff Assuming constant fanout rgy Delay Ener 7 6 5 4 3 2 1 0 x 10 4 Energy Delay versus Fanout 2 4 6 8 10 Fan Out Assuming FO is constant, α 0 Results in larger FO FO = 5 is pretty reasonable EEM216A.:. Fall 2010 Lecture 4: Speed D. Optimization Markovic / Slide 25 25 p=0 p=1 p=2 p=3 p=4 p=5 Issue with Optimal Energy Delay Constant fanout is not a good assumption Intuition: Reduce a lot of power by reducing the size of the final driver Large fanout at tthe last stage Reduce fanout of prior stages to compensate Example: C in =1, C out =1000 Equal Fanout Result: 4 stages Stage 1 Stage 2 Stage 3 Stage 4 EDP Equal FO = 5.62 1 5.62 31.6 177.8 32200 Unequal FO (tapered FO) 1 4.8 (4.8) 23.1 (4.9) 124.5 (5.4) 31100 EEM216A.:. Fall 2010 Lecture 4: Speed D. Optimization Markovic / Slide 26 26

Ultimately, e will get Here (~Lecture 8) Energy Delay Optimization Gate size, Supply Voltage, Threshold Voltage Energy egy E 0 D (min D) General form: E α D β ED (min EDP), V DD, V th optimization 0 Delay ED 0 (min E) EEM216A.:. Fall 2010 Lecture 4: Speed D. Optimization Markovic / Slide 27 27 Application of Fanout to Logic? hen logic needs to drive a large capacitive load: Fanout ~ 4 hat is fanout? Effective load capacitance driven by the Gate (norm. to C inverter ) Example: NAND gate P =5, N =5 driving 5 equal NAND gates Equivalent Inverter: P =5, N =2.5; Total Gate width = 7.5 Total Load Gate idth = 5*10 = 50 Fanout = 6.6 Try to reorganize logic and add inverters so fanout ~4 hen logic has large N so each stage drives small fanouts: Delay is logic limited so reduce N Balance Fanout so that they are equal OK, but not very systematic EEM216A.:. Fall 2010 Lecture 4: Speed D. Optimization Markovic / Slide 28 28

Speed Optimization via Gate Sizing Gate sizing basics P:N ratio Complex gates Velocity saturation ti Tapering Developing intuition Number of stages vs. fanout Popular inverter chain example Formal approach: logical effort Sizing optimization for speed EEM216A.:. Fall 2010 Lecture 4: Speed D. Optimization Markovic / Slide 29 29 Concept of Logical Effort Instead of running lots of simulations Simplified: (almost) back of envelope calculations of delay Basic concept: Delay = R gate (C load + C self ) = R gate C load + R gate C self Logical Effort basic equation: d = f + p d is the delay (normalized) f is known as the effort delay p is known as the parasitic delay d = Delay/τ = (R gate C load + R gate C self ) / R 0 C 0 Normalized to the delay of a FO 1inverter(no self load) ith R 0 = R gate, d = fanout + normalized parasitic So f is essentially equivalent to fanout d is a measure that is independent of process, voltage, temp EEM216A.:. Fall 2010 Lecture 4: Speed D. Optimization Markovic / Slide 30 30

The Logical Effort ay of Thinking Gate delay we used up to now: Another way to write this formula is: EEM216A.:. Fall 2010 Lecture 4: Speed D. Optimization Markovic / Slide 31 31 Now Normalize the Delay Strategy: normalize to the time constant of an inverter Approach 1: normalize to fictitious technology time constant Approach 2: normalize to intrinsic delay of inverter Both formulations exist in the literature e use approach 1 (as in the original logical theory) Doesn t really matter it s just a constant EEM216A.:. Fall 2010 Lecture 4: Speed D. Optimization Markovic / Slide 32 32

Normalized Delay Strategy: normalize to a time constant of an inverter Approach 1: normalize to fictitious technology time constant Normalized delay: Even simpler: Logical effort terms Logical effort (g) Electrical fanout (h) Parasitic delay (p) EEM216A.:. Fall 2010 Lecture 4: Speed D. Optimization Markovic / Slide 33 33 The Meaning of Logical Effort Terms Logical effort terms Logical effort (g) Electrical fanout (h) Parasitic delay (p) Intuition Logical effort (g) R on ratio for equal C in C in ratio for equal R on Electrical fanout (h) C out / C in ratio (gate cap only, diffusion counts in the p term) Parasitic delay (p) Ratio of parasitic capacitances for equal R on EEM216A.:. Fall 2010 Lecture 4: Speed D. Optimization Markovic / Slide 34 34

Calibrating the Model The values for g and p can be extracted from simulation Because, d = g*h+p Simulating the delay of the gate for different loads Drive itself with different multiplication li factor Extract τ using inverter with no self loading (AS, AD, PS, PD = 0) Vary the inputs (and rise/fall) for different g and p Dela ay/τ Gate Intercept is p Slope is g C load /C in EEM216A.:. Fall 2010 Lecture 4: Speed D. Optimization Markovic / Slide 35 35 Logical and Electrical Effort Instead of just d = f + p, let f = gh g = logical effort (of a gate) Cost of implementing logic h = electrical effort Cost of driving a load f= R gate C load /R 0 C 0, p = R gate C self /R 0 C 0 Let R 0 = R inv where R inv = R gate, C 0 = C inv p = C self /C inv, f = C in C load /C in C inv C in is the gate s input capacitance (for the particular input) g = C in /C inv Each gate (and each input of every gate) has different values h = C load /C in Output to input capacitance ratio EEM216A.:. Fall 2010 Lecture 4: Speed D. Optimization Markovic / Slide 36 36

Typical Simulation Data (*) Normalized dela ay: d 6 5 4 3 2 1 effort delay (*) assumes g INV = 1 parasitic delay 1 2 3 4 5 Electrical effort: g = C out /C in g = 4/3 p= 2 g = 1 d = (4/3)h + 2 p = 1 d = h + 1 d gate = g h + p = effort delay + parasitic delay EEM216A.:. Fall 2010 Lecture 4: Speed D. Optimization Markovic / Slide 37 37 Computing Logical Effort: g g is an unitless inherent characteristic of the gate Not a function of gate size It is a function of the construction of the gate (connection and relative size between transistors) An indication of the cost of implementing the function. Procedure: 1. Choose an input, find total device with driven by that input 2. Find P, the pull up device width of a single device that has equivalent drive strength as a gate s pull up of that input 3. For a reference inv with Equal Rise/Fall, β=µ, with P from Step 3, determine the total gate widths of the inverter devices 4. Divide Step 2 by Step 4 to determine g up 5. Repeat Steps 3 5 for pull down device for g down The two g s would only be different if β of gate is not µ EEM216A.:. Fall 2010 Lecture 4: Speed D. Optimization Markovic / Slide 38 38

Example: Calculating Logical Effort Def: Logical effort is the ratio of the gate input cap to the input cap of an inverter delivering the same output current NOR2: C in = 5 LE = 5/3 Inverter: C in = 3 LE = 1 (def) Reference NAND2: C in = 4 LE = 4/3 EEM216A.:. Fall 2010 Lecture 4: Speed D. Optimization Markovic / Slide 39 39 Example: NOR Gate with β = 3 Common assumptions C gate proportional to Device idth R gate inversely proportional to Device idth For a NOR gate β = µ = 3 Units are not so important Equivalent inverter 12 B P : N = 6:2 C G_INV = 8 12 NOR NOR gate input capacitance A 2 2 Output C G_NOR = 14 Logical Effort = 7/4 Caveat: don t get confused with absolute transistor sizing! EEM216A.:. Fall 2010 Lecture 4: Speed D. Optimization Markovic / Slide 40 40

Example: Calculating Parasitic Delay Def: Parasitic delay is the ratio of intrinsic cap at the gate output and intrinsic cap at the output of an equivalent inverter NOR2: C int = 6 P = 2 Inverter: C int = 3 P = 1 (def) Reference NAND2: C int = 6 P = 2 EEM216A.:. Fall 2010 Lecture 4: Speed D. Optimization Markovic / Slide 41 41 Calculating Parasitic Delay: p Typically given since it depends on C diffusion of a gate Example: assume C S/D =0.5C G =0.5C o For an inverter C self /C inv = p INV =05 0.5 Higher C S/D /C G results in larger p (penalizing delay more). C S/D /C G is often close to 1 A 2 6 Output B A 12 12 2 2 NOR Output NOR Gate C S/DNOR = 12 + 2 + 2 =16 = 8C o C INV = 4C o p NOR = 2 (2p INV ) Caveat: γ INV is included in p INV (it is not always 1)! EEM216A.:. Fall 2010 Lecture 4: Speed D. Optimization Markovic / Slide 42 42

Calculating p Including Series Stacking hat about the intermediate nodes? One way to account for them is to use an effective p. For example: NOR pull up of B input R NOR =2*R PMOS. Delay = (R NOR /2)*C 1 + R NOR *C 2 + R NOR *C load Self loading p BUP = [(R NOR /2)*C 1 + R NOR *C 2 ]/(R inv *C inv ) (where R inv =R gate ) p BUP = (C 1 /2+ C 2 )/C inv Using C S/D = 0.5C G C 1 = 6C o (shared) C 2 = 8C o p BUP = 11/4 B A 12 C1 C 1 12 NOR Output 2 2 C 2 Note: this increased accuracy requires different p s for different input AND pull up/down. Simplify by ignoring these nodes (unless otherwise specified) EEM216A.:. Fall 2010 Lecture 4: Speed D. Optimization Markovic / Slide 43 43 Generalize N input NAND 2 2 C 1 =(2N+N)C o N NAND Output N C 2 =NC o C 3 =NC o Output load = 3N N size 2 PMOS=2N 1 size N NMOS = N Intermediate load = N (shared) Total pull down delay T = R(3NC o ) + sum(i=1 N 1){(iR/N)*NC o } d (norm) = 3N + (N 2 /2 N/2) p = (N 2 /2 N/2) Proportional to N 2!!! This is bad news for large series stacking Even worse for PMOS (NOR) Reality is even worse since C GS makes each intermediate node capacitance > NC o EEM216A.:. Fall 2010 Lecture 4: Speed D. Optimization Markovic / Slide 44 44

A Catalog of Gates Gate Type g for Different number of inputs 1 2 3 4 5 n Inverter 1 NAND 4/3 5/3 6/3 7/3 (n+2)/3 NOR 5/3 7/3 9/3 11/3 (2n+1)/3 Multiplexer 2 2 2 2 2 XOR,XNOR 4 12 32 Gate Type Parasitic delay Inverter n-input NAND n-input NOR n-way Multiplexer 2-input XOR,XNOR (sym) p inv np inv np inv 2np inv n2 n-1 p inv β = µ = 2 Mux is tri state inverters shorted together. XOR assumes that input is bundled (a,a ) p INV ~ 1 p GATE in this table does not include intermediate nodes. EEM216A.:. Fall 2010 Lecture 4: Speed D. Optimization Markovic / Slide 45 45 Example #1: Ring Oscillator Estimate the frequency of an N stage ring oscillator: D Logical Effort: g = Electrical Effort: h = Parasitic i Delay: p = 1 C out /C in = 1 p inv = 1 gpdk090: t stage = 13ps (TT) Stage Delay: d = g h + p = 2 1 OSC Frequency: f OSC = 2Ndτ = 1 4Nτ EEM216A.:. Fall 2010 Lecture 4: Speed D. Optimization Markovic / Slide 46 46

Example #2: Fanout of 4 Inverter Estimate the delay of a fanout of 4 (FO4) inverter: D Logical Effort: g = Electrical Effort: h = Parasitic Delay: p = Stage Delay: d = 1 C out /C in = 4 p inv = 1 g h + p = 5 gpdk090: t FO4 = 33ps (TT) EEM216A.:. Fall 2010 Lecture 4: Speed D. Optimization Markovic / Slide 47 47 Example #3: Gate Delays Delay of the path from A to B where β = µ = 2 and p INV =1 g G1 = 4/3, p G1 =2, C IN_G1 =8 g G2 = 5/3, p G2 =2, C IN_G2 =15 g G3 = 4, p G3 =4, C IN_G3 =30 C IN_G4 =15 h G1 = (C IN_G2 +C IN_G3 )/C IN_G1 = 5.625, h G2 = C IN_G4 /C IN_G2 = 1 d G1 = g G1 h G1 +p G1 = 9.5 d G2 = g G2 h G2 +p G2 = 3.66 P : N =20:10 Delay = 13.16 G 3 P : N =4:4 Normalized A G 1 P : N =10:5 B G 2 G 4 P : N =12:3 EEM216A.:. Fall 2010 Lecture 4: Speed D. Optimization Markovic / Slide 48 48

Summary Delay and/or power of a logic network depend significantly on the relative sizes of logic gates (not transistors within a gate) Inverter buffering is a simple example of the analysis The analysis leads to ~FO 4 4asbeingoptimalfanout for driving larger capacitive loads To generalize analysis of delay, we introduce logical effort Delay normalized by inverter delay, d = gh + p g and p are characteristics of a logic gate that depends on its structure and does not depend on gate size. May have different g s and p s for different inputs and pull up / pull down Simplifybyusingg g AVG and ignoring C s of intermediate nodes Once a table of g s and p s are created for the catalog of gates, delay can be calculated quickly and easily Next we will look at how to size a network instead of just analyzing it EEM216A.:. Fall 2010 Lecture 4: Speed D. Optimization Markovic / Slide 49 49