Lecture 5. Logical Effort Using LE on a Decoder

Similar documents
C.K. Ken Yang UCLA Courtesy of MAH EE 215B

ECE429 Introduction to VLSI Design

EE141-Fall 2011 Digital Integrated Circuits

Lecture 6: Logical Effort

Logical Effort. Sizing Transistors for Speed. Estimating Delays

EE 447 VLSI Design. Lecture 5: Logical Effort

EE M216A.:. Fall Lecture 5. Logical Effort. Prof. Dejan Marković

Logical Effort: Designing for Speed on the Back of an Envelope David Harris Harvey Mudd College Claremont, CA

VLSI Design, Fall Logical Effort. Jacob Abraham

Lecture 8: Combinational Circuit Design

Introduction to CMOS VLSI Design. Lecture 5: Logical Effort. David Harris. Harvey Mudd College Spring Outline

Very Large Scale Integration (VLSI)

Logical Effort EE141

Lecture 8: Logic Effort and Combinational Circuit Design

Logic Gate Sizing. The method of logical effort. João Canas Ferreira. March University of Porto Faculty of Engineering

Integrated Circuits & Systems

7. Combinational Circuits

Digital Microelectronic Circuits ( ) Logical Effort. Lecture 7: Presented by: Adam Teman

EE M216A.:. Fall Lecture 4. Speed Optimization. Prof. Dejan Marković Speed Optimization via Gate Sizing

ESE 570: Digital Integrated Circuits and VLSI Fundamentals

Logic Effort Revisited

Introduction to CMOS VLSI Design. Logical Effort B. Original Lecture by Jay Brockman. University of Notre Dame Fall 2008

Interconnect (2) Buffering Techniques. Logical Effort

Spiral 2 7. Capacitance, Delay and Sizing. Mark Redekopp

Digital Integrated Circuits A Design Perspective

Static CMOS Circuits. Example 1

Lecture 4. Adders. Computer Systems Laboratory Stanford University

Lecture 8: Combinational Circuits

and V DS V GS V T (the saturation region) I DS = k 2 (V GS V T )2 (1+ V DS )

Topics. CMOS Design Multi-input delay analysis. John A. Chandy Dept. of Electrical and Computer Engineering University of Connecticut

CARNEGIE MELLON UNIVERSITY DEPARTMENT OF ELECTRICAL AND COMPUTER ENGINEERING DIGITAL INTEGRATED CIRCUITS FALL 2002

Lecture Outline. ESE 570: Digital Integrated Circuits and VLSI Fundamentals. Total Power. Energy and Power Optimization. Worksheet Problem 1

Lecture 1: Gate Delay Models

Lecture 6: Circuit design part 1

Introduction to CMOS VLSI Design (E158) Lecture 20: Low Power Design

Energy Delay Optimization

ESE 570: Digital Integrated Circuits and VLSI Fundamentals

Name: Answers. Mean: 83, Standard Deviation: 12 Q1 Q2 Q3 Q4 Q5 Q6 Total. ESE370 Fall 2015

Announcements. EE141-Spring 2007 Digital Integrated Circuits. CMOS SRAM Analysis (Read/Write) Class Material. Layout. Read Static Noise Margin

Lecture 8: Combinational Circuits

EE141Microelettronica. CMOS Logic

EECS 151/251A Homework 5

Arithmetic Building Blocks

Lecture 9: Combinational Circuit Design

EECS 427 Lecture 8: Adders Readings: EECS 427 F09 Lecture 8 1. Reminders. HW3 project initial proposal: due Wednesday 10/7

EE141-Fall 2012 Digital Integrated Circuits. Announcements. Homework #3 due today. Homework #4 due next Thursday EECS141 EE141

EE371 - Advanced VLSI Circuit Design

EE141. Administrative Stuff

Lecture 9: Combinational Circuits

EE115C Digital Electronic Circuits Homework #5

Errata of K Introduction to VLSI Systems: A Logic, Circuit, and System Perspective

Advanced VLSI Design Prof. A. N. Chandorkar Department of Electrical Engineering Indian Institute of Technology- Bombay

EE115C Digital Electronic Circuits Homework #6

Lecture Outline. ESE 570: Digital Integrated Circuits and VLSI Fundamentals. Review: 1st Order RC Delay Models. Review: Two-Input NOR Gate (NOR2)

Chapter 4. Digital Integrated Circuit Design I. ECE 425/525 Chapter 4. CMOS design can be realized meet requirements from

ESE 570: Digital Integrated Circuits and VLSI Fundamentals

Computer Science 324 Computer Architecture Mount Holyoke College Fall Topic Notes: Digital Logic

Lecture 6: Time-Dependent Behaviour of Digital Circuits

EE 560 CHIP INPUT AND OUTPUT (I/0) CIRCUITS. Kenneth R. Laker, University of Pennsylvania

CMPEN 411 VLSI Digital Circuits Spring 2011 Lecture 07: Pass Transistor Logic

UNIVERSITY OF CALIFORNIA, BERKELEY College of Engineering Department of Electrical Engineering and Computer Sciences

Lecture 34: Portable Systems Technology Background Professor Randy H. Katz Computer Science 252 Fall 1995

Floating Point Representation and Digital Logic. Lecture 11 CS301

CMOS logic gates. João Canas Ferreira. March University of Porto Faculty of Engineering

Lecture 7 Circuit Delay, Area and Power

Lecture 6: DC & Transient Response

Homework #2 10/6/2016. C int = C g, where 1 t p = t p0 (1 + C ext / C g ) = t p0 (1 + f/ ) f = C ext /C g is the effective fanout

THE INVERTER. Inverter

EE 434 Lecture 33. Logic Design

EE 466/586 VLSI Design. Partha Pande School of EECS Washington State University

UNIVERSITY OF CALIFORNIA, BERKELEY College of Engineering Department of Electrical Engineering and Computer Sciences

Lecture 4: DC & Transient Response

Hw 6 and 7 Graded and available Project Phase 2 Graded Project Phase 3 Launch Today

Lecture 12 CMOS Delay & Transient Response

9/18/2008 GMU, ECE 680 Physical VLSI Design

Chapter 11. Inverter. DC AC, Switching. Layout. Sizing PASS GATES (CHPT 10) Other Inverters. Baker Ch. 11 The Inverter. Introduction to VLSI

ESE 570: Digital Integrated Circuits and VLSI Fundamentals

Using MOS Models. C.K. Ken Yang UCLA Courtesy of MAH EE 215B

Lecture 8-1. Low Power Design

Name: Grade: Q1 Q2 Q3 Q4 Q5 Total. ESE370 Fall 2015

Testability. Shaahin Hessabi. Sharif University of Technology. Adapted from the presentation prepared by book authors.

Topic 4. The CMOS Inverter

Lecture 5: DC & Transient Response

EE371 - Advanced VLSI Circuit Design

! Crosstalk. ! Repeaters in Wiring. ! Transmission Lines. " Where transmission lines arise? " Lossless Transmission Line.

MASSACHUSETTS INSTITUTE OF TECHNOLOGY Department of Electrical Engineering and Computer Sciences

Homework 4 due today Quiz #4 today In class (80min) final exam on April 29 Project reports due on May 4. Project presentations May 5, 1-4pm

Interconnect (2) Buffering Techniques.Transmission Lines. Lecture Fall 2003

Digital Integrated Circuits A Design Perspective

CMPEN 411 VLSI Digital Circuits. Lecture 04: CMOS Inverter (static view)

University of Toronto. Final Exam

Chapter 2 Combinational Logic Circuits

EE 330 Lecture 37. Digital Circuits. Other Logic Families. Propagation Delay basic characterization Device Sizing (Inverter and multiple-input gates)

CMPEN 411 VLSI Digital Circuits Spring Lecture 14: Designing for Low Power

! Energy Optimization. ! Design Space Exploration. " Example. ! P tot P static + P dyn + P sc. ! Steady-State: V in =V dd. " PMOS: subthreshold

EECS 270 Midterm 2 Exam Answer Key Winter 2017

Lecture 14: Circuit Families

EE241 - Spring 2000 Advanced Digital Integrated Circuits. Announcements

CMPEN 411 VLSI Digital Circuits Spring Lecture 19: Adder Design

Homework Assignment #1 Solutions EE 477 Spring 2017 Professor Parker

Transcription:

Lecture 5 Logical Effort Using LE on a Decoder Mark Horowitz Computer Systems Laboratory Stanford University horowitz@stanford.edu Copyright 00 by Mark Horowitz Overview Reading Harris, Logical Effort talk slides Read book if you want more information For sizing, there is nothing that good in either book Chandrakasan.3 describes some issues Amrutur, Fast Low Power Decoders Section I, II cover this lecture, Section III covers Lecture 6 Overview Having setup the basic optimization problems, we will next develop a formalism for doing sizing with real gates. This formalism is called logical effort. To get some practice using this method we will apply it to the memory decoder we talked about in Lecture 3. This lecture will use logical effort to optimize the performance of a decoder, and in the process motivate some creative ways to build CMOS logic gates.

Logical Effort Formalism Let us express delays in terms of τ inv Delay* = Delay/ τ inv Delay of logic gate has two components: Delay* = EffortDelay + ParasiticDelay Effort delay again has two components: EffortDelay = LogicalEffort * ElectricalEffort ElectricalEffort is just fanout ElectricalEffort = Cload/Cin LogicalEffort describes the relative ability of gate topology to deliver current for a given input capacitance LogicalEffort = τ gate / τ inv 3 Logical Effort s View of Gate Delays Gate delay 0 8 6 Graphical model Slope is logical effort Y intercept is parasitic delay Effort Delay More complex gates Have larger LE Have larger parasitics 0 0 3 5 6 inv NAND NOR parasitic fanout Logical Effort Cin gate *Rdrive gate Cin inv *Rdrive inv

Calculating Logical Effort for a Gate Build the gates to have the same drive strength as a x pmos, x nmos inverter. The numbers on each transistor is relative to the x nmos transistor in the inverter. The Cin of inverter is 3x. LE = /3 LE=5/3 LE=; /3 Note that the logical effort of all inputs does not always match 5 Warm-ups with Logical Effort Frequency of an N stage ring oscillator LE=, FO=, γ=; delay = per inverter Delay through N inverters is *N, Frequency is /(N) since it takes N time to change from a high to a low, and another N to change from a low to a high Delay of a inverter with a fanout of LE=, FO=, γ=; delay = 5 per inverter Normally we will report delays in terms of FO inverter delays Roughly equal to normalized delay divided by 5 6

Logical Effort for Transmission Gates Need to consider the stage TG and drive gates stage Rdrive Set by path to Vdd/Gnd Resistance of logic gate + TG In this example it is x a / inverter from all paths Cin Depends on input 3 for inv, for NAND, TG LE for inv, 8/3 for NAND, /3 TG 7 Using Logical Effort to Size Gates x y z 0CIn LE /3 5/3 Fanout x/ y/x z/y 0/z Know the effort delay is the same per stage Call this the effective fanout, (EF) since it is the delay for an inverter with this fanout Need to find the total effective fanout for the chain This is the product of LE * Fanout of every stage =Product of LEs *fanout of chain In example: Total LE =., Fanout = 5 Effective Fanout = 8 j C inj LE. j C inj C load. C in j LE j

Sizing Gates x y z 0CIn LE /3 5/3 Fanout x/ y/x z/y 0/z Given we know the total EF EF for each stage must be EF /N Makes the EF of each stage the same Matches the required total Given the EF of each stage Fanout is EF/LE of that stage In the example the EF of each stage is.8= / This is too small. We have too many stages 9 Reducing Number of Logic Stages Reformulate logic If you have too low EF, use more complex gates, with fewer stages Often need to use AND- OR-Invert gates or OR- AND-Invert gates EF = 8.33, with two stages EF per stage is.9 which is quite reasonable X = *.9 = 5.8 x 0CIn LE 5/3 Fanout x/ 0/x Note: this was the wrong optimization if the top input was critical. Since we have slowed down this path. Make sure you are sizing the critical path. 0

Branching Factors X 0CIn X 0CIn In some circuits there is fan-out in the conventional sense One gate drives a number of gates If all gates are on the critical path EF stage = LE * (C j+ /C j ) *B (Branching factor) Total EF for a chain is then the product of LE and B for all the gates on the chain. In the example = /3**0 Electrical fanout per stage = EF stage /(LE*B) So X in the example would be around.6 = ½ * 7 Logical Effort Summary Estimate the path effort EF = Π LE * Π B *Cload/Cin Estimate the optimal number of stages N* = log (EF) Estimate the minimum delay N* + Parasitic delay Determine the actual number and type of gates Fit the required logic in N stages, where N is close to N* This may change slightly the path EF Determine the stage effort New EF /N =f Working from either end, determine gate sizes Cin = Cout*LE*B/f

Some Definitions 3 Decoder Review Decoder has two main jobs: Logic function Using N address bits Needs to select of N wordlines This means the logical effort of the chain will be larger than Equal to LE of an N input AND gate Act as a buffer chain The address line has a large fanout Each address line ultimately needs to drive every AND gate A0 drives ½ of the decoders and A0_b drives the other ½ The wordline capacitance can be large It has M cells on it, and a large wire capacitance Total fanout is proportional to the size of the memory

Decoder Logic N address N lines A0, A0_b both needed Each decoder an AND gate Large fanin Must drive wordline N gates N address lines 5 Example for Class Assume we are building a 56x56 memory (8KB memory) Assume that C add = * C cell Total fanout on each address input is then We still need to build the decoder too! What is the minimum logical effort for 8 input AND gate? Since fanout is large don t need to worry about # of stages We will need lots of inverters anyhow Many possibilities NAND -Inv trees NAND - NOR trees NAND - Inv tree Etc 6

Optimal Static Decoder Logical Effort of building input AND Two NAND gates is /3*/3 =.8 One NAND is -input NAND/-input NOR is /3*5/3 =. Which is best? Depends on the number of levels of logic you need If you need lots of gates, -input gates are often the best Using -input NAND gates An 8-input gate will take 6 levels of gates 8 to outputs, to outputs, to output 7 Number of Stages of Logic For our decoder (assuming input NAND gates) The decoder has a total effort of. (which is /3 cubed) * FO (which is ) Note that using other gates would not change this result very much, so don t sweat which gate you are going to use when you figure out the total effective fanout. For a effective fanout of around per stage This design would need around 7.5 stages which is Log ( 5 ) But how is it going to be put together? How do we size the individual gates? Which wires do we run up the decoder? 8

Predecode Options Two basic choices Can do a - predecode in groups, with a to final gate Final gate has two level of and gates Uses only 6 address wires running across the decoder Final gates are larger Can do a -6 predecode in groups, with a - final gate Uses 3 address lines running across the decoder Final gates are smaller Generally doing a larger predecode is better for two reasons More levels of logic before the wire capacitance Less capacitance switches each cycle (lower power) 9 Decode Path By using a -6 predecode we have move more stages of logic before the long wire This decreases its effect on the circuit, since it naturally give us more stages of buffers before driving the wire Otherwise we would need more stages in the predecode, just to drive the wire 6 6 A0AAA3 C L 56 to 6 predecoder A0 A AA3 0

General Predecode Consider the to 6 decoder: branching effort = for n inputs BE p is approx n- x a a0a a0aaa3 C L logical effort = a0 x x x x 3 3 total path effort PE= (LEp)(BEp) (FOp) =(/3)()()C L /C in This formula has A0 and A0_b as separate inputs Otherwise branching effort would be n Logical Effort of Predecoder This works back from the load on the predecoder for both a - predecoder, and a -6 predecode /3 / f /f to decoder /3 / f /f If this gives you a input capacitance that is too high, you need to add extra buffer stages (/3*) / f is the load here (/3) */ f 3 (/3) */ f to 6 decoder portion of to 6 decoder (/3) *8/f load (6/9*8)/f

Logical Effort in Real Life There are fixed side loads you need to deal with This is the wire capacitance of the predecoder outputs Since these capacitances don t scale with sizing, they don t fit in nicely to the logical effort frame work They also are caused by loading of non-critical gates But they are not a huge issue either The optimal sizes are often pretty large This will cause power and area issues Often you need to back off a little to make things fit better 3 Handling Fixed Side Loads w x y If B=6 and A=0, then w=, x=, y=6 (easy) fixed sideload A B If B=6 and A=8, things get more difficult. First solve the problem without the sideload: gives w=, x=, y=6 Formally we can add the sideload and solve the problem again using the fact that x/w = (A+y)/x and y/x=b/y. These equations can be rewritten in an iterative form and then solved. But after iterating, this produces x=5 y=8. But increasing the size of y makes a very small difference and adds power and area. It is probably not a good idea. So, if the side load is smaller than the gate load, we can size gates beyond the sideload using the standard approach, and size the gates before the sideload using the the sum of the sideload and the previous result.