Where are we? Data Path Design

Similar documents
Where are we? Data Path Design. Bit Slice Design. Bit Slice Design. Bit Slice Plan

Lecture 11: Adders. Slides courtesy of Deming Chen. Slides based on the initial set from David Harris. 4th Ed.

EECS 427 Lecture 8: Adders Readings: EECS 427 F09 Lecture 8 1. Reminders. HW3 project initial proposal: due Wednesday 10/7

EE141-Fall 2010 Digital Integrated Circuits. Announcements. An Intel Microprocessor. Bit-Sliced Design. Class Material. Last lecture.

Digital Integrated Circuits A Design Perspective

Arithmetic Building Blocks

Bit-Sliced Design. EECS 141 F01 Arithmetic Circuits. A Generic Digital Processor. Full-Adder. The Binary Adder

Digital Integrated Circuits A Design Perspective. Arithmetic Circuits. Jan M. Rabaey Anantha Chandrakasan Borivoje Nikolic.

Hw 6 due Thursday, Nov 3, 5pm No lab this week

Homework 4 due today Quiz #4 today In class (80min) final exam on April 29 Project reports due on May 4. Project presentations May 5, 1-4pm

CMPEN 411 VLSI Digital Circuits Spring Lecture 19: Adder Design

VLSI Design. [Adapted from Rabaey s Digital Integrated Circuits, 2002, J. Rabaey et al.] ECE 4121 VLSI DEsign.1

Digital Integrated Circuits A Design Perspective. Arithmetic Circuits. Jan M. Rabaey Anantha Chandrakasan Borivoje Nikolic.

Full Adder Ripple Carry Adder Carry-Look-Ahead Adder Manchester Adders Carry Select Adder

Digital Integrated Circuits A Design Perspective. Arithmetic Circuits

Digital Integrated Circuits A Design Perspective

Lecture 4. Adders. Computer Systems Laboratory Stanford University

VLSI Design I; A. Milenkovic 1

EE141. Administrative Stuff

Overview. Arithmetic circuits. Binary half adder. Binary full adder. Last lecture PLDs ROMs Tristates Design examples

CSE140: Components and Design Techniques for Digital Systems. Logic minimization algorithm summary. Instructor: Mohsen Imani UC San Diego

Arithmetic Circuits Didn t I learn how to do addition in the second grade? UNC courses aren t what they used to be...

CSE477 VLSI Digital Circuits Fall Lecture 20: Adder Design

Properties of CMOS Gates Snapshot

ISSN (PRINT): , (ONLINE): , VOLUME-4, ISSUE-10,

Review. EECS Components and Design Techniques for Digital Systems. Lec 18 Arithmetic II (Multiplication) Computer Number Systems

CPE/EE 427, CPE 527 VLSI Design I Pass Transistor Logic. Review: CMOS Circuit Styles

Arithmetic Circuits-2

Computer Science 324 Computer Architecture Mount Holyoke College Fall Topic Notes: Digital Logic

CMOS Digital Integrated Circuits Lec 10 Combinational CMOS Logic Circuits

GALOP : A Generalized VLSI Architecture for Ultrafast Carry Originate-Propagate adders

Arithmetic Circuits-2

Floating Point Representation and Digital Logic. Lecture 11 CS301

Fundamentals of Computer Systems

EECS150. Arithmetic Circuits

Spiral 2 7. Capacitance, Delay and Sizing. Mark Redekopp

EFFICIENT MULTIOUTPUT CARRY LOOK-AHEAD ADDERS

CPE/EE 427, CPE 527 VLSI Design I L18: Circuit Families. Outline

Area-Time Optimal Adder with Relative Placement Generator

Lecture 6: Circuit design part 1

Combinational Logic Design

Logical Effort: Designing for Speed on the Back of an Envelope David Harris Harvey Mudd College Claremont, CA

Based on slides/material by. Topic 3-4. Combinational Logic. Outline. The CMOS Inverter: A First Glance

EE141- Spring 2004 Digital Integrated Circuits

Adders allow computers to add numbers 2-bit ripple-carry adder

UNIT III Design of Combinational Logic Circuits. Department of Computer Science SRM UNIVERSITY

CMPEN 411 VLSI Digital Circuits Spring Lecture 21: Shifters, Decoders, Muxes

Basic Gate Repertoire

MASSACHUSETTS INSTITUTE OF TECHNOLOGY Department of Electrical Engineering and Computer Sciences

EE115C Digital Electronic Circuits Homework #6

B.Supmonchai August 1st, q In-depth discussion of CMOS logic families. q Optimizing gate metrics. q High Performance circuit-design techniques

CSE140: Components and Design Techniques for Digital Systems. Decoders, adders, comparators, multipliers and other ALU elements. Tajana Simunic Rosing

CSE370 HW3 Solutions (Winter 2010)

Dynamic Combinational Circuits. Dynamic Logic

CMPEN 411 VLSI Digital Circuits Spring 2011 Lecture 07: Pass Transistor Logic

Datapath Component Tradeoffs

Slide Set 6. for ENEL 353 Fall Steve Norman, PhD, PEng. Electrical & Computer Engineering Schulich School of Engineering University of Calgary

9. Datapath Design. Jacob Abraham. Department of Electrical and Computer Engineering The University of Texas at Austin VLSI Design Fall 2017

C.K. Ken Yang UCLA Courtesy of MAH EE 215B

Arithmetic Circuits How to add and subtract using combinational logic Setting flags Adding faster

Miscellaneous Lecture topics. Mary Jane Irwin [Adapted from Rabaey s Digital Integrated Circuits, 2002, J. Rabaey et al.]

8. Design Tradeoffs x Computation Structures Part 1 Digital Circuits. Copyright 2015 MIT EECS

Lecture 8: Combinational Circuit Design

8. Design Tradeoffs x Computation Structures Part 1 Digital Circuits. Copyright 2015 MIT EECS

L8/9: Arithmetic Structures

CPE/EE 427, CPE 527 VLSI Design I L07: CMOS Logic Gates, Pass Transistor Logic. Review: CMOS Circuit Styles

Digital Integrated Circuits A Design Perspective

Lecture 4: Implementing Logic in CMOS

Announcements. EE141-Spring 2007 Digital Integrated Circuits. CMOS SRAM Analysis (Read/Write) Class Material. Layout. Read Static Noise Margin

EEC 216 Lecture #3: Power Estimation, Interconnect, & Architecture. Rajeevan Amirtharajah University of California, Davis

Prove that if not fat and not triangle necessarily means not green then green must be fat or triangle (or both).

Binary addition by hand. Adding two bits

S No. Questions Bloom s Taxonomy Level UNIT-I

Logic Synthesis and Verification

UNIT 8A Computer Circuitry: Layers of Abstraction. Boolean Logic & Truth Tables

Appendix A: Digital Logic. Principles of Computer Architecture. Principles of Computer Architecture by M. Murdocca and V. Heuring

ESE 570: Digital Integrated Circuits and VLSI Fundamentals

Hardware Design I Chap. 4 Representative combinational logic

Topics. Dynamic CMOS Sequential Design Memory and Control. John A. Chandy Dept. of Electrical and Computer Engineering University of Connecticut

Lecture 6: Logical Effort

Digital EE141 Integrated Circuits 2nd Combinational Circuits

Digital Integrated Circuits A Design Perspective

Logic Synthesis. Late Policies. Wire Gauge. Schematics & Wiring

Pass-Transistor Logic

ALUs and Data Paths. Subtitle: How to design the data path of a processor. 1/8/ L3 Data Path Design Copyright Joanne DeGroat, ECE, OSU 1

ECE429 Introduction to VLSI Design

EE 447 VLSI Design. Lecture 5: Logical Effort

Ch 4. Combinational Logic Technologies. IV - Combinational Logic Technologies Contemporary Logic Design 1

SRC Language Conventions. Class 6: Intro to SRC Simulator Register Transfers and Logic Circuits. SRC Simulator Demo. cond_br.asm.

CMOS logic gates. João Canas Ferreira. March University of Porto Faculty of Engineering

Chapter 2. Introduction. Chapter 2 :: Topics. Circuits. Nodes. Circuit elements. Introduction

Arithmetic Circuits-2

Logic. Basic Logic Functions. Switches in series (AND) Truth Tables. Switches in Parallel (OR) Alternative view for OR

Optimum Prefix Adders in a Comprehensive Area, Timing and Power Design Space

CprE 281: Digital Logic

9/18/2008 GMU, ECE 680 Physical VLSI Design

Possible logic functions of two variables

Static CMOS Circuits. Example 1

CMSC 313 Lecture 18 Midterm Exam returned Assign Homework 3 Circuits for Addition Digital Logic Components Programmable Logic Arrays

Chapter 5 CMOS Logic Gate Design

Transcription:

Where are we? Subsystem Design Registers and Register Files dders and LUs Simple ripple carry addition Transistor schematics Faster addition Logic generation How it fits into the datapath Data Path Design lock-diagram style data path description

it Slice Design ontrol it 3 Data-In Register dder Shifter Multiplexer it 2 it it 0 Data-Out Layout Reality Tile identical processing elements it Slice Design ontrol it 3 Data-In Register dder Shifter Multiplexer it 2 it it 0 Data-Out Tile identical processing elements Layout Reality 2

it Slice Plan Recall planning a DFF to make a register Inputs on top in M2 Outputs on bottom in M2 lock and lock-bar routed horizontally in M D2 Qb2 Q2 D Qb Q D0 Qb0 Q0 Vdd b Vss it Slice Plan Now extend this to a register file D inputs go to all cells an select one register for writing by controlling the clock Q outputs go all the way through the register file Each cell can drive Q from enabled inverter Now you can select one register for reading by selecting D2 which cell is driving its output D D0 b En b En Q2 Q Q0 3

it Slice Plan b En b En b En b En Q0 D0 Q D Q2 D2 it Slice Design ontrol it 3 Data-In Register dder Shifter Multiplexer it 2 it it 0 Data-Out Tile identical processing elements 4

Multi-Port Register Re Re0 Multi-Port Register 5

it Slice Design ontrol it 3 Data-In Register dder Shifter Multiplexer it 2 it it 0 Data-Out Tile identical processing elements Where are power lines? it Slice Design ontrol it 3 Data-In Register dder Shifter Multiplexer it 2 it it 0 Data-Out Tile identical processing elements Where are power lines? asic omb scheme 6

Power Routing is a global chipwide issue Here s another approach Note the Vdd and Gnd pads Global rings with combs for regions of the chip hip-wide View of Power Power Routing is a global chipwide issue Here s another approach Note the Vdd and Gnd pads Global rings with combs for regions of the chip hip-wide View of Power 7

ore power routing ore power routing 8

hip-wide View of Power nother view of the same issue Watch out for routing blockages! Tweak on the Scheme Same basic scheme ut with no internal jumpers Jumpers are restricted to outer loops 9

dders Etc. heck out hapter 0 in your text asic ddition: Full dder in Full adder Sum out 0

oolean Equations in Full adder Sum out Direct Implementation Fig 0.3 in your text 32 transistors

Use the Factored Equations V DD V DD i i V DD X i i S i V DD i o 28 Transistors Fully static, complex gate implementation Getting Rid of Inverters Even ell Odd ell 0 0 2 2 3 3 i,0 o,0 o, o,2 o,3 F F F F S 0 S S 2 S 3 Exploit Inversion Property Note: need 2 different types of cells an improve performance by removing inverters from carry chain 2

etter Static Gate ombine gates and reuse subterms etter Static Gate Sometimes called a mirror adder 3

Mirror dder onsiderations Feed the arry-in to the inner inputs so the internal capacitance is already discharged Make all transistors whose gates are connected to in and carry logic minimum size minimizes branching effort on critical path (carry out) Determine gate widths by Logical Effort reduce effort from to out at the expense of Sum Use relatively large transistors on critical path so that stray wiring cap is a small fraction of overall cap dder Layout Examples from Weste and Eshraghian Standard ell vs. Datapath Definitely worth looking at carefully 4

Datapath Layout little tricky to figure out You may not want to use this exact layout, but it might give you ideas Start by identifying vdd and gnd paths Think about rotating it ccw Think about a taller circuit that matches the bit-pitch of your register Datapath Layout 5

Example Datapath Layout ddition and Subtraction Remember back to your logic design class dd the two s complement to subtract Take two s complement by inverting all the bits and adding one Use the carry-in to add one Use an XOR to invert or not 0 0 Out 0 0 0 0 6

Two s omplement dd/sub side: XOR Gates Slightly tricky gate, ~ + ~ Lots of different schematics 7

nother XOR gate Not too bad if you already have, ~,, ~ floating around If not, you ll need a couple inverters too ~ ~ ~ ~ XOR ~ ~ ~ XNOR ~ Yet nother XOR Gate DVSL (section 6.2.3 in your text) Differential ascode Voltage Switch Logic Make sure that the combinational pull-down networks are complementary Out ~Out Differential Inputs PDN PDN2 8

DVSL XOR/XNOR Out ~Out ~ ~ ~ Generates both XOR/XNOR Still static, but might be slower than others nother DVSL Example Out ~Out D E ~D ~ ~E ~ ~ Pull-down stacks must be complementary 9

DVSL Large XOR Out ~Out Four-input XOR aka odd parity ~D D ~D D ~ ~ ~ ~ ~ DVSL Large XOR Out ~Out Four-input XOR aka odd parity ~D D ~D D ~ ~ ~ ~ ~ 20

DVSL Large XOR Out ~Out Four-input XOR aka odd parity ~D D ~D D ~ ~ ~ ~ ~ Transmission Gate XOR Tiny, clever circuit If is high, N, P act like inverter If is low, is passed to the output through transmission gate 2

Transmission Gate dder V DD V DD nother Version P P i i P P P V DD P S Sum Generation V DD o arry Generation i i Setup i P 22

Yet nother Version n Example Layout Not the same style we re used to seeing 23

More Pass Transistors omplementary Pass Transistor Logic (PL) Slightly faster, but more area S out S out Speeding Up ddition It all comes back to the carry circuit Ripple carry delay goes from low-order to high-order bit This determines the speed of the addition Many many ways to speed up the carry calculation Section 0.2.2 in your text 24

arry Lookahead Sum = P + i Key is that the carry depends ONLY on and, not the carry-in atch is that the gates have large fan-in - arry Lookahead Restated: i = Gi + Pi (i-) 0 = G0 + P0 in = G + P 0 = G + P(G0 + P0 in) = G + P G0 + P P0 in 2 = G2 + P2G2 + P2PG0 + P2PP0in 3 = G + P3G2 + P3P2G + P3P2PG0 + P3P2PP0in Or 3 = G3 + P3(G2 +P2( G + P(G0 + P0 in))) 25

arry Lookahead 0, 0, N-, N-... i,0 P0 i, P i,n - P N- The equations get larger with each stage Usually do lookahead in small blocks (I.e. 4) and the combine in a tree... arry Lookahead Logic 26

Fast arry Lookahead Logic Pseudo-nMOS Uses lots of current! nother Version V DD G 3 G 2 G G 0 i,0 o,3 P 0 P P 2 P 3 27

nother View nother View 4 4 3 3 2 2 in G 4 P 4 G 3 P 3 G 2 P 2 G P G 0 P 0 : itwise PG logic 2: Group PG logic G 3:0 G 2:0 G :0 G 0:0 3 2 0 3: Sum logic 4 out S 4 S 3 S 2 S 28

Ripple arry 4 4 3 3 2 2 in G 4 P 4 G 3 P 3 G 2 P 2 G P G 0 P 0 G 3:0 G 2:0 G :0 G 0:0 3 2 0 4 out S 4 S 3 S 2 S PG Diagram Notation lack cell Gray cell uffer i:k k-:j i:k k-:j i:j i:j i:j i:j G i:k P i:k G k-:j G i:j G i:k P i:k G i:j G k-:j G i:j G i:j P k-:j P i:j P i:j P i:j 29

Ripple arry t = t + ( N ) t + t ripple pg O xor 5 4 3 2 0 9 it Position 8 7 6 5 4 3 2 0 Delay 5:0 4:0 3:0 2:0 :0 0:0 9:0 8:0 7:0 6:0 5:0 4:0 3:0 2:0 :0 0:0 arry-lookahead dder arry-lookahead adder computes G i:0 for many bits in parallel. Uses higher-valency cells with more than two inputs. 6:3 6:3 2:9 2:9 8:5 8:5 4: 4: out G 6:3 P 6:3 2 G 2:9 P 2:9 8 G 8:5 P 8:5 4 G 4: P 4: + + + + in S 6:3 S 2:9 S 8:5 S 4: 30

L PG Diagram 6 5 4 3 2 0 9 8 7 6 5 4 3 2 0 6:0 5:0 4:0 3:0 2:0 :0 0:0 9:0 8:0 7:0 6:0 5:0 4:0 3:0 2:0 :0 0:0 Higher-Valency ells i:k k-:l l-:m m-:j i:j G i:k P i:k G k-:l P k-:l G l-:m P l-:m G m-:j P m-:j G i:j P i:j 3

arry-select dder arry-select ompute result for a block based on carry-in of and carry-in of 0, then select the right one arry-select dder Trick for critical paths dependent on late input X Precompute two possible outputs for X = 0, Select proper output when X arrives arry-select adder precomputes n-bit sums For both possible carries into n-bit group 6:3 6:3 2:9 2:9 8:5 8:5 4: 4: + 0 + 0 + 0 out + 2 + 8 + 4 + in 0 0 0 S 6:3 S 2:9 S 8:5 S 4: 32

arry-skip dder ompute the P and G for an entire block If the block generates or kills, don t propagate 6:3 6:3 2:9 2:9 8:5 8:5 4: 4: P 6:3 P 2:9 P 8:5 2 8 4 out 0 + 0 + 0 + 0 P 4: + in S 6:3 S 2:9 S 8:5 S 4: arry-skip PG Diagram 6 5 4 3 2 0 9 8 7 6 5 4 3 2 0 6:0 5:0 4:0 3:0 2:0 :0 0:0 9:0 8:0 7:0 6:0 5:0 4:0 3:0 2:0 :0 0:0 ( ) tskip = tpg + 2 n + ( k ) to + t For k n-bit groups (N = nk) xor 33

Tree dder If lookahead is good, lookahead across lookahead! Recursive lookahead gives O(log N) delay Many variations on tree adders rent-kung 5 4 3 2 0 9 8 7 6 5 4 3 2 0 5:4 3:2 :0 9:8 7:6 5:4 3:2 :0 5:2 :8 7:4 3:0 5:8 7:0 :0 3:0 9:0 5:0 5:04:03:02:0:00:0 9:0 8:0 7:0 6:0 5:0 4:0 3:0 2:0 :0 0:0 34

Sklansky 5 4 3 2 0 9 8 7 6 5 4 3 2 0 5:4 3:2 :0 9:8 7:6 5:4 3:2 :0 5:2 4:2 :8 0:8 7:4 6:4 3:0 2:0 5:8 4:8 3:8 2:8 5:04:03:02:0:00:0 9:0 8:0 7:0 6:0 5:0 4:0 3:0 2:0 :0 0:0 Kogge-Stone 5 4 3 2 0 9 8 7 6 5 4 3 2 0 5:4 4:3 3:2 2: :0 0:9 9:8 8:7 7:6 6:5 5:4 4:3 3:2 2: :0 5:2 4: 3:0 2:9 :8 0:7 9:6 8:5 7:4 6:3 5:2 4: 3:0 2:0 5:8 4:7 3:6 2:5 :4 0:3 9:2 8: 7:0 6:0 5:0 4:0 5:04:03:02:0:00:0 9:0 8:0 7:0 6:0 5:0 4:0 3:0 2:0 :0 0:0 35

Manchester arry hain Instead of changing the architecture of the adder, use a clever circuit to ripple the carry more effectively lternate Implementation 36

Four it lock Summary dder architectures offer area / power / delay tradeoffs. hoose the best one for your application. rchitecture arry-ripple lassification Logic Levels N- Max Fanout Tracks ells N arry-skip n=4 N/4 + 5 2.25N arry-inc. n=4 N/4 + 2 4 2N rent-kung Sklansky (L-, 0, 0) (0, L-, 0) 2log 2 N log 2 N 2 N/2 + 2N 0.5 Nlog 2 N Kogge-Stone (0, 0, L-) log 2 N 2 N/2 Nlog 2 N 37

Design as Trade-Off t p (nse c) 80.0 60.0 40.0 20.0 static mirror manchester bypass select look-ahead 0.2 rea (mm 2 )0.4 look-ahead select static bypass mirror manchester 0.0 0 0 20 N 0.0 0 0 20 N Do you want speed or size? There s always power to consider too How well does Synopsys do? Design compiler using a 80nm library 38

What should you use? Ripple if timing allows ompact, easy L or carry-skip work well for 8-6 bits For 32, and especially 64 bits tree adders are faster dders designed and tiled by hand will be much smaller (and probably faster) than synthesized adders Logic Functions Use the features of the full adder cell to generate logic functions Lots of other ideas in your text 39

General Logic Generator One Possible MUX Version 40

Remember the ig Picture ontrol it 3 Data-In Register dder Shifter Multiplexer it 2 it it 0 Data-Out Tile identical processing elements We want things to stack up nicely in the datapath Shifters Right nop Left i i i- i- it-slice i Essentially a muxing... operation select the shift you want (section 0.8) 4

arrel Shifter 3 3 Sh 2 2 Sh2 : Data Wire : ontrol Wire Sh3 0 0 Sh0 Sh Sh2 Sh3 Shift any number of bits in one shot lever layout is possible Lots of wiring arrel Shifter 3 3 3 Sh 2 2 3 Sh2 : Data Wire 3 : ontrol Wire Sh3 0 0 2 Sh0 Sh Sh2 Sh3 Shift any number of bits in one shot lever layout is possible Lots of wiring 42

Four by Four arrel Shifter 3 2 0 Sh0 Sh Sh2 Sh3 uffer Note the zig-zag control wire in poly Logarithmic Shifter Sh Sh Sh2 Sh2 Sh4 Sh4 3 3 2 2 0 0 43

Logarithmic Shifter Layout 44