Where are we? Data Path Design. Bit Slice Design. Bit Slice Design. Bit Slice Plan

Similar documents
Where are we? Data Path Design

Lecture 11: Adders. Slides courtesy of Deming Chen. Slides based on the initial set from David Harris. 4th Ed.

EECS 427 Lecture 8: Adders Readings: EECS 427 F09 Lecture 8 1. Reminders. HW3 project initial proposal: due Wednesday 10/7

EE141-Fall 2010 Digital Integrated Circuits. Announcements. An Intel Microprocessor. Bit-Sliced Design. Class Material. Last lecture.

Digital Integrated Circuits A Design Perspective

Arithmetic Building Blocks

Bit-Sliced Design. EECS 141 F01 Arithmetic Circuits. A Generic Digital Processor. Full-Adder. The Binary Adder

Digital Integrated Circuits A Design Perspective. Arithmetic Circuits. Jan M. Rabaey Anantha Chandrakasan Borivoje Nikolic.

Hw 6 due Thursday, Nov 3, 5pm No lab this week

Homework 4 due today Quiz #4 today In class (80min) final exam on April 29 Project reports due on May 4. Project presentations May 5, 1-4pm

CMPEN 411 VLSI Digital Circuits Spring Lecture 19: Adder Design

VLSI Design. [Adapted from Rabaey s Digital Integrated Circuits, 2002, J. Rabaey et al.] ECE 4121 VLSI DEsign.1

Digital Integrated Circuits A Design Perspective. Arithmetic Circuits. Jan M. Rabaey Anantha Chandrakasan Borivoje Nikolic.

Full Adder Ripple Carry Adder Carry-Look-Ahead Adder Manchester Adders Carry Select Adder

Digital Integrated Circuits A Design Perspective. Arithmetic Circuits

Digital Integrated Circuits A Design Perspective

Lecture 4. Adders. Computer Systems Laboratory Stanford University

VLSI Design I; A. Milenkovic 1

EE141. Administrative Stuff

Overview. Arithmetic circuits. Binary half adder. Binary full adder. Last lecture PLDs ROMs Tristates Design examples

Properties of CMOS Gates Snapshot

CSE140: Components and Design Techniques for Digital Systems. Logic minimization algorithm summary. Instructor: Mohsen Imani UC San Diego

Arithmetic Circuits Didn t I learn how to do addition in the second grade? UNC courses aren t what they used to be...

ISSN (PRINT): , (ONLINE): , VOLUME-4, ISSUE-10,

CSE477 VLSI Digital Circuits Fall Lecture 20: Adder Design

Review. EECS Components and Design Techniques for Digital Systems. Lec 18 Arithmetic II (Multiplication) Computer Number Systems

CMOS Digital Integrated Circuits Lec 10 Combinational CMOS Logic Circuits

CPE/EE 427, CPE 527 VLSI Design I Pass Transistor Logic. Review: CMOS Circuit Styles

Arithmetic Circuits-2

Computer Science 324 Computer Architecture Mount Holyoke College Fall Topic Notes: Digital Logic

GALOP : A Generalized VLSI Architecture for Ultrafast Carry Originate-Propagate adders

Arithmetic Circuits-2

Floating Point Representation and Digital Logic. Lecture 11 CS301

Fundamentals of Computer Systems

Lecture 6: Circuit design part 1

Combinational Logic Design

EECS150. Arithmetic Circuits

Spiral 2 7. Capacitance, Delay and Sizing. Mark Redekopp

Logical Effort: Designing for Speed on the Back of an Envelope David Harris Harvey Mudd College Claremont, CA

B.Supmonchai August 1st, q In-depth discussion of CMOS logic families. q Optimizing gate metrics. q High Performance circuit-design techniques

EFFICIENT MULTIOUTPUT CARRY LOOK-AHEAD ADDERS

Area-Time Optimal Adder with Relative Placement Generator

CPE/EE 427, CPE 527 VLSI Design I L18: Circuit Families. Outline

Based on slides/material by. Topic 3-4. Combinational Logic. Outline. The CMOS Inverter: A First Glance

EE141- Spring 2004 Digital Integrated Circuits

Adders allow computers to add numbers 2-bit ripple-carry adder

UNIT III Design of Combinational Logic Circuits. Department of Computer Science SRM UNIVERSITY

Basic Gate Repertoire

CMPEN 411 VLSI Digital Circuits Spring Lecture 21: Shifters, Decoders, Muxes

EE115C Digital Electronic Circuits Homework #6

MASSACHUSETTS INSTITUTE OF TECHNOLOGY Department of Electrical Engineering and Computer Sciences

CSE140: Components and Design Techniques for Digital Systems. Decoders, adders, comparators, multipliers and other ALU elements. Tajana Simunic Rosing

CSE370 HW3 Solutions (Winter 2010)

C.K. Ken Yang UCLA Courtesy of MAH EE 215B

Dynamic Combinational Circuits. Dynamic Logic

Lecture 8: Combinational Circuit Design

Lecture 4: Implementing Logic in CMOS

CPE/EE 427, CPE 527 VLSI Design I L07: CMOS Logic Gates, Pass Transistor Logic. Review: CMOS Circuit Styles

CMPEN 411 VLSI Digital Circuits Spring 2011 Lecture 07: Pass Transistor Logic

Slide Set 6. for ENEL 353 Fall Steve Norman, PhD, PEng. Electrical & Computer Engineering Schulich School of Engineering University of Calgary

Datapath Component Tradeoffs

9. Datapath Design. Jacob Abraham. Department of Electrical and Computer Engineering The University of Texas at Austin VLSI Design Fall 2017

Prove that if not fat and not triangle necessarily means not green then green must be fat or triangle (or both).

Arithmetic Circuits How to add and subtract using combinational logic Setting flags Adding faster

Miscellaneous Lecture topics. Mary Jane Irwin [Adapted from Rabaey s Digital Integrated Circuits, 2002, J. Rabaey et al.]

8. Design Tradeoffs x Computation Structures Part 1 Digital Circuits. Copyright 2015 MIT EECS

8. Design Tradeoffs x Computation Structures Part 1 Digital Circuits. Copyright 2015 MIT EECS

L8/9: Arithmetic Structures

Digital Integrated Circuits A Design Perspective

Digital EE141 Integrated Circuits 2nd Combinational Circuits

EEC 216 Lecture #3: Power Estimation, Interconnect, & Architecture. Rajeevan Amirtharajah University of California, Davis

Announcements. EE141-Spring 2007 Digital Integrated Circuits. CMOS SRAM Analysis (Read/Write) Class Material. Layout. Read Static Noise Margin

Digital Integrated Circuits A Design Perspective

Binary addition by hand. Adding two bits

S No. Questions Bloom s Taxonomy Level UNIT-I

CMOS logic gates. João Canas Ferreira. March University of Porto Faculty of Engineering

SRC Language Conventions. Class 6: Intro to SRC Simulator Register Transfers and Logic Circuits. SRC Simulator Demo. cond_br.asm.

Logic Synthesis and Verification

Optimum Prefix Adders in a Comprehensive Area, Timing and Power Design Space

ESE 570: Digital Integrated Circuits and VLSI Fundamentals

Appendix A: Digital Logic. Principles of Computer Architecture. Principles of Computer Architecture by M. Murdocca and V. Heuring

UNIT 8A Computer Circuitry: Layers of Abstraction. Boolean Logic & Truth Tables

Hardware Design I Chap. 4 Representative combinational logic

Topics. Dynamic CMOS Sequential Design Memory and Control. John A. Chandy Dept. of Electrical and Computer Engineering University of Connecticut

Static CMOS Circuits. Example 1

Lecture 6: Logical Effort

Chapter 5 CMOS Logic Gate Design

Logic Synthesis. Late Policies. Wire Gauge. Schematics & Wiring

ALUs and Data Paths. Subtitle: How to design the data path of a processor. 1/8/ L3 Data Path Design Copyright Joanne DeGroat, ECE, OSU 1

Pass-Transistor Logic

ECE429 Introduction to VLSI Design

EE 447 VLSI Design. Lecture 5: Logical Effort

Ch 4. Combinational Logic Technologies. IV - Combinational Logic Technologies Contemporary Logic Design 1

Chapter 2. Introduction. Chapter 2 :: Topics. Circuits. Nodes. Circuit elements. Introduction

Arithmetic Circuits-2

COMBINATIONAL LOGIC. Combinational Logic

Logic. Basic Logic Functions. Switches in series (AND) Truth Tables. Switches in Parallel (OR) Alternative view for OR

CprE 281: Digital Logic

9/18/2008 GMU, ECE 680 Physical VLSI Design

Possible logic functions of two variables

Transcription:

Where are we? Data Path Design Subsystem Design Registers and Register Files dders and LUs Simple ripple carry addition Transistor schematics Faster addition Logic generation How it fits into the datapath lock-diagram style data path description it Slice Design ontrol it Slice Design ontrol it 3 it 3 Data-In Register dder Shifter Multiplexer it 2 it it Data-Out Data-In Register dder Shifter Multiplexer it 2 it it Data-Out Layout Reality Tile identical processing elements Tile identical processing elements Layout Reality it Slice Plan Recall planning a DFF to make a register Inputs on top in M2 Outputs on bottom in M2 lock and lock-bar routed horizontally in M D2 Qb2 Q2 D Qb Q D Qb Vdd b Q Vss it Slice Plan Now extend this to a register file D inputs go to all cells an select one register for writing by controlling the clock Q outputs go all the way through the register file Each cell can drive Q from enabled inverter Now you can select one register for reading by selecting which cell is driving its output D2 D Q2 Q D Q b En b En

it Slice Plan b En b En b En b En it Slice Design ontrol Q D it 3 D Q Data-In Register dder Shifter Multiplexer it 2 it it Data-Out D2 Q2 Tile identical processing elements Multi-Port Register Multi-Port Register Re Re it Slice Design ontrol it Slice Design ontrol it 3 it 3 Data-In Register dder Shifter Multiplexer it 2 it it Data-Out Data-In Register dder Shifter Multiplexer it 2 it it Data-Out Tile identical processing elements Where are power lines? Tile identical processing elements Where are power lines? asic omb scheme 2

hip-wide View of Power Power Routing is a global chipwide issue Here s another approach Note the Vdd and Gnd pads Global rings with combs for regions of the chip hip-wide View of Power Power Routing is a global chipwide issue Here s another approach Note the Vdd and Gnd pads Global rings with combs for regions of the chip ore power routing ore power routing hip-wide View of Power nother view of the same issue Watch out for routing blockages! Tweak on the Scheme Same basic scheme ut with no internal jumpers Jumpers are restricted to outer loops 3

dders Etc. asic ddition: Full dder heck out hapter in your text in Full adder out Sum in oolean Equations Full adder Sum out Direct Implementation Fig.3 in your text 32 transistors Use the Factored Equations VDD V DD i i X V DD i i S i V DD i i, Getting Rid of Inverters Even ell Odd ell 2 2 3 3 o, o, o,2 o,3 F F F F S S S 2 S 3 o Exploit Inversion Property 28 Transistors Fully static, complex gate implementation Note: need 2 different types of cells an improve performance by removing inverters from carry chain 4

etter Static Gate etter Static Gate Sometimes called a mirror adder ombine gates and reuse subterms Mirror dder onsiderations Feed the arry-in to the inner inputs so the internal capacitance is already discharged Make all transistors whose gates are connected to in and carry logic minimum size minimizes branching effort on critical path (carry out) Determine gate widths by Logical Effort reduce effort from to out at the expense of Sum Use relatively large transistors on critical path so that stray wiring cap is a small fraction of overall cap dder Layout Examples from Weste and Eshraghian Standard ell vs. Datapath Definitely worth looking at carefully Datapath Layout Datapath Layout little tricky to figure out You may not want to use this exact layout, but it might give you ideas Start by identifying vdd and gnd paths Think about rotating it ccw Think about a taller circuit that matches the bit-pitch of your register 5

Example Datapath Layout ddition and Subtraction Remember back to your logic design class dd the two s complement to subtract Take two s complement by inverting all the bits and adding one Use the carry-in to add one Out Use an XOR to invert or not Two s omplement dd/sub side: XOR Gates Slightly tricky gate, ~ ~ Lots of different schematics nother XOR gate Not too bad if you already have, ~,, ~ floating around If not, you ll need a couple inverters too Yet nother XOR Gate DVSL (section 6.2.3 in your text) Differential ascode Voltage Switch Logic Make sure that the combinational pull-down networks are complementary ~ ~ ~ ~ ~ XOR ~ ~ ~ XNOR Out Differential Inputs PDN PDN2 ~Out 6

DVSL XOR/XNOR nother DVSL Example Out ~Out Out ~Out ~ ~ ~ D E ~D ~ ~E ~ Generates both XOR/XNOR Still static, but might be slower than others ~ Pull-down stacks must be complementary DVSL Large XOR Out Four-input XOR aka odd parity ~Out DVSL Large XOR Out ~Out Four-input XOR aka odd parity ~D D ~D D ~D D ~D D ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ DVSL Large XOR Out Four-input XOR aka odd parity ~Out Transmission Gate XOR ~D D ~D D ~ ~ ~ ~ ~ Tiny, clever circuit If is high, N, P act like inverter If is low, is passed to the output through transmission gate 7

Transmission Gate dder nother Version P V DD V DD P i P i S Sum Generation V DD P P P V DD o arry Generation i i Setup i P Yet nother Version n Example Layout Not the same style we re used to seeing More Pass Transistors omplementary Pass Transistor Logic (PL) Slightly faster, but more area Speeding Up ddition It all comes back to the carry circuit Ripple carry delay goes from low-order to high-order bit This determines the speed of the addition S S out out Many many ways to speed up the carry calculation Section.2.2 in your text 8

arry Lookahead Sum = P i Key is that the carry depends ONLY on and, not the carry-in atch is that the gates have large fan-in - arry Lookahead Restated: i = Gi Pi (i-) = G P in = G P = G P(G P in) = G P G P P in 2 = G2 P2G2 P2PG P2PPin 3 = G P3G2 P3P2G P3P2PG P3P2PPin Or 3 = G3 P3(G2 P2( G P(G P in))) arry Lookahead,, N-, N-... arry Lookahead Logic i, P i, P i,n - PN- The equations get larger with each stage Usually do lookahead in small blocks (I.e. 4) and the combine in a tree... Fast arry Lookahead Logic nother Version VDD G3 G2 G Pseudo-nMOS Uses lots of current! i, P G o,3 P P2 P3 9

nother View nother View 4 4 3 3 2 2 in G 4 P 4 G 3 P 3 G 2 P 2 G P G P : itwise PG logic 2: Group PG logic G 3: G 2: G : G : 3 2 3: Sum logic 4 out S 4 S 3 S 2 S Ripple arry PG Diagram Notation 4 4 3 3 2 2 in lack cell Gray cell uffer G 4 P 4 G 3 P 3 G 2 P 2 G P G P i:k k-:j i:k k-:j i:j i:j i:j i:j G 3: G 2: G : G : 3 2 G i:k P i:k G k-:j G i:j G i:k P i:k G i:j G k-:j G i:j G i:j 4 P k-:j P i:j P i:j P i:j out S 4 S 3 S 2 S Ripple arry arry-lookahead dder t = t ( N ) t t ripple pg O xor 5 4 3 2 9 it Position 8 7 6 5 4 3 2 arry-lookahead adder computes G i: for many bits in parallel. Uses higher-valency cells with more than two inputs. 6:3 6:3 2:9 2:9 8:5 8:5 4: 4: Delay out G 6:3 2 G 2:9 8 G 8:5 4 G 4: P 6:3 P 2:9 P 8:5 P 4: in S 6:3 S 2:9 S 8:5 S 4: 5: 4: 3: 2: : : 9: 8: 7: 6: 5: 4: 3: 2: : :

L PG Diagram Higher-Valency ells 6 5 4 3 2 9 8 7 6 5 4 3 2 i:k k-:ll-:m m-:j i:j G i:k P i:k G k-:l P k-:l G l-:m P l-:m G m-:j G i:j 6: 5: 4: 3: 2: : : 9: 8: 7: 6: 5: 4: 3: 2: : : P m-:j P i:j arry-select dder arry-select dder arry-select ompute result for a block based on carry-in of and carry-in of, then select the right one Trick for critical paths dependent on late input X Precompute two possible outputs for X =, Select proper output when X arrives arry-select adder precomputes n-bit sums For both possible carries into n-bit group 6:3 6:3 2:9 2:9 8:5 8:5 4: 4: out 2 8 4 in S 6:3 S 2:9 S 8:5 S 4: arry-skip dder arry-skip PG Diagram ompute the P and G for an entire block If the block generates or kills, don t propagate 6 5 4 3 2 9 8 7 6 5 4 3 2 6:3 6:3 2:9 2:9 8:5 8:5 4: 4: P 6:3 P 2:9 P 8:5 P 4: 2 8 4 out in 6: 5: 4: 3: 2: : : 9: 8: 7: 6: 5: 4: 3: 2: : : S 6:3 S 2:9 S 8:5 S 4: tskip = tpg 2( n ) ( k ) to txor For k n-bit groups (N = nk)

Tree dder rent-kung If lookahead is good, lookahead across lookahead! Recursive lookahead gives O(log N) delay Many variations on tree adders 5:4 5:2 5 4 3:2 3 2 : :8 9:8 9 8 7 7:6 7:4 6 5:4 5 4 3 3:2 3: 2 : 5:8 7: : 3: 9: 5: 5:4:3:2::: 9: 8: 7: 6: 5: 4: 3: 2: : : Sklansky Kogge-Stone 5 4 3 2 9 8 7 6 5 4 3 2 5 4 3 2 9 8 7 6 5 4 3 2 5:4 4:3 3:2 2: : :9 9:8 8:7 7:6 6:5 5:4 4:3 3:2 2: : 5:4 3:2 : 9:8 7:6 5:4 3:2 : 5:2 4:2 :8 :8 7:4 6:4 3: 2: 5:2 4: 3: 2:9 :8 :7 9:6 8:5 7:4 6:3 5:2 4: 3: 2: 5:8 4:8 3:8 2:8 5:8 4:7 3:6 2:5 :4 :3 9:2 8: 7: 6: 5: 4: 5:4:3:2::: 9: 8: 7: 6: 5: 4: 3: 2: : : 5:4:3:2::: 9: 8: 7: 6: 5: 4: 3: 2: : : Manchester arry hain Instead of changing the architecture of the adder, use a clever circuit to ripple the carry more effectively lternate Implementation 2

Four it lock Summary dder architectures offer area / power / delay tradeoffs. hoose the best one for your application. rchitecture arry-ripple arry-skip n=4 arry-inc. n=4 rent-kung Sklansky Kogge-Stone lassification Logic Max Tracks Levels Fanout N- N/4 5 2 N/4 2 4 (L-,, ) 2log 2 N 2 (, L-, ) log 2 N N/2 (,, L-) log 2 N 2 N/2 ells N.25N 2N 2N.5 Nlog 2 N Nlog 2 N Design as Trade-Off How well does Synopsys do? t p (nse c) 8. 6. 4. 2. sta tic mirror manchester bypass select look-ahead.2 rea (mm 2 ).4 look-ahead select static bypass mirror manchester. 2 N. 2 N Do you want speed or size? There s always power to consider too Design compiler using a 8nm library What should you use? Ripple if timing allows ompact, easy L or carry-skip work well for 8-6 bits For 32, and especially 64 bits tree adders are faster dders designed and tiled by hand will be much smaller (and probably faster) than synthesized adders Logic Functions Use the features of the full adder cell to generate logic functions Lots of other ideas in your text 3

General Logic Generator One Possible MUX Version Remember the ig Picture ontrol Shifters Right nop Left it 3 i i Data-In Register dder Shifter Multiplexer it 2 it it Data-Out i- i- Tile identical processing elements We want things to stack up nicely in the datapath it-slice i Essentially a muxing... operation select the shift you want (section.8) arrel Shifter arrel Shifter 3 3 3 3 3 2 Sh 2 2 Sh 2 3 Sh2 : Data Wire Sh2 : Data Wire : ontrol Wire 3 : ontrol Wire Sh3 Sh3 2 Sh Sh Sh2 Shift any number of bits in one shot lever layout is possible Lots of wiring Sh3 Sh Sh Sh2 Shift any number of bits in one shot lever layout is possible Lots of wiring Sh3 4

Four by Four arrel Shifter Logarithmic Shifter Sh Sh Sh2 Sh2 Sh4 Sh4 3 2 3 3 2 2 Sh Sh Sh2 Sh3 uffer Note the zig-zag control wire in poly Logarithmic Shifter Layout 5