Homework 4 due today Quiz #4 today In class (80min) final exam on April 29 Project reports due on May 4. Project presentations May 5, 1-4pm

Similar documents
Digital Integrated Circuits A Design Perspective. Arithmetic Circuits. Jan M. Rabaey Anantha Chandrakasan Borivoje Nikolic.

EECS 427 Lecture 8: Adders Readings: EECS 427 F09 Lecture 8 1. Reminders. HW3 project initial proposal: due Wednesday 10/7

Digital Integrated Circuits A Design Perspective. Arithmetic Circuits. Jan M. Rabaey Anantha Chandrakasan Borivoje Nikolic.

ISSN (PRINT): , (ONLINE): , VOLUME-4, ISSUE-10,

CMPEN 411 VLSI Digital Circuits Spring Lecture 19: Adder Design

Hw 6 due Thursday, Nov 3, 5pm No lab this week

VLSI Design. [Adapted from Rabaey s Digital Integrated Circuits, 2002, J. Rabaey et al.] ECE 4121 VLSI DEsign.1

Where are we? Data Path Design

CSE477 VLSI Digital Circuits Fall Lecture 20: Adder Design

Digital Integrated Circuits A Design Perspective

EE141-Fall 2010 Digital Integrated Circuits. Announcements. An Intel Microprocessor. Bit-Sliced Design. Class Material. Last lecture.

Where are we? Data Path Design. Bit Slice Design. Bit Slice Design. Bit Slice Plan

Lecture 4. Adders. Computer Systems Laboratory Stanford University

Lecture 11: Adders. Slides courtesy of Deming Chen. Slides based on the initial set from David Harris. 4th Ed.

Bit-Sliced Design. EECS 141 F01 Arithmetic Circuits. A Generic Digital Processor. Full-Adder. The Binary Adder

Area-Time Optimal Adder with Relative Placement Generator

Digital Integrated Circuits A Design Perspective. Arithmetic Circuits

VLSI Design I; A. Milenkovic 1

Part II Addition / Subtraction

EFFICIENT MULTIOUTPUT CARRY LOOK-AHEAD ADDERS

Part II Addition / Subtraction

EE141- Spring 2004 Digital Integrated Circuits

Arithmetic Building Blocks

Chapter 5. Digital Design and Computer Architecture, 2 nd Edition. David Money Harris and Sarah L. Harris. Chapter 5 <1>

L8/9: Arithmetic Structures

ECE 645: Lecture 3. Conditional-Sum Adders and Parallel Prefix Network Adders. FPGA Optimized Adders

L15: Custom and ASIC VLSI Integration

CprE 281: Digital Logic

ECE 645: Lecture 2. Carry-Lookahead, Carry-Select, & Hybrid Adders

MASSACHUSETTS INSTITUTE OF TECHNOLOGY Department of Electrical Engineering and Computer Sciences

CMPEN 411 VLSI Digital Circuits Spring Lecture 21: Shifters, Decoders, Muxes

CprE 281: Digital Logic

Midterm Exam Two is scheduled on April 8 in class. On March 27 I will help you prepare Midterm Exam Two.

Lecture 7: Logic design. Combinational logic circuits

CSE140: Components and Design Techniques for Digital Systems. Logic minimization algorithm summary. Instructor: Mohsen Imani UC San Diego

9. Datapath Design. Jacob Abraham. Department of Electrical and Computer Engineering The University of Texas at Austin VLSI Design Fall 2017

Implementation of Carry Look-Ahead in Domino Logic

Floating Point Representation and Digital Logic. Lecture 11 CS301

Hardware Design I Chap. 4 Representative combinational logic

Chapter 5 Arithmetic Circuits

DESIGN OF LOW POWER-DELAY PRODUCT CARRY LOOK AHEAD ADDER USING MANCHESTER CARRY CHAIN

Arithmetic Circuits-2

VLSI Arithmetic. Lecture 9: Carry-Save and Multi-Operand Addition. Prof. Vojin G. Oklobdzija University of California

Adders, subtractors comparators, multipliers and other ALU elements

Binary addition by hand. Adding two bits

ARITHMETIC COMBINATIONAL MODULES AND NETWORKS

Switching Activity Calculation of VLSI Adders

CSE140: Components and Design Techniques for Digital Systems. Decoders, adders, comparators, multipliers and other ALU elements. Tajana Simunic Rosing

Robust Energy-Efficient Adder Topologies

Fast Ripple-Carry Adders in Standard-Cell CMOS VLSI

EE241 - Spring 2000 Advanced Digital Integrated Circuits. Carry-Skip Adder

CS 140 Lecture 14 Standard Combinational Modules

EECS150 - Digital Design Lecture 22 - Arithmetic Blocks, Part 1

Name: Answers. Mean: 83, Standard Deviation: 12 Q1 Q2 Q3 Q4 Q5 Q6 Total. ESE370 Fall 2015

Computer Architecture 10. Fast Adders

For smaller NRE cost For faster time to market For smaller high-volume manufacturing cost For higher performance

Full Adder Ripple Carry Adder Carry-Look-Ahead Adder Manchester Adders Carry Select Adder

Department of Electrical and Computer Engineering University of Wisconsin - Madison. ECE/CS 352 Digital System Fundamentals.

Overview. Arithmetic circuits. Binary half adder. Binary full adder. Last lecture PLDs ROMs Tristates Design examples

1 Short adders. t total_ripple8 = t first + 6*t middle + t last = 4t p + 6*2t p + 2t p = 18t p

CprE 281: Digital Logic

EECS150 - Digital Design Lecture 24 - Arithmetic Blocks, Part 2 + Shifters

EE241 - Spring 2001 Advanced Digital Integrated Circuits

EE141-Fall 2011 Digital Integrated Circuits

Homework #2 10/6/2016. C int = C g, where 1 t p = t p0 (1 + C ext / C g ) = t p0 (1 + f/ ) f = C ext /C g is the effective fanout

Adders, subtractors comparators, multipliers and other ALU elements

EE115C Digital Electronic Circuits Homework #5

Slide Set 6. for ENEL 353 Fall Steve Norman, PhD, PEng. Electrical & Computer Engineering Schulich School of Engineering University of Calgary

Logic and Computer Design Fundamentals. Chapter 5 Arithmetic Functions and Circuits

Digital Electronics II Mike Brookes Please pick up: Notes from the front desk

EEC 216 Lecture #3: Power Estimation, Interconnect, & Architecture. Rajeevan Amirtharajah University of California, Davis

GALOP : A Generalized VLSI Architecture for Ultrafast Carry Originate-Propagate adders

CMSC 313 Lecture 18 Midterm Exam returned Assign Homework 3 Circuits for Addition Digital Logic Components Programmable Logic Arrays

EE371 - Advanced VLSI Circuit Design

EE 447 VLSI Design. Lecture 5: Logical Effort

Design and Implementation of Carry Tree Adders using Low Power FPGAs

COE 202: Digital Logic Design Combinational Circuits Part 2. Dr. Ahmad Almulhem ahmadsm AT kfupm Phone: Office:

Lecture 5. Logical Effort Using LE on a Decoder

HIGH SPEED AND INDEPENDENT CARRY CHAIN CARRY LOOK AHEAD ADDER (CLA) IMPLEMENTATION USING CADENCE-EDA K.Krishna Kumar 1, A.

Combinational Logic. Lan-Da Van ( 范倫達 ), Ph. D. Department of Computer Science National Chiao Tung University Taiwan, R.O.C.

Digital Microelectronic Circuits ( ) Logical Effort. Lecture 7: Presented by: Adam Teman

Computer Science. 19. Combinational Circuits. Computer Science COMPUTER SCIENCE. Section 6.1.

Logical Effort Based Design Exploration of 64-bit Adders Using a Mixed Dynamic-CMOS/Threshold-Logic Approach

EE241 - Spring 2000 Advanced Digital Integrated Circuits. Announcements

ALUs and Data Paths. Subtitle: How to design the data path of a processor. 1/8/ L3 Data Path Design Copyright Joanne DeGroat, ECE, OSU 1

Design of Sequential Circuits

Integrated Circuits & Systems

3. Combinational Circuit Design

Digital Logic. CS211 Computer Architecture. l Topics. l Transistors (Design & Types) l Logic Gates. l Combinational Circuits.

Datapath Component Tradeoffs

CMPEN 411 VLSI Digital Circuits Spring 2011 Lecture 07: Pass Transistor Logic

Review. EECS Components and Design Techniques for Digital Systems. Lec 18 Arithmetic II (Multiplication) Computer Number Systems

EECS150 - Digital Design Lecture 10 - Combinational Logic Circuits Part 1

14:332:231 DIGITAL LOGIC DESIGN

Announcements. EE141-Spring 2007 Digital Integrated Circuits. CMOS SRAM Analysis (Read/Write) Class Material. Layout. Read Static Noise Margin

We are here. Assembly Language. Processors Arithmetic Logic Units. Finite State Machines. Circuits Gates. Transistors

Chapter 6: Solutions to Exercises

Carry Look Ahead Adders

Number representation

Digital Integrated Circuits A Design Perspective

Transcription:

EE241 - Spring 2010 Advanced Digital Integrated Circuits Lecture 25: Digital Arithmetic Adders Announcements Homework 4 due today Quiz #4 today In class (80min) final exam on April 29 Project reports due on May 4 6 pages, double column Project presentations May 5, 1-4pm 15 minute talk + 5 minute Q &A 2 1

Outline Last lecture Domino timing Other dynamic styles This lecture Adders 3 Adders 2

Arithmetic Circuits Chapter 11, Rabaey, 2 nd ed. Selected journal publications Books: Ercegovac and Lang, Digital Arithmetic Elsevier 2004 High-Speed VLSI Arithmetic Units: Adders and Multipliers, by V. Oklobdzija in Chandrakasan et al. 5 Adders EE141 Ripple carry & implementation Carry bypass (skip) Carry select Carry lookahead (basic) EE241 Conditional sum More carry lookahead 6 3

Conditional Sum Adders 0 i i y i s x y 1 i i i s x y 0 oi i i c x y 1 oi i i c x y Sklansky, Trans on Comp 6/60 7 Conditional Sum Adders 8 4

TG Conditional Sum Conditional Sum Adder Conditional Cell 2-way MUXes Rothermel, JSSC 89 9 TG Conditional Sum Serial connection of transmission gates Chain length = 1+log 2 n Signal propagation 10 5

DPL Conditional Sum CLA Conditional carry select 11 DPL Conditional Sum Block Conditional Sums 12 6

Carry-Lookahead Adders Adder trees Radix of a tree Minimum depth trees Sparse trees Logic manipulations Conventional vs. Ling Stack height limiting 13 Lookahead Adder: Basic Idea A 0, B 0 A 1, B 1 A N-1, B N-1 C i,0 P 0 C i,1 P 1 C i, N-1 P N-1 S 0 k 1 Co, k f Ak, Bk Ci, k Gk Pk Ci k S 1 S N-1 C i,,, 14 7

Propagate and Generate Signals Define 2 (or 3) new variables which ONLY depend on inputs a k, b k Generate (g k ) = a k b k Propagate (p k ) = a k b k (could be XOR as well) (Delete = a k B ) k c g, p g p c out k k k k in sg (, p ) a b c k k k k in Can also derive expressions for s and c out based on d k and p k 15 Lookahead Adder Looakahead Equations Position k: Position k + 1: ck gk pkck 1 ck 1 gk 1 pk 1ck g p g p c g p g p p c k 1 k 1 k k k 1 k 1 k 1 k k 1 k k 1 Carry exists if: - generated in stage k + 1 - generated in stage k and propagated through k + 1 - propagated through both k and k + 1 16 8

Lookahead Adder Unrolling of carry recurrence can be continued If unrolled to level k, resulting in two-level AND-OR structure AND Fan-In = k + 1, OR Fan-In = k + 1 k + 1 transistors in the MOS stack Limits k to 2 4 Later referred to as a radix of an adder 17 Carry Lookahead Trees C o 0 = G 0 + P 0 C i 0 C o1 = G 1 + P 1 G 0 + P 1 P 0 C i0 C o2 = G 2 + P 2 G 1 + P 2 P 1 G 0 + P 2 P 1 P 0 C i 0 = G 2 + P 2 G 1 + P 2 P 1 G 0 + P 0 C i0 = G 2:1 + P 2:1 C o 0 Can continue building the tree hierarchically 18 9

Tree Adders P G p p m more significant G m p l G g m p m g l l less significant Start from the input P, G, and continue up the tree 2-bit groups, then 4-bit groups, p (G G, P ) g, p g, p g p g, p G G m m l l m m l m l Kogge, Stone, Trans on Comp, 73 Radix 2 19 Adder Structure Carry tree and sum precompute operate in parallel Sum select selects the correct precomputed sum based on final carry 20 10

Adder Optimization If given Input capacitance, Overall fanout (loading capacitance) Wiring structure Adder topology Optimization can be performed to: Minimize the delay subject to power Minimize the power for given delay constraint 21 Design Considerations for CLA Adders Wire capacitance is determined by the microarchitecture From register files / Cache / Bypass Carry signals cross certain number of bitslices Multiplexers The adder topology determines the wire capacitance weak function of gate sizing Loopback Bus Loopback Bus Shifter Adder stage 1 Wiring Adder stage 2 Wiring Loopback Bus The capacitance of wires depends on the tree topology and wiring/shielding methodology Bit slice 63 Adder stage 3 Sum Select Bit slice 2 Bit slice 1 Bit slice 0 To register files / Cache 22 11

Specifying the Output Capacitance Fanout is dictated by the architecture In Itanium, each IEU drives 6 other IEUs, register files and the cache, through a long bus Thus the fanout is larger than 15-20, but depends on the ratio of the IEU input capacitance compared to the bus capacitance Bus is driven through a buffer, thus reducing the adder fanout to close to 1. 23 Specifying the Input Capacitance Larger C in : Less impact of internal wires Less fanout (less impact of the buss) Faster adder Power grows linearly with C in Smaller C in : Larger impact of internal wires Larger fanout Slower, lower power adder Optimum tradeoff: For desired de/dd (for both adder and 6 IEUs) find optimal Cg/Cw For example de/dd=2, Cg/Cw = 2.5-3 24 12

Carry Tree Considerations Number of signals merging at each stage (radix) Uniform vs. non-uniform Number of logic levels Full vs. sparse trees 25 Tree Adders: Kogge-Stone (A 0, B 0 ) (A 1, B 1 ) (A 2, B 2 ) (A 3, B 3 ) (A 4, B 4 ) (A 5, B 5 ) (A 6, B 6 ) (A 7, B 7 ) (A 8, B 8 ) (A 9, B 9 ) (A 10, B 10 ) (A 11, B 11 ) (A 12, B 12 ) (A 13, B 13 ) (A 14, B 14 ) (A 15, B 15 ) S 0 S 1 S 2 S 3 S 4 S 5 S 6 S 7 S 8 S 9 S 10 S 11 S 12 S 13 S 14 S 15 16-bit radix-2 Kogge-Stone Tree 26 13

Tree Adders: Other Trees Ladner-Fischer (A 0, B 0 ) (A 1, B 1 ) (A 2, B 2 ) (A 3, B 3 ) (A 4, B 4 ) (A 5, B 5 ) (A 6, B 6 ) (A 7, B 7 ) (A 8, B 8 ) (A 9, B 9 ) (A 10, B 10 ) (A 11, B 11 ) (A 12, B 12 ) (A 13, B 13 ) (A 14, B 14 ) (A 15, B 15 ) S 0 S 1 S 2 S 3 S 4 S 5 S 6 S 7 S 8 S 9 S 10 S 11 S 12 S 13 S 14 S 15 27 Kogge-Stone vs. Ladner-Fischer Uniform vs. progressively increasing fanouts Ladner-Fischer much slower Needs internal buffering 28 14

Tree Adders: Radix 4 (a 0, b 0 ) (a 1, b 1 ) (a 2, b 2 ) (a 3, b 3 ) (a 4, b 4 ) (a 5, b 5 ) (a 6, b 6 ) (a 7, b 7 ) (a 8, b 8 ) (a 9, b 9 ) (a 10, b 10 ) (a 11, b 11 ) (a 12, b 12 ) (a 13, b 13 ) (a 14, b 14 ) (a 15, b 15 ) S 0 S 1 S 2 S 3 S 4 S 5 S 6 S 7 S 8 S 9 S 10 S 11 S 12 S 13 S 14 S 15 16-bit radix-4 Kogge-Stone Tree 29 Radix-2 vs. Radix-4 More logic stages drive fanout easier Fanout is low, radix-4 can be padded with inverters Radix-4 has less stages and could have speed advantage when driving low fanouts Radix-2 has lower stack heights Radix-4 has longer wires (64 bits: crosses 48 bitslices vs. 32 in radix-2). Less logic stages precedes large wireload. 30 15

Ling Adder CLA g a b i i i p a b i i i G g p G i:0 i i i 1:0 S a b G i i i i 1:0 Ling s equations g a b i i i t a b i i i H g t H i:0 i i 1 i 1:0 S t H g t H i i i:0 i i 1 i 1:0 Ling, IBM J. Res. Dev, 5/81 31 Ling Adder Conventional radix-4 G g pg ppg pppg 3:0 3 3 2 3 2 1 3 2 1 0 Ling s radix-4 H g t g t t g t t t g g g t g t t g 3:0 3 2 2 2 1 1 2 1 0 0 3 2 2 1 2 1 0 Reduces the stack height (or width) Reduces input loading 32 16

Ling vs. CLA Conventional G3 Ling s H3 CK CK G3 H3 a3 a3 b3 a3 a2 a2 b2 b3 a2 a2 b2 b3 b2 a1 b1 a1 b 2 a1 a1 b1 b1 a0 b1 a0 b0 b0 33 17