VLSI Design. [Adapted from Rabaey s Digital Integrated Circuits, 2002, J. Rabaey et al.] ECE 4121 VLSI DEsign.1

Similar documents
CSE477 VLSI Digital Circuits Fall Lecture 20: Adder Design

CMPEN 411 VLSI Digital Circuits Spring Lecture 19: Adder Design

VLSI Design I; A. Milenkovic 1

Digital Integrated Circuits A Design Perspective. Arithmetic Circuits. Jan M. Rabaey Anantha Chandrakasan Borivoje Nikolic.

Digital Integrated Circuits A Design Perspective. Arithmetic Circuits. Jan M. Rabaey Anantha Chandrakasan Borivoje Nikolic.

Bit-Sliced Design. EECS 141 F01 Arithmetic Circuits. A Generic Digital Processor. Full-Adder. The Binary Adder

Arithmetic Building Blocks

EECS 427 Lecture 8: Adders Readings: EECS 427 F09 Lecture 8 1. Reminders. HW3 project initial proposal: due Wednesday 10/7

Hw 6 due Thursday, Nov 3, 5pm No lab this week

Digital Integrated Circuits A Design Perspective. Arithmetic Circuits

CMPEN 411 VLSI Digital Circuits Spring Lecture 21: Shifters, Decoders, Muxes

Digital Integrated Circuits A Design Perspective

Homework 4 due today Quiz #4 today In class (80min) final exam on April 29 Project reports due on May 4. Project presentations May 5, 1-4pm

Where are we? Data Path Design

Lecture 4. Adders. Computer Systems Laboratory Stanford University

Computer Architecture 10. Fast Adders

Where are we? Data Path Design. Bit Slice Design. Bit Slice Design. Bit Slice Plan

EFFICIENT MULTIOUTPUT CARRY LOOK-AHEAD ADDERS

ISSN (PRINT): , (ONLINE): , VOLUME-4, ISSUE-10,

Chapter 5. Digital Design and Computer Architecture, 2 nd Edition. David Money Harris and Sarah L. Harris. Chapter 5 <1>

CSE140: Components and Design Techniques for Digital Systems. Decoders, adders, comparators, multipliers and other ALU elements. Tajana Simunic Rosing

ALUs and Data Paths. Subtitle: How to design the data path of a processor. 1/8/ L3 Data Path Design Copyright Joanne DeGroat, ECE, OSU 1

EE141-Fall 2010 Digital Integrated Circuits. Announcements. An Intel Microprocessor. Bit-Sliced Design. Class Material. Last lecture.

Chapter 5 Arithmetic Circuits

CSE140: Components and Design Techniques for Digital Systems. Logic minimization algorithm summary. Instructor: Mohsen Imani UC San Diego

Hardware Design I Chap. 4 Representative combinational logic

MASSACHUSETTS INSTITUTE OF TECHNOLOGY Department of Electrical Engineering and Computer Sciences

Part II Addition / Subtraction

Adders, subtractors comparators, multipliers and other ALU elements

Part II Addition / Subtraction

Adders, subtractors comparators, multipliers and other ALU elements

C.K. Ken Yang UCLA Courtesy of MAH EE 215B

DIGITAL TECHNICS. Dr. Bálint Pődör. Óbuda University, Microelectronics and Technology Institute

Area-Time Optimal Adder with Relative Placement Generator

UNSIGNED BINARY NUMBERS DIGITAL ELECTRONICS SYSTEM DESIGN WHAT ABOUT NEGATIVE NUMBERS? BINARY ADDITION 11/9/2018

CS 140 Lecture 14 Standard Combinational Modules

CMOS Digital Integrated Circuits Lec 10 Combinational CMOS Logic Circuits

Computer Science 324 Computer Architecture Mount Holyoke College Fall Topic Notes: Digital Logic

Chapter 7. VLSI System Components

Floating Point Representation and Digital Logic. Lecture 11 CS301

EECS 312: Digital Integrated Circuits Final Exam Solutions 23 April 2009

Lecture 7: Logic design. Combinational logic circuits

EE241 - Spring 2001 Advanced Digital Integrated Circuits

ECEN 248: INTRODUCTION TO DIGITAL SYSTEMS DESIGN. Week 9 Dr. Srinivas Shakkottai Dept. of Electrical and Computer Engineering

Design of System Elements. Basics of VLSI

EE141- Spring 2004 Digital Integrated Circuits

Topics. Dynamic CMOS Sequential Design Memory and Control. John A. Chandy Dept. of Electrical and Computer Engineering University of Connecticut

ARITHMETIC COMBINATIONAL MODULES AND NETWORKS

CMPEN 411 VLSI Digital Circuits Spring Lecture 14: Designing for Low Power

9/18/2008 GMU, ECE 680 Physical VLSI Design

University of Toronto Faculty of Applied Science and Engineering Edward S. Rogers Sr. Department of Electrical and Computer Engineering

1 Short adders. t total_ripple8 = t first + 6*t middle + t last = 4t p + 6*2t p + 2t p = 18t p

L8/9: Arithmetic Structures

Section 3: Combinational Logic Design. Department of Electrical Engineering, University of Waterloo. Combinational Logic

Combinational Logic. By : Ali Mustafa

Reversible ALU Implementation using Kogge-Stone Adder

Name: Answers. Mean: 83, Standard Deviation: 12 Q1 Q2 Q3 Q4 Q5 Q6 Total. ESE370 Fall 2015

CprE 281: Digital Logic

CSEE 3827: Fundamentals of Computer Systems. Combinational Circuits

EECS150 - Digital Design Lecture 10 - Combinational Logic Circuits Part 1

Number representation

ECE 545 Digital System Design with VHDL Lecture 1. Digital Logic Refresher Part A Combinational Logic Building Blocks

Static CMOS Circuits. Example 1

KINGS COLLEGE OF ENGINEERING DEPARTMENT OF ELECTRONICS AND COMMUNICATION ENGINEERING QUESTION BANK

ECE 250 / CPS 250 Computer Architecture. Basics of Logic Design Boolean Algebra, Logic Gates

Arithmetic Circuits-2

Digital Integrated Circuits A Design Perspective

Lecture 11: Adders. Slides courtesy of Deming Chen. Slides based on the initial set from David Harris. 4th Ed.

CS470: Computer Architecture. AMD Quad Core

EECS150 - Digital Design Lecture 22 - Arithmetic Blocks, Part 1

Implementation of Reversible ALU using Kogge-Stone Adder

VLSI Design, Fall Logical Effort. Jacob Abraham

ALU A functional unit

Chapter 5 CMOS Logic Gate Design

GALOP : A Generalized VLSI Architecture for Ultrafast Carry Originate-Propagate adders

We are here. Assembly Language. Processors Arithmetic Logic Units. Finite State Machines. Circuits Gates. Transistors

Logarithmic Circuits

Fundamentals of Computer Systems

ECE/CS 250 Computer Architecture

7. Combinational Circuits

Chapter 8. Low-Power VLSI Design Methodology

Combinational Logic. Mantıksal Tasarım BBM231. section instructor: Ufuk Çelikcan

Full Adder Ripple Carry Adder Carry-Look-Ahead Adder Manchester Adders Carry Select Adder

Lecture 34: Portable Systems Technology Background Professor Randy H. Katz Computer Science 252 Fall 1995

ECE 645: Lecture 3. Conditional-Sum Adders and Parallel Prefix Network Adders. FPGA Optimized Adders

Semiconductor Memories

Systems I: Computer Organization and Architecture

3. Combinational Circuit Design

EE141-Fall 2011 Digital Integrated Circuits

9. Datapath Design. Jacob Abraham. Department of Electrical and Computer Engineering The University of Texas at Austin VLSI Design Fall 2017

Digital Integrated Circuits A Design Perspective

Dynamic Combinational Circuits. Dynamic Logic

I. INTRODUCTION. CMOS Technology: An Introduction to QCA Technology As an. T. Srinivasa Padmaja, C. M. Sri Priya

LOGIC CIRCUITS. Basic Experiment and Design of Electronics. Ho Kyung Kim, Ph.D.

Logic Synthesis and Verification

EE241 - Spring 2000 Advanced Digital Integrated Circuits. Announcements

University of Toronto. Final Exam

ECE 645: Lecture 2. Carry-Lookahead, Carry-Select, & Hybrid Adders

! Charge Leakage/Charge Sharing. " Domino Logic Design Considerations. ! Logic Comparisons. ! Memory. " Classification. " ROM Memories.

EECS150 - Digital Design Lecture 24 - Arithmetic Blocks, Part 2 + Shifters

Transcription:

VLSI Design Adder Design [Adapted from Rabaey s Digital Integrated Circuits, 2002, J. Rabaey et al.] ECE 4121 VLSI DEsign.1

Major Components of a Computer Processor Devices Control Memory Input Datapath Output ECE 4121 VLSI DEsign.2

A Generic Digital Processor MEMORY INPUT-O OUTPUT CONTROL DATAPATH ECE 4121 VLSI DEsign.3

Basic Building Blocks Datapath Execution units - Adder, multiplier, divider, shifter, etc. Register file and pipeline registers Multiplexers, decoders Control Finite state machines (PLA, ROM, random logic) Interconnect Switches, arbiters, buses Memory Caches (SRAMs), TLBs, DRAMs, buffers ECE 4121 VLSI DEsign.4

Bit-Sliced Design Control Bit 3 Dat ta-in ster Regi Add der Shift ter Multip plexer Bit 2 Bit 1 Bit 0 Data a-out Tile identical processing elements ECE 4121 VLSI DEsign.5

Bit-Sliced Design Control Bit 3 Dat ta-in ster Regi Add der Shift ter Multip plexer Bit 2 Bit 1 Bit 0 Data a-out Tile identical processing elements ECE 4121 VLSI DEsign.6

The 1-bit Binary Adder A B C in A B C in C out S carry status 0 0 0 0 0 kill 1-bit Full Adder S 0 0 0 1 1 0 0 0 1 1 kill propagate (FA) 0 1 1 1 0 propagate 1 0 0 0 1 propagate C out 1 0 1 1 0 propagate p 1 1 0 1 0 generate 1 1 1 1 1 generate G = AB P = A B S=A B C in =P C in K =!A!B C out = AB + AC in + BC in (majority function) = G + PC in How can we use it to build a 64-bit adder? How can we modify it easily to build an adder/subtractor? How can we make it better (faster, lower power, smaller)? ECE 4121 VLSI DEsign.7

One-Bit Full Adder: Share Logic An observation Almost always, C in A B Sum Cout sum = NOT carry 0 0 0 0 0 0 0 1 1 0 0 1 0 1 0 includes 111 0 1 1 0 1 1 0 0 1 0 Sum = ABCin+ A.B.Cin 1 0 1 0 1 (A+B+Cin).Cout 1 1 0 0 1 1 1 1 1 1 excludes 000 ECE 4121 VLSI DEsign.8

FA Gate Level Implementations A B C in A B C in t1 t0 t2 t2 t1 t0 C out S C out S ECE 4121 VLSI DEsign.9

Ripple Carry Adder (RCA) A 3 B 3 A 2 B 2 A 1 B 1 A 0 B 0 C out =C 4 FA FA FA FA C 0 =C in S 3 S 2 S 1 S 0 T adder T FA (A,B C out ) + (N-2)T FA (C in C out ) + T FA (C in S) t ( adder N 1 )t carry + t sum T = O(N) worst case delay Real Goal: Make the fastest possible carry path ECE 4121 VLSI DEsign.10

Complimentary Static CMOS Full Adder V DD A B V DD B C i A B A A C i B X C i C i A V DD S A B B V DD A B C i C i A C o B C out = AB + BC in + AC in SUM = ABC +!C (A+B+C = AB + Cin(B + A) in OUT + in ) 28 Transistors ECE 4121 VLSI DEsign.11

Inversion Property Inverting all inputs to a FA results in inverted values for all outputs A B A B C out FA C in C out FA C in S S!S (A, B, C in ) = S(!A,!B,!C in )!C out (A,B,C in ) = C out (!A,!B,!C in ) ECE 4121 VLSI DEsign.12

One-Bit Full Adder: Inverted Inputs An observation Invert inputs => outputs invert FA Exploit this property: FA Get rid of the inverter on the carry critical path C in A B Sum Cout 0 0 0 0 0 0 0 1 1 0 0 1 0 1 0 0 1 1 0 1 1 0 0 1 0 1 0 1 0 1 1 1 0 0 1 1 1 1 1 1 ECE 4121 VLSI DEsign.13

Exploiting the Inversion Property A 3 B 3 A 2 B 2 A 1 B 1 A 0 B 0 C out =C 4 FA FA FA FA C 0 =C in S 3 S 2 S 1 S 0 inverted cell regular cell Minimizes the critical path (the carry chain) by eliminating inverters between the FAs (will need to increase the transistor sizing on the carry chain portion of the mirror adder). ECE 4121 VLSI DEsign.14

Ripple Carry Adder: Inverting Property A 3 B 3 A 2 B 2 A 1 B 1 A 0 B 0 C 4 C 3 FA C 2 FA C 1 FA C 0... FA S S S S 3 2 1 0 FA is similar to FA, but with no inverters on the outputs Much faster (1-stage) Disadvantage: not regular data path ECE 4121 VLSI DEsign.15

Mirror Adder 24+4 transistors B 6 A 8 B 8 B 8 A 4 B 4 C in 4 0-propagate kill A 6 8 A 8 4 C in 6!C C out in!s 1-propagate 4 A 4 generate 2 C in 3 A 4 B 4 B 4 A 2 B 2 C in 2 A 3 B 3 C out =AB+BC BC in +AC in SUM = ABC in +!C OUT (A+B+C + in ) = AB + Cin(B + A) Sizing: Each input in the carry circuit has a logical effort of 2 so the optimal fan-out for each is also 2. Since!C out drives 2 internal and 2 inverter transistor gates (to form C in for the nms bit adder) should oversize the carry circuit. PMOS/NMOS ratio of 2. ECE 4121 VLSI DEsign.16

Mirror Adder Features The NMOS and PMOS chains are completely symmetrical with a maximum of two series transistors in the carry circuitry, guaranteeing identical rise and fall transitions if the NMOS and PMOS devices are properly sized. When laying out the cell, the most critical issue is the minimization of the capacitances at node!c out (four diffusion capacitances, two internal gate capacitances, and two inverter gate capacitances). Shared diffusions can reduce the stack node capacitances. The transistors connected to C in are placed closest to the output. Only the transistors in the carry stage have to be optimized for optimal speed. All transistors in the sum stage can be minimal size. ECE 4121 VLSI DEsign.17

A 64-bit Adder/Subtractor Ripple Carry Adder (RCA) built out of 64 FAs Subtraction complement all subtrahend bits (xor gates) and set the low order carry-in RCA advantage: simple logic, so small (low cost) add/subt B 0 B 1 B 2 disadvantage: slow (O(N) for N bits) and lots of glitching (so lots of energy consumption) B 63 A 0 A1 A 2 A 63 C 0 =C in 1-bit FA S 0 C 1 1-bit FA S 1 C 2 1-bit FA S 2... C 3 C 63 1-bit FA S 63 C 64 =C out ECE 4121 VLSI DEsign.18

Carry-Lookahead Adder: Idea New look: carry propagation Idea: Try to predict C k earlier than T c *k Instead of passing through k stages, compute C k separately using 1-stage CMOS logic Carry propagation: an example Bit position Carry A B 7 6 5 4 3 2 1 0 1 0 0 1 1 1 1 0 1 0 0 1 1 0 1 + 0 1 0 0 0 1 1 1 Sum 1 0 0 1 0 1 0 0 ECE 4121 VLSI DEsign.19

Carry-Lookahead Adder (CLA): One Bit What happens to the propagating carry in bit position k? 0-propagate A B A C A B ECE 4121 VLSI DEsign.20 kill A B C in Cout 0 0-0 (kill) 0 1 C C(propagate) 1 0 C C(propagate) 1 1-1(generate) B Cout B p=a+b (or A B) A g = A.B 1-propagate generate [Rab96] p391

CLA: Propagation Equations If C 4 =1, then either: g 3 generated at bit pos 3 g 2.p 3 generated at bit pos 2, propagated 3 g 1.p 2.p 3 generated at bit pos 1, propagated 2,3 g 0.p 1.p 2.p 3 generated at bit pos 0, propagated 1,2,3 C in.p 0.p 1.p 2.p 3 input carry, propagated 0,1,2,3 C 4 = g 3 + g 2.p 3 + g 1.p 2.p 3 + g 0.p 1.p 2.p 3 + C in.p 0.p 1.p 2.p 3 Implement C 4 as a one-stage CMOS logic delay=1 (or is it?) ECE 4121 VLSI DEsign.21

CLA: Static Logic Implementation d o e q f r h s p 1.g 2.g 3 C 4 g 3 g g 2 g 1 g 0 j C in p 0 p 1 p 2 p 3 t u v w x k l m p 3.g 2 C 4 n C 4 ECE 4121 VLSI DEsign.22

CLA: Dynamic Logic Implementation Dynamic gate implementation: C 4 = g 3 + p 3. (g 2 + p 2. (g 1 + p 1. (g 0 + P 0.C in ))) 6 transistors in series p 3 p 3 g 3 p 2 g 2 φ C 4 p p 0 p 1 g 0 g 1 C in φ ECE 4121 VLSI DEsign.23 [WE92] p529 [ Hauck]

C 1? CLA: Dynamic Logic Implementation Can we reuse logic? Can we get C 1, C 2 and C 3 from the same circuit? φ C 3? p 3 g C 3 2? g 2 p 2 No! p 1 g 1 C1, C2 and C3 may be floating p 0 g 0 (not precharged) C in φ C 4 Charge sharing problem ECE 4121 VLSI DEsign.24 [ Hauck]

CLA: Dynamic Logic Implementation φ g 0 C 1 p 0 p 1 g 1 p 0 φ C 2 p 0 C in g 0 φ C in φ p 2 p 1 g 1 φ g 2 C 3 p 2 p 3 g 2 g 3 p 1 g 1 φ C 4 p 0 g 0 C in φ C in φ p 0 g 0 [WE92] p529 ECE 4121 VLSI DEsign.25

CLA: Basic Block (4 Bits) Architecture Block of 4-bit p, g, C out A B 3 A B 2 A B 1 A B 0 3 2 1 0 p,g p,g p,g p,g p 3 g 3 p 2 g 2 p 1 g 1 p 0 g 0 C C 4 0 C 3 C 2 C 1 S S S S 3 ECE 4121 VLSI DEsign.26 2 1 0

CLA: N-Bit Architecture Put it all together: A 7 B 7 A 6 B 6 A 5 B 5 A 4 B 4 A 3 B 3 A 2 B 2 A 1 B 1 A 0 B 0 p,g p,g p,g p,g p,g p,g p,g p,g Carry Generator Carry Generator C 0 C 8 C 4 S 7 S 6 S 5 S 4 S 3 S 2 S 1 S 0 ECE 4121 VLSI DEsign.27

CLA: 12-Bit Example A= 1101 1001 1010 B= 0111 0110 1101 A 11 A 10 A 9 A 8 A 7 A 6 A 5 A 4 A 3 A 2 A 1 A 0 B 11 B 10 B 9 B 8 B 7 B 6 B 5 B 4 B 3 B 2 B 1 B 0 p,g p,g p,g p,g p,g p,g p,g p,g p,g p,g p,g p,g 0 Carry Generator Carry Generator Carry Generator C 0 C 12 S 11 S 10 S 9 S 8 C 8 C 4 S 7 T=0 0 0000 0 0000 0 0000 T=2 T=3 T=4 S 6 1 0100 0 1111 1 0111 1 0100 1 0000 1 0111 1 0101 1 0000 1 0111 S 5 S 4 S 3 S 2 S 1 S 0 ECE 4121 VLSI DEsign.28

Summary: Carry Lookahead Adder CLA compared to ripple-carry adder: Faster ( 4 times?), but delay still linear (w.r.t. # of bits) Larger area - P, G signal generation - Carry ygeneration circuits - Carry generation ckt for each bit position (no re-use) Limitation: cannot go beyond 4 bits of look-ahead Large p,g fan-out slows down carry generation Next: Manchester carry chains Tries to reuse logic by pre-charging each carry position ECE 4121 VLSI DEsign.29

Recap: Carry Look-Ahead Charge sharing problem C 1? φ C 3? p 3 g C 3 2? p 2 g 2 p p 0 p 1 g 0 g 1 C 4 C in φ ECE 4121 VLSI DEsign.30

Fast Carry Chain Design The key to fast addition is a low latency carry network What matters is whether in a given position a carry is generated G i = A i & B i = A i B i propagated P i = A i B i (sometimes use A i B i ) annihilated (killed) K i =!A i &!B i Giving a carry recurrence of C i+1 =G i PC i i C 1 = C 2 = C 3 = C 4 = ECE 4121 VLSI DEsign.31

Fast Carry Chain Design The key to fast addition is a low latency carry network What matters is whether in a given position a carry is generated G i = A i & B i = A i B i propagated P i = A i B i (sometimes use A i B i ) annihilated (killed) K i =!A i &!B i Giving a carry recurrence of C i+1 =G i PC i i C 1 = G 0 P 0 C 0 C 2 = G 1 P 1 G 0 P 1 P 0 C 0 C 3 = G 2 P 2 G 1 P 2 P 1 G 0 P 2 P 1 P 0 C 0 C 4 =G 3 PG 3 2 PP 3 P 2 G 1 PP 3 P 2 P 1 G 0 PP 3 P 2 P 1 P 0 C 0 ECE 4121 VLSI DEsign.32

Manchester Carry Chain Switches controlled by G i and P i!c i+1 G i P!C i i P i clk Total delay of time to form the switch control signals G i and P i setup time for the switches signal propagation delay through N switches in the worst case ECE 4121 VLSI DEsign.33

Domino Manchester Carry Chain Circuit 3 P 3 P 3 P 3 P 3 P 3 C 1 2 3 4 i,4 1 G G 2 G 1 G 3 2 3 4 0 5 P 2 P 1 P 0 clk C i,0 2 3 4 5 6 clk!(g 2 P 2 G 1 P 2 P 1 G 0 P 2 P 1 P 0 C i,0 )!(G 0 P 0 C i,0 )!(G 1 P 1 G 0 P 1 P 0 C i,0 )!(G 3 P 3 G 2 P 3 P 2 G 1 P 3 P 2 P 1 G 0 P 3 P 2 P 1 P 0 C i,0 ) ECE 4121 VLSI DEsign.34

Carry-Skip (Carry-Bypass) Adder A 3 B 3 A 2 B 2 A 1 B 1 A 0 B 0 C o,3 FA FA FA FA C i,0 C o,3 S 3 S 2 S 1 S 0 BP = P 0 P 1 P 2 P 3 Block Propagate If (P 0 & P 1 & P 2 & P 3 = 1) then C o,3 = C i,0 otherwise the block itself kills or generates the carry internally ECE 4121 VLSI DEsign.35

Carry-Skip Chain Implementation carry-out block carry-out BP block carry-in P 3 P 2 P 1 P 0!C out C in G 3 G 2 G 1 G 0 BP ECE 4121 VLSI DEsign.36

4-bit Block Carry-Skip Adder bits 12 to 15 bits8to11 to bits4to7 to bits0to3 to Setup Setup Setup Setup Carry Propagation Carry Propagation Carry Propagation Carry Propagation C i,0 Sum Sum Sum Sum Worst-case delay carry from bit 0 to bit 15 = carry generated in bit 0, ripples through bits 1, 2, and 3, skips the middle two groups (B is the group size in bits), ripples in the last group from bit 12 to bit 15 T add = t setup + B t carry + ((N/B) -1) t skip +B t carry + t sum ECE 4121 VLSI DEsign.37

Optimal Block Size and Time Assuming one stage of ripple (t carry ) has the same delay as one skip logic stage (t skip ) and both are 1 T CSkA = 1 + B + (N/B-1) + B + 1 t setup ripple in skips ripple in t sum block 0 last block = 2B + N/B + 1 So the optimal block size, B, is And the optimal time is dt CSkA /db = 0 (N/2) = B opt Optimal T CSkA = 2( (2N)) + 1 ECE 4121 VLSI DEsign.38

Carry-Skip Adder Extensions Variable block sizes A carry that is generated in, or absorbed by, one of the inner blocks travels a shorter distance through the skip blocks, so can have bigger blocks for the inner carries without increasing the overall delay C out C in Multiple levels of skip logic C out C in skip level 1 skip level 2 AND of the first level l skip signals (BP s) ECE 4121 VLSI DEsign.39

Carry-Skip Adder Comparisons 70 60 50 40 RCA 30 20 10 B=6 B=5 B=4 B=2 B=3 CSkA VSkA 0 8bits 16 bits 32 bits 48 bits 64 bits ECE 4121 VLSI DEsign.40

Carry Select Adder A s B s 4-b Setup Precompute the carry out of each block for P s G s both carry_in = 0 and 0 carry ypropagationp 0 carry_in = 1 (can be done for all blocks in parallel) and then select the correct one 1 carry ypropagationp 1 multiplexer C out C s Sum generation C in ECE 4121 VLSI DEsign.41 S s

Carry Select Adder: Critical Path bits 12 to 15 A s B s bits 8 to 1 A s B s bits 4 to 7 A s B s bits 0 to 3 A s B s Setup P s G s Setup P s G s Setup P s G s Setup P s G s 0 carry 0 carry 0 carry 0 carry 0 1 carry 1 carry 1 carry 1 carry 1 C out mux C s mux C s mux C s mux C s C in Sum gen Sum gen Sum gen Sum gen S s S s S s S s ECE 4121 VLSI DEsign.42

Carry Select Adder: Critical Path bits 12 to 15 A s B s bits 8 to 1 A s B s bits 4 to 7 A s B s bits 0 to 3 A s B s Setup P s G s Setup P s G s Setup P s G s 1 Setup P s G s 0 carry 0 carry 0 carry 0 carry 0 +4 1 carry 1 carry 1 carry 1 carry 1 C out mux +1 C s mux +1 C s mux +1 C s mux +1 C s C in Sum +1gen Sum gen Sum gen Sum gen S s S s S s S s T add = t setup + B t carry + N/B t mux + t sum ECE 4121 VLSI DEsign.43

Square Root Carry Select Adder bits 14 to 19 A s B s bits 9 to 13 A s B s bits 5 to 8 A s B s bits2to4 to A s B s bits0to1 to A s B s Setup P s G s Setup P s G s Setup P s G s Setup P s G s Setup P sg s 0 carry 0 carry 0 carry 0 carry 0 carry 0 1 carry 1 carry 1 carry 1 carry 1 carry 1 C out mux C s mux C s mux C s mux C s mux C s Cin Sum gen Sum gen Sum gen Sum gen Sum gen S s S s S s S s S s ECE 4121 VLSI DEsign.44

Square Root Carry Select Adder bits 14 to 19 A s B s bits 9 to 13 A s B s bits 5 to 8 A s B s bits2to4 to A s Bs bits0to1 to As B s Setup P s G s Setup P s G s Setup P s G s Setup P s G s Setup 1 P sg s 0 carry +6 1 carry 0 carry 0 0 carry 0 0 carry 0 +5 +4 +3 1 carry 1 1 carry 1 1 carry 1 0 carry 0 +2 1 carry 1 C out mux +1 C s mux +1 C s mux +1 C s +1 mux C s mux +1 C s Cin Sum +1gen Sum gen Sum gen Sum gen Sum gen S s S s S s S s S s T add = t setup + 2 t carry + N t mux + t sum ECE 4121 VLSI DEsign.45

Parallel Prefix Adders (PPAs) Define carry operator on (G,P) signal pairs (G,P ) (G,P ) G (G,P) where G = G P G P = P P G!G P is associative, i.e., [(g,p ) (g,p )] (g,p ) = (g,p ) [(g,p ) (g,p )] ECE 4121 VLSI DEsign.46

PPA General Structure Given P and G terms for each bit position, computing all the carries is equal to finding all the prefixes in parallel (G 0 0,,P 0 0) (G 1 1,,P 1 1) (G 2 2,,P 2 2) (G N-2,,P N-2) ) (G N-1,,P N-1) ) Since is associative, we can group them in any order but note that it is not commutative P i, G i logic (1 unit delay) C i parallel prefix logic tree (1 unit delay per level) S i logic (1 unit delay) Measures to consider number of cells tree cell depth (time) tree cell area cell fan-in and fan-out max wiring length wiring congestion delay path variation (glitching) ECE 4121 VLSI DEsign.47

Brent-Kung PPA G 15 p 15 G 14 p 14 G 13 p 13 G 12 P 12 G 11 p 11 G 10 P 10 G 9 p 9 G 8 P 8 G 7 P 7 G 6 P 6 G 5 P 5 G 4 P 4 G 3 P 3 G 2 p 2 G 1 P 1 G 0 P 0 C in C 16 C 15 C 14 C 13 C 12 C 11 C 10 C 9 C 8 C 7 C 6 C 5 C 4 C 3 C 2 C 1 ECE 4121 VLSI DEsign.48

Brent-Kung PPA G 15 p 15 G 14 p 14 G 13 p 13 G 12 P 12 G 11 p 11 G 10 P 10 G 9 p 9 G 8 P 8 G 7 P 7 G 6 P 6 G 5 P 5 G 4 P 4 G 3 P 3 G 2 p 2 G 1 P 1 G 0 P 0 C in C 16 C 15 C 14 C 13 C 12 C 11 C 10 C 9 C 8 C 7 C 6 C 5 C 4 C 3 C 2 C 1 ECE 4121 VLSI DEsign.49

Kogge-Stone PPF Adder G 15 P 15 G 14 P 14 G 13 P 13 G 12 P 12 G 11 P 11 G 10 P 10 G 9 P 9 G 8 P 8 G 7 P 7 G 6 P 6 G 5 P 5 G 4 P 4 G 3 P 3 G 2 P 2 G 1 P 1 G 0 P 0 C in C 16 C 15 C 14 C 13 C 12 C 11 C 10 C 9 C 8 C 7 C 6 C 5 C 4 C 3 C 2 C 1 ECE 4121 VLSI DEsign.50 T add = t setup + log 2 N t + t sum

More Adder Comparisons 70 60 50 40 30 20 RCA CSkA VSkA KS PPA 10 0 8bits 16 bits 32 bits 48 bits 64 bits ECE 4121 VLSI DEsign.51

Adder Speed Comparisons 70 60 50 40 30 RCA MCC CCSkA VCSkA CCSlA B&K 20 10 16bts bits 32 bits 64 bits ECE 4121 VLSI DEsign.52

Adder Average Power Comparisons 35 30 25 RCA 20 MCC CCSkA 15 VCSkA CCSlA 10 B&K 5 0 16 bits 32 bits 64 bits ECE 4121 VLSI DEsign.53

PDP of Adder Comparisons 100 80 60 40 20 RCA MCCA CCSkA VCSkA CCSlA BKA 0 8 bits 16 bits 32 bits 48 bits 64 bits From Nagendra, 1996 ECE 4121 VLSI DEsign.54