Hw 6 due Thursday, Nov 3, 5pm No lab this week

Similar documents
EECS 427 Lecture 8: Adders Readings: EECS 427 F09 Lecture 8 1. Reminders. HW3 project initial proposal: due Wednesday 10/7

Digital Integrated Circuits A Design Perspective. Arithmetic Circuits. Jan M. Rabaey Anantha Chandrakasan Borivoje Nikolic.

Digital Integrated Circuits A Design Perspective. Arithmetic Circuits. Jan M. Rabaey Anantha Chandrakasan Borivoje Nikolic.

Arithmetic Building Blocks

Bit-Sliced Design. EECS 141 F01 Arithmetic Circuits. A Generic Digital Processor. Full-Adder. The Binary Adder

EE141-Fall 2010 Digital Integrated Circuits. Announcements. An Intel Microprocessor. Bit-Sliced Design. Class Material. Last lecture.

Digital Integrated Circuits A Design Perspective

Digital Integrated Circuits A Design Perspective. Arithmetic Circuits

EE141- Spring 2004 Digital Integrated Circuits

CMPEN 411 VLSI Digital Circuits Spring Lecture 19: Adder Design

VLSI Design I; A. Milenkovic 1

Homework 4 due today Quiz #4 today In class (80min) final exam on April 29 Project reports due on May 4. Project presentations May 5, 1-4pm

VLSI Design. [Adapted from Rabaey s Digital Integrated Circuits, 2002, J. Rabaey et al.] ECE 4121 VLSI DEsign.1

CSE477 VLSI Digital Circuits Fall Lecture 20: Adder Design

Where are we? Data Path Design

Lecture 4. Adders. Computer Systems Laboratory Stanford University

Where are we? Data Path Design. Bit Slice Design. Bit Slice Design. Bit Slice Plan

EECS 141 F01 Lecture 17

9/18/2008 GMU, ECE 680 Physical VLSI Design

Digital Integrated Circuits A Design Perspective

ISSN (PRINT): , (ONLINE): , VOLUME-4, ISSUE-10,

Overview. Arithmetic circuits. Binary half adder. Binary full adder. Last lecture PLDs ROMs Tristates Design examples

Full Adder Ripple Carry Adder Carry-Look-Ahead Adder Manchester Adders Carry Select Adder

Digital Integrated Circuits Designing Combinational Logic Circuits. Fuyuzhuo

CSE140: Components and Design Techniques for Digital Systems. Logic minimization algorithm summary. Instructor: Mohsen Imani UC San Diego

CMOS Digital Integrated Circuits Lec 10 Combinational CMOS Logic Circuits

EFFICIENT MULTIOUTPUT CARRY LOOK-AHEAD ADDERS

Floating Point Representation and Digital Logic. Lecture 11 CS301

Design and Implementation of Carry Tree Adders using Low Power FPGAs

Digital Integrated Circuits A Design Perspective

Digital EE141 Integrated Circuits 2nd Combinational Circuits

Midterm Exam Two is scheduled on April 8 in class. On March 27 I will help you prepare Midterm Exam Two.

COMP 103. Lecture 16. Dynamic Logic

ECE 2300 Digital Logic & Computer Organization

EEC 118 Lecture #6: CMOS Logic. Rajeevan Amirtharajah University of California, Davis Jeff Parkhurst Intel Corporation

ARITHMETIC COMBINATIONAL MODULES AND NETWORKS

Lecture 11: Adders. Slides courtesy of Deming Chen. Slides based on the initial set from David Harris. 4th Ed.

EEC 116 Lecture #5: CMOS Logic. Rajeevan Amirtharajah Bevan Baas University of California, Davis Jeff Parkhurst Intel Corporation

EE141Microelettronica. CMOS Logic

EE141. Administrative Stuff

COMBINATIONAL LOGIC. Combinational Logic

Dynamic Combinational Circuits. Dynamic Logic

Hardware Design I Chap. 4 Representative combinational logic

L8/9: Arithmetic Structures

Logical Effort: Designing for Speed on the Back of an Envelope David Harris Harvey Mudd College Claremont, CA

Pass-Transistor Logic

Digital Integrated Circuits A Design Perspective

Topics. Dynamic CMOS Sequential Design Memory and Control. John A. Chandy Dept. of Electrical and Computer Engineering University of Connecticut

EE241 - Spring 2000 Advanced Digital Integrated Circuits. Announcements

EE241 - Spring 2001 Advanced Digital Integrated Circuits

CMSC 313 Lecture 16 Announcement: no office hours today. Good-bye Assembly Language Programming Overview of second half on Digital Logic DigSim Demo

Adders, subtractors comparators, multipliers and other ALU elements

ΗΜΥ 307 ΨΗΦΙΑΚΑ ΟΛΟΚΛΗΡΩΜΕΝΑ ΚΥΚΛΩΜΑΤΑ Εαρινό Εξάμηνο 2018

Digital Integrated Circuits A Design Perspective

ALUs and Data Paths. Subtitle: How to design the data path of a processor. 1/8/ L3 Data Path Design Copyright Joanne DeGroat, ECE, OSU 1

1 Short adders. t total_ripple8 = t first + 6*t middle + t last = 4t p + 6*2t p + 2t p = 18t p

EE371 - Advanced VLSI Circuit Design

L2: Combinational Logic Design (Construction and Boolean Algebra)

Dynamic Combinational Circuits. Dynamic Logic

Module 2. Basic Digital Building Blocks. Binary Arithmetic & Arithmetic Circuits Comparators, Decoders, Encoders, Multiplexors Flip-Flops

CMPEN 411 VLSI Digital Circuits Spring Lecture 21: Shifters, Decoders, Muxes

EECS150 - Digital Design Lecture 10 - Combinational Logic Circuits Part 1

Arithmetic Circuits How to add and subtract using combinational logic Setting flags Adding faster

L2: Combinational Logic Design (Construction and Boolean Algebra)

Adders, subtractors comparators, multipliers and other ALU elements

ESE 570: Digital Integrated Circuits and VLSI Fundamentals

Circuit A. Circuit B

C.K. Ken Yang UCLA Courtesy of MAH EE 215B

EE141-Fall 2011 Digital Integrated Circuits

Integrated Circuits & Systems

8. Design Tradeoffs x Computation Structures Part 1 Digital Circuits. Copyright 2015 MIT EECS

8. Design Tradeoffs x Computation Structures Part 1 Digital Circuits. Copyright 2015 MIT EECS

Logic Synthesis and Verification

Logical Effort. Sizing Transistors for Speed. Estimating Delays

VLSI Design, Fall Logical Effort. Jacob Abraham

Area-Time Optimal Adder with Relative Placement Generator

CMPEN 411 VLSI Digital Circuits Spring 2011 Lecture 07: Pass Transistor Logic

Implementation of Carry Look-Ahead in Domino Logic

Computer Science 324 Computer Architecture Mount Holyoke College Fall Topic Notes: Digital Logic

EE241 - Spring 2003 Advanced Digital Integrated Circuits

Skew-Tolerant Circuit Design

Appendix A: Digital Logic. Principles of Computer Architecture. Principles of Computer Architecture by M. Murdocca and V. Heuring

CSE241 VLSI Digital Circuits Winter Lecture 07: Timing II

MODULE 5 Chapter 7. Clocked Storage Elements

CPE/EE 427, CPE 527 VLSI Design I L18: Circuit Families. Outline

CSE140: Components and Design Techniques for Digital Systems. Decoders, adders, comparators, multipliers and other ALU elements. Tajana Simunic Rosing

Properties of CMOS Gates Snapshot

EE141. Lecture 28 Multipliers. Lecture #20. Project Phase 2 Posted. Sign up for one of three project goals today

Lecture A: Logic Design and Gates

Lecture 12: Adders, Sequential Circuits

COSC 243. Introduction to Logic And Combinatorial Logic. Lecture 4 - Introduction to Logic and Combinatorial Logic. COSC 243 (Computer Architecture)

Lecture 14: Circuit Families

CprE 281: Digital Logic

ESE 570: Digital Integrated Circuits and VLSI Fundamentals

CS 140 Lecture 14 Standard Combinational Modules

CMOS logic gates. João Canas Ferreira. March University of Porto Faculty of Engineering

Robust Energy-Efficient Adder Topologies

EC 413 Computer Organization

ECE 545 Digital System Design with VHDL Lecture 1. Digital Logic Refresher Part A Combinational Logic Building Blocks

EECS150 - Digital Design Lecture 22 - Arithmetic Blocks, Part 1

Transcription:

EE141 Fall 2005 Lecture 18 dders nnouncements Hw 6 due Thursday, Nov 3, 5pm No lab this week Midterm 2 Review: Tue Nov 8, North Gate Hall, Room 105, 6:30-8:30pm Exam: Thu Nov 10, Morgan, Room 101, 6:30-8:00pm Samples available at the class web-site EE141 2

Class Material Last Lecture Dynamic Logic Today s Lecture Dual-Rail Domino, np-cmos dders EE141 3 Domino Logic In 1 In 2 M p 1 1 1 0 PDN Out1 0 0 0 1 In 4 M p M kp PDN Out2 In 3 In 5 M e M e Evaluation (conditional discharge) ONLY 0 1 transitions during evaluation! EE141 4

Footless Domino M p M p M p Out 1 0->1 Out 2 0- >1 Out n 0->1 In 1 1- >0 In 2 1->0 In 3 1->0 In n 1->0 The first gate in the chain needs a foot switch Precharge is rippling (next stage has to wait for propagation delay of inverter from the previous stage) Static power consumption EE141 5 Differential (Dual Rail) Domino Out = off on M p M kp M kp M p 1 0 1 0!!!Out =!() M e Solves the problem of non-inverting logic EE141 6

np-cmos In 1 In 2 In 3 M p PDN M e 1 1 1 0 Out1! In 4 In 5! M e PUN M p 0 0 0 1 Out2 (to PDN) Only 0 1 transitions allowed at inputs of PDN Only 1 0 transitions allowed at inputs of PUN EE141 7 NOR Logic In 1 In 2 In 3 M p PDN M e 1 1 1 0 Out1! In 4 In 5! M e PUN M p 0 0 0 1 Out2 (to PDN) to other PDN s to other PUN s WRNING: Very sensitive to noise! P-blocks are slower EE141 8

Choosing a Logic Style: No Style Fits all Needs General design Considerations Robustness (Static CMOS, Ratioed Logic) rea (Pseudo-NMOS, Static CMOS) Speed (Dynamic, Ratioed Logic) Power (Static CMOS, Dynamic Logic) pplication-specific considerations XOR-dominated functions (PTL) Design tool considerations Static CMOS EE141 9 dders video clip

LUs are Thermal Hotspots! Cache Temp ( o C) Processor thermal map Execution core Integer and FP LUs and MCs Courtesy: R. Krishnamurthy (Intel) LUs: performance and peak-current limiters Goal: high-performance energy-efficient design EE141 11 32-it LU rchitecture External operands External operands Mux control 6:1 Mux 6:1 Mux Shift control 5:1 Mux 2:1 Mux dder core O/p Mux Courtesy: R. Krishnamurthy (Intel) Sum Mux control Sign control Loopback bus Multiple LUs clustered together in the execution core High power density EE141 12

Full dder Cin Full adder Sum Cout EE141 13 The inary dder Cin Full adder Sum Cout S = = + + + C o = + + EE141 14

Express Sum and Carry as a Function of P, G, D Define 3 new variables which ONLY depend on, Generate (G) = Propagate (P) = Delete = Can also derive expressions for S and C o based on D and P Note that we will be sometimes using an alternate definition for Propagate (P) = + EE141 15 The Ripple-Carry dder 0 0 1 1 2 2 3 3,0 C o,0 C o,1 C o,2 C o,3 F F F F (=,1 ) S 0 S 1 S 2 S 3 Worst case delay linear with the number of bits t d = O(N) t adder = (N-1)t carry + t sum Goal: Make the fastest possible carry path circuit EE141 16

Complimentary Static CMOS Full dder X S C o 28 Transistors EE141 17 Inversion Property F C o F C o S S EE141 18

Minimize Critical Path by Reducing Inverting Stages Even cell Odd cell 0 0 1 1 2 2 3 3,0 C o,0 C o,1 C o,2 C o,3 F F F F S 0 S 1 S 2 S 3 Exploit Inversion Property EE141 19 etter Structure: The Mirror dder "0"-Propagate Kill C o S "1"-Propagate Generate 24 Transistors EE141 20

Mirror dder Stick Diagram C o C o S GND EE141 21 Manchester Carry Chain P i φ P i C o G i Ci C o G i P i D i φ EE141 22

Manchester Carry Chain φ P 0 P 1 P 2 P 3 C 3,0 G 0 G 1 G 2 G 3 φ C 0 C 1 C 2 C 3 EE141 23 Manchester Carry Chain Stick Diagram Propagate/Generate Row P i G i φ P i + 1 G i + 1 φ - 1 + 1 GND Inverter/Sum Row EE141 24

Domino Manchester Carry Chain 3 3 3 3 3 P 0 P 1 P 2 P 3 4,0 5 G 0 4 3 2 1,4 G 1 3 G 2 2 G 3 1 6 5 4 3 2!(G 0 + P 0,0 )!(G 1 + P 1 G 0 + P 1 P 0,0 ) EE141 25 Carry-ypass dder,0 P 0 G 1 P 0 G 1 P 2 G 2 P 3 G 3 C o,0 C o,1 C o,2 F F F F C o,3 lso called Carry-Skip P 0 G 1 P 0 G 1 P 2 G 2 P 3 G 3 P=P o P 1 P 2 P 3,0 C o,0 C o,1 C o,2 F F F F Multiplexer C o,3 Idea: If (P 0 and P 1 and P 2 and P 3 = 1), then C o,3 = C o, else kill or generate EE141 26

Carry-ypass dder (Cont.) it 0 3 t setup it 4 7 t bypass it 8 11 it 12 15 Carry propagation Carry propagation Carry propagation Carry propagation Sum Sum Sum t sum Sum M bits t adder = t setup + Mt carry + (N/M-1)t bypass + (M-1)t carry + t sum EE141 27 Carry Ripple vs. Carry ypass t p Ripple adder ypass adder 4-8 N EE141 28

Carry-Select dder P,G "0" "0" Carry Propagation "1" "1" Carry Propagation C o,k-1 Multiplexer Co,k+3 Carry Vector Sum Generation EE141 29 Carry Select dder: Critical Path it 0 3 it 4 7 it 8 11 it 12 15 0 0-Carry 0 0-Carry 0 0-Carry 0 0-Carry 1 1-Carry 1 1-Carry 1 1-Carry 1 1-Carry Multiplexer Multiplexer Multiplexer Multiplexer,0 C o,3 C o,7 C o,11 C o,15 Sum Generation Sum Generation Sum Generation Sum Generation S 0 3 S 4 7 S 8 11 S 12 15 EE141 30

Linear Carry Select 0 1 (1) it 0 3 it 4 7 it 8 11 it 12 15 0-Carry 1-Carry (5) (1) 0 0-Carry 0 0-Carry 0 0-Carry 1 1-Carry 1 1-Carry 1 1-Carry (5) (5) (5) (6) (7) (8) Multiplexer Multiplexer Multiplexer Multiplexer,0 C o,3 C o,7 C o,11 C o,15 Sum Generation Sum Generation Sum Generation S 0 3 S 4 7 S 8 11 N t add = tsetup + M tcarry + tmux + t M Sum Generation sum S 12 15 (9) (10) EE141 31 Square Root Carry Select it 0-1 it 2-4 it 5-8 it 9-13 it 14-19 (1) "0" "0" Carry "0" "0" Carry "0" "0" Carry "0" "0" Carry (1) "1" "1" Carry "1" "1" Carry "1" "1" Carry "1" "1" Carry (3) (3) (4) (5) (6) (4) (5) (6) (7) Multiplexer Multiplexer Multiplexer Multiplexer,0 Sum Generation Sum Generation Sum Generation Sum Generation S 0-1 S 2-4 S 5-8 S 9-13 (7) Mux (8) Sum S 14-19 (9) t = t + M t + 2N t + t add setup carry (N/M) EE141 32 mux sum

dder Delays Comparison 50 t p (in unit delays) 40 30 20 10 Ripple adder Linear select Square root select 0 0 20 40 N 60 EE141 33 Carry Look-head Partial Sum Sum i = i i Carry i-1 Carry i = i i + ( i + i ) Carry i-1 Generate Propagate Carry i = G i + P i Carry i-1 EE141 34

Look-head: asic Idea 0, 0 1, 1 N-1, N-1 The idea is to eliminate carry rippling effect,0 P 0,1 P 1, N-1 P N-1 S 0 S 1 S N-1 C ok, = f ( k, k, C ok ) = G k + P k C ok 1, 1, EE141 35 Look-head: Topology Expanding Look-head equations: C ok, = G k + P k ( G k 1 + P k 1 C ok 2 ), G 3 G 2 G 1 Implementation issues: - long stack (N+1) - or multiple stages still linear delay!,0 P 0 P 1 P 2 G 0 C o,3 P 3 ll the way: C ok, = G k + P k ( G k 1 + P k 1 ( + P 1 ( G 0 + P 0 0 ))), EE141 36

Logarithmic Look-head dder 0 F 1 2 3 4 5 6 7 0 1 t p N 2 3 4 5 6 7 F t p log 2 (N) Idea: large stacks limit carry look-ahead to 2-4 bits organize carry P and G into recursive trees EE141 37 Carry Look-head Trees C o, 0 = G 0 + P 0, 0 C o1, = G 1 + P 1 G 0 + P 1 P 0 0, C o2 = G, 2 + P 2 G 1 + P 2 P 1 G 0 + P 2 P 1 P 0, 0 = ( G 2 + P 2 G 1 ) + ( P 2 P 1 )( G 0 + P 0 0 ) = G 2:1 + P 2:1 C o 0,, Can continue building the tree hierarchically... EE141 38

High-Performance dders: Kogge-Stone Tree dder Even input bits 1 2 3 4 5 6 7 PG Gen. CM1 CM2 CM3 CM4 CM5 XOR Sum even Odd input bits PG Gen. CM1 CM2 CM3 CM4 CM5 XOR Sum odd GG=G i +P i G i-1 GP=P i P i-1 Courtesy: R. Krishnamurthy (Intel) Generate all 32 carries Full-blown binary tree energy-inefficient # carry-merge stages = log 2 (32) 5 stages EE141 39 Kogge-Stone dder Courtesy: R. Krishnamurthy (Intel) PG 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 XOR Carry-merge gates Critical path = PG + 5 + XOR = 7 gate stages Generate, Propagate FO of 2,3 Energy Maximum interconnect spans 16b inefficient EE141 40

Tree dders ( 0, 0 ) ( 1, 1 ) ( 2, 2 ) ( 3, 3 ) ( 4, 4 ) ( 5, 5 ) ( 6, 6 ) ( 7, 7 ) ( 8, 8 ) ( 9, 9 ) ( 10, 10 ) ( 11, 11 ) ( 12, 12 ) ( 13, 13 ) ( 14, 14 ) ( 15, 15 ) S 0 S 1 S 2 S 3 S 4 S 5 S 6 S 7 S 8 S 9 S 10 S 11 S 12 S 13 S 14 S 15 16-bit radix-2 Kogge-Stone tree EE141 41 Example: Domino dder Clk G i = a i b i Clk P i = a i + b i a i a i b i b i Clk Clk Propagate Generate EE141 42

Example: Domino dder The dot operator (carry-merge) Clk k P i:i-2k+1 Clk k G i:i-2k+1 P i:i-k+1 P i:i-k+1 G i:i-k+1 P i-k:i-2k+1 G i-k:i-2k+1 Propagate Generate EE141 43 Example: Domino Sum Keeper Clk Clkd Sum Gi:0 Clk S i 0 Clkd Clk Gi:0 S i 1 Clk EE141 44

Tree dders (a 0, b 0 ) (a 1, b 1 ) (a 2, b 2 ) (a 3, b 3 ) (a 4, b 4 ) (a 5, b 5 ) (a 6, b 6 ) (a 7, b 7 ) (a 8, b 8 ) (a 9, b 9 ) (a 10, b 10 ) (a 11, b 11 ) (a 12, b 12 ) (a 13, b 13 ) (a 14, b 14 ) (a 15, b 15 ) S 0 S 1 S 2 S 3 S 4 S 5 S 6 S 7 S 8 S 9 S 10 S 11 S 12 S 13 S 14 S 15 16-bit radix-4 Kogge-Stone Tree EE141 45 Sparse-Tree dder rchitecture Generate every 4 th carry in parallel Side-path: 4-bit conditional sum generator 73% fewer carry-merge gates energy-efficient Courtesy: R. Krishnamurthy (Intel) EE141 46

dder Core Critical Path dder Inputs clk PG GG 1 clk2 GG 7 clk3 Single-rail dynamic sparse-tree path Sum 31_0 clk CM0 Latch GG 3 CM1 XOR GG 15 Static sum generator GG 27 Sum 31_1 Courtesy: R. Krishnamurthy (Intel) C 27 Sum 31 Critical path: 7 gates same as KS Sparse-tree: single-rail dynamic Exploit non-criticality of sum generator Convert to static logic semi-dynamic design EE141 47 Sparse-Tree rchitecture Performance impact: 20% speedup 33-50% reduced G/P fanouts 80% reduced wiring complexity 30% reduction in maximum interconnect Power impact: 56% reduction 73% fewer carry-merge gates 50% reduction in average transistor size Courtesy: R. Krishnamurthy (Intel) EE141 48

Energy-Delay Space Worst-case Energy (pj) 100 80 60 40 56% 20% 130nm CMOS, 1.2V, 110 o C Dynamic Kogge-Stone Courtesy: R. Krishnamurthy (Intel) 20 4GHz 0 Design Semi-dynamic Sparse-Tree 140 160 180 200 220 240 260 280 Delay (ps) 20% speedup over Kogge-Stone 56% worst-case energy reduction EE141 49 Next Lecture Multipliers Power EE141 50