Digital Integrated Circuits A Design Perspective. Arithmetic Circuits. Jan M. Rabaey Anantha Chandrakasan Borivoje Nikolic.

Similar documents
Digital Integrated Circuits A Design Perspective. Arithmetic Circuits. Jan M. Rabaey Anantha Chandrakasan Borivoje Nikolic.

Digital Integrated Circuits A Design Perspective. Arithmetic Circuits

Digital Integrated Circuits A Design Perspective

Bit-Sliced Design. EECS 141 F01 Arithmetic Circuits. A Generic Digital Processor. Full-Adder. The Binary Adder

CMPEN 411 VLSI Digital Circuits Spring Lecture 19: Adder Design

Hw 6 due Thursday, Nov 3, 5pm No lab this week

Arithmetic Building Blocks

EECS 427 Lecture 8: Adders Readings: EECS 427 F09 Lecture 8 1. Reminders. HW3 project initial proposal: due Wednesday 10/7

VLSI Design. [Adapted from Rabaey s Digital Integrated Circuits, 2002, J. Rabaey et al.] ECE 4121 VLSI DEsign.1

EE141-Fall 2010 Digital Integrated Circuits. Announcements. An Intel Microprocessor. Bit-Sliced Design. Class Material. Last lecture.

EE141- Spring 2004 Digital Integrated Circuits

CSE477 VLSI Digital Circuits Fall Lecture 20: Adder Design

Homework 4 due today Quiz #4 today In class (80min) final exam on April 29 Project reports due on May 4. Project presentations May 5, 1-4pm

CMPEN 411 VLSI Digital Circuits Spring Lecture 21: Shifters, Decoders, Muxes

VLSI Design I; A. Milenkovic 1

Hardware Design I Chap. 4 Representative combinational logic

Arithmetic Circuits-2

Design of System Elements. Basics of VLSI

Where are we? Data Path Design

L15: Custom and ASIC VLSI Integration

Arithmetic Circuits-2

Where are we? Data Path Design. Bit Slice Design. Bit Slice Design. Bit Slice Plan

Chapter 5 Arithmetic Circuits

CSE140: Components and Design Techniques for Digital Systems. Decoders, adders, comparators, multipliers and other ALU elements. Tajana Simunic Rosing

ISSN (PRINT): , (ONLINE): , VOLUME-4, ISSUE-10,

9. Datapath Design. Jacob Abraham. Department of Electrical and Computer Engineering The University of Texas at Austin VLSI Design Fall 2017

Chapter 5. Digital Design and Computer Architecture, 2 nd Edition. David Money Harris and Sarah L. Harris. Chapter 5 <1>

ARITHMETIC COMBINATIONAL MODULES AND NETWORKS

Digital Integrated Circuits A Design Perspective

DIGITAL TECHNICS. Dr. Bálint Pődör. Óbuda University, Microelectronics and Technology Institute

Adders, subtractors comparators, multipliers and other ALU elements

Arithmetic Circuits-2

Lecture 4. Adders. Computer Systems Laboratory Stanford University

CS 140 Lecture 14 Standard Combinational Modules

The Inverter. Digital Integrated Circuits A Design Perspective. Jan M. Rabaey Anantha Chandrakasan Borivoje Nikolic

EE141. Lecture 28 Multipliers. Lecture #20. Project Phase 2 Posted. Sign up for one of three project goals today

L8/9: Arithmetic Structures

EFFICIENT MULTIOUTPUT CARRY LOOK-AHEAD ADDERS

Adders, subtractors comparators, multipliers and other ALU elements

ECEN 248: INTRODUCTION TO DIGITAL SYSTEMS DESIGN. Week 9 Dr. Srinivas Shakkottai Dept. of Electrical and Computer Engineering

Lecture 7: Logic design. Combinational logic circuits

CSE140: Components and Design Techniques for Digital Systems. Logic minimization algorithm summary. Instructor: Mohsen Imani UC San Diego

Chapter 7. VLSI System Components

MASSACHUSETTS INSTITUTE OF TECHNOLOGY Department of Electrical Engineering and Computer Sciences

CprE 281: Digital Logic

ALUs and Data Paths. Subtitle: How to design the data path of a processor. 1/8/ L3 Data Path Design Copyright Joanne DeGroat, ECE, OSU 1

Menu. 7-Segment LED. Misc. 7-Segment LED MSI Components >MUX >Adders Memory Devices >D-FF, RAM, ROM Computer/Microprocessor >GCPU

Digital Logic. CS211 Computer Architecture. l Topics. l Transistors (Design & Types) l Logic Gates. l Combinational Circuits.

Part II Addition / Subtraction

CSE 140 Lecture 11 Standard Combinational Modules. CK Cheng and Diba Mirza CSE Dept. UC San Diego

Digital Integrated Circuits A Design Perspective

Datapath Component Tradeoffs

EEC 216 Lecture #3: Power Estimation, Interconnect, & Architecture. Rajeevan Amirtharajah University of California, Davis

Midterm Exam Two is scheduled on April 8 in class. On March 27 I will help you prepare Midterm Exam Two.

Lecture 34: Portable Systems Technology Background Professor Randy H. Katz Computer Science 252 Fall 1995

Introduction to CMOS VLSI Design Lecture 1: Introduction

Floating Point Representation and Digital Logic. Lecture 11 CS301

Part II Addition / Subtraction

Integrated Circuits & Systems

Integer Multipliers 1

EECS150 - Digital Design Lecture 24 - Arithmetic Blocks, Part 2 + Shifters

Overview. Arithmetic circuits. Binary half adder. Binary full adder. Last lecture PLDs ROMs Tristates Design examples

Combinational Logic. By : Ali Mustafa

ECE 545 Digital System Design with VHDL Lecture 1. Digital Logic Refresher Part A Combinational Logic Building Blocks

1 Short adders. t total_ripple8 = t first + 6*t middle + t last = 4t p + 6*2t p + 2t p = 18t p

Digital Integrated Circuits A Design Perspective

Name: Answers. Mean: 83, Standard Deviation: 12 Q1 Q2 Q3 Q4 Q5 Q6 Total. ESE370 Fall 2015

CMOS logic gates. João Canas Ferreira. March University of Porto Faculty of Engineering

KINGS COLLEGE OF ENGINEERING DEPARTMENT OF ELECTRONICS AND COMMUNICATION ENGINEERING QUESTION BANK

Logic and Computer Design Fundamentals. Chapter 5 Arithmetic Functions and Circuits

Area-Time Optimal Adder with Relative Placement Generator

Introduction to Computer Engineering. CS/ECE 252, Fall 2012 Prof. Guri Sohi Computer Sciences Department University of Wisconsin Madison

Fundamentals of Computer Systems

S No. Questions Bloom s Taxonomy Level UNIT-I

Appendix B. Review of Digital Logic. Baback Izadi Division of Engineering Programs

Digital Integrated Circuits A Design Perspective

Timing Issues. Digital Integrated Circuits A Design Perspective. Jan M. Rabaey Anantha Chandrakasan Borivoje Nikolić. January 2003

CMPE12 - Notes chapter 1. Digital Logic. (Textbook Chapter 3)

EECS150 - Digital Design Lecture 25 Shifters and Counters. Recap

Jan M. Rabaey Anantha Chandrakasan Borivoje Nikolic. November Digital Integrated Circuits 2nd Sequential Circuits

Design and Implementation of Carry Tree Adders using Low Power FPGAs

CMP 334: Seventh Class

CSEE 3827: Fundamentals of Computer Systems. Combinational Circuits

Chapter 5 CMOS Logic Gate Design

Topics. Dynamic CMOS Sequential Design Memory and Control. John A. Chandy Dept. of Electrical and Computer Engineering University of Connecticut

7 Multipliers and their VHDL representation

Review for Final Exam

Chapter Overview. Memory Classification. Memory Architectures. The Memory Core. Periphery. Reliability. Memory

Lecture 18: Datapath Functional Units

EE241 - Spring 2001 Advanced Digital Integrated Circuits

Combinational Logic. Mantıksal Tasarım BBM231. section instructor: Ufuk Çelikcan

School of EECS Seoul National University

Vectorized 128-bit Input FP16/FP32/ FP64 Floating-Point Multiplier

14:332:231 DIGITAL LOGIC DESIGN

Semiconductor Memories

EE141-Fall 2011 Digital Integrated Circuits

EE115C Digital Electronic Circuits Homework #6

Digital Integrated Circuits A Design Perspective

LOGIC CIRCUITS. Basic Experiment and Design of Electronics. Ho Kyung Kim, Ph.D.

CSE 140 Midterm 3 version A Tajana Simunic Rosing Spring 2015

Transcription:

Digital Integrated Circuits A Design Perspective Jan M. Rabaey Anantha Chandrakasan Borivoje Nikolic Arithmetic Circuits January, 2003 1

A Generic Digital Processor MEMORY INPUT-OUTPUT CONTROL DATAPATH 2

Building Blocks for Digital Architectures Arithmetic unit - Bit-sliced datapath (adder, multiplier, shifter, comparator, etc.) Memory - RAM, ROM, Buffers, Shift registers Control - Finite state machine (PLA, random logic.) - Counters Interconnect -Switches -Arbiters -Bus 3

An Intel Microprocessor 9-1 Mux 5-1 Mux a CARRYGEN g64 node1 ck1 SUMSEL REG sum sumb to Cache 9-1 Mux 2-1 Mux b SUMGEN + LU s0 s1 LU : Logical Unit 1000um Itanium has 6 integer execution units like this 4

Bit-Sliced Design Control Bit 3 Data-In Register Adder Shifter Multiplexer Bit 2 Bit 1 Bit 0 Data-Out Tile identical processing elements 5

Bit-Sliced Datapath From register files / Cache / Bypass Multiplexers Shifter Adder stage 1 Loopback Bus Loopback Bus Wiring Adder stage 2 Wiring Loopback Bus Bit slice 63 Adder stage 3 Sum Select Bit slice 2 Bit slice 1 Bit slice 0 To register files / Cache 6

Itanium Integer Datapath Fetzer, Orton, ISSCC 02 7

Adders 8

Full-Adder A B Cin Full adder Sum Cout 9

The Binary Adder A B Cin Full adder Sum Cout S = A B C i = ABC i + ABC i + ABC i + ABC i C o = AB + BC i + AC i 10

Express Sum and Carry as a function of P, G, D Define 3 new variable which ONLY depend on A, B Generate (G) = AB Propagate (P) = A B Delete = A B Can also derive expressions for S and C o based on D and P Note that we will be sometimes using an alternate definition for Propagate (P) = A + B 11

Complimentary Static CMOS Full Adder V DD V DD C i A B A B A B A C i X B C i V DD C i A S C i A B B V DD A B C i A C o B 28 Transistors 12

A Better Structure: The Mirror Adder V DD V DD V DD A "0"-Propagate A C i B B A Kill C o A B C i B C i S "1"-Propagate A Generate C i A B B A B C i A B 24 transistors 13

Mirror Adder Stick Diagram V DD A B C i B A C i C o C i A B C o S GND 14

The Mirror Adder The NMOS and PMOS chains are completely symmetrical. A maximum of two series transistors can be observed in the carrygeneration circuitry. When laying out the cell, the most critical issue is the minimization of the capacitance at node C o. The reduction of the diffusion capacitances is particularly important. The capacitance at node C o is composed of four diffusion capacitances, two internal gate capacitances, and six gate capacitances in the connecting adder cell. The transistors connected to C i are placed closest to the output. Only the transistors in the carry stage have to be optimized for optimal speed. All transistors in the sum stage can be minimal size. 15

Transmission Gate Full Adder P V DD A V DD A A P C i C i P S Sum Generation V DD B A P B A P P V DD C o Carry Generation C i C i Setup A C i P 16

One-phase dynamic CMOS adder 17

One-phase dynamic CMOS adder 18

One-phase dynamic CMOS adder 19

The Ripple-Carry Adder A 0 B 0 A 1 B 1 A 2 B 2 A 3 B 3 C i,0 C o,0 C o,1 C o,2 C o,3 FA FA FA FA (= C i,1 ) S 0 S 1 S 2 S 3 Worst case delay linear with the number of bits t d = O(N) t adder = (N-1)t carry + t sum Goal: Make the fastest possible carry path circuit 20

Inversion Property A B A B C i FA C o C i FA C o S S 21

Minimize Critical Path by Reducing Inverting Stages Even cell Odd cell A 0 B 0 A 1 B 1 A 2 B 2 A 3 B 3 C i,0 C o,0 C o,1 C o,2 C o,3 FA FA FA FA S 0 S 1 S 2 S 3 Exploit Inversion Property 22

Carry Look-Ahead Adders 23

Carry-Lookahead Adders 24

Carry-Lookahead Adders 25

Look-Ahead: Topology Expanding Lookahead equations: V DD C o k, = G k + P k ( G k 1 + P k 1 C ok 2 ), G 3 G 2 All the way: C o k, = G k + P k ( G k 1 + P k 1 ( + P 1 ( G 0 + P 0 C i0 ))), C i,0 G 1 G 0 C o,3 P 0 P 1 P 2 P 3 26

Manchester Carry Chain V DD P i φ P i V DD C o G i C i Ci C o G i P i D i φ 27

Manchester Carry Chain φ V DD P 0 P 1 P 2 P 3 C 3 C i,0 G 0 G 1 G 2 G 3 φ C 0 C 1 C 2 C 3 28

Manchester Carry Chain Stick Diagram Propagate/Generate Row V DD P i G i φ P i + 1 G i + 1 φ C i C i - 1 C i + 1 GND Inverter/Sum Row 29

Carry-Bypass Adder C i,0 P 0 G 1 P 0 G 1 P 2 G 2 P 3 G 3 C o,0 C o,1 C o,2 FA FA FA FA C o,3 Also called Carry-Skip P 0 G 1 P 0 G 1 P 2 G 2 P 3 G 3 BP=P o P 1 P 2 P 3 C i,0 C o,0 C o,1 C o,2 FA FA FA FA Multiplexer C o,3 Idea: If (P0 and P1 and P2 and P3 = 1) then C o3 = C 0, else kill or generate. 30

Carry-Bypass Adder (cont.) Bit 0 3 Bit 4 7 Bit 8 11 Bit 12 15 Setup t setup Setup t bypass Setup Setup Carry propagation Carry propagation Carry propagation Carry propagation Sum Sum Sum t sum Sum M bits t adder = t setup + M tcarry + (N/M 1)t bypass + (M 1)t carry + t sum 31

Carry Ripple versus Carry Bypass t p ripple adder bypass adder 4..8 N 32

Carry-Select Adder Setup P,G "0" "0" Carry Propagation "1" "1" Carry Propagation C o,k-1 Multiplexer Co,k+3 Sum Generation Carry Vector 33

Carry Select Adder: Critical Path Bit 0 3 Bit 4 7 Bit 8 11 Bit 12 15 Setup Setup Setup Setup 0 0-Carry 0 0-Carry 0 0-Carry 0 0-Carry 1 1-Carry 1 1-Carry 1 1-Carry 1 1-Carry Multiplexer Multiplexer Multiplexer Multiplexer C i,0 C o,3 C o,7 C o,11 C o,15 Sum Generation Sum Generation Sum Generation Sum Generation S 0 3 S 4 7 S 8 11 S 12 15 34

Linear Carry Select Bit 0-3 Bit 4-7 Bit 8-11 Bit 12-15 Setup Setup Setup Setup (1) "0" (1) "0" Carry "0" "0" Carry "0" "0" Carry "0" "0" Carry C i,0 "1" "1" Carry (5) (5) Multiplexer "1" Carry "1" Carry "1" Carry "1" "1" "1" (5) (5) (5) (6) (7) (8) Multiplexer Multiplexer Multiplexer (9) Sum Generation Sum Generation Sum Generation Sum Generation S 0-3 S 4-7 S 8-11 S 12-15 (10) 35

Square Root Carry Select Bit 0-1 Bit 2-4 Bit 5-8 Bit 9-13 Bit 14-19 Setup Setup Setup Setup (1) "0" Carry "0" (1) "0" "0" Carry "0" "0" Carry "0" "0" Carry C i,0 "1" Carry "1" Carry "1" Carry "1" Carry "1" "1" "1" "1" (3) (3) (4) (5) (6) (4) (5) (6) (7) Multiplexer Multiplexer Multiplexer Multiplexer (7) Mux (8) Sum Generation Sum Generation Sum Generation Sum Generation Sum S 0-1 S 2-4 S 5-8 S 9-13 S 14-19 (9) 36

Adder Delays - Comparison 50 t p (in unit delays) 40 30 20 10 Ripple adder Linear select Square root select 0 0 20 40 N 60 37

O Operator Definizione 38

Properties of the O operator 39

Properties of the O operator 40

Group Generate and Propagate 41

Group Generate and Propagate 42

Group Generate and Propagate 43

Look-Ahead - Basic Idea A 0, B 0 A 1, B 1 A N-1, B N-1 C i,0 P 0 C i,1 P 1 C i, N-1 P N-1 S 0 S 1 S N-1 C o k, = fa ( k, B k, C o k ) = G k + P k C o k 1, 1, 44

Logarithmic Look-Ahead Adder A 0 F A 1 A 2 A 3 A 4 A 5 A 6 A 7 A 0 A 1 t p N A 2 A 3 A 4 A 5 A 6 A 7 F t p log 2 (N) 45

Brent-Kung BLC adder 46

Folding of the inverse tree 47

Folding the inverse tree 48

Dense tree with minimum fan-out 49

Dense tree with simple connections 50

Carry Lookahead Trees C o2, = C o1 C o0, = G 0 + P 0 C i, 0, = G 1 + P 1 G 0 + P 1 P 0 C i, 0 G 2 + P 2 G 1 + P 2 P 1 G 0 + P 2 P 1 P 0 C i0 = ( G 2 + P 2 G 1 ) + ( P 2 P 1 )( G 0 + P 0 C i 0 ) = G 2:1 + P 2:1 C o0 Can continue building the tree hierarchically.,,, 51

Tree Adders (A0, B 0 ) (A1, B 1 ) (A2, B 2 ) (A3, B 3 ) (A4, B 4 ) (A5, B 5 ) (A6, B 6 ) (A7, B 7 ) (A8, B 8 ) (A9, B 9 ) (A10, B 10 ) (A11, B 11 ) (A12, B 12 ) (A13, B 13 ) (A14, B 14 ) (A15, B 15 ) S0 S1 S2 S3 S4 S5 S6 S7 S8 S9 S10 S11 S12 S13 S14 S15 16-bit radix-2 Kogge-Stone tree 52

Tree Adders (a0, b ) 0 (a1, b ) 1 (a2, b ) 2 (a3, b ) 3 (a4, b ) 4 (a5, b ) 5 (a6, b ) 6 (a7, b ) 7 (a8, b ) 8 (a9, b ) 9 (a10, b ) 10 (a11, b ) 11 (a12, b ) 12 (a13, b ) 13 (a14, b ) 14 (a15, b ) 15 S0 S1 S2 S3 S4 S5 S6 S7 S8 S9 S10 S11 S12 S13 S14 S15 16-bit radix-4 Kogge-Stone Tree 53

Sparse Trees (a0, b ) 0 (a1, b ) 1 (a2, b ) 2 (a3, b ) 3 (a4, b ) 4 (a5, b ) 5 (a6, b ) 6 (a7, b ) 7 (a8, b ) 8 (a9, b ) 9 (a10, b ) 10 (a11, b ) 11 (a12, b ) 12 (a13, b ) 13 (a14, b ) 14 (a15, b ) 15 S0 S1 S2 S3 S4 S5 S6 S7 S8 S9 S10 S11 S12 S13 S14 S15 16-bit radix-2 sparse tree with sparseness of 2 54

Tree Adders (A0, B ) 0 (A1, B ) 1 (A2, B ) 2 (A3, B ) 3 (A4, B ) 4 (A5, B ) 5 (A6, B ) 6 (A7, B ) 7 (A8, B ) 8 (A9, B ) 9 (A10, B ) 10 (A11, B ) 11 (A12, B ) 12 (A13, B ) 13 (A14, B ) 14 (A15, B ) 15 S0 S1 S2 S3 S4 S5 S6 S7 S8 S9 S10 S11 S12 S13 S14 S15 Brent-Kung Tree 55

Example: Domino Adder V DD V DD Clk G i = a i b i Clk P i = a i + b i a i a i b i b i Clk Clk Propagate Generate 56

Example: Domino Adder V DD V DD Clk k P i:i-2k+1 Clk k G i:i-2k+1 P i:i-k+1 P i:i-k+1 G i:i-k+1 P i-k:i-2k+1 G i-k:i-2k+1 Propagate Generate 57

Example: Domino Sum V DD V DD Keeper Clk Clkd Sum Gi:0 Clk S i 0 Clkd Clk Gi:0 S i 1 Clk 58

Adders Summary 59

Multipliers 60

The Binary Multiplication 61

The Binary Multiplication 62

The Binary Multiplication + x 1 0 1 0 1 0 1 0 1 1 1 0 1 0 1 0 1 0 1 0 1 0 0 0 0 0 0 0 1 0 1 0 1 0 1 1 1 0 0 1 1 1 0 Multiplicand Multiplier Partial products Result 63

The Array Multiplier X 3 X 2 X 1 X 0 Y 0 X 3 X 2 X 1 X 0 Y 1 Z 0 HA FA FA HA X 3 X 2 X 1 X 0 Y 2 Z 1 FA FA FA HA X3 X 2 X 1 X 0 Y 3 Z 2 FA FA FA HA Z 7 Z 6 Z 5 Z 4 Z 3 64

The MxN Array Multiplier Critical Path HA FA FA HA FA FA FA HA Critical Path 1 Critical Path 2 FA FA FA HA Critical Path 1 & 2 65

Carry-Save Multiplier HA HA HA HA HA FA FA FA HA FA FA FA HA FA FA HA Vector Merging Adder 66

Multiplier Floorplan X 3 X 2 X 1 X 0 Y 0 Y 1 C S C S C S C S Z 0 HA Multiplier Cell FA Multiplier Cell Y 2 C S C S C S C S Z 1 Vector Merging Cell Y 3 C S C S C S C S Z 2 X and Y signals are broadcasted through the complete array. ( ) C S C S C S C S Z 7 Z 6 Z 5 Z 4 Z 3 67

Wallace-Tree Multiplier Partial products First stage 6 5 4 3 2 1 0 6 5 4 3 2 1 0 Bit position (a) (b) Second stage Final adder 6 5 4 3 2 1 0 6 5 4 3 2 1 0 FA (c) HA (d) 68

Wallace-Tree Multiplier Partial products x 3 y 3 x 3 y 2 x 2 y 2 x 3 y 1 x 1 y 2 x 3 y 0 x 1 y 1 x 2 y 0 x 0 y 1 x 2 y 3 x1 y 3 x 0 y 3 x 2 y 1 x 0 y 2 x 1 y 0 x 0 y 0 First stage HA HA Second stage FA FA FA FA Final adder z 7 z 6 z 5 z 4 z 3 z 2 z 1 z 0 69

Wallace-Tree Multiplier y 0 y 1 y 2 FA C i-1 y 0 y 1 y 2 y 3 y 4 y 5 y 3 FA FA C i FA C i-1 C i C i C i-1 C i-1 y 4 FA C i FA C i-1 C i C i-1 y 5 C i FA FA C S C S 70

Wallace Tree Mult.. Performance 71

Wallace Tree Multiplier Complexity 72

4:2 Adder 73

Eight-input input Tree 74

Architectural comparison of multiplier solutions 75

SPIM Architecture 76

SPIM Pipe Timing 77

SPIM Microphotograph 78

SPIM clock generator circuit 79

Binary Tree Multiplier Performance 80

Binary Tree Multiplier Complexity 81

Multipliers Summary Optimization Goals Different Vs Binary Adder Once Again: Identify Critical Path Other possible techniques - Logarithmic versus Linear (Wallace Tree Mult) - Data encoding (Booth) - Pipelining FIRST GLIMPSE AT SYSTEM LEVEL OPTIMIZATION 82

Multipliers Summary 83

Booth encoding 84

Booth encoding 85

Tree Multiplier with Booth Encoding 86

The f.p. addition algorithm Exponent comparison and swap (if needed) Mantissas aligment Addition Normalization of result Rounding of result 87

The f.p. multiplication algorithm Mantissas multiplication Exponent addition Mantissa normalization and exponent adjusting (if needed) Rounding of result 88

Dividers 89

Iterative Division (Newton-Raphson) 90

Iterative Division (Newton-Raphson) 91

Quadratic Convergence of the Newton Method 92

Properties of the Newton Method Asymptotically quadratic convergence Correction of round-off errors Final multiplication by a generates a round-off problem incompatible with the Standard IEEE-754 93

Iterative Division (Goldschmidt) 94

Iterative division (Goldschmidt) 95

Iterative Division (Goldschmidt) The sequence of x n tends to a/b The convergence is quadratic In its present form, this method is affected by roud-off errors 96

Modified Goldschmidt Algorithm (correction of round-off off errors) 97

The Link between the Newton-Raphson and Goldschmidt Methods 98

The Link between the Newton-Raphson and Goldschmidt Methods 99

Comparison between Newton and Goldschmidt methods The Newton and Goldschmidt methods are essentially equivalent; Both methods exhibit an asymptotically quadratic convergence; Both methods are able to correct roundoff errors; The Goldschmidt methods directly computes the a/b ratio. 100

Shifters 101

The Binary Shifter Right nop Left A i B i A i-1 B i-1 Bit-Slice i... 102

The Barrel Shifter A 3 B 3 Sh1 A 2 B 2 A 1 Sh2 B 1 : Data Wire : Control Wire A 0 Sh3 B 0 Sh0 Sh1 Sh2 Sh3 Area Dominated by Wiring 103

4x4 barrel shifter A 3 A 2 A 1 A 0 Sh0 Sh1 Sh2 Sh3 Width barrel ~ 2 p m M Buffer 104

Logarithmic Shifter Sh1 Sh1 Sh2 Sh2 Sh4 Sh4 A 3 B 3 A 2 B 2 A 1 B 1 A 0 B 0 105

0-77 bit Logarithmic Shifter A 3 Out3 A 2 Out2 A 1 Out1 A 0 Out0 106

ALUs 107

Two-bit MUX 108

Two-bit MUX Truth Table 109

Two-bit Selector Truth Table 110

Carry-chain Truth Table 111

ALU block diagram (Mead-Conway) 112

ALU Operations 113

ALU Operations 114

ALU Operations 115

ALU Operations 116

ALU Operations 117

MIPS-X Instruction Format 118

Pipeline dependencies in MIPS-X 119

Die Photo of MIPS-X 120

MIPS-X Architecture 121

MIPS-X Instruction Cache-miss timing 122

MIPS-X Tag Memory 123

MIPS-X Valid Store Array 124

RAM Sense Amplifier 125

CMOS Dual-port Register Cell 126

Self-timed bit-line write circuit 127

Register bypass logic 128

Schematic of comparator circuit 129

Squash FSM 130

Cache-miss FSM 131