Computer arithmetic. Intensive Computation. Annalisa Massini 2017/2018

Similar documents
Unit 1 - Computer Arithmetic

Round-off Errors and Computer Arithmetic - (1.2)

Outline. EECS150 - Digital Design Lecture 26 Error Correction Codes, Linear Feedback Shift Registers (LFSRs) Simple Error Detection Coding

Laboratoire de l Informatique du Parallélisme

Evaluating Circuit Reliability Under Probabilistic Gate-Level Fault Models

John Weatherwax. Analysis of Parallel Depth First Search Algorithms

Chapter 5. Digital Design and Computer Architecture, 2 nd Edition. David Money Harris and Sarah L. Harris. Chapter 5 <1>

Analysis of execution time for parallel algorithm to dertmine if it is worth the effort to code and debug in parallel

For q 0; 1; : : : ; `? 1, we have m 0; 1; : : : ; q? 1. The set fh j(x) : j 0; 1; ; : : : ; `? 1g forms a basis for the tness functions dened on the i

Lecture 8: Sequential Multipliers

Shadow Computing: An Energy-Aware Fault Tolerant Computing Model

CMSC 425: Lecture 4 Geometry and Geometric Programming

Characterizing the Behavior of a Probabilistic CMOS Switch Through Analytical Models and Its Verification Through Simulations

Distributed Rule-Based Inference in the Presence of Redundant Information

EE 457 HW 2 Arithmetic Designs Redekopp Puvvada Name: Due: See Website

PHYS 301 HOMEWORK #9-- SOLUTIONS

LIMITATIONS OF RECEPTRON. XOR Problem The failure of the perceptron to successfully simple problem such as XOR (Minsky and Papert).

Cryptanalysis of Pseudorandom Generators

VLSI Design Issues. ECE 410, Prof. F. Salem/Prof. A. Mason notes update

Meshless Methods for Scientific Computing Final Project

ECE/CS 552: Introduction To Computer Architecture 1. Instructor:Mikko H Lipasti. Fall 2010 University i of Wisconsin-Madison

MODELING THE RELIABILITY OF C4ISR SYSTEMS HARDWARE/SOFTWARE COMPONENTS USING AN IMPROVED MARKOV MODEL

On Line Parameter Estimation of Electric Systems using the Bacterial Foraging Algorithm

Number representation

Fault Tolerant Quantum Computing Robert Rogers, Thomas Sylwester, Abe Pauls

Design Constraint for Fine Grain Supply Voltage Control LSI

Design of Sequential Circuits

Feedback-error control

Uncorrelated Multilinear Principal Component Analysis for Unsupervised Multilinear Subspace Learning

Cmpt 250 Unsigned Numbers January 11, 2008

4. Score normalization technical details We now discuss the technical details of the score normalization method.

9. Datapath Design. Jacob Abraham. Department of Electrical and Computer Engineering The University of Texas at Austin VLSI Design Fall 2017

Binary Multipliers. Reading: Study Chapter 3. The key trick of multiplication is memorizing a digit-to-digit table Everything else was just adding

Linear diophantine equations for discrete tomography

ECE260: Fundamentals of Computer Engineering

Synthesis of Moore FSM with Expanded Coding Space

The Euler Phi Function

AI*IA 2003 Fusion of Multiple Pattern Classifiers PART III

D.1 Deutsch-Jozsa algorithm

Convex Optimization methods for Computing Channel Capacity

rate~ If no additional source of holes were present, the excess

5. PRESSURE AND VELOCITY SPRING Each component of momentum satisfies its own scalar-transport equation. For one cell:

GIVEN an input sequence x 0,..., x n 1 and the

Recent Developments in Multilayer Perceptron Neural Networks

Parallelism and Locality in Priority Queues. A. Ranade S. Cheng E. Deprit J. Jones S. Shih. University of California. Berkeley, CA 94720

Combining Logistic Regression with Kriging for Mapping the Risk of Occurrence of Unexploded Ordnance (UXO)

Section 0.10: Complex Numbers from Precalculus Prerequisites a.k.a. Chapter 0 by Carl Stitz, PhD, and Jeff Zeager, PhD, is available under a Creative

Modeling and Estimation of Full-Chip Leakage Current Considering Within-Die Correlation

Paper C Exact Volume Balance Versus Exact Mass Balance in Compositional Reservoir Simulation

The Noise Power Ratio - Theory and ADC Testing

A Parallel Algorithm for Minimization of Finite Automata

Fundamentals of Digital Design

A randomized sorting algorithm on the BSP model

Research of PMU Optimal Placement in Power Systems

Automatic Generation and Integration of Equations of Motion for Linked Mechanical Systems

An Investigation on the Numerical Ill-conditioning of Hybrid State Estimators

Optical Fibres - Dispersion Part 1

Indirect Rotor Field Orientation Vector Control for Induction Motor Drives in the Absence of Current Sensors

Advanced Cryptography Midterm Exam

Solved Problems. (a) (b) (c) Figure P4.1 Simple Classification Problems First we draw a line between each set of dark and light data points.

Counters. We ll look at different kinds of counters and discuss how to build them

Proof Nets and Boolean Circuits

Logic and Computer Design Fundamentals. Chapter 5 Arithmetic Functions and Circuits

EE 508 Lecture 13. Statistical Characterization of Filter Characteristics

Multilayer Perceptron Neural Network (MLPs) For Analyzing the Properties of Jordan Oil Shale

Factors Effect on the Saturation Parameter S and there Influences on the Gain Behavior of Ytterbium Doped Fiber Amplifier

Sets of Real Numbers

UNSIGNED BINARY NUMBERS DIGITAL ELECTRONICS SYSTEM DESIGN WHAT ABOUT NEGATIVE NUMBERS? BINARY ADDITION 11/9/2018

Tree and Array Multipliers Ivor Page 1

Knuth-Morris-Pratt Algorithm

Radial Basis Function Networks: Algorithms

The goal differs from prime factorization. Prime factorization would initialize all divisors to be prime numbers instead of integers*

CSE 311 Lecture 02: Logic, Equivalence, and Circuits. Emina Torlak and Kevin Zatloukal

Elliptic Curves and Cryptography

Chapter 5 Arithmetic Circuits

Chapter 7 Rational and Irrational Numbers

NUMBER SYSTEMS. Number theory is the study of the integers. We denote the set of integers by Z:

NUMERICAL AND THEORETICAL INVESTIGATIONS ON DETONATION- INERT CONFINEMENT INTERACTIONS

SIMULATION OF DIFFUSION PROCESSES IN LABYRINTHIC DOMAINS BY USING CELLULAR AUTOMATA

CMP 334: Seventh Class

Algorithms for Air Traffic Flow Management under Stochastic Environments

Spin as Dynamic Variable or Why Parity is Broken

COMPUTER SIMULATION OF A LABORATORY HYDRAULIC SYSTEM WITH MATLAB-SIMULINK

Lecture 8. Sequential Multipliers

Node-voltage method using virtual current sources technique for special cases

arxiv: v2 [quant-ph] 2 Aug 2012

0.6 Factoring 73. As always, the reader is encouraged to multiply out (3

COMPUTERS ORGANIZATION 2ND YEAR COMPUTE SCIENCE MANAGEMENT ENGINEERING UNIT 3 - ARITMETHIC-LOGIC UNIT JOSÉ GARCÍA RODRÍGUEZ JOSÉ ANTONIO SERRA PÉREZ

Churilova Maria Saint-Petersburg State Polytechnical University Department of Applied Mathematics

INF2270 Spring Philipp Häfliger. Lecture 8: Superscalar CPUs, Course Summary/Repetition (1/2)

POINTS ON CONICS MODULO p

Cost/Performance Tradeoff of n-select Square Root Implementations

Universal Finite Memory Coding of Binary Sequences

Efficient Hardware Architecture of SEED S-box for Smart Cards

Finite-State Verification or Model Checking. Finite State Verification (FSV) or Model Checking

arxiv: v3 [cs.lg] 9 Feb 2016

COMPARISON OF VARIOUS OPTIMIZATION TECHNIQUES FOR DESIGN FIR DIGITAL FILTERS

2.6 Primitive equations and vertical coordinates

Pretest (Optional) Use as an additional pacing tool to guide instruction. August 21

MATHEMATICAL MODELLING OF THE WIRELESS COMMUNICATION NETWORK

Transcription:

Comuter arithmetic Intensive Comutation Annalisa Massini 7/8

Intensive Comutation - 7/8 References Comuter Architecture - A Quantitative Aroach Hennessy Patterson Aendix J

Intensive Comutation - 7/8 3 Half adder and Full adder Adders are usually imlemented by combining multile coies of simle comonents The natural comonents for addition are half adders and full adders The half adder takes two bits a and b as inut and roduces a sum bit s and a carry bit c out as outut As logic equations, s ab ab and ab c out

Intensive Comutation - 7/8 4 Half adder and Full adder The full adder takes three bits a, b and c as inut and roduces a sum bit s and a carry bit c out as outut As logic equations, s abc abc abc abc ( a b) c and ( a b) c ab c out The half adder is a (,) adder, since it takes two inuts and roduces two oututs. The full adder is a (3,) adder, since it takes three inuts and roduces two oututs S

Intensive Comutation - 7/8 5 Rile-Carry Addition The rincial roblem in constructing an adder for n-bit numbers out of smaller ieces is roagating the carries from one iece to the next The most obvious way to solve this is with a rile-carry adder, consisting of n full adders a n- b n- a b a b a b S n- s s s

Intensive Comutation - 7/8 6 Rile-Carry Addition The time a circuit takes to roduce an outut is roortional to the maximum number of logic levels through which a signal travels Determining the exact relationshi between logic levels and timings is highly technology deendent a n- b n- a b a b a b S n- s s s

Intensive Comutation - 7/8 7 Rile-Carry Addition When comaring adders we will simly comare the number of logic levels in each one A rile-carry adder takes two levels to comute c from a and b. Then it takes two more levels to comute c from c, a, b, and so on, u to cn So, there are a total of n levels a n- b n- a b a b a b S n- s s s

Intensive Comutation - 7/8 8 Rile-Carry Addition Tyical values of n are 3 for integer arithmetic and 53 for double-recision floating oint The rile-carry adder is the slowest adder, but also the cheaest It can be built with only n simle cells, connected in a simle, regular way a n- b n- a b a b a b S n- s s s

Intensive Comutation - 7/8 9 Rile-Carry Addition The rile-carry adder is relatively slow it takes time O(n) But it is used because in technologies like CMOS, the constant factor is very small Short rile adders are often used as building blocks in larger adders a n- b n- a b a b a b S n- s s s

Intensive Comutation - 7/8 Rile-Carry Addition for Signed Numbers The most widely used system for reresenting integers is the two s comlement, where the MSB is considered associated with a negative weight The value of a two s comlement number an an aa is: n n a a a a n n a n- b n- a b a b a b S n- s s s

Intensive Comutation - 7/8 Rile-Carry Addition for Signed Numbers One reason for the oularity of two s comlement is that it makes signed addition easy Simly discard the carryout from the high order bit Subtraction is executed as an addition: A-B = A+(-B), recalling that X X a n- b n- a b a b a b S n- s s s

Intensive Comutation - 7/8 Rile-Carry Addition for Signed Numbers The Rile-Carry adder can be used also for subtraction acting on second oerand B and on C If line comlement is then oerand B is bit wise comlemented and C = b n- b b b comlement a n- a a a S n- s s s

Intensive Comutation - 7/8 3 Unsigned Multilication The simlest multilier comutes the roduct of two unsigned numbers, a n a n a and b n b n b, one bit at a time Register Product is initially Carry out Product n bits Shift Right A - Multilier n bits n bits B - Multilicand

Intensive Comutation - 7/8 4 Unsigned Multilication Each multily ste has two arts: (i) If the least-significant bit of A is, then register B, containing b n b n b, is added to P; otherwise, is added to P. The sum is laced back into P Carry out Product n bits Shift Right A - Multilier n bits n bits B - Multilicand

Intensive Comutation - 7/8 5 Unsigned Multilication (ii) Registers P and A are shifted right, with the carry-out of the sum being moved into the high-order bit of P, the low-order bit of P being moved into register A, and the rightmost bit of A (not used in the rest of the algorithm) being shifted out Carry out Product n bits Shift Right A - Multilier n bits n bits B - Multilicand

Intensive Comutation - 7/8 6 Unsigned Multilication Hence, we add the contents of P to either B or (deending on the low-order bit of A), relace P with the sum, and then shift both P and A one bit right After n stes, the roduct aears in registers P and A, with A holding the lower-order bits Carry out Product n bits Shift Right A - Multilier n bits n bits B - Multilicand

Intensive Comutation - 7/8 7 Signed Multilication To multily two s comlement numbers, the obvious aroach is to convert oerands to be nonnegative, do an unsigned multilication, and then (if the original oerands were of oosite signs) negate the result This requires extra time and hardware Carry out Product n bits Shift Right A - Multilier n bits n bits B - Multilicand

Intensive Comutation - 7/8 8 Signed Multilication A better aroach to multily A and B using the hardware below: If B is otentially negative but A is nonnegative, to convert the unsigned multilication algorithm into a two s comlement one we need that when P is shifted, it is shifted arithmetically Carry out Product n bits Shift Right A - Multilier n bits n bits B - Multilicand

Intensive Comutation - 7/8 9 Signed Multilication A better aroach to multily A and B using the hardware below: If A is negative, the method is Booth recoding that is based on the fact that any sequence of s in a binary number can be written as =.. - Carry out Product n bits Shift Right A - Multilier n bits n bits B - Multilicand

Intensive Comutation - 7/8 Signed Multilication Then, we relace a string of s in multilier with an initial subtract when we first see a one and then later add for the bit after the last one x + shift ( in multilier) + add ( in multilier) + add ( in multilier) + shift ( in multilier)

Intensive Comutation - 7/8 Signed Multilication Then, we relace a string of s in multilier with an initial subtract when we first see a one and then later add for the bit after the last one x + shift ( in multilier) + add ( in multilier) + add ( in multilier) + shift ( in multilier) x + shift ( in multilier) - sub(first in multl) + shift(mid string of s) + add(rior ste had last )

Intensive Comutation - 7/8 Signed Multilication Hence, to deal with negative values of A, all that is required is to sometimes subtract B from P, instead of adding either B or to P Rules: If the initial content of A is a n a, then ste (i) in the multilication algorithm becomes: If ai = and ai =, then add to P If ai = and ai =, then add B to P If ai = and ai =, then subtract B from P If ai = and ai =, then add to P For the first ste, when i =, take ai to be

Intensive Comutation - 7/8 3 Seeding U Integer Multilication Integer addition is the simlest oeration and the most imortant Even for rograms that don t do exlicit arithmetic, addition must be erformed to increment the rogram counter and to calculate addresses The delay of an N-bit rile-carry adder is: t rile = Nt FA where t FA is the delay of a full adder There are different techniques to increase the seed of integer oerations (that lead to faster floating oint) CLA

Intensive Comutation - 7/8 4 Seeding U Integer Multilication Methods that increase the seed of multilication can be divided into two classes: single adder multile adders In the simle multilier we described, each multilication ste asses through the single adder The amount of comutation in each ste deends on the used adder If the sace for many adders is available, then multilication seed can be imroved

Intensive Comutation - 7/8 5 Pielined arithmetic Consider the instruction ielining already described The rocessor goes through a reetitive cycle of fetching and rocessing instructions In the absence of hazards, the rocessor is continuously fetching instructions from sequential locations the ieline is ket full and a savings in time is achieved Similarly, a ielined ALU will save time if it is fed a stream of data from sequential locations A single, isolated oeration is not seeded u by ieline The seedu is achieved when a vector of oerands is resented to the units in the ALU

Intensive Comutation - 7/8 6 Pielined Addition For n bits oerands, a ieline adder consists of n stages of half adders Registers are inserted at each stage to synchronize the comutation At each clock cycle a new air of oerands is alied to the inuts of the adder a 3 b 3 HA HA HA HA HA HA HA HA HA s 3 a b a b a b HA s s s

Intensive Comutation - 7/8 7 Pielined Addition After n clock cycles, the sum of the first air of oerands is obtained The comuting time for a single sum is the same of the carry-rile adder A new sum is obtained at each clock cycle starting from the (n+)- th clock cycle a 3 b 3 HA HA HA HA HA HA HA HA HA a b a b a b HA s 3 s s s

Intensive Comutation - 7/8 8 Pielined Addition The number of HA is O(n ), whereas the circuit comlexity of the carry-rile adder is O(n) The added circuit comlexity ays off if long sequences of numbers are being added a 3 b 3 HA HA HA HA HA HA HA HA HA a b a b a b HA s 3 s s s

Intensive Comutation - 7/8 9 7 a3b Pielined Unsigned Multilication 6 3 a3b ab 5 3 a3b ab ab 4 3 a3 b3 a3b ab ab ab The roduct of two n bit oerands has length n 3 3 a b ab ab a b Result is obtained by executing n- sums a b ab a b a b a b a 3 b 3 HA HA a b 3 a 3 b a 3 b a 3 b a b a b a b a b a b a b HA a b a b 3 HA HA FA FA FA FA FA FA HA HA HA a b a b 3 a b HA 7 6 5 4 3

Intensive Comutation - 7/8 3 7 a3b Pielined Unsigned Multilication 6 3 a3b ab 5 3 a3b ab ab 4 3 a3 b3 a3b ab ab ab 3 3 a b ab ab a b a b ab a b a b a b a 3 b a 3 b a 3 b a 3 3 b a b a b a b a b a b a b a b a b 3 a b 3 a b HA HA HA FA FA FA a b 3 FA FA FA a b Inuts to the multilier are logical AND among airs of bits HA HA HA HA HA There are (n-) stages of FA or HA 7 HA 6 5 4 3

Intensive Comutation - 7/8 3 Pielined Unsigned Multilication After stage (n-) all bit roducts (AND) are added Last (n-) stages reresent a ielined adder Bit n- of the result is obtained as OR among the carries generated by the most left HA of each stage a 3 b a 3 b a 3 b a 3 3 b a b a b a b a b a b a b a b a b 3 a b 3 a b HA HA HA FA FA FA a b 3 FA FA FA HA HA HA HA HA HA a b 7 6 5 4 3

Intensive Comutation - 7/8 3 Pielined Unsigned Multilication After (n-) clock cycles, the roduct of the first air of oerands is obtained A new result is obtained at each clock cycle starting from the (n-)-th clock cycle a 3 b a 3 b a 3 b a 3 3 b a b a b a b a b a b a b a b a b 3 a b 3 a b HA HA HA FA FA FA a b 3 FA FA FA HA HA HA HA HA HA a b 7 6 5 4 3

Intensive Comutation - 7/8 33 Pielined Signed Multilication Signed numbers are extended to the length n of the roduct and used as oerands a 5 b a 4 b HA a 3 b FA a 4 b a 3 b a 3 b a b a b a b HA HA HA a b a b a b FA FA FA a b HA a b a b a b 3 a b 3 a b 3 FA FA FA a5b a4b3 a3b4 a b 5 a5b a4b a3b ab a b 3 4 5 a5 b5 a5b a4b a3b ab ab ab 5 3 4 5 a4 b4 a4b a3b ab ab a b 4 3 4 a3 b3 a3b ab ab a b 3 3 a b ab ab a b a b ab a b a b a b a b 4 FA a b 5 FA a b 4 FA 5 4 3

Intensive Comutation - 7/8 34 Pielined Signed Multilication Partial roducts of length n are considered (the remaining art is ignored) All stages but the first consists of FAs a 5 b a 4 b a 4 b a 3 b a 3 b a b a b a b HA HA HA HA a 3 b a b a b a b FA FA FA FA a b 3 a b 3 a b 3 FA FA FA a b HA a b a b a5b a4b3 a3b4 a b 5 a5b a4b a3b ab a b 3 4 5 a5 b5 a5b a4b a3b ab ab ab 5 3 4 5 a4 b4 a4b a3b ab ab a b 4 3 4 a3 b3 a3b ab ab a b 3 3 a b ab ab a b a b ab a b a b a b a b 4 FA a b 5 FA a b 4 FA 5 4 3

Intensive Comutation - 7/8 35 CIRCUIT AREA AND TIME EVALUATION

Intensive Comutation - 7/8 36 Circuit area and time To discuss about the time and area, it is useful the analytical model (unit-gate model) resented in A. Tyagi, A reduced-area scheme for carry-select adders, IEEE Trans. Comut., 993 They use a simlistic model for gate-count and gate-delay: Each gate excet EX-OR counts as one elementary gate An EX-OR gate is counted as two elementary gates, because in static (restoring) CMOS, an EX-OR gate is imlemented as two elementary gates (NAND) The delay through an elementary gate is counted as one gatedelay unit, but an EX-OR gate is two gate-delay units

Intensive Comutation - 7/8 37 Circuit area and time In this model we are ignoring the fanin and fanout of a gate This can lead to unfair comarisons for circuits containing gates with a large difference in fanin or fanout For instance, gates in the CLA adder have different fanin A carry-rile adder has no gates with fanin and fanout greater than The best comarison for a VLSI imlementation is actual area and time The gate-count and gate-delay comarisons may not always be consistent with the area-time comarisons

Intensive Comutation - 7/8 38 Circuit area and time To simlify we consider: Any gate (but the EX-OR) counts as one gate for both area and delay A gate and T gate An exclusive-or gate counts as two elementary gates for both area and delay A EX-OR =A gate and T EX-OR =T gate An m-inut gate counts as m gates for area and log m gates for delay A m-gate =(m-)a gate and T m-gate = log m T gate

Intensive Comutation - 7/8 39 Circuit area and time A half adder (HA) has: delay unit gates T HA = T gate area 3 unit gates A HA = 3 A gate

Intensive Comutation - 7/8 4 Circuit area and time A half adder (HA) has: delay unit gates T HA = T gate area 3 unit gates A HA = 3 A gate A full adder (FA) has: delay unit gates T FA = 4 T gate area 3 unit gates A FA = 7 A gate

Intensive Comutation - 7/8 4 Circuit area and time A half adder (HA) has: delay unit gates T HA = T gate area 3 unit gates A HA = 3 A gate A full adder (FA) has: delay unit gates T FA = 4 T gate = T HA area 3 unit gates A FA = 7 A gate = A HA + A gate S

Intensive Comutation - 7/8 4 Circuit area and time A carry-rile adder for n-bits oerands has: delay T CR-adder T CR-adder = n T FA = n T HA = 4n T gate area A CR-adder A CR-adder = n A FA = n A HA + n A gate = 7n A gate a n- b n- a b a b a b S n- s s s