Comuter arithmetic Intensive Comutation Annalisa Massini 7/8
Intensive Comutation - 7/8 References Comuter Architecture - A Quantitative Aroach Hennessy Patterson Aendix J
Intensive Comutation - 7/8 3 Half adder and Full adder Adders are usually imlemented by combining multile coies of simle comonents The natural comonents for addition are half adders and full adders The half adder takes two bits a and b as inut and roduces a sum bit s and a carry bit c out as outut As logic equations, s ab ab and ab c out
Intensive Comutation - 7/8 4 Half adder and Full adder The full adder takes three bits a, b and c as inut and roduces a sum bit s and a carry bit c out as outut As logic equations, s abc abc abc abc ( a b) c and ( a b) c ab c out The half adder is a (,) adder, since it takes two inuts and roduces two oututs. The full adder is a (3,) adder, since it takes three inuts and roduces two oututs S
Intensive Comutation - 7/8 5 Rile-Carry Addition The rincial roblem in constructing an adder for n-bit numbers out of smaller ieces is roagating the carries from one iece to the next The most obvious way to solve this is with a rile-carry adder, consisting of n full adders a n- b n- a b a b a b S n- s s s
Intensive Comutation - 7/8 6 Rile-Carry Addition The time a circuit takes to roduce an outut is roortional to the maximum number of logic levels through which a signal travels Determining the exact relationshi between logic levels and timings is highly technology deendent a n- b n- a b a b a b S n- s s s
Intensive Comutation - 7/8 7 Rile-Carry Addition When comaring adders we will simly comare the number of logic levels in each one A rile-carry adder takes two levels to comute c from a and b. Then it takes two more levels to comute c from c, a, b, and so on, u to cn So, there are a total of n levels a n- b n- a b a b a b S n- s s s
Intensive Comutation - 7/8 8 Rile-Carry Addition Tyical values of n are 3 for integer arithmetic and 53 for double-recision floating oint The rile-carry adder is the slowest adder, but also the cheaest It can be built with only n simle cells, connected in a simle, regular way a n- b n- a b a b a b S n- s s s
Intensive Comutation - 7/8 9 Rile-Carry Addition The rile-carry adder is relatively slow it takes time O(n) But it is used because in technologies like CMOS, the constant factor is very small Short rile adders are often used as building blocks in larger adders a n- b n- a b a b a b S n- s s s
Intensive Comutation - 7/8 Rile-Carry Addition for Signed Numbers The most widely used system for reresenting integers is the two s comlement, where the MSB is considered associated with a negative weight The value of a two s comlement number an an aa is: n n a a a a n n a n- b n- a b a b a b S n- s s s
Intensive Comutation - 7/8 Rile-Carry Addition for Signed Numbers One reason for the oularity of two s comlement is that it makes signed addition easy Simly discard the carryout from the high order bit Subtraction is executed as an addition: A-B = A+(-B), recalling that X X a n- b n- a b a b a b S n- s s s
Intensive Comutation - 7/8 Rile-Carry Addition for Signed Numbers The Rile-Carry adder can be used also for subtraction acting on second oerand B and on C If line comlement is then oerand B is bit wise comlemented and C = b n- b b b comlement a n- a a a S n- s s s
Intensive Comutation - 7/8 3 Unsigned Multilication The simlest multilier comutes the roduct of two unsigned numbers, a n a n a and b n b n b, one bit at a time Register Product is initially Carry out Product n bits Shift Right A - Multilier n bits n bits B - Multilicand
Intensive Comutation - 7/8 4 Unsigned Multilication Each multily ste has two arts: (i) If the least-significant bit of A is, then register B, containing b n b n b, is added to P; otherwise, is added to P. The sum is laced back into P Carry out Product n bits Shift Right A - Multilier n bits n bits B - Multilicand
Intensive Comutation - 7/8 5 Unsigned Multilication (ii) Registers P and A are shifted right, with the carry-out of the sum being moved into the high-order bit of P, the low-order bit of P being moved into register A, and the rightmost bit of A (not used in the rest of the algorithm) being shifted out Carry out Product n bits Shift Right A - Multilier n bits n bits B - Multilicand
Intensive Comutation - 7/8 6 Unsigned Multilication Hence, we add the contents of P to either B or (deending on the low-order bit of A), relace P with the sum, and then shift both P and A one bit right After n stes, the roduct aears in registers P and A, with A holding the lower-order bits Carry out Product n bits Shift Right A - Multilier n bits n bits B - Multilicand
Intensive Comutation - 7/8 7 Signed Multilication To multily two s comlement numbers, the obvious aroach is to convert oerands to be nonnegative, do an unsigned multilication, and then (if the original oerands were of oosite signs) negate the result This requires extra time and hardware Carry out Product n bits Shift Right A - Multilier n bits n bits B - Multilicand
Intensive Comutation - 7/8 8 Signed Multilication A better aroach to multily A and B using the hardware below: If B is otentially negative but A is nonnegative, to convert the unsigned multilication algorithm into a two s comlement one we need that when P is shifted, it is shifted arithmetically Carry out Product n bits Shift Right A - Multilier n bits n bits B - Multilicand
Intensive Comutation - 7/8 9 Signed Multilication A better aroach to multily A and B using the hardware below: If A is negative, the method is Booth recoding that is based on the fact that any sequence of s in a binary number can be written as =.. - Carry out Product n bits Shift Right A - Multilier n bits n bits B - Multilicand
Intensive Comutation - 7/8 Signed Multilication Then, we relace a string of s in multilier with an initial subtract when we first see a one and then later add for the bit after the last one x + shift ( in multilier) + add ( in multilier) + add ( in multilier) + shift ( in multilier)
Intensive Comutation - 7/8 Signed Multilication Then, we relace a string of s in multilier with an initial subtract when we first see a one and then later add for the bit after the last one x + shift ( in multilier) + add ( in multilier) + add ( in multilier) + shift ( in multilier) x + shift ( in multilier) - sub(first in multl) + shift(mid string of s) + add(rior ste had last )
Intensive Comutation - 7/8 Signed Multilication Hence, to deal with negative values of A, all that is required is to sometimes subtract B from P, instead of adding either B or to P Rules: If the initial content of A is a n a, then ste (i) in the multilication algorithm becomes: If ai = and ai =, then add to P If ai = and ai =, then add B to P If ai = and ai =, then subtract B from P If ai = and ai =, then add to P For the first ste, when i =, take ai to be
Intensive Comutation - 7/8 3 Seeding U Integer Multilication Integer addition is the simlest oeration and the most imortant Even for rograms that don t do exlicit arithmetic, addition must be erformed to increment the rogram counter and to calculate addresses The delay of an N-bit rile-carry adder is: t rile = Nt FA where t FA is the delay of a full adder There are different techniques to increase the seed of integer oerations (that lead to faster floating oint) CLA
Intensive Comutation - 7/8 4 Seeding U Integer Multilication Methods that increase the seed of multilication can be divided into two classes: single adder multile adders In the simle multilier we described, each multilication ste asses through the single adder The amount of comutation in each ste deends on the used adder If the sace for many adders is available, then multilication seed can be imroved
Intensive Comutation - 7/8 5 Pielined arithmetic Consider the instruction ielining already described The rocessor goes through a reetitive cycle of fetching and rocessing instructions In the absence of hazards, the rocessor is continuously fetching instructions from sequential locations the ieline is ket full and a savings in time is achieved Similarly, a ielined ALU will save time if it is fed a stream of data from sequential locations A single, isolated oeration is not seeded u by ieline The seedu is achieved when a vector of oerands is resented to the units in the ALU
Intensive Comutation - 7/8 6 Pielined Addition For n bits oerands, a ieline adder consists of n stages of half adders Registers are inserted at each stage to synchronize the comutation At each clock cycle a new air of oerands is alied to the inuts of the adder a 3 b 3 HA HA HA HA HA HA HA HA HA s 3 a b a b a b HA s s s
Intensive Comutation - 7/8 7 Pielined Addition After n clock cycles, the sum of the first air of oerands is obtained The comuting time for a single sum is the same of the carry-rile adder A new sum is obtained at each clock cycle starting from the (n+)- th clock cycle a 3 b 3 HA HA HA HA HA HA HA HA HA a b a b a b HA s 3 s s s
Intensive Comutation - 7/8 8 Pielined Addition The number of HA is O(n ), whereas the circuit comlexity of the carry-rile adder is O(n) The added circuit comlexity ays off if long sequences of numbers are being added a 3 b 3 HA HA HA HA HA HA HA HA HA a b a b a b HA s 3 s s s
Intensive Comutation - 7/8 9 7 a3b Pielined Unsigned Multilication 6 3 a3b ab 5 3 a3b ab ab 4 3 a3 b3 a3b ab ab ab The roduct of two n bit oerands has length n 3 3 a b ab ab a b Result is obtained by executing n- sums a b ab a b a b a b a 3 b 3 HA HA a b 3 a 3 b a 3 b a 3 b a b a b a b a b a b a b HA a b a b 3 HA HA FA FA FA FA FA FA HA HA HA a b a b 3 a b HA 7 6 5 4 3
Intensive Comutation - 7/8 3 7 a3b Pielined Unsigned Multilication 6 3 a3b ab 5 3 a3b ab ab 4 3 a3 b3 a3b ab ab ab 3 3 a b ab ab a b a b ab a b a b a b a 3 b a 3 b a 3 b a 3 3 b a b a b a b a b a b a b a b a b 3 a b 3 a b HA HA HA FA FA FA a b 3 FA FA FA a b Inuts to the multilier are logical AND among airs of bits HA HA HA HA HA There are (n-) stages of FA or HA 7 HA 6 5 4 3
Intensive Comutation - 7/8 3 Pielined Unsigned Multilication After stage (n-) all bit roducts (AND) are added Last (n-) stages reresent a ielined adder Bit n- of the result is obtained as OR among the carries generated by the most left HA of each stage a 3 b a 3 b a 3 b a 3 3 b a b a b a b a b a b a b a b a b 3 a b 3 a b HA HA HA FA FA FA a b 3 FA FA FA HA HA HA HA HA HA a b 7 6 5 4 3
Intensive Comutation - 7/8 3 Pielined Unsigned Multilication After (n-) clock cycles, the roduct of the first air of oerands is obtained A new result is obtained at each clock cycle starting from the (n-)-th clock cycle a 3 b a 3 b a 3 b a 3 3 b a b a b a b a b a b a b a b a b 3 a b 3 a b HA HA HA FA FA FA a b 3 FA FA FA HA HA HA HA HA HA a b 7 6 5 4 3
Intensive Comutation - 7/8 33 Pielined Signed Multilication Signed numbers are extended to the length n of the roduct and used as oerands a 5 b a 4 b HA a 3 b FA a 4 b a 3 b a 3 b a b a b a b HA HA HA a b a b a b FA FA FA a b HA a b a b a b 3 a b 3 a b 3 FA FA FA a5b a4b3 a3b4 a b 5 a5b a4b a3b ab a b 3 4 5 a5 b5 a5b a4b a3b ab ab ab 5 3 4 5 a4 b4 a4b a3b ab ab a b 4 3 4 a3 b3 a3b ab ab a b 3 3 a b ab ab a b a b ab a b a b a b a b 4 FA a b 5 FA a b 4 FA 5 4 3
Intensive Comutation - 7/8 34 Pielined Signed Multilication Partial roducts of length n are considered (the remaining art is ignored) All stages but the first consists of FAs a 5 b a 4 b a 4 b a 3 b a 3 b a b a b a b HA HA HA HA a 3 b a b a b a b FA FA FA FA a b 3 a b 3 a b 3 FA FA FA a b HA a b a b a5b a4b3 a3b4 a b 5 a5b a4b a3b ab a b 3 4 5 a5 b5 a5b a4b a3b ab ab ab 5 3 4 5 a4 b4 a4b a3b ab ab a b 4 3 4 a3 b3 a3b ab ab a b 3 3 a b ab ab a b a b ab a b a b a b a b 4 FA a b 5 FA a b 4 FA 5 4 3
Intensive Comutation - 7/8 35 CIRCUIT AREA AND TIME EVALUATION
Intensive Comutation - 7/8 36 Circuit area and time To discuss about the time and area, it is useful the analytical model (unit-gate model) resented in A. Tyagi, A reduced-area scheme for carry-select adders, IEEE Trans. Comut., 993 They use a simlistic model for gate-count and gate-delay: Each gate excet EX-OR counts as one elementary gate An EX-OR gate is counted as two elementary gates, because in static (restoring) CMOS, an EX-OR gate is imlemented as two elementary gates (NAND) The delay through an elementary gate is counted as one gatedelay unit, but an EX-OR gate is two gate-delay units
Intensive Comutation - 7/8 37 Circuit area and time In this model we are ignoring the fanin and fanout of a gate This can lead to unfair comarisons for circuits containing gates with a large difference in fanin or fanout For instance, gates in the CLA adder have different fanin A carry-rile adder has no gates with fanin and fanout greater than The best comarison for a VLSI imlementation is actual area and time The gate-count and gate-delay comarisons may not always be consistent with the area-time comarisons
Intensive Comutation - 7/8 38 Circuit area and time To simlify we consider: Any gate (but the EX-OR) counts as one gate for both area and delay A gate and T gate An exclusive-or gate counts as two elementary gates for both area and delay A EX-OR =A gate and T EX-OR =T gate An m-inut gate counts as m gates for area and log m gates for delay A m-gate =(m-)a gate and T m-gate = log m T gate
Intensive Comutation - 7/8 39 Circuit area and time A half adder (HA) has: delay unit gates T HA = T gate area 3 unit gates A HA = 3 A gate
Intensive Comutation - 7/8 4 Circuit area and time A half adder (HA) has: delay unit gates T HA = T gate area 3 unit gates A HA = 3 A gate A full adder (FA) has: delay unit gates T FA = 4 T gate area 3 unit gates A FA = 7 A gate
Intensive Comutation - 7/8 4 Circuit area and time A half adder (HA) has: delay unit gates T HA = T gate area 3 unit gates A HA = 3 A gate A full adder (FA) has: delay unit gates T FA = 4 T gate = T HA area 3 unit gates A FA = 7 A gate = A HA + A gate S
Intensive Comutation - 7/8 4 Circuit area and time A carry-rile adder for n-bits oerands has: delay T CR-adder T CR-adder = n T FA = n T HA = 4n T gate area A CR-adder A CR-adder = n A FA = n A HA + n A gate = 7n A gate a n- b n- a b a b a b S n- s s s