Arithmetic and Logic Unit First Part

Arithmetic and Logic Unit First Part Arquitectura de Computadoras Arturo Díaz D PérezP Centro de Investigación n y de Estudios Avanzados del IPN adiaz@cinvestav.mx Arquitectura de Computadoras ALU1-1

Typical Operations Data Movement Arithmetic Shift Logical Load (from memory) Store (to memory) memory-to-memory move register-to-register move input (from I/O device) output (to I/O device) push, pop (to/from stack) integer (binary + decimal) or FP Add, Subtract, Multiply, Divide shift left/right, rotate left/right not, and, or, set, clear Arquitectura de Computadoras ALU1-2

Operands for ALU instructions ALU instructions combine operands (e.g. ADD) Number of explicit operands Two - destination equals one source Three - orthogonal Arquitectura de Computadoras ALU1-3

MIPS Addressing Modes/Instruction Formats All instructions 32 bits wide Register (direct) op rs rt rd register Immediate op rs rt immed Base+index op rs rt immed Memory register + PC-relative op rs rt immed Memory Register Indirect? PC + Arquitectura de Computadoras ALU1-4

MIPS: Register State 32 integer registers $0 ishardwaredto0 $31 is the return address register software convention for other registers 32 single-precision FP registers or 16 doubleprecision FP registers PC and other special registers Arquitectura de Computadoras ALU1-5

MIPS I Operation Overview Arithmetic Logical: Add, AddU, Sub, SubU, And, Or, Xor, Nor, SLT, SLTU AddI, AddIU, SLTI, SLTIU, AndI, OrI, XorI, LUI SLL, SRL, SRA, SLLV, SRLV, SRAV Memory Access: LB, LBU, LH, LHU, LW, LWL,LWR SB, SH, SW, SWL, SWR Arquitectura de Computadoras ALU1-6

MIPS arithmetic instructions Instruction Example Meaning Comments add add $1,$2,$3 $1 = $2 + $3 3 operands; exception possible subtract sub $1,$2,$3 $1 = $2 $3 3 operands; exception possible add immediate addi $1,$2,100 $1 = $2 + 100 + constant; exception possible add unsigned addu $1,$2,$3 $1 = $2 + $3 3 operands; no exceptions subtract unsigned subu $1,$2,$3 $1 = $2 $3 3 operands; no exceptions add imm. unsign. addiu $1,$2,100 $1 = $2 + 100 + constant; no exceptions multiply mult $2,$3 Hi, Lo = $2 x $3 64-bit signed product multiply unsigned multu$2,$3 Hi, Lo = $2 x $3 64-bit unsigned product divide div $2,$3 Lo = $2 $3, Lo = quotient, Hi = remainder Hi = $2 mod $3 divide unsigned divu $2,$3 Lo = $2 $3, Unsigned quotient & remainder Hi = $2 mod $3 Move from Hi mfhi $1 $1 = Hi Used to get copy of Hi Move from Lo mflo $1 $1 = Lo Used to get copy of Lo Which add for address arithmetic? Which add for integers? Arquitectura de Computadoras ALU1-7

MIPS logical instructions Instruction Example Meaning Comment and and $1,$2,$3 $1 = $2 & $3 3 reg. operands; Logical AND or or $1,$2,$3 $1 = $2 $3 3 reg. operands; Logical OR xor xor $1,$2,$3 $1 = $2 $3 3 reg. operands; Logical XOR nor nor $1,$2,$3 $1 = ~($2 $3) 3 reg. operands; Logical NOR and immediate andi $1,$2,10 $1 = $2 & 10 Logical AND reg, constant or immediate ori $1,$2,10 $1 = $2 10 Logical OR reg, constant xor immediate xori $1, $2,10 $1 = ~$2 &~10 Logical XOR reg, constant shift left logical sll $1,$2,10 $1 = $2 << 10 Shift left by constant shift right logical srl $1,$2,10 $1 = $2 >> 10 Shift right by constant shift right arithm. sra $1,$2,10 $1 = $2 >> 10 Shift right (sign extend) shift left logical sllv $1,$2,$3 $1 = $2 << $3 Shift left by variable shift right logical srlv $1,$2, $3 $1 = $2 >> $3 Shift right by variable shift right arithm. srav $1,$2, $3 $1 = $2 >> $3 Shift right arith. by variable Arquitectura de Computadoras ALU1-8

Details of the MIPS instruction set Register zero always has the value zero (even if you try to write it) Branch/jump and link put the return addr. PC+4 or 8 into the link register (R31) (depends on logical vs physical architecture) All instructions change all 32 bits of the destination register (including lui, lb, lh) and all read all 32 bits of sources (add, sub, and, or, ) Immediate arithmetic and logical instructions are extended as follows: logical immediates ops are zero extended to 32 bits arithmetic immediates ops are sign extended to 32 bits (including addu) The data loaded by the instructions lb and lh are extended as follows: lbu, lhu are zero extended lb, lh are sign extended Overflow can occur in these arithmetic and logical instructions: add, sub, addi it cannot occur in addu, subu, addiu, and, or, xor, nor, shifts, mult, multu, div, divu Arquitectura de Computadoras ALU1-9

MIPS: Instruction Set Format load/store architecture with 3 explicit operands (ALU ops) fixed 32-bit instructions 3 instruction formats» R-Type» I-Type» J-Type 6 instruction set groups:» load/store - data movement operations» computational - arithmetic, logical, and shift operations» jump/branch - including call and returns» coprocessor - FP instructions» coprocessor0 - memory management and exception handling» special - accessing special registers, system calls, breakpoint instructions, etc. Arquitectura de Computadoras ALU1-10

R2000/3000 Instruction Formats R-type (register) e.g. add $8, $17, $18 # $8 = $17 + $18 31 26 25 21 20 16 15 11 10 6 5 0 OpCode rs rt rd shamt funct 31 26 25 21 20 16 15 11 10 6 5 0 0 17 18 8 0 32 Arquitectura de Computadoras ALU1-11

R2000/3000 Instruction Formats I-type (immediate) e.g. addi $8, $17, -44 # $8 = $17-44 lw $8, -44($17) # $8 = M[$17-44] beq $17, $8, label # if( $8 == $17) go to label: 31 26 25 21 20 16 15 0 OpCode rs rt immediate 31 26 25 21 20 16 15 0 op 17 8-44 Arquitectura de Computadoras ALU1-12

R2000/3000 Instruction Formats J-type (jump) e.g. jump label # call label: ; $31 = $pc + 8 31 26 25 0 OpCode target 31 26 25 0 3-44 Arquitectura de Computadoras ALU1-13

Arquitectura de Computadoras ALU1-14

5 Steps of DLX Datapath Instruction Fetch Instruction Decode/ Register Fetch Execute Addr. Calc. Memory Access Write Back M u x 4 Add NPC Zero? PC Inst. Memory IR Registers A B M u x M u x SMD Add ALU Output Data Memory LMD M u x 16 Sign Extend 32 Arquitectura de Computadoras ALU1-15

Useful Circuits for Interconnection Four common and useful MSI circuits are: Decoder Demultiplexer Encoder Multiplexer Block-level outlines of MSI circuits: code decoder entity entity encoder code input mux data data demux output select select Arquitectura de Computadoras ALU1-16

Decoders Codes are frequently used to represent entities These codes can be identified (or decoded) using a decoder. Given a code, identify the entity. Convert binary information from n input lines to (max. of) 2n output lines. Known as n-to-m-line decoder, or simply n:m or n m decoder (m 2n). May be used to generate 2n (or fewer) minterms of n input variables. Arquitectura de Computadoras ALU1-17

Decoders Example: if codes 00, 01, 10, 11 are used to identify four light bulbs, we may use a 2-bit decoder: 2-bit code X Y 2x4 Dec F 0 F 1 F 2 F 3 Bulb 0 Bulb 1 Bulb 2 Bulb 3 This is a 2 4 decoder which selects an output line based on the 2-bit code supplied. Truth table: X Y F 0 F 1 F 2 F 3 0 0 1 0 0 0 0 1 0 1 0 0 1 0 0 0 1 0 1 1 0 0 0 1 Arquitectura de Computadoras ALU1-18

Encoder Encoding is the converse of decoding. Given a set of input lines, where one has been selected, provide a code corresponding to that line. Contains 2n (or fewer) input lines and n output lines. Implemented with OR gates. An example: Select via switches F 0 F 1 F 2 F 3 4-to-2 Encoder D 0 D 1 2-bits code Arquitectura de Computadoras ALU1-19

Encoder Truth table: F 0 F 1 F 2 F 3 D 1 D 0 1 0 0 0 0 0 0 1 0 0 0 1 0 0 1 0 1 0 0 0 0 1 1 1 0 0 0 0 X X 0 0 1 1 X X 0 1 0 1 X X 0 1 1 0 X X 0 1 1 1 X X 1 0 0 1 X X 1 0 1 0 X X 1 0 1 1 X X 1 1 0 0 X X 1 1 0 1 X X 1 1 1 0 X X 1 1 1 1 X X Arquitectura de Computadoras ALU1-20

Encoder With the help of K-map (and don t care conditions), can obtain: D0 = F1 + F3 D1 = F2 + F3 which correspond to circuit: F 0 F 1 D 0 F 2 F 3 D 1 Simple 4-to-2 encoder Arquitectura de Computadoras ALU1-21

Demultiplexer Given an input line and a set of selection lines, the demultiplexer will direct data from input to a selected output line. An example of a 1-to-4 demultiplexer: Outputs Data D demux Y 0 = D.S 1 '.S 0 ' Y 1 = D.S 1 '.S 0 Y 2 = D.S 1.S 0 ' Y 3 = D.S 1.S 0 S 1 S o Y 0 Y 1 Y 2 Y 3 0 0 D 0 0 0 0 1 0 D 0 0 1 0 0 0 D 0 1 1 0 0 0 D S 1 S 0 select Arquitectura de Computadoras ALU1-22

Demultiplexer The demultiplexer is actually identical to a decoder with enable, as illustrated below: S 1 S 0 2x4 Decoder E Y 0 = D.S 1 '.S 0 ' Y 1 = D.S 1 '.S 0 Y 2 = D.S 1.S 0 ' Y 3 = D.S 1.S 0 D Exercise: Provide the truth table for above demultiplexer. Arquitectura de Computadoras ALU1-23

Multiplexer A multiplexer is a device which has (i) a number of input lines (ii) a number of selection lines (iii) one output line It steers one of 2n inputs to a single output line, using n selection lines. Also known as a data selector. inputs : 2 n :1 Multiplexer output... select Arquitectura de Computadoras ALU1-24

Multiplexer Truth table for a 4-to-1 multiplexer: I 0 I 1 I 2 I 3 S 1 S 0 Y d 0 d 1 d 2 d 3 0 0 d 0 d 0 d 1 d 2 d 3 0 1 d 1 d 0 d 1 d 2 d 3 1 0 d 2 d 0 d 1 d 2 d 3 1 1 d 3 S 1 S 0 Y 0 0 I 0 0 1 I 1 1 0 I 2 1 1 I 3 Inputs I 0 I 1 I 2 I 3 0 1 2 3 4:1 MUX Y S 1 S 0 Output Inputs I 0 I 1 I 2 I 3 mux Y select S 1 S 0 select Arquitectura de Computadoras ALU1-25

Arquitectura de Computadoras ALU1-26

Binary Representation b 31 b 30 b 29 b 28 b 3 b 2 b 1 b 0 b 31 30 29 28 2 1 0 31 2 + b30 2 + b29 2 + b28 2 +... + b2 2 + b1 2 + b0 2 0000 0000 0000 0000 0000 0000 0000 0000 2 = 0 10 0000 0000 0000 0000 0000 0000 0000 0001 2 = 1 10 0000 0000 0000 0000 0000 0000 0000 0010 2 = 2 10 0000 0000 0000 0000 0000 0000 0000 1011 2 = 11 10 Arquitectura de Computadoras ALU1-27

Signed Numbers Sign+Magnitude For n-bit numbers, the most significant bit is reserved for sign 0000 0000 0000 0000 0000 0000 0000 1011 2 = 11 10 1000 0000 0000 0000 0000 0000 0000 1011 2 = -11 10 Sign Magnitude Arquitectura de Computadoras ALU1-28

Signed Numbers For n-bit numbers, the negation of B in two s complement is 2 n - B (this is one of the alternative ways of negating a two s-complement number). -B = (2 n -B) 0000 0000 0000 0000 0000 0000 0000 1011 2 = 11 10 1111 1111 1111 1111 1111 1111 1111 0100 2 + 1 2 1111 1111 1111 1111 1111 1111 1111 0101 2 Arquitectura de Computadoras ALU1-29

Signed Numbers For n-bit numbers, the negation of B in two s complement is 2 n - B (this is one of the alternative ways of negating a two s-complement number). -B = (2 n -B) 1111 1111 1111 1111 1111 1111 1111 0101 2 = -11 10 0000 0000 0000 0000 0000 0000 0000 1010 2 + 1 2 0000 0000 0000 0000 0000 0000 0000 1011 2 = 11 10 Arquitectura de Computadoras ALU1-30

Signed Number Systems Here are all the 4-bit numbers in the different systems. Positive numbers are the same in all three representations. Signed magnitude and one s complement have two ways of representing 0. This makes things more complicated. Two s complement has asymmetric ranges; there is one more negative number than positive number. Here, you can represent -8 but not +8. However, two s complement is preferred because it has only one 0, and its addition algorithm is the simplest. Decimal S.M. 1 s comp. 2 s comp. 7 0111 0111 0111 6 0110 0110 0110 5 0101 0101 0101 4 0100 0100 0100 3 0011 0011 0011 2 0010 0010 0010 1 0001 0001 0001 0 0000 0000 0000-0 1000 1111-1 1001 1110 1111-2 1010 1101 1110-3 1011 1100 1101-4 1100 1011 1100-5 1101 1010 1011-6 1110 1001 1010-7 1111 1000 1001-8 1000 Arquitectura de Computadoras ALU1-31

Sign extension In everyday life, decimal numbers are assumed to have an infinite number of 0s in front of them. This helps in lining up numbers. To subtract 231 and 3, for instance, you can imagine: 231-003 228 You need to be careful in extending signed binary numbers, because the leftmost bit is the sign and not part of the magnitude. If you just add 0s in front, you might accidentally change a negative number into a positive one! For example, going from 4-bit to 8-bit numbers: 0101 (+5) should become 0000 0101 (+5). But 1100 (-4) should become 1111 1100 (-4). The proper way to extend a signed binary number is to replicate the sign bit, so the sign is preserved. Arquitectura de Computadoras ALU1-32

Two s s complement addition Negating a two s complement number takes a bit of work, but addition is much easier than with the other two systems To find A + B, you just have to: Do unsigned addition on A and B, including their sign bits. Ignore any carry out. For example, to find 0111 + 1100, or (+7) + (-4): First add 0111 + 1100 as unsigned numbers: Discard the carry out (1). The answer is 0011 (+3). 01 1 1 + 1100 1 001 1 Arquitectura de Computadoras ALU1-33

Another two s s complement example Let s try adding two negative numbers 1101 + 1110, or (-3) + (-2) in decimal. Adding the numbers gives 11011: 1101 + 1110 1 1011 Dropping the carry out (1) leaves us with the answer, 1011 (-5). Arquitectura de Computadoras ALU1-34

Why does this work? For n-bit numbers, the negation of B in two s complement is 2 n - B (this is one of the alternative ways of negating a two s-complement number). A - B = A + (-B) = A + (2 n -B) = (A - B) + 2 n If A B, then (A - B) is a positive number, and 2 n represents a carry out of 1. Discarding this carry out is equivalent to subtracting 2 n, which leaves us with the desired result (A - B). If A < B, then (A - B) is a negative number and we have 2 n - (A - B). This corresponds to the desired result, -(A - B), in two s complement form. Arquitectura de Computadoras ALU1-35

Signed overflow With two s complement and a 4-bit adder, for example, the largest representable decimal number is +7, and the smallest is -8. What if you try to compute 4 + 5, or (-4) + (-5)? 01 00 (+4) + 01 01 (+5) 01 001 (-7) 1100 (-4) + 1 01 1 (-5) 10111 (+7) We cannot just include the carry out to produce a five-digit result, as for unsigned addition. If we did, (-4) + (-5) would result in +23! Also, unlike the case with unsigned numbers, the carry out cannot be used to detect overflow. In the example on the left, the carry out is 0 but there is overflow. Conversely, there are situations where the carry out is 1 but there is no overflow. Arquitectura de Computadoras ALU1-36

Detecting signed overflow The easiest way to detect signed overflow is to look at all the sign bits. 01 00 (+4) + 01 01 (+5) 01 001 (-7) 1100 (-4) + 1 01 1 (-5) 10111 (+7) Overflow occurs only in the two situations above: If you add two positive numbers and get a negative result. If you add two negative numbers and get a positive result. Overflow cannot occur if you add a positive number to a negative number. Do you see why? Arquitectura de Computadoras ALU1-37

Refined Requirements (1) Functional Specification inputs: 2 x 32-bit operands A, B, 4-bit mode outputs: 32-bit result S, 1-bit carry, 1 bit overflow operations: add, addu, sub, subu, and, or, xor, nor, slt, sltu (2) Block Diagram (powerview symbol, VHDL entity) 32 32 c ovf A ALU S 32 B m 4 Arquitectura de Computadoras ALU1-38

Gate-level Design: Half Adder Design procedure: 1) State Problem Example: Build a Half Adder to add two bits 2) Determine and label the inputs & outputs of circuit. Example: Two inputs and two outputs labeled, as follows: X Y 3) Draw truth table. Half Adder (X + Y) S C X Y C S 0 0 0 0 0 1 0 1 1 0 0 1 1 1 1 0 Arquitectura de Computadoras ALU1-39

Gate-level Design: Half Adder 4) Obtain simplified Boolean function. Example: C = X.Y S = X'.Y + X.Y' = X Y X Y C S 0 0 0 0 0 1 0 1 1 0 0 1 1 1 1 0 5) Draw logic diagram. X Y S Half Adder C Arquitectura de Computadoras ALU1-40

Gate-level Design: Full Adder Half-adder adds up only two bits. To add two binary numbers, we need to add 3 bits (including the carry). Example: 1 1 1 carry 0 0 1 1 X + 0 1 1 1 Y 1 0 1 0 S Need Full Adder (so called as it can be made from two halfadders). X Y Z Full Adder S C (X + Y + Z) Arquitectura de Computadoras ALU1-41

Gate-level Design: Full Adder Truth table: X Y Z C S 0 0 0 0 0 0 0 1 0 1 0 1 0 0 1 0 1 1 1 0 1 0 0 0 1 1 0 1 1 0 1 1 0 1 0 1 1 1 1 1 Note: Z - carry in (to the current position) C - carry out (to the next position) C YZ 00 01 11 10 X 0 1 Using K-map, simplified SOP form: C = X.Y + X.Z + Y.Z S = X'.Y'.Z + X'.Y.Z'+X.Y'.Z'+X.Y.Z 1 YZ X 0 1 1 1 S 00 01 11 10 1 1 1 1 1 Arquitectura de Computadoras ALU1-42

Gate-level Design: Full Adder Alternative formulae using algebraic manipulation: C = X.Y + X.Z + Y.Z = X.Y + (X + Y).Z = X.Y + ((X Y) + X.Y).Z = X.Y + (X Y).Z + X.Y.Z = X.Y + (X Y).Z S = X'.Y'.Z + X'.Y.Z' + X.Y'.Z' + X.Y.Z = X.(Y'.Z + Y.Z') + X.(Y'.Z' + Y.Z) = X'.(Y Z) + X.(Y Z)' = X (Y Z) or (X Y) Z Arquitectura de Computadoras ALU1-43

Gate-level Design: Full Adder Circuit for above formulae: C = X.Y + (X Y).Z S = (X Y) Z X Y (X Y) S (XY) C Z Full Adder made from two Half-Adders (+ OR gate). Arquitectura de Computadoras ALU1-44

Gate-level (SSI) Design: Full Adder Circuit for above formulae: C = X.Y + (X Y).Z S = (X Y) Z Block diagrams. X Y X Y Half Adder Sum Carry (X Y) (X.Y) X Y Sum Half Adder Carry S C Z Full Adder made from two Half-Adders (+ OR gate). Arquitectura de Computadoras ALU1-45

4-bit Parallel Adder Consider a circuit to add two 4-bit numbers together and a carry-in, to produce a 5-bit result: X 4 X 3 X 2 X 1 Y 4 Y 3 Y 2 Y 1 4-bit C 5 Parallel Adder C 1 S 4 S 3 S 2 S 1 Black-box view of 4-bit parallel adder 5-bit result is sufficient because the largest result is: (1111) 2 +(1111) 2 +(1) 2 = (11111) 2 Arquitectura de Computadoras ALU1-46

4-bit Parallel Adder Truth table for 9 inputs very big, i.e. 2 9 =512 entries: X 4 X 3 X 2 X 1 Y 4 Y 3 Y 2 Y 1 C 1 C 5 S 4 S 3 S 2 S 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 1............... 0 1 0 1 1 1 0 1 1 1 0 0 1 1............... 1 1 1 1 1 1 1 1 1 1 1 1 1 1 Simplification very complicated. Arquitectura de Computadoras ALU1-47

4-bit Parallel Adder Alternative design possible. Addition formulae for each pair of bits (with carry in), C i+1 S i = X i + Y i + C i has the same function as a full adder. C i+1 = X i.y i + (X i Y i ).C i S i = X i Y i C i Arquitectura de Computadoras ALU1-48

4-bit Parallel Adder Cascading 4 full adders via their carries, we get: Y 4 X 4 Y 3 X 3 Y 2 X 2 Y 1 X 1 C 4 C 3 C 2 C 5 FA FA FA FA C 1 Input S 4 S 3 S 2 S 1 Output Arquitectura de Computadoras ALU1-49

Parallel Adders Note that carry propagated by cascading the carry from one full adder to the next. Called Parallel Adder because inputs are presented simultaneously (in parallel). Also, called Ripple-Carry Adder. Arquitectura de Computadoras ALU1-50

16-bit Parallel Adder Larger parallel adders can be built from smaller ones. Example: a 16-bit parallel adder can be constructed from four 4-bit parallel adders: X 16..X 13 Y 16..Y 13 X 12..X 9 Y 12..Y 9 X 8..X 5 Y 8..Y 5 X 4..X 1 Y 4..Y 1 4 4 4 4 4 4 4 4 C 17 4-bit // adder C 13 4-bit // adder C 9 4-bit // adder C 5 4-bit // adder C 1 4 4 4 4 S 16..S 13 S 12..S 9 S 8..S 5 S 4..S 1 A 16-bit parallel adder Arquitectura de Computadoras ALU1-51

But What about Performance? Critical Path of n-bit Rippled-carry adder is n*cp CarryIn0 A0 B0 A1 B1 A2 B2 A3 B3 1-bit ALU CarryIn1 CarryOut0 1-bit ALU CarryIn2 CarryOut1 1-bit ALU CarryIn3 CarryOut2 1-bit ALU CarryOut3 Result0 Result1 Result2 Result3 Design Trick: Throw hardware at it Arquitectura de Computadoras ALU1-52

Calculation of Circuit Delays In general, given a logic gate with delay, t. t 1 t 2 t n : : Logic Gate max (t 1, t 2,..., t n ) + t If inputs are stable at times t 1,t 2,..,t n, respectively; then the earliest time in which the output will be stable is: max(t 1, t 2,.., t n ) + t To calculate the delays of all outputs of a combinational circuit, repeat above rule for all gates. Arquitectura de Computadoras ALU1-53

Calculation of Circuit Delays As a simple example, consider the full adder circuit where all inputs are available at time 0. (Assume each gate has delay t.) X Y 0 0 max(0,0)+t = t max(t,0)+t = 2t S t 2t max(t,2t)+t = 3t C Z 0 where outputs S and C, experience delays of 2t and 3t, respectively. Arquitectura de Computadoras ALU1-54

Calculation of Circuit Delays More complex example: 4-bits parallel adder. Y 4 X 4 Y 3 X 3 Y 2 X 2 Y 1 X 1 C 4 C 3 C 2 0 0 0 0 0 0 0 0 C 5 FA FA FA FA 0 C 1 S 4 S 3 S 2 S 1 Arquitectura de Computadoras ALU1-55

Calculation of Circuit Delays Analyse the delay for the repeated block: X i Y i C i 0 0 mt Full Adder S i C i+1 where X i, Y i are stable at 0t, while C i is assumed to be stable at mt. Performing the delay calculation gives: X i Y i 0 0 max(0,0)+t = t max(t,mt)+t S i t max(t,mt)+t max(t,mt)+2t C i mt C i+1 Arquitectura de Computadoras ALU1-56

Calculation of Circuit Delays Calculating: When i=1, m=0: S 1 = 2t and C 2 = 3t. When i=2, m=3: S 2 = 4t and C 3 = 5t. When i=3, m=5: S 3 = 6t and C 4 = 7t. When i=4, m=7: S 4 = 8t and C 5 = 9t. In general, an n-bit ripple-carry parallel adder will experience: S n = ((n-1)*2+2)t C n+1 = ((n-1)*2+3)t as their delay times. Propagation delay of ripple-carry parallel adders is proportional to the number of bits it handles. Maximum Delay: ((n-1)*2+3)t Arquitectura de Computadoras ALU1-57

Faster Circuits Three ways of improving the speed of these circuits: (i) Use better technology (e.g. ECL faster than TTL gates), BUT (a) faster technology is more expensive, needs more power, lower-level of integrations. (b) physical limits (e.g. speed of light, size of atom). (ii) Use gate-level designs to two-level circuits! (use sumof-products/product-of-sums) BUT (a) complicated designs for large circuits. (b) product/sum terms need MANY inputs! (iii) Use clever look-ahead techniques BUT there are additional costs (hopefully reasonable). Arquitectura de Computadoras ALU1-58

Look-Ahead Carry Adder Consider the full adder: X i Y i C i P i G i S i C i+1 where intermediate signals are labelled as P i, G i, and defined as: P i = X i Y i G i = X i.y i The outputs, C i+1,s i, in terms of P i,g i,c i, are: S i = P i C i (1) C i+1 = G i + P i.c i (2) If you look at equation (2), G i = X i.y i is a carry generate signal P i = X i Y i is a carry propagate signal Arquitectura de Computadoras ALU1-59

Look-Ahead Carry Adder For 4-bit ripple-carry adder, the equations to obtain four carry signals are: C i+1 = G i + P i.c i C i+2 = G i+1 + P i+1.c i+1 C i+3 = G i+2 + P i+2.c i+2 C i+4 = G i+3 + P i+3.c i+3 These formula are deeply nested, as shown here for C i+2 : C i P i C i+1 G i P i+1 G i+1 C i+2 4-level circuit for C i+2 = G i+1 + P i+1.c i+1 Arquitectura de Computadoras ALU1-60

Look-Ahead Carry Adder Nested formula/gates cause ripple-carry propagation delay. Can reduce delay by expanding and flattening the formula for carries. For example, C i+2 C i+2 = G i+1 + P i+1.c i+1 = G i+1 + P i+1.(g i + P i.c i ) = G i+1 + P i+1.g i + P i+1.p i.c i New faster circuit for C i+2 C i P i P i+1 G i P i+1 C i+2 G i+1 Arquitectura de Computadoras ALU1-61

Look-Ahead Carry Adder Other carry signals can also be similarly flattened. C i+3 = G i+2 + P i+2 C i+2 = G i+2 + P i+2 (G i+1 + P i+1 G i + P i+1 P i C i ) = G i+2 + P i+2 G i+1 + P i+2 P i+1 G i + P i+2 P i+1 P i C i C i+4 = G i+3 + P i+3 C i+3 = G i+3 + P i+3 (G i+2 + P i+2 G i+1 + P i+2 P i+1 G i + P i+2 P i+1 P i C i ) = G i+3 + P i+3 G i+2 + P i+3 P i+2 G i+1 + P i+3 P i+2 P i+1 G i + P i+3 P i+2 P i+1 P i C i Notice that formulae gets longer with higher carries. Also, all carries are two-level sum-of-products expressions, in terms of the generate signals, Gs, the propagate signals, Ps, and the first carry-in, C i. Arquitectura de Computadoras ALU1-62

Look-Ahead Carry Adder We employ the lookahead formula in this lookahead-carry adder circuit: Arquitectura de Computadoras ALU1-63

Look-Ahead Carry Adder The 74182 IC chip allows faster lookahead adder to be built. Maximum propagation delay is 4t (t to get generate & propagate signals, 2t to get the carries and t for the sum signals) where t is the average gate delay. Arquitectura de Computadoras ALU1-64

Making a subtraction circuit We could build a subtraction circuit directly, similar to the way we made unsigned adders However, by using two s complement we can convert any subtraction problem into an addition problem. Algebraically, A - B = A + (-B) So to subtract B from A, we can instead add the negation of B to A This way we can re-use the unsigned adder hardware Arquitectura de Computadoras ALU1-65

A two s s complement subtraction circuit To find A - B with an adder, we ll need to: Complement each bit of B. Set the adder s carry in to 1. The net result is A + B + 1, where B + 1 is the two s complement negation of B. A3, B3 and S3 here are actually sign bits. Arquitectura de Computadoras ALU1-67

Small differences The only differences between the adder and subtractor circuits are: The subtractor has to negate B3 B2 B1 B0. The subtractor sets the initial carry in to 1, instead of 0. It s not too hard to make one circuit that does both addition and subtraction Arquitectura de Computadoras ALU1-68

An adder-subtractor circuit XOR gates let us selectively complement the B input. X 0 = X X 1 = X When Sub = 0, the XOR gates output B3 B2 B1 B0 and the carry in is 0. The adder output will be A + B + 0, or just A + B. When Sub = 1, the XOR gates output B3 B2 B1 B0 and the carry in is 1. Thus, the adder output will be a two s complement subtraction, A - B. Arquitectura de Computadoras ALU1-69

Subtraction summary A good representation for negative numbers makes subtraction hardware much easier to design. Two s complement is used most often (although signed magnitude shows up sometimes, such as in floating-point systems) Using two s complement, we can build a subtractor with minor changes to the adder from last week. We can also make a single circuit which can both add and subtract. Overflow is still a problem, but signed overflow is very different from the unsigned overflow Sign extension is needed to properly lengthen negative numbers. We will use most of the ideas we ve seen so far to build an ALU an important part of a processor. Arquitectura de Computadoras ALU1-70

Homework 4 Computer Organization and Design: The Hardware and Software Interface. Third Edition. David A. Patterson and John L. Hennesy. Morgan and Kauffmann Publishers. USA. 2005. Solve the following exercises: Chapter 1. Exercises: 1.47, 1.48, 1.50, 1.51, 1.52, 1.53, 1.54 Chapter 2. Exercises: 2.6, 2.29, 2.30, 2.31, 2.32, 2.33, 2.37, 2.49, 2.51 Send a pdf file Due date: October 6th, 2008. Arquitectura de Computadoras ALU1-71