CMP 334: Seventh Class Performance HW 5 solution Averages and weighted averages (review) Amdahl's law Ripple-carry adder circuits Binary addition Half-adder circuits Full-adder circuits Subtraction, negative numbers, signed arithmetic A B = A + B For next class: HW 6; read A.3-4, 2.1-5, 3.1-2
HW 5: Performance Problems 1) Computer A has a 5 GHz clock and executes program P in 3 seconds with an average CPI (cycles per instruction) of 3.. How many instructions does it execute for program P? 2) Computer B has a 2 GHz clock. It executes P with the same number of instructions as computer A with an average CPI 1.. How long does it take to execute P? 3) Compare the performance of computer B and computer A on program P. Which is faster? by how much? 4) A new compiler for computer A compiles program P so that it executes only half as many instructions. Unfortunately, the CPI for computer A on these instructions is 4.. How long does it take to execute the newly compiled program? 5) Compare the performance of computer B (with the old compiler) to computer A (with the new compiler) on program P. Which is faster? by how much?
Performance Equations Performance inverse of execution time performance: P x 1 T x relative performance: P x P y = T y T x CPU time equation T CPU (execution ) = # instructions execution Amdahl's law # cycles instruction # seconds cycle T new = fraction affected T old improvement + fraction not affected T old
Processor Performance Equations T X = # instructions X CPI X cycletime X T X = # instructions X CPI X clockrate X P X P Y = T Y T X = # instructions Y CPI Y cycletime Y # instructions X CPI X cycletime X P X P Y = T Y T X = # instructions Y CPI Y clockrate X # instructions X CPI X clockrate Y
HW 5.1 Instruction Count Computer A has a 5 GHz clock and executes program P in 3 seconds with an average CPI (cycles per instruction) of 3.. How many instructions does it execute for program P? T A = # instructions A CPI A clockrate A 3 seconds = # instructions A 3. cycles / instruction 5 GHz # instructions A = 3 seconds 5 19 cycles / second 3. cycles / instruction # instructions A = 5 1 9 instructions
HW 5.2 Execution Time Computer B has a 2 GHz clock. It executes P with the same number of instructions as computer A (5 1 9 instructions) with an average CPI 1.. How long does it take to execute P? T B T B T B T B = # instructions B CPI B clockrate B = 5 19 instructions 1. cycles / instruction 2 GHz = 5 19 instructions 1 cycles / instruction 2 1 9 cycles / second = 25 seconds
HW 5.3 Relative Performance Compare the performance of computer B and computer A on program P. Which is faster? by how much? P B P A = T A T B = 3 seconds 25 seconds = 1.2 B is 1.2 times faster than A.
HW 5.4 Execution Time A new compiler for computer A compiles program P so that it executes only half as many instructions. Unfortunately, the CPI for computer A on these instructions is 4.. How long does it take to execute the newly compiled program? T A ' = # instructions A' CPI A ' clockrate A' 1 2 # instructions A 4. cycles / instruction T A ' = clockrate A T A ' =.5 5 19 instructions 4. cycles / instruction 5 1 9 cycles / second T A ' = 1 seconds = 2 seconds 5
HW 5.5 Relative Performance Compare the performance of computer B (with the old compiler) to computer A' (A with the new compiler) on program P. Which is faster? by how much? P A' P B = T B T A' = 25 seconds 2 seconds = 1.25 A' is 1.25 times faster than B. P A' P A = T A T A' = 3 seconds 2 seconds = 1.5 A' is 1.5 times faster than A.
Averages and Weighted Averages Given values: {v 1, v 2, v N } & weights: {w 1, w 2, w N } average: v N i=1 N v i total weight: N W i = 1 w i normalized weight: q i w i W N ( i = q i = 1 ) weighted average: i=1 N N i=1 w i v i w i = N i=1 w i v i W N = i=1 w i W v i N = i=1 q i v i
Typical Instruction Statistics Instruction types, frequencies, and execution times 5% ALU instructions 5 CPI 3% Memory instructions 2% Load 8 CPI 1% Store 6 CPI 2% Branch instructions 1 CPI.5% Special instructions
Average Cycles Per Instruction (Weighted) average CPI = q ALU T ALU + q Load T Load + q Store T Store + q Branch T Branch =.5 5 +.2 8 +.1 6 +.2 1 = 2.5 + 1.6 +.6 + 2. = 6.7 cycles approximation: 2 / 6.7 3 Execution time fraction by instruction type ALU 2.5 / 6.7 ~ 37.5% Load 1.6 / 6.7 ~ 24.% Store.6 / 6.7 ~ 9.% Branch 2. / 6.7 ~ 3.%
CPU Time Equation T CPU (execution ) = # instructions execution # cycles instruction # seconds cycle If T CPU (execution ) 2 seconds, cycle time = 1 9 seconds 2 seconds # instructions 6.7 1 9 seconds 2 # instructions 3 19 9 6.7 1 instruction time = # seconds instruction = # cycles instruction # seconds cycle
Performance Equations Performance inverse of execution time performance: P x 1 T x relative performance: P x P y = T y T x CPU time equation T CPU (execution ) = # instructions execution Amdahl's law # cycles instruction # seconds cycle T new = fraction affected T old improvement + fraction not affected T old
Amdahl's Law old time (12) affected (8) unaffected (4) SpeedUp (1.6) improved (5) unaffected (4) new time (9) T old = affected + unaffected T new = improved + unaffected SpeedUp = affected / improved Overall SpeedUp = P new /P old = T old /T new (fraction affected) F a = affected / T old (fraction unaffected) F a = unaffected / T old
Improving Race Car Performance miles miles/hours cruising 9 9 other 1 5 Race time = 9/9 + 1/5 = 12 hours Change #1: 1.11 x improvement in cruising speed Change #2: 2. x improvement in other speed
Change # 1 T old = 12 hours fa =.9 (fraction affected = affected/total = fa =.1 (fraction unaffected = unaffected/total = su = 1.11 (speedup for affected) 9 miles ) 1 miles 1 miles 1 miles ) T new = fa T old su + fa T old T new =.9 12 1.11 +.1 12 9.73 + 1.2 = 1.93 hours
Change # 2 T old = 12 hours fa =.1 (fraction affected = affected/total = fa =.9 (fraction unaffected = unaffected/total = su = 2. (speedup for affected) 1 miles ) 1 miles 9 miles 1 miles ) T new = fa T old su + fa T old T new =.1 12 2 +.9 12.6 + 1.8 = 11.4 hours
Change # 1 wrong! T old = 12 hours fa =.9 (fraction affected = affected/total = fa =.1 (fraction unaffected = unaffected/total = su = 1.11 (speedup for affected) 9 miles ) 1 miles 1 miles 1 miles ) T new = fa T old su + fa T old T new =.9 12 1.11 +.1 12 9.73 + 1.2 = 1.93 hours
Change # 2 wrong! T old = 12 hours fa =.1 (fraction affected = affected/total = fa =.9 (fraction unaffected = unaffected/total = su = 2. (speedup for affected) 1 miles ) 1 miles 9 miles 1 miles ) T new = fa T old su + fa T old T new =.1 12 2 +.9 12.6 + 1.8 = 11.4 hours
Change # 1 correct T old = 12 hours fa =.833 (fraction 1 hours affected = affected/total = 12 hours = 5 ) 6 fa =.167 (fraction unaffected = unaffected/total = 2 hours 12 hours = 1 ) 6 su = 1.11 (speedup for affected) T new = fa T old su T new.833 12 1.11 + fa T old +.167 12 9 + 2 = 11 hours
Change # 2 correct T old = 12 hours fa =.167 (fraction affected = affected/total = 2 hours 12 hours = 1 ) 6 fa =.833 (fraction unaffected = unaffected/total = su = 2. T new (speedup for affected) = fa T old su T new.167 12 2 + fa T old 1 hours 12 hours = 5 6 ) +.833 12 1. + 1 = 11 hours
Average Cycles Per Instruction (Weighted) average CPI = q ALU T ALU + q Load T Load + q Store T Store + q Branch T Branch =.5 5 +.2 8 +.1 6 +.2 1 = 2.5 + 1.6 +.6 + 2. = 6.7 cycles approximation: 2 / 6.7 3 Execution time fraction by instruction type ALU 2.5 / 6.7 ~ 37.5% Load 1.6 / 6.7 ~ 24.% Store.6 / 6.7 ~ 9.% Branch 2. / 6.7 ~ 3.%
CPU Time Equation T CPU (execution ) = # instructions execution # cycles instruction # seconds cycle If T CPU (execution ) 2 seconds, cycle time = 1 9 seconds 2 seconds # instructions 6.7 1 9 seconds 2 # instructions 3 19 9 6.7 1 instruction time = # seconds instruction = # cycles instruction # seconds cycle
Amdahl's Law 1 T new = fraction affected T old improvement + fraction not affected T old Improvement X reduces ALU instruction CPI from 5 to 4 T X = fraction affected 2 sec improvement + fraction not affected 2 sec
Amdahl's Law 1 (wrong!) T new = fraction affected T old improvement + fraction not affected T old Improvement X reduces ALU instruction CPI from 5 to 4 T X = fraction affected 2 sec improvement + fraction not affected 2 sec T X =.5 2 5 4 +.5 2 sec = 8 +1 sec = 18 sec
( Amdahl's Law 1 T new = fraction affected T old improvement + fraction not affected T old Improvement X reduces ALU instruction CPI from 5 to 4 T X = fraction affected 2 sec improvement + fraction not affected 2 sec T X = 2.5 6.7 2 5 4 + 4.2 6.7 2) sec ( 7.5 1.25 + 12.6 ) sec = 18.6 sec
( Amdahl's Law 2 T new = fraction affected T old improvement + fraction not affected T old Improvement Y reduces Load instruction CPI from 8 to 4 T Y = fraction affected 2 sec improvement + fraction not affected 2 sec T Y = 1.6 6.7 2 8 4 + 5.1 6.7 2)sec ( 4.8 2 + 15.3 ) sec = 17.7 sec
( Amdahl's Law 3 T new = fraction affected T old improvement + fraction not affected T old Improvement Z reduces Store instruction CPI from 6 to 2 T Z = fraction affected 2 sec improvement + fraction not affected 2 sec T Z =.6 6.7 2 6 2 + 6.1 6.7 2)sec ( 1.8 3 + 18.3 ) sec = 18.9 sec
( Amdahl's Law 4 T new = fraction affected T old improvement + fraction not affected T old Improvement W reduces Branch instruction CPI from 1 to 5 T W = fraction affected 2 sec improvement + fraction not affected 2 sec T W = 2. 6.7 2 1 5 + 4.7 6.7 2)sec ( 6 2 + 14.1 ) sec = 17.1 sec
Amdahl's Law old time (12) affected (8) unaffected (4) SpeedUp (1.6) improved (5) unaffected (4) new time (9) T old = affected + unaffected T new = improved + unaffected SpeedUp = affected / improved Overall SpeedUp = P new /P old = T old /T new (fraction affected) F a = affected / T old (fraction unaffected) F a = unaffected / T old
Amdahl's Law Overall SpeedUp performance: P x 1 T x relative performance: P x P y = T y T x P X P old = T old T X = 2 18.6 1.75 P Y P old = T old T Y = 2 17.7 1.13 P Z P old = T old T Z = 2 18.9 1.58 P W P old = T old T W = 2 17.1 1.17
Amdahl's Law Overall SpeedUp P new P old = T old T new P new P old = fa T old su T old + fa T old P new P old = 1 fa su + fa
Human Addition (Binary) + 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
Binary Addition + 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
Binary Addition + 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
Binary Addition + 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
Binary Addition + 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
Binary Addition + 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
Binary Addition + 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
Binary Addition + 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
Binary Addition + 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
64-Bit Computer Addition a s b.... 64-bit adder.. a 63 b 63 s 63 C
4-Bit Computer Addition a b 1-bit adder s a 1 b 1 a 2 b 2 1-bit adder 4-bit adder half 1-bit adder s 1 s 2 a 3 b 3 1-bit adder s 3 c
4-Bit Ripple Carry Adder a b 1-bit adder s a 1 b 1 a 2 b 2 1-bit adder 4-bit adder half 1-bit adder s 1 s 2 a 3 b 3 1-bit adder s 3 c
4-Bit Ripple Carry Adder a b 1-bit adder s a 1 b 1 1-bit adder s 1 a 2 b 2 half 1-bit adder s 2 a 3 b 3 1-bit adder s 3 c
4-Bit Ripple Carry Adder a b s a 1 b 1 full adder s 1 a 2 b 2 full adder s 2 a 3 b 3 full adder s 3 c
1-Bit Computer Addition (take 1) a s b c
Half Adder Truth Table a b c s c = ab s = ab + ab 1 1 1 1 1 1 1
Half Adder Circuit a s b c
1-Bit Computer Addition (take 2) c a b full adder s i c'
Full Adder # a b c c' s 1 1 1 2 1 1 3 1 1 1 4 1 1 5 1 1 1 6 1 1 1 7 1 1 1 1 1 s' = abc + abc + abc + s' = abc c' = abc + abc + abc + c' = abc + abc + abc c' = abc + abc + abc + c' = abc + abc + abc c' = bc + ac + ab
Full Adder # a b c c' s 1 1 1 2 1 1 3 1 1 1 4 1 1 5 1 1 1 6 1 1 1 7 1 1 1 1 1 s' = abc + abc + abc + s' = abc c' = abc + abc + abc + c' = abc + abc + abc c' = abc + abc + abc + c' = abc + abc + abc c' = bc + ac + ab
Full Adder # a b c c' s 1 1 1 2 1 1 3 1 1 1 4 1 1 5 1 1 1 6 1 1 1 7 1 1 1 1 1 s' = abc + abc + abc + s' = abc c' = abc + abc + abc + c' = abc + abc + abc c' = abc + abc + abc + c' = abc + abc + abc c' = bc + ac + ab
Full Adder # a b c c' s 1 1 1 2 1 1 3 1 1 1 4 1 1 5 1 1 1 6 1 1 1 7 1 1 1 1 1 s' = abc + abc + abc + s' = abc c' = abc + abc + abc + c' = abc + abc + abc c' = abc + abc + abc + c' = abc + abc + abc c' = bc + ac + ab
s' = abc + abc + abc + abc c' = ab + ac + bc c a b full adder r c'
s' = abc + abc + abc + abc c' = ab + ac + bc c a b r c' full adder
Full Adder?? c a b s full adder c'
Full Adder?? c a s b x z y full adder c'
Full Adder Implementation?? # a b c x y z c' s c 1 1 1 2 1 1 1 3 1 1 1 1 1 4 1 1 1 5 1 1 1 1 1 6 1 1 1 1 7 1 1 1 1 1 1 a b full adder x y u v c s 1 1 1 1 1 1 1 z c' c = uv s = uv + uv s
Full Adder Implementation! # a b c x y z c' s c 1 1 1 2 1 1 1 3 1 1 1 1 1 4 1 1 1 5 1 1 1 1 1 6 1 1 1 1 7 1 1 1 1 1 1 a b full adder x y u v c s 1 1 1 1 1 1 1 z c' c = uv s = uv + uv s
A Full Adder c a b s i Full Adder c'
A Full Adder c a b s i Full Adder c'
Initial adder half or full? a b half? s a 1 b 1 full adder s 1 a 2 b 2 full adder s 2 a 3 b 3 full adder s 3 c
Full Initial adder? Con Pro Superfluous wires and gates General simplicity Avoid special cases where ever practical Simplifies addition of big integers Big Integer: N = x k J k +... + x 2 J 2 + x 1 J 1 + x J Where J 2 32 Like base 1 but with sixteen billion billion fingers Simplifies subtraction
4-Bit Ripple Carry Adder c -1 a b half full adder s a 1 b 1 half full adder adder s 1 a 2 b 2 half full adder s 2 a 3 b 3 full adder s 3 c 3
4-Bit Ripple Carry Adder c -1 a b half full adder 4-bit adder s a 1 b 1 half full adder adder s 1 a 2 b 2 half full adder s 2 a 3 b 3 full adder s 3 c 3
4-Bit Ripple Carry Adder c -1 a b half full adder s a 1 b 1 a 2 b 2 half full adder adder 4-bit adder half full adder s 1 s 2 a 3 b 3 full adder s 3 c 3
8-Bit Ripple Carry Adder c -1 a b a 1 b 1 a 2 b 2 a 3 b 3 a 4 b 4 a 5 b 5 a 6 b 6 a 7 b 7 half full adder full adder 4-bit adder 8-bit adder half full adder half full c 3 half full adder full adder 4-bit adder half full adder half full s s 1 s 2 s 3 s 4 s 5 s 6 s 7 c 7
64-Bit Ripple Carry Adder c -1 a s b.... 64-bit adder.. a 63 b 63 s 63 c 63
Fixed Width Binary Addition + 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
Fixed Width Binary Addition + 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
Fixed Width Binary Addition + 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
Fixed Width Binary Addition + 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
Fixed Width Binary Addition + 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
Fixed Width Binary Addition + 1 1 1 1 1 1 1 1 1 1 1 Carry out 1 1 1 1 1 1
Fixed Width Binary Addition + 1 1 1 1 1 1 1 1 1 1 1 Carry out Carry in 1 1 1 1 1 1 1
Non-Negative Numbers Subtraction A B A ~ B A B (ordinary arithmetic) A < B 1) B ~ B = 2) (A ~ B) + C = (A + C) ~ B ~B ~ B = 2 n B = B + 1 ~B pseudoinverse of B A ~ B A + B + 1
Fixed Width Binary Addition 1 1 1 1 1 1 1 1 1 1 1 1 1 1 Borrow Carry in 1 1 1 1 1 1 1
Negative Number Representation Alternatives 1. Sign-magnitude 2. Bias How would it help Complicates arithmetic 3. 1's complement Too many zeros 4. 2's complement
Negative Number Representation Unsigned numbers:.. 2 N - 1 Signed number alternatives 1. Sign-magnitude: 2 N-1 1.. 2 N-1 1 How would this help? 2. Bias: bias.. 2 N 1 bias Complicates arithmetic (a-bias + b-bias) = (a + b)-bias)-bias 3. 1's complement: 2 N-1 1.. 2 N-1 1 + and - 4. 2's complement: 2 N-1.. 2 N-1 1
Negative Number Representation # binary sign magnitude bias (8) 1's complement 2's complement + -8 1 1 + 1-7 1 1 2 1 + 2-6 2 2 3 11 + 3-5 3 3 4 1 + 4-4 4 4 5 11 + 5-3 5 5 6 11 + 6-2 6 6 7 111 + 7-1 7 7 8 1 - -7-8 9 11-1 1-6 -7 A 11-2 2-5 -6 B 111-3 3-4 -5 C 11-4 4-3 -4 D 111-5 5-2 -3 E 111-6 6-1 -2 F 1111-7 7 - -1
1's Complement Arithmetic The 9's complement, d, of a decimal digit d is 9 d The 9's complement, X, of a 4-digit X is 9999 X The 1's complement, X, of X is X + 1 X = X + 1 = 9999 X +1 = 1 X convention: X is positive and Y is negative iff X < 5 Y < 1 Y Y and X Y = X + Y unless Overflow sign(x) = sign(y) sign(x + Y) Overflow sign(x) sign(y) = sign(x Y)
1 = 8999 1's Complement Example 1 = 8999 + 1 = 9 3 + 1 = 12 2 = 3 1-2 1 2 = 9799 + 1 = 2 + 1 = 2-3 1 3 = 9699 + 1 = 3 + 1 = 3 1 + 3 = 98 = 2 1 3 overflow 4 + 2 = 6 = 5999 + 1 = 41 + 1-41
2's Complement Arithmetic The 1's complement, b, of a binary bit b is 1 d The 1's complement, X, of a 4-bit X is 1111 X The 2's complement, X, of X is X + 1 X = X + 1 = 1111 X +1 = 1 X convention: X is positive and Y is negative iff X < 2 N-1 Y < 2 N Y Y and X Y = X + Y unless Overflow sign(x) = sign(y) sign(x + Y) Overflow sign(x) sign(y) = sign(x Y)
Unsigned Integers......... 1 = 2 N... 11111111 = 2 N 1......... 1 = 2 N-1... 1111111 = 2 N-1 1......... 11... 1 = 4 = 2 2... 11 = 3 = 2 2 1... 1 = 2 = 2 1... 1 = 1 = 2 = 2 1 1... = = 2 1... 1111111111 = 1 = 2... 111111111 = 2 = 2 1 = 2 1... 111111111 = 3 = 2 1 1... 11111111 = 4 = 2 2... 111111111 = 5 = 2 2 1......... 111 = 2 N-1... 111111111 = 2 N-1 1......... 11 = 2 N... 111111111 = 2 N 1...... N Bit Integers (N = 8) Signed Integers
Negative Number Representation # binary sign magnitude bias (-7) 1's complement 2's complement + -7 1 1 + 1-6 1 1 2 1 + 2-5 2 2 3 11 + 3-4 3 3 4 1 + 4-3 4 4 5 11 + 5-2 5 5 6 11 + 6-1 6 6 7 111 + 7-7 7 8 1-1 -7-8 9 11-1 2-6 -7 A 11-2 3-5 -6 B 111-3 4-4 -5 C 11-4 5-3 -4 D 111-5 6-2 -3 E 111-6 7-1 -2 F 1111-7 8 - -1
Basic Processor Model registers SYSTEM BUS A MUX A bus C MUX B MUX B bus ALU C Bus PC IR
Basic Processor Model registers SYSTEM BUS Combinational Core A MUX A bus C MUX B MUX B bus ALU C Bus PC IR
Building Blocks Arithmetic / Logical Unit AND, OR, and NOT gates Inverters, Decoders, Multiplexers Inputs (operands): A and B buses Output (result): C bus Logical Bitwise: A, A & B, A B, A ^ B, A B, A B,... Arithmetic A + B, A B, A ٠ B, A div B, A mod B Comparison A < B, A = B, A B, etc. and X <, X =, X, etc.
TINY Arithmetic / Logical Unit Building Blocks AND, OR, and NOT gates Inverters, Decoders, Multiplexers Inputs (operands): A and B buses Output (result): C bus Logical Bitwise: A, A & B, A B, A ^ B, A B, A B,... Arithmetic A + B, A B, A ٠ B, A div B, A mod B Comparison A < B, A = B, A B, etc. and X <, X =, X, etc.
Muxes, Buses, and ALU ALU inputs (operands): A and B buses ALU output (result): C bus Logical Bitwise: A, A & B, A B, A ^ B, A B, A B,... Arithmetic A + B, A B, A ٠ B, A div B, A mod B Comparison A < B, A = B, A B, etc. and X <, X =, X, etc. Combinational building blocks AND, OR, and NOT gates Inverters, Decoders, Multiplexers
Inverters, Decoders, Multiplexer Inverter: select data input or its negation 1 data input 1 selector input 1 output Decoder: select unique output to be 1 (true) N selector inputs 2 N outputs Multiplexer: select unique data input to be output 2 N data inputs N selector inputs 1 output
The TINY Computer A bus B bus ALU C Bus B MUX A MUX C MUX SYSTEM BUS MAR x x x x x x x x x x x x x x x xffff MDR Z N C O registers $1 $2 $3 $4 $5 $6 $7 $8 $9 $A $B $C $D $E $F PC IR
The TINY Computer A bus B bus ALU C Bus B MUX A MUX C MUX SYSTEM BUS MAR x x x x x x x x x x x x x x x xffff MDR Z N C O registers $1 $2 $3 $4 $5 $6 $7 $8 $9 $A $B $C $D $E $F PC IR