CS470: Computer Architecture Yashwant K. Malaiya, Professor malaiya@cs.colostate.edu AMD Quad Core 1
Architecture Layers Building blocks Gates, flip-flops Functional bocks: Combinational, Sequential Instruction set architecture Assembly/machine level Implementation using blocks Systems Processor + memory hierarchy 2
Optimization Objectives Reduce cost, complexity and power Combinational & sequential minimization Enhance performance Faster technology More Parallelism (of different types) Higher performance with lower cost Memory hierarchy Reliability Testing and verification Redundancy What else? 3
Combinational Circuits Gates & Boolean Algebra Functional blocks: Decoder, MUX, Adder Minimization Programmable logic Interlocking building blocks Propagation delays 4
Logic Design: Outline Gates, boolean algebra and truth tables Combinational logic and functional blocks (MUX, decoders, Adders, PLAs) Flip-flops, registers and memories Timing analysis Finite state machines Use of software packages for simulation 5
Basic hardware building blocks 6
OR and NOR A B OR 0 0 0 0 1 1 1 0 1 1 1 1 Inputs: 2 or more A B NOR 0 0 1 0 1 0 1 0 0 1 1 0 Output=A+B Output=A+B 7
AND and NAND A B AND 0 0 0 0 1 0 1 0 0 1 1 1 Inputs: 2 or more A B NAND 0 0 1 0 1 1 1 0 1 1 1 0 Output = A.B Output = A.B 8
Boolean Algebra x 0 0 x 1 x x x 0 x.0 = 0 x.1 = x x.x = 0 x 0 x x 1 1 x x 1 X+0 = x x+1 = 1 x+x = 1 9
Boolean Algebra (2) Commutative A+B = B+A Associative A+(B+C)=(A+B)+C A.(B.C)=(A.B).C Distributive A.(B+C)=A.B+A.C A+(B.C)=(A+B).(A+C) A.B = B.A 10
Boolean Algebra (3) DeMorgan s Law A.B = A+B A+B = A.B A B C = A B C A B A B AN D C 0 0 1 1 1 0 0 1 1 0 0 1 1 0 0 1 0 1 1 1 0 0 0 1 11
Some Useful Identities AB+AB = A AB+AB =A(B+B) =A A+AB = A A+AB =A(1+B) =A 12
Decoder A=0 B=1 0 1 0 0 Input Outputs A,B 00 1 0 0 0 01 0 1 0 0 10 0 0 1 0 11 0 0 0 1 13
MUX Multiplexer: selects one of the inputs and connects it to the output. A B A B A B C D S=1 S 1 1 1 1 2 S 1 0 B C 4 input MUX B 2-to-1 MUX 14
Tri-State Lines Tri-state: 0, 1 and high z (disconnected) Used for implementing buses Input A Control C Tri-state buffer Output Y = A if C = 1 High Z if C = 0 15
Boolean Functions A B C S 0 0 0 0 0 0 1 1 0 1 0 1 0 1 1 0 1 0 0 1 1 0 1 0 1 1 0 0 1 1 1 1 S=A.B.C + A.B.C + A.B.C + A.B.C A B C S 16
Simplification Some rules for simplification: A + A = A A A = A [Prove them] AB +AB = A [Use for joining or breaking] Proof: AB +AB = A(B +B) = A A+AB = A [Use for absorption] Proof: A+AB = A(1+B) = A 17
Karnaugh maps Objective: minimize literals. Based on set-theory Visual representation of algebraic functions Allow algorithmic minimization of boolean functions in sum-ofproducts form Note: ABC+ABC = AB(C+C )=AB Thus ABC and ABC are two pieces of AB. Minterms For n-variables, there are 2 n minterms, corresponding to each row of truth table. 18
Minterms A B C in C out 0 0 0 0 0 0 1 0 0 1 0 0 0 1 1 1 1 0 0 0 1 0 1 1 1 1 0 1 1 1 1 1 C out = A BC in + AB C in + ABC in + ABC in Involves four minterms 19
Combining Minterms Combining minterms F(a,b,c) = a b c +a b c+a bc +a bc minterms a b c +a b c combine to give a b minterms a bc +a bc combine to give a b Terms a b and a b combine to give a Two Adjacent terms: differ in only one variable, complemented in one, uncomplemented on the other. They combine to drop that variable. 20
Visualization of Boolean Functions Each box is a minterm. Adjacent minterms can be combined 2-variable maps X: lower half Y: right half 0 x y 0 1 x y 1 xy xy X 0 1 0 1 1 1 1 X Y = x+y Y 21
3-variable Kmaps B 00 01 11 10 0 1 1 1 1 1 1 A C F(A,B,C) = C +A B B 00 01 11 10 0 1 x 1 1 x 1 A C F(A,B,C) = AC +A B All 1 s must be covered. Don t cares (x) can be taken as either 0 or 1 Columns arranged so that adjacent terms are visually adjacent. Sometimes the solution is not unique 22
3-variable Kmaps B 00 01 11 10 0 1 1 1 1 1 1 1 A C B 00 01 11 10 0 1 1 1 1 1 1 1 A C F(A,B,C) = B C +A C+AB F(A,B,C) = A B +BC+AC 23
4-variable Kmaps / Design C 00 01 11 10 00 1 1 01 1 A 11 10 1 1 D F(A,B,C,D)=ABC +A C D+ A BC+ACD+? B F(A,B,C,D)=B D + C 00 01 11 10 00 1 01 1 1 1 A 11 1 1 1 10 1 B D 24
Combinational Logic Optimization Design steps: Get truth table Do minimization (K-map, Quine-McCluskey etc) as applicable Get Boolean expression Get logic diagram. Automated methods: computer based implementation. Example: http://electronics-course.com/karnaugh-map Multi-output circuits: Many functions have multiple outputs Often implemented as PLAs Objective: minimize product terms Adjacent product terms: with same output combinations 25
Full Adder A i B i C i C i+ S 1 i 0 0 0 0 0 0 0 1 0 1 0 1 0 0 1 0 1 1 1 0 1 0 0 0 1 1 0 1 1 0 1 1 0 1 0 1 1 1 1 1 A i B i C i C i+1 S 26
4-bit Adder A 3 B 3 A 2 B 2 A 1 B 1 A 0 B 0 Full Full Full Full 0 adder adder adder adder C 4 C C 2 3 C 1 S 3 S 2 S 1 S 0 Note that propagation delay add for each stage. 27
Two-level logic: SOP form A combinational functional can be implemented using a two-level implementation Sum-of-products (SOP)form F(A,B,C) = B C +A B AND-OR A B C F 0 0 0 1 0 0 1 0 0 1 0 1 0 1 1 1 1 0 0 1 1 0 1 0 1 1 0 0 1 1 1 0 B 00 01 11 10 0 1 1 1 1 1 A C 28
Two-level logic: POS form Product of Sums (POS) form: Step: 1. Minimize F (i.e. 0s) is SOP form. 2. Complement both sides 3. Use DeMorgan s for RHS F (A,B,C) = B C+AB F(A,B,C) = (B+C )(A +B ) OR-AND A B C F 0 0 0 1 0 0 1 0 0 1 0 1 0 1 1 1 1 0 0 1 1 0 1 0 1 1 0 0 1 1 1 0 B 00 01 11 10 0 1 0 1 1 1 1 0 0 0 A C 29
Programmable Logic Arrays (PLAs) A B C D E F 0 0 0 0 0 0 0 0 1 1 0 0 0 1 0 1 0 0 0 1 1 1 1 0 1 0 0 1 0 0 1 0 1 1 1 0 1 1 0 1 1 0 1 1 1 1 0 1 30
PLAs A B C D E F 0 0 0 0 0 0 0 0 1 1 0 0 0 1 0 1 0 0 0 1 1 1 1 0 1 0 0 1 0 0 1 0 1 1 1 0 1 1 0 1 1 0 1 1 1 1 0 1 Logical equivalent This is closer to physical layout 31
PLA Minimization Main objective: minimize product terms 1. Some product terms do not need to be implemented 2. Two product terms can be combined if They have exactly same values for all outputs They differ in only one variable, complmented in one, uncomplemented in the other Best to start with product terms in minterm form 32
PLA Minimization Ex1 Inputs Outputs A B C D E F 0 0 0 0 0 0 0 0 1 1 0 1 0 1 0 0 1 1 0 1 1 1 1 0 1 0 0 1 1 1 1 0 1 1 1 0 1 1 0 1 1 1 1 1 1 1 1 1 Inputs Outputs A B C D E F 0 0 0 0 0 0 0 0 1 1 0 1 0 1 0 0 1 1 0 1 1 1 1 0 1 0 0 1 1 1 1 0 1 1 1 0 1 1 x 1 1 1 33
PLA Minimization Ex2 Inputs Outputs A B C D E F 0 0 0 0 0 0 0 0 1 1 0 1 0 1 0 0 1 1 0 1 1 1 1 x 1 0 0 1 1 x 1 0 1 1 1 x 1 1 0 1 1 x 1 1 1 1 1 x Inputs Outputs A B C D E F 0 0 0 0 0 0 0 0 1 1 0 1 0 1 0 0 1 1 x 1 1 1 1 x 1 x x 1 1 x 3,7 (?) 4,5,6,7 34
Field Programmable PLA Can be programmed in the field by specifying the AND and the OR array connections. FPLA concept developed into FPGAs: can contain logic blocks of various functionality. 35
ROMS: Fully decoded with 2 n addresses Address Contents A2 A1 A0 D2 D1 D0 0 0 0 0 0 0 0 0 1 1 0 0 0 1 0 1 0 0 0 1 1 1 1 0 1 0 0 1 0 0 1 0 1 1 1 0 1 1 0 1 1 0 1 1 1 1 0 1 36
Programmable ROM Programmed: Last manufacturing step Programmable ROM: PROM with fuses Erasable UV light erasable PROM Electrically erasable PROM EEPROM Flash memories: based on EEPROM! 37
Address vs Data Some lines/registers contain data Some lines/registers contain addresses used for selection 38
Full Adder A i B i C i C i+ S 1 i 0 0 0 0 0 0 0 1 0 1 0 1 0 0 1 0 1 1 1 0 1 0 0 0 1 1 0 1 1 0 1 1 0 1 0 1 1 1 1 1 A i B i C i C i+1 Could have minimized it. S 39
Arrays of Logic Elements 40
Cascadable Logic blocks A 3 B 3 A 2 B 2 A 1 B 1 A 0 B 0 Full Full Full Full 0 adder adder adder adder C C 2 3 C 1 S 3 S 2 S 1 S 0 41
Selecting Data Using MUX to bus Multiple sources, one destination MUX with appropriate select lines 2 PC PCMUX +1 2 to bus MARMUX + 2/1/2016 Comp Org YKM 42
Selecting Data Using MUX 2 Assume: MUX: 0:C, 1:B ALU: 00 ADD 01 AND 10 NOTA 11 PassA C B MUX 2 ALU A 2/1/2016 Comp Org YKM 43
Delays in Combinational Circuits Each gate responds with some delay termed propagation delay pd. pd depends on Diagram on the board Technology (transistor size etc) Load on a gate Fan-out Interconnection length Can not be exactly controlled Need to wait until all transitions have settled down. Longest combinational path 44
Delays: Example Delays can depend on levels of logic Other factors like load Nanosec (10-9 sec) and picosec (10-12 sec) 45
Synchronized updating: clock Combinatorial elements hold no state ALU, caches, multiplier, multiplexers, etc. State elements are clocked devices Flip flops, etc In edge triggered clocking, state elements are only updated on the (rising) edge of the clock pulse. 46
Clock frequency Unit: GHz Clock frequency (in cycles/sec or Hz) is inverse of propagation delays (in sec). 2 GHz implies 0.5 nanosec or 500 picosec. 47
Power Consumption Dynamic component: depends on frequency, Number of nodes that switch, voltage Quiescent component: small steady current More later 48