Fundamentals of Computer Systems Review for the Final Stephen A. Edwards Columbia University Summer 25
The Final 2 hours 8 problems Closed book Simple calculators are OK, but unnecessary One double-sided 8.5 sheet of your own notes Anything discussed in class is fair game Much like homework assignments Problems will range from easy to difficult; do the easy ones first.
Boolean Logic Axioms and Simplification Implicants, Minterms, etc. De Morgan s Theorem Karnaugh Maps Combinational Logic Decoders Multiplexers Timing and Glitches Adders Sequential Logic Bistables; SR Latch; D Latch D Flip-Flops Synchronous Digital Logic Shift Registers Counters Finite State Machines Moore and Mealy Machines The Snail Example The TLC: One-Hot Encoding CMOS Logic Gates The Inverter The CMOS NAND Gate The CMOS NOR Gate A CMOS AND-OR-INVERT Gate General Static CMOS Gates Memories ROMs, EPROMs, and FLASH The SRAM Cell Dynamic RAM Cell PLAs and FPGAs
MIPS Architecture/Assembly programming Computational, Load/Store, & Control-flow Instrs. Instruction Encoding Pseudoinstructions Higher-level constructs; subroutines and recursion MIPS Microarchitecture/Datapaths Single-Cycle The datapath for lw, sw, R-type, and branch The controller: instruction decoding Processor Performance Multi-cycle Constructing the datapath The FSM controller Performance Analysis Pipelined Basic pipelined datapath and control Hazards: forwarding, stalling, and flushing Performance Analysis
The Memory Hierarchy: Caches Memory hierarchy to make it fast & cheap Temporal and Spatial Locality Memory performance; hit rate Direct-mapped caches n-way set associative caches Fully associative caches
The Axioms of (Any) Boolean Algebra A Boolean Algebra consists of A set of values A An and operator An or operator A not operator A false value A A true value A Axioms a b = b a a b = b a a (b c) = (a b) c a (b c) = (a b) c a (a b) = a a (a b) = a a (b c) = (a b) (a c) a (b c) = (a b) (a c) a a = a a = We will use the first non-trivial Boolean Algebra: A = {, }. This adds the law of excluded middle: if a then a = and if a then a =.
Simplifying a Boolean Expression You are a New Yorker if you were born in New York or were not born in New York and lived here ten years. Axioms x ( ( x) y ) a b = b a a b = b a a (b c) = (a b) c a (b c) = (a b) c a (a b) = a a (a b) = a a (b c) = (a b) (a c) a (b c) = (a b) (a c) a a = a a = Lemma: x = x (x x) = x (x y) if y = x = x
Simplifying a Boolean Expression You are a New Yorker if you were born in New York or were not born in New York and lived here ten years. Axioms x ( ( x) y ) = ( x ( x) ) (x y) a b = b a a b = b a a (b c) = (a b) c a (b c) = (a b) c a (a b) = a a (a b) = a a (b c) = (a b) (a c) a (b c) = (a b) (a c) a a = a a = Lemma: x = x (x x) = x (x y) if y = x = x
Simplifying a Boolean Expression You are a New Yorker if you were born in New York or were not born in New York and lived here ten years. Axioms x ( ( x) y ) = ( x ( x) ) (x y) = (x y) a b = b a a b = b a a (b c) = (a b) c a (b c) = (a b) c a (a b) = a a (a b) = a a (b c) = (a b) (a c) a (b c) = (a b) (a c) a a = a a = Lemma: x = x (x x) = x (x y) if y = x = x
Simplifying a Boolean Expression You are a New Yorker if you were born in New York or were not born in New York and lived here ten years. Axioms x ( ( x) y ) = ( x ( x) ) (x y) = (x y) = x y Lemma: a b = b a a b = b a a (b c) = (a b) c a (b c) = (a b) c a (a b) = a a (a b) = a a (b c) = (a b) (a c) a (b c) = (a b) (a c) a a = a a = x = x (x x) = x (x y) if y = x = x
Alternate Notations for Boolean Logic Operator Math Engineer Schematic Copy x X X or X X Complement x X X X AND x y XY or X Y X Y XY OR x y X + Y X Y X + Y
Definitions Literal: a Boolean variable or its complement E.g., X X Y Y Implicant: A product of literals E.g., X XY XYZ Minterm: An implicant with each variable once E.g., XYZ XYZ X YZ Maxterm: A sum of literals with each variable once E.g., X + Y + Z X + Y + Z X + Y + Z
Sum-of-minterms and Product-of-maxterms Two mechanical ways to translate a function s truth table into an expression: X Y Minterm Maxterm F X Y X + Y X Y X + Y X Y X + Y X Y X + Y The sum of the minterms where the function is : F = X Y + X Y The product of the maxterms where the function is : F = (X + Y)(X + Y)
Minterms and Maxterms: Another Example The minterm and maxterm representation of functions may look very different: X Y Minterm Maxterm F X Y X + Y X Y X + Y X Y X + Y X Y X + Y The sum of the minterms where the function is : F = X Y + X Y + X Y The product of the maxterms where the function is : F = X + Y
The Menagerie of Gates Buffer Inverter AND NAND OR NOR XOR XNOR + +
De Morgan s Theorem (a b) = ( a) ( b) (a b) = ( a) ( b) AB = A + B = A + B = A B =
Karnaugh Map for Seg. a W X Y Z a X X X X X X Z X X X X X Y W The Karnaugh Map Sum-of-Products Challenge Cover all the s and none of the s using as few literals (gate inputs) as possible. Few, large rectangles are good. Covering X s is optional.
Karnaugh Map for Seg. a W X Y Z a X X X X X X Z X X X X X Y W The minterm solution: cover each with a single implicant. a = W X Y Z + W X Y Z + W X Y Z + W X Y Z + W X Y Z + W X Y Z + W X Y Z + W X Y Z 8 4 = 32 literals 4 inv + 8 AND4 + OR8
Karnaugh Map for Seg. a W X Y Z a X X X X X X Z X X X X X Y Merging implicants helps Recall the distributive law: AB + AC = A(B + C) W a = W X Y Z + W Y + W X Z + W X Y 4 + 2 + 3 + 3 = 2 literals 4 inv + AND4 + 2 AND3 + AND2 + OR4
Karnaugh Map for Seg. a W X Y Z a X X X X X X Z X X X X X Y W Missed one: Remember this is actually a torus. a = X Y Z + W Y + W X Z + W X Y 3 + 2 + 3 + 3 = literals 4 inv + 3 AND3 + AND2 + OR4
Karnaugh Map for Seg. a W X Y Z a X X X X X X Z X X X X X Y W Taking don t-cares into account, we can enlarge two implicants: a = X Z + W Y + W X Z + W X 2 + 2 + 3 + 2 = 9 literals 3 inv + AND3 + 3 AND2 + OR4
Karnaugh Map for Seg. a W X Y Z a X X X X X X Z X X X X X Y W Can also compute the complement of the function and invert the result. Covering the s instead of the s: a = W X Y Z + X Y Z + W Y 4 + 3 + 2 = 9 literals 5 inv + AND4 + AND3 + AND2 + OR3
Karnaugh Map for Seg. a W X Y Z a X X X X X X Z X X X X X Y W To display the score, PONG used a TTL chip with this solution in it: (3) (2) OUTPUT a OUTPUT b
Decoders Input: n-bit binary number Output: -of-2 n one-hot code 2-to-4 in out 3-to-8 decoder in out in 4-to-6 decoder out
The 7438 3-to-8 Decoder 5 Y A 4 Y Select Inputs B 2 3 Y2 C 3 2 Y3 Y4 Data Outputs Y5 G 6 9 Y6 Enable Inputs G2A 4 7 Y7 G2B 5
General n-bit Decoders Implementing a function with a decoder: Every minterm E.g., F = AC + BC I n. I 2 I I n I 2 I I n I 2 I. I n I 2 I I n I 2 I I n I 2 I I n I 2 I C B A F C B A 7 6 5 4 3 2 F
The Two-Input Multiplexer B A S Y S B A Y B A S S A B Y S B A Y X X X X S Y A B
The Four-Input Mux D C B A 3 2 Y D C S 2 S B Y S 2 S Y A B C D A S 2 S
General 2 n -input muxes I 2 n. I I 2 n S n S 2 S Y I 2 n I. Y Y = I S n S 2 S + I S n S 2 S + I 2 S n S 2 S +. I 2 n 2S n S 2 S + I 2 n S n S 2 S I 2 n n-to-2 n decoder S n S 2 S
Using a Mux to Implement an Arbitrary Function F = AC + BC C B A F
Using a Mux to Implement an Arbitrary Function F = AC + BC Apply each value in the truth table: C B A F 7 6 5 4 3 2 C B A F
Using a Mux to Implement an Arbitrary Function F = AC + BC Apply each value in the truth table: C B A F 7 6 5 4 3 2 C B A F
Using a Mux to Implement an Arbitrary Function F = AC + BC Apply each value in the truth table: C B A F 7 6 5 4 3 2 C B A F
Using a Mux to Implement an Arbitrary Function F = AC + BC C B A F
Using a Mux to Implement an Arbitrary Function F = AC + BC C B A F
Using a Mux to Implement an Arbitrary Function F = AC + BC Can always remove a select and feed in,, S, or S. C B A F C B F A A 3 2 C B Y
Using a Mux to Implement an Arbitrary Function F = AC + BC Can always remove a select and feed in,, S, or S. C B A F C B F A A A A 3 2 C B Y
Using a Mux to Implement an Arbitrary Function F = AC + BC Can always remove a select and feed in,, S, or S. C B A F C B F A A A A 3 2 C B Y
Using a Mux to Implement an Arbitrary Function F = AC + BC Can always remove a select and feed in,, S, or S. C B A F C B F A A A A 3 2 C B Y
Using a Mux to Implement an Arbitrary Function F = AC + BC Can always remove a select and feed in,, S, or S. C B A F C B F A A B B In this case, the function just happens to be a mux: (not always the case!) A A 3 2 C B Y B A Y C
The Simplest Timing Model In Out t p Each gate has its own propagation delay t p. When an input changes, any changing outputs do so after t p. Wire delay is zero.
A More Realistic Timing Model In Out t p(max) t p(min) It is difficult to manufacture two gates with the same delay; better to treat delay as a range. Each gate has a minimum and maximum propagation delay t p(min) and t p(max). Outputs may start changing after t p(min) and stablize no later than t p(min).
Critical Paths and Short Paths A B C D Y How slow can this be?
Critical Paths and Short Paths A B C D Y How slow can this be? The critical path has the longest possible delay. t p(max) = t p(max, AND) + t p(max, OR) + t p(max, AND)
Critical Paths and Short Paths A B C D Y How fast can this be? The shortest path has the least possible delay. t p(min) = t p(min, AND)
Glitches A glitch is when a single input change can cause multiple output changes. B A B C A C B B A B BC A B + BC A B + BC C A
Glitches A glitch is when a single input change can cause multiple output changes. B A B C A C B B A B BC A B + BC A B + BC C A
Glitches A glitch is when a single input change can cause multiple output changes. B A B C A C B B A B BC A B + BC A B + BC C A
Glitches A glitch is when a single input change can cause multiple output changes. B A B C A C B B A B BC A B + BC A B + BC C A
Glitches A glitch is when a single input change can cause multiple output changes. B A B C A C B B A B BC A B + BC A B + BC C A
Glitches A glitch is when a single input change can cause multiple output changes. B A B C A B + BC + AC C A Adding such redundancy only works for single input changes; glitches may be unavoidable when multiple inputs change.
Arithmetic: Addition Adding two one-bit numbers: A and B Produces a two-bit result: A B C S C S Half Adder (carry and sum) A B C S
Full Adder In general, you need to add three bits: C o + + = + + = + + = + + = + + = + + = C i A B C o S C i A B C i A B C o S S
A Four-Bit Ripple-Carry Adder A B A 3 B 3 A 2 B 2 A B A B C o FA C i FA FA FA FA S S 4 S 3 S 2 S S
A Two s Complement Adder/Subtractor To subtract B from A, add A and B. Neat trick: carry in takes care of the + operation. B 3 A 3 B 2 A 2 B A B A SUBTRACT/ADD FA FA FA FA S 4 S 3 S 2 S S
Overflow in Two s-complement Representation When is the result too positive or too negative? + 2 2 + + + + + + + + + +
Overflow in Two s-complement Representation When is the result too positive or too negative? + 2 2 + % + % + The result does not fit when the top two carry bits differ. A n Bn A n B n + + + S n S n + + + + % Overflow
Ripple-Carry Adders are Slow A3 B3 A2 B2 A B A B C C4 S3 S2 S S The depth of a circuit is the number of gates on a critical path. This four-bit adder has a depth of 8. n-bit ripple-carry adders have a depth of 2n.
Carry Generate and Propagate The carry chain is the slow part of an adder; carry-lookahead adders reduce its depth using the following trick: C A B K-map for the carry-out function of a full adder For bit i, C i+ = A i B i + A i C i + B i C i = A i B i + C i (A i + B i ) = G i + C i P i Generate G i = A i B i sets carry-out regardless of carry-in. Propagate P i = A i + B i copies carry-in to carry-out.
Carry Lookahead Adder Expand the carry functions into sum-of-products form: C i+ = G i + C i P i C = G + C P C 2 = G + C P = G + (G + C P )P = G + G P + C P P C 3 = G 2 + C 2 P 2 = G 2 + (G + G P + C P P )P 2 = G 2 + G P 2 + G P P 2 + C P P P 2 C 4 = G 3 + C 3 P 3 = G 3 + (G 2 + G P 2 + G P P 2 + C P P P 2 )P 3 = G 3 + G 2 P 3 + G P 2 P 3 + G P P 2 P 3 + C P P P 2 P 3
Bistable Elements Equivalent circuits; right is more traditional. Two stable states:
SR Latch R S S R
SR Latch R S S R R S Set
SR Latch R S S R R S Hold, State
SR Latch R S S R R S Reset
SR Latch R S S R R S Hold, State
SR Latch R S S R R S Huh?
SR Latch R S S R R S Set
SR Latch R S S R R S Hold, State
SR Latch R S S R R S Huh?
SR Latch R X S R S X R S Undefined
D Latch D C D C inputs outputs C D X
Positive-Edge-Triggered D Flip-Flop D C M Master D C Slave D D D C S C C C D C M D C S transparent opaque
Positive-Edge-Triggered D Flip-Flop D C M Master D C Slave D D D C S C C C D C M D C S transparent opaque
Positive-Edge-Triggered D Flip-Flop D C M Master D C Slave D D D C S C C C D C M D C S transparent opaque opaque transparent
Positive-Edge-Triggered D Flip-Flop D C M Master D C Slave D D D C S C C C D C M D C S transparent opaque opaque transparent
Positive-Edge-Triggered D Flip-Flop D C M Master D C Slave D D D C S C C C D C M D C S transparent opaque transparent opaque transparent opaque
Positive-Edge-Triggered D Flip-Flop D C M Master D C Slave D D D C S C C C D C M D C S transparent opaque transparent opaque opaque transparent opaque transparent
D Flip-Flop with Enable D D E C C E D X X X X X D E C What s wrong with this solution? D
Asynchronous Preset/Clear PRE D CLR CLK D PRE CLR
The Synchronous Digital Logic Paradigm Gates and D flip-flops only INPUTS OUTPUTS Each flip-flop driven by the same clock STATE C L Every cyclic path contains at least one flip-flop CLOCK NEXT STATE
Timing in Synchronous Circuits D C L CLK t c CLK D t c : Clock period. E.g., ns for a MHz clock
Timing in Synchronous Circuits D C L CLK Sufficient Hold Time? t p(min,ff) t p(min,cl) CLK D Hold time constraint: how soon after the clock edge can D start changing? Min. FF delay + min. logic delay
Timing in Synchronous Circuits D C L CLK t p(max,ff) Sufficient Setup Time? t p(max,cl) CLK D Setup time constraint: when before the clock edge is D guaranteed stable? Max. FF delay + max. logic delay
CLK 2 Clock Skew: What Really Happens C L D CLK CLK 2 CLK Sufficient Hold Time? CLK t skew D t p(min,ff) t p(min,cl) CLK 2 arrives late: clock skew reduces hold time
CLK 2 Clock Skew: What Really Happens C L D CLK CLK 2 CLK Sufficient Setup Time? CLK t skew D t p(max,ff) t p(max,cl) CLK arrives early: clock skew reduces setup time
Cool Sequential Circuits: Shift Registers A 2 3 CLK A 2 3 X X X X X X X X X X
Universal Shift Register L 3 2 D 3 2 D 3 2 2 D 2 3 2 3 D 3 R S S CLK S S 3 2 R 3 2 D 3 D 2 D D 3 2 2 L S S Operation Shift right Load Hold Shift left
Cool Sequential Circuits: Counters Cycle through sequences of numbers, e.g.,
The 74LS63 Synchronous Binary Counter
Moore and Mealy Machines Inputs Next State Logic Next State CLK Current State Output Logic Outputs The Moore Form: Outputs are a function of only the current state.
Moore and Mealy Machines Inputs Next State Logic Next State CLK Current State Output Logic Outputs The Mealy Form: Outputs may be a function of both the current state and the inputs. A mnemonic: Moore machines often have more states.
Mealy Machines are the Most General Inputs Outputs Current State C L CLK Next State Another, equivalent way of drawing Mealy Machines This is exactly the synchronous digital logic paradigm
Moore vs. Mealy FSMs Alyssa P. Hacker has a snail that crawls down a paper tape with s and s on it. The snail smiles whenever the last four digits it has crawled over are. Design Moore and Mealy FSMs of the snail s brain.
State Transition Diagrams: Looking for S S S2 S3 S4 Moore Machine: States indicate output / / / / S / S / S2 / S3 / Mealy Machine: Arcs indicate input/output
Moore Machine Next State S A S S S S S S S S S2 S2 S3 S2 S2 S3 S S3 S4 S4 S S4 S2 Output S Y S S S2 S3 S4
Moore Machine A Next State S A S Output S Y S 2 CLK S 2 Y S CLK S CLK S S
Mealy Machine S A S Y S S S S S S S S2 S2 S3 S2 S2 S3 S S3 S
Mealy Machine A S A S Y S CLK S CLK Y S S
State Transition Diagram for the TLC C + L/T S/T S/T HG H : G F : R FY H : R F : Y Inputs: C: Car sensor S: Short Timeout L: Long Timeout CL/T C + L/T HY H : Y F : R FG H : R F : G S/T S/T C L/T Outputs: T: Timer Reset H: Highway color F: Farm road color S C S L T S HG X X HG HG X X HG HG X HY HY X X HY HY X X FG FG X FG FG X X FY FG X X FY FY X X FY FY X X HG S H F HG G R HY Y R FG R G FY R Y
State and Output Encoding S C S L T S HG X X HG HG X X HG HG X HY HY X X HY HY X X FG FG X FG FG X X FY FG X X FY FY X X FY FY X X HG S H F HG G R HY Y R FG R G FY R Y A one-hot encoding: HG HY FG FY G Y R
State and Output Encoding S C S L T S X X X X X X X X X X X X X X X X X X S H F T = S CL + S S + S 2 (C + L) + S 3 S S 3 = S 2(C + L) + S 3 S S 2 = S S + S 2 (C + L) S = S CL + S S S = S (CL) + S 3 S H R = S 2 + S 3 H Y = S H G = S F R = S + S F Y = S 3 F G = S 2
State and Output Encoding C L S3 FY T = S CL + S S + S 2 (C + L) + S 3 S S 3 = S 2(C + L) + S 3 S HR S 2 = S S + S 2 (C + L) S2 FG S = S CL + S S S = S (CL) + S 3 S H R = S 2 + S 3 S HY H Y = S H G = S FR F R = S + S S S HG F Y = S 3 F G = S 2
The CMOS Inverter A Y A 3V p-fet Y n-fet An inverter is built from two MOSFETs: An n-fet connected to ground A p-fet connected to the power supply V
The CMOS Inverter A 3V A Y 3V Off V Y On V When the input is near the power supply voltage ( ), the p-fet is turned off; the n-fet is turned on, connecting the output to ground ( ). n-fets are only good at passing s
The CMOS Inverter A V A Y 3V On 3V Y Off V When the input is near ground ( ), the p-fet is turned on, connecting the output to the power supply ( ); the n-fet is turned off. p-fets are only good at passing s
The CMOS NAND Gate A B Y A Y Two-input NAND gate: two n-fets in series; two p-fets in parallel B
The CMOS NAND Gate A B Y A B Y Both inputs : Both p-fets turned on Output pulled high
The CMOS NAND Gate A B A B Y Y One input, the other : One p-fet turned on Output pulled high One n-fet turned on, but does not control output
The CMOS NAND Gate A B A B Y Y Both inputs : Both n-fets turned on Output pulled low Both p-fets turned off
The CMOS NOR Gate A B A B Y Y Two-input NOR gate: two n-fets in parallel; two p-fets in series. Not as fast as the NAND gate because n-fets are faster than p-fets
A CMOS AND-OR-INVERT Gate A C Y A B C D Y B D
Static CMOS Gate Structure Inputs p-fet pull-up network Y Pull-up and Pull-down networks must be complementary; exactly one should be connected for each input combination. n-fet pull-down network Series connection in one should be parallel in the other
Read-Only Memories: Combinational Functions A k. A 2 A A 2 k n ROM D n. D D row column A 6 A 5 A 4 A 3 A 2 A A 28 ROM D General ROM: 2 k words n bits per word Example: Space Race ROM
Implementing ROMs / Z: not connected Bitline 2 Bitline Bitline Wordline Wordline A A 2-to-4 Decoder 2 Wordline 2 Add. Data 3 Wordline 3 D 2 D D
Implementing ROMs / Z: not connected Bitline 2 Bitline Bitline Wordline Wordline A A 2-to-4 Decoder 2 Wordline 2 Add. Data 3 Wordline 3 D 2 D D
Implementing ROMs / Z: not connected A A 2-to-4 Decoder 2 Add. Data 3 D 2 D D
Implementing ROMs / Z: not connected A A 2-to-4 Decoder 2 Add. Data 3 D 2 D D
CMOS Mask-Programmed ROMs Add. Data ROM programmed by selectively connecting drain wires Active-high wordlines
EPROMs and FLASH use Floating-Gate MOSFETs
Static Random-Access Memory Cell Bit line Bit line Word line
Dynamic RAM Cell Bit line Word line
Our Old Pal, the Space Race ROM 2 3 4 5 means 6 7 8 9 and 2 means 3 4 5 A3 A2 A A D D D2 D3 D4 D5 D6 D7
Our Old Pal, the Space Race ROM 2 3 4 5 6 7 8 9 2 3 The decoder or AND plane In a RAM or ROM, computes every minterm Pattern is not programmable 4 5 A3 A2 A A D D D2 D3 D4 D5 D6 D7
Our Old Pal, the Space Race ROM 2 3 4 5 6 7 8 9 2 3 The decoder or OR plane One term for every output Pattern is programmable = the contents of the ROM 4 5 A3 A2 A A D D D2 D3 D4 D5 D6 D7
Simplifying the Space Race ROM A A 2 A 3 A Essential minterms mean don t expand these
Our New PAL, the Space Race ROM 32 32 2 32 3 32 4 3 5 32 6 3 2 7 32 8 3 9 2 2 32 2 3 2 3 3 2 4 5 D = 32 D = 32 D 2 = 32 + 32 D 3 = 32 + 3+ 32 + 32 D 4 = 3 2 + 32 + 32 + 32 D 5 = 3 + 2 + 2+ 32 + 32 D 6 = 3 2 + 32 D 7 = 3 2 + 32 Saved two ANDs A3 A2 A A D D D2 D3 D4 D5 D6 D7
I I I I A (t I/ I/ I/ I/ I/ A 22V PAL: Programmable AND/Fixed OR CLK/I First Fuse Numbers 2 3 4 5 396 44 88 924 452 496 22 256 286 Increments 4 8 2 6 2 24 28 32 36 4 P = 588 R = 589 P = 58 R = 58 P = 582 R = 583 P = 584 R = 585 Macrocell Macrocell Macrocell Macrocell Macrocell P = 586 R = 587 23 22 2 2 9
Field-Programmable Gate Arrays (FPGAs) 6 RAM programmable switch LUT LE LE LE Switch Block SB SB LE LE LE Switch Box: 6 programmable switches