EE4-Fall 20 Digital Integrated Circuits Lecture 5 Memory decoders Administrative Stuff Homework #6 due today Project posted Phase due next Friday Project done in pairs 2
Last Lecture Last lecture Logical effort Intro to memory Today s lecture SRAM operation Memory decoders Reading (2.2.-2.2.3, 6) 3 SRAM 4 2
6-transistor CMOS SRAM Cell WL V DD M 2 M 4 Q M Q M 5 6 M M 3 BL BL 5 SRAM Column WL0 WL2 WL3 BL BL_B 6 3
SRAM Array Layout 7 65nm 22 nm SRAM Layout Thin cell Access Transistor 0.092 m 2 cell in 22nm from Intel (IDF 09) Pull down Pull up 8 4
Write SRAM Operation 0 Hold 0 9 SRAM Operation Read 0 Reading the cell should not destroy the stored value 0 5
CMOS SRAM Analysis (Read) WL BL V DD M 4 Q = 0 M 5 V Q = M 6 BL V DD M V DD V DD C bit C bit I I D5 D 2 DD Tn V V V W V Wv C C V V V 5 sat ox n ox DD Tn VDD VTn V crit, nl5 L 2 CMOS SRAM Analysis (Read).2 Voltage Rise ( V) 0.8 0.6 0.4 0.2 0 0 0.5.2.5 2 Cell Ratio (CR) 2.5 3 2 6
CMOS SRAM Analysis (Write) V DD WL M 4 Q = 0 M 6 M 5 Q = M V DD BL = BL = 0 2 W6 V V V Wv C C V V V DD Tp Q 4 sat ox n ox DD Tn Q VDD VTp crit, pl4 L6 2 3 CMOS SRAM Analysis (Write) PR W / L 4 W / L 6 4 7
Read Static Noise Margin (SNM) VDD VL VR PR AXR Read SNM NR VR (V) 0.5 90nm simulation 0 0 0.5 VL (V) Read SNM is the contention between the two sides of the cell under read conditions. Write Stability Write Noise Margin (WNM) VDD PR VL VR AXR NR Write margin is the distance between the two curves 8
Write Stability BL/WL Write Margins Voltage (V).2 0.8 0.6 0.4 0.2 0-0.2 BL WM 0.00E+00 2.00E-08 4.00E-08 6.00E-08 8.00E-08.00E-07 Time (s) Highest BL voltage under which write is possible when BLC is kept precharged Voltage (V).2 0.8 0.6 0.4 0.2 0-0.2 WM WL 0.00E+00 2.00E-08 4.00E-08 6.00E-08 8.00E-08.00E-07 Time (s) Difference between VDD and lowest WL voltage under which write is possible when both bit-lines are precharged Write Stability Write Current (N-Curve) Minimum current looking into the storage node 9
Decoders N words S 0 S S 2 M bits Word 0 Word Word 2 Storage cell Decoder A 0 A S 0 M bits Word 0 Word Word 2 Storage cell S N2 2 S N2 Word N2 2 Word N2 A K2 K = log 2 N Word N2 2 Word N2 Input-Output (M bits) Input-Output (M bits) Intuitive architecture for N x M memory Too many select signals: N words == N select signals Decoder reduces the number of select signals K = log 2 N 9 Row Decoders Collection of 2 M complex logic gates Organized in regular and dense fashion (N)AND Decoder NOR Decoder 20 0
Decoder Design Example Look at decoder for 256x256 memory block (8KBytes) 2 Problem Setup Goal: Build fastest, lowest possible power decoder with static CMOS logic What we know Need 256 AND gates, each one of them drives one word line N=8 22
Possible AND8 Build 8-input NAND gate using 2-input gates and inverters Is this the best we can do? Is this better than using fewer NAND4 gates? 23 Problem Setup Goal: Build fastest possible decoder with static CMOS logic What we know Basically need 256 AND gates, each one of them drives one word line N=8 24 2
Problem Setup () Each wordline has 256 cells connected to it C WL = 256*C cell + C wire Ignore wire for now (include it in project) Assume that decoder input capacitance is C address =4*C cell 25 Problem Setup (2) Each address drives 2 8 /2 AND gates A0 drives ½ of the gates, A0_b the other ½ of the gates 26 3
Problem Setup (3) Total fanout on each address wire is: 256C 2 8 Ccell C F B 28 2 2 C C C load cell 7 3 2 in 4 cell 2 cell 27 Decoder Fan-Out F of 2 3 means that we will want to use more than log 4 (2 3 ) = 6.5 stages to implement the AND8 Need many stages anyways So what is the best way to implement the AND gate? Will see next that it s the one with the most stages and least complicated gates 28 4
8-Input AND LE=0/3 LE = 0/3 P = 8 + LE=2 5/3 LE = 0/3 P = 4 + 2 LE=4/3 5/3 4/3 LE = 80/27 P = 2 + 2 + 2 + 29 8-Input AND Using 2-input NAND gates 8-input gate takes 6 stages Total LE is (4/3) 3 2.4 So PE is 2.4*2 3 optimal N of ~7. 30 5
Decoder So Far 256 8-input AND gates Each built out of tree of NAND gates and inverters Issue: Every address line has to drive 28 gates (and wire) right away Can t build gates small enough - Forces us to add buffers just to drive address inputs 3 Look Inside Each AND8 Gate 32 6
Predecoders Use a single gate for each of the shared terms E.g., from A 0, A 0, A, and A, generate four signals: A 0 A, A 0 A, A 0 A, A 0 A In other words, we are decoding smaller groups of address bits first And using the predecoded outputs to do the rest of the decoding 33 Predecoder and Decoder A 0 A A 2 A 3 A 4 A 5 34 7
Predecoder/Decoder Layout WL0 WL SRAM Array WL63 a5 a4 a3 a5 a4 a3 a2 a a0 35 Predecode Options Larger predecode usually better: More stages before the long wires Decreases their effect on the circuit Fewer long wires switch Lower power Easier to fit 2-input gate into cell pitch 6 A0 6 A0AA2A3 A A2A3 C L 256 4 to 6 predecoder 36 8
What We Now Know Given decoder structure, input capacitance, final load Can size the entire chain using LE for minimum delay Is this the best we can do in terms of power too? Not necessarily probably want to reduce sizes (especially on final decoder inputs) Is there anything else we can do to improve energy even further? 37 Next Lecture Back to logic 38 9