EECS141 1 Hw 8 Posted Last one to be graded Due Friday April 30 Hw 6 and 7 Graded and available Project Phase 2 Graded Project Phase 3 Launch Today EECS141 2 1
6 5 4 3 2 1 0 1.5 2 2.5 3 3.5 4 Frequency [GHz] Performance Histogram EECS141 3 4 3 2 1 0 1.5 2 2.5 3 3.5 4 Frequency [GHz] Performance - pruned Power [W] 3.5E-03 3.0E-03 2.5E-03 2.0E-03 1.5E-03 1.0E-03 5.0E-04 0.0E+00 0.0E+00 1.0E+09 2.0E+09 3.0E+09 Frequency [Hz] EECS141 4 2
Max: 98 Avg: 78.9 StDev: 12.9 6 5 4 3 2 1 Median: 81 0 50 60 70 80 90 100 EECS141 5 Design implementation of 256-bit code Create floorplan to estimate wirelength Optimize circuit such that energy is minimized for max delay of 10 nsec Determine also Tdmin and Emin for that design Present results in poster Energy (T d,min, E max ) (T d, E) (T d,max, E min ) Delay EECS141 6 3
Template: Group A: Group B: Same but period is 4 Group C: Same but period is 8 Group D: Same but period is 16 EECS141 7! EECS141 8 4
! EECS141 9 Energy? Joule 10ns Optimization target Delay The extremes EECS141 10 5
Last lecture Multipliers Today s lecture Memory cells Reading (Ch 12) EECS141 11 12 EECS141 12 6
Read-Write Memory Non-Volatile Read-Write Memory Read-Only Memory Random Access Non-Random Access EPROM E 2 PROM Mask-Programmed Programmable (PROM) SRAM FIFO FLASH DRAM LIFO Shift Register CAM EECS141 13 STATIC (SRAM) Data stored as long as supply is applied Larger (6 transistors/cell) Fast Differential (usually) DYNAMIC (DRAM) Periodic refresh required Smaller (1-3 transistors/cell) Slower Single Ended EECS141 14 7
Conceptual: linear array Each box holds some data But this does not lead to a nice layout shape Too long and skinny Create a 2-D array Decode Row and Column address to get data EECS141 15 M bits M bits N words S 0 S 1 S 2 S N-2 S N -1 Word 0 Word 1 Word 2 Word N-2 Word N-1 Storage cell A 0 A 1 A K-1 K = log 2 N D e c o d e r S 0 Word 0 Word 1 Word 2 Word N-2 Word N-1 Storage cell Input-Output ( M bits) Input-Output ( M bits) Intuitive architecture for N x M memory Too many select signals: N words == N select signals Decoder reduces the number of select signals K = log N 2 EECS141 16 8
EECS141 17 If D is high, D_b will be driven low Which makes D stay high Positive feedback EECS141 18 9
EECS141 19 Access transistor must be able to overpower the feedback EECS141 20 10
EECS141 21 Complementary data values are written (read) from two sides EECS141 22 11
WL M 2 M 4 M 5 Q Q M 6 M 1 M 3 BL BL EECS141 23 Read 1 0 Q_b will get pulled up when WL first goes high Reading the cell should not destroy the stored value EECS141 24 12
WL BL M 5 Q = 0 Q = 1 M 4 M 6 BL M 1 C bit C bit EECS141 25 1.2 1 Voltage Rise (V) 0.8 0.6 0.4 0.2 0 0 0.5 1 1.2 1.5 2 Cell Ratio (CR) 2.5 3 EECS141 26 13
WL M 4 M 5 Q = 0 Q = 1 M 6 M 1 BL = 1 BL = 0 EECS141 27 EECS141 28 14
Obtained by breaking the feedback between the inverters SNM EECS141 29 BL BLB Compact cell Bitlines: M2 Wordline: bootstrapped in M3 GND WL EECS141 30 15
ST/Philips/Motorola Access Transistor Pull down Pull up EECS141 31 EECS141 32 16
EECS141 33 WL R L R L M 3 Q Q M 4 BL M 1 M 2 BL Static power dissipation -- Want R L large Bit lines precharged to to address t p problem EECS141 34 17
BL 1 BL 2 WWL RWL WWL M 3 RWL M 1 X M 2 X - V T C S BL 1 BL 2 - V T D V No constraints on device ratios Reads are non-destructive Value stored at node X when writing a 1 = V WWL -V Tn EECS141 35 BL 1 BL 2 BL2 BL1 GND WWL RWL RWL M3 M 3 M2 M 1 X M 2 C S WWL M1 EECS141 36 18
Write: C S is charged or discharged by asserting WL and BL. Read: Charge redistribution takes places between bit line and storage capacitance Δ V BL V PRE V BIT V C S = V = ------------ PRE C S + C BL Voltage swing is small; typically around 250 mv. EECS141 37 Capacitor Metal word line n + Poly Poly n + Inversion layer induced by plate bias Cross-section SiO 2 Field Oxide Diffused bit line Polysilicon gate Layout Polysilicon plate M 1 word line Uses Polysilicon-Diffusion Capacitance Expensive in Area EECS141 38 19
EECS141 39 Word line Insulating Layer Cell plate Capacitor dielectric layer Cell Plate Si Capacitor Insulator Refilling Poly Transfer gate Isolation Storage electrode Storage Node Poly Si Substrate 2nd Field Oxide Trench Cell EECS141 Stacked-capacitor Cell 40 20
BL BL BL 1 WL WL WL BL BL BL 0 WL WL WL GND Diode ROM MOS ROM 1 MOS ROM 2 EECS141 41 Pull-up devices WL [0] WL [1] GND WL [2] GND WL [3] BL [0] BL [1] BL [2] BL [3] EECS141 42 21
Cell (9.5λ x 7λ) Programmming using the Active Layer Only Polysilicon Metal1 Diffusion Metal1 on Diffusion EECS141 43 Cell (11λ x 7λ) Programmming using the Contact Layer Only Polysilicon Metal1 Diffusion Metal1 on Diffusion 44 EECS141 44 22
Pull-up devices BL [0] BL [1] BL [2] BL [3] WL [0] WL [1] WL [2] WL [3] All word lines high by default with exception of selected row EECS141 45 Cell (8λ x 7λ) Programmming using the Metal-1 Layer Only No contact to VDD or GND necessary; drastically reduced cell size Loss in performance compared to NOR ROM Polysilicon Diffusion Metal1 on Diffusion EECS141 46 23
Cell (5λ x 6λ) Programmming using Implants Only Polysilicon Threshold-altering implant Metal1 on Diffusion 47 EECS141 47 Source Floating gate Gate Drain D t ox G n + Substrate p t ox n +_ S Device cross-section Schematic symbol EECS141 48 24
20 V 0 V 5 V 10 V 5 V 20 V - 5 V 0 V - 2.5 V 5 V S D S D S D Avalanche injection Removing programming voltage leaves charge trapped Programming results in higher V T. EECS141 49 Floating gate Gate I Source Drain 20 30 nm -10 V V GD 10 V n 1 n Substrate 1 p 10 nm FLOTOX transistor Fowler-Nordheim I-V characteristic EECS141 50 25
BL WL Absolute threshold control is hard Unprogrammed transistor might be depletion 2 transistor cell EECS141 51 Flash Courtesy Intel EPROM EECS141 52 26
Decoders Sense Amplifiers Input/Output Buffers Control / Timing Circuitry EECS141 53 Collection of 2 M complex logic gates Organized in regular and dense fashion (N)AND Decoder NOR Decoder EECS141 54 27
Look at decoder for 256x256 memory block (8KBytes) EECS141 55 Goal: Build fastest possible decoder with static CMOS logic What we know Basically need 256 AND gates, each one of them drives one word line N=8 EECS141 56 28
Each word line has 256 cells connected to it Total output load is 256*C cell + C wire Assume that decoder input capacitance is C address =4*C cell Each address drives 2 8 /2 AND gates A0 drives ½ of the gates, A0_b the other ½ of the gates Neglecting C wire, the fan-out on each one of the 16 address wires is: B EECS141 57 FB of at least 2 13 means that we will want to use more than log 4 (2 13 ) = 6.5 stages to implement the AND8 Need many stages anyways So what is the best way to implement the AND gate? Will see next that it s the one with the most stages and least complicated gates EECS141 58 29
LE=10/3 1 ΠLE = 10/3 P = 8 + 1 LE=2 5/3 ΠLE = 10/3 P = 4 + 2 LE=4/3 5/3 4/3 1 ΠLE = 80/27 P = 2 + 2 + 2 + 1 EECS141 59 Using 2-input NAND gates 8-input gate takes 6 stages Total LE is (4/3) 3 2.4 So PE is 2.4*2 13 optimal N of ~7.1 EECS141 60 30
256 8-input AND gates Each built out of tree of NAND gates and inverters Issue: Every address line has to drive 128 gates (and wire) right away Can t build gates small enough - Forces us to add buffers just to drive address inputs EECS141 61 EECS141 62 31
Use a single gate for each of the shared terms E.g., from A 0, A 0, A 1, and A 1, generate four signals: A 0 A 1, A 0 A 1, A 0 A 1, A 0 A 1 In other words, we are decoding smaller groups of address bits first And using the predecoded outputs to do the rest of the decoding EECS141 63 EECS141 64 32
Multi-stage implementation improves performance WL 1 WL 0 A 0 A 1 A 0 A 1 A 0 A 1 A 0 A 1 A 2 A 3 A 2 A 3 A 2 A 3 A 2 A 3 A 1 A 0 A 0 A 1 A 3 A 2 A 2 A 3 EECS141 65 Precharge devices GND GND WL 3 WL 3 WL 2 WL 2 WL 1 WL 0 WL 1 WL 0 φ A 0 A 0 A 1 A 1 A 0 A 0 A 1 A 1 φ 2-input NOR decoder 2-input NAND decoder EECS141 66 33
BL 0 BL 1 BL 2 BL 3 A 0 S 0 S 1 S 2 A 1 S 3 2-input NOR decoder Advantages: speed (t pd does not add to overall memory access time) Only one extra transistor in signal path Disadvantage: Large transistor count D EECS141 67 BL 0 BL 1 BL 2 BL 3 A 0 A 0 A 1 A 1 Number of devices drastically reduced Delay increases quadratically with # of sections; prohibitive for large decoders Solutions: buffers progressive sizing combination of tree and pass transistor approaches EECS141 68 D 34
t p = C Δ ---------------- V I av make Δ V as small as possible large Idea: Use Sense Amplifer small EECS141 69 M 3 M 4 y Out bit M 1 M 2 bit SE M 5 Directly applicable to SRAMs EECS141 70 35
EECS141 71 EQ BL BL SE SE Initialized in its meta-stable point with EQ Once adequate voltage gap created, sense amp enabled with SE Positive feedback quickly forces output to a stable operating point. EECS141 72 36