SRAM Design
Chapter Overview Classification Architectures The Core Periphery Reliability
Semiconductor Classification RWM NVRWM ROM Random Access Non-Random Access EPROM E 2 PROM Mask-Programmed Programmable (PROM) SRAM FIFO FLASH DRAM LIFO Shift Register CAM
Architecture: Decoders M bits M bits N Words S 0 S 1 S 2 S N-2 S N_1 Word 0 Word 1 Word 2 Word N-2 Word N-1 Storage Cell A 0 A 1 A K-1 Decoder S 0 Word 0 Word 1 Word 2 Word N-2 Word N-1 Storage Cell Input-Output (M bits) Input-Output (M bits) N words => N select signals Too many select signals Decoder reduces # of select signals K = log 2 N
Array-Structured Architecture Problem: ASPECT RATIO or HEIGHT >> WIDTH 2 L-K Bit Line Storage Cell A K A K+1 A L-1 Row Decoder Word Line Sense Amplifiers / Drivers M.2 K Amplify swing to rail-to-rail amplitude A 0 A K-1 Column Decoder Selects appropriate word Input-Output (M bits)
Hierarchical Architecture Row Address Column Address Block Address Control Circuitry Block Selector Global Amplifier/Driver I/O Global Data Bus Advantages: 1. Shorter wires within blocks 2. Block address activates only 1 block => power savings
Timing: Definitions Read Cycle READ Read Access Read Access Write Cycle WRITE Data Valid Write Access DATA Data Written
Timing: Approaches MSB LSB Address Bus Row Address Column Address RAS CAS Address Bus Address Address transition initiates memory operation RAS-CAS timing DRAM Timing Multiplexed Adressing SRAM Timing Self-timed
Read-Write Memories (RAM) STATIC (SRAM) Data stored as long as supply is applied Large (6 transistors/cell) Fast Differential DYNAMIC (DRAM) Periodic refresh required Small (1-3 transistors/cell) Slower Single Ended
6-transistor CMOS SRAM Cell WL M2 M4 M5 Q Q M6 M1 M3
CMOS SRAM Analysis (Write) WL M4 M5 Q = 0 Q = 1 M6 M1 = 1 = 0 k nm6 V, ( DD V Tn ) V 2 DD V ---------- ---------- DD = k pm4, 2 8 ( V Tp ) V 2 DD V ---------- ---------- DD 2 8 (W/L) n,m6 0.33 (W/L) p,m4 k nm5, 2 2 ------------ V V DD DD ---------- V Tn ---------- 2 k 2 2 n, M1 ( V Tn )---------- ----------- 2 8 = (W/L) n,m5 10 (W/L) n,m1
CMOS SRAM Analysis (Read) WL M4 M5 Q = 0 Q = 1 M6 M1 C bit C bit k nm5, --------------- V V DD DD ----------- V 2 2 Tn ----------- 2 V 2 = k 2 nm1, ( V Tn )----------- ----------- DD 2 8 (W/L) n,m5 10 (W/L) n,m1 (supercedes read constraint)
6T-SRAM Layout M2 M4 Q Q M1 M3 M5 M6 GND WL
Resistance-load SRAM Cell WL R L R L M3 Q Q M4 M1 M2 Static power dissipation -- Want R L large Bit lines precharged to to address t p problem
Periphery Decoders Sense Amplifiers Input/Output Buffers Control / Timing Circuitry
Row Decoders Collection of 2 M complex logic gates Organized in regular and dense fashion (N)AND Decoder NOR Decoder
Dynamic Decoders Precharge devices GND GND WL 3 WL 3 WL 2 WL 2 WL 1 WL 1 WL 0 WL 0 φ A 0 A 0 A 1 A 1 A 0 A 0 A 1 A 1 φ Dynamic 2-to-4 NOR decoder 2-to-4 MOS dynamic NAND Decoder Propagation delay is primary concern
A NAND decoder using 2-input predecoders WL 1 WL 0 A 0 A 1 A 0 A 1 A 0 A 1 A 0 A 1 A 2 A 3 A 2 A 3 A 2 A 3 A 2 A 3 A 1 A 0 A 0 A 1 A 3 A 2 A 2 A 3 Splitting decoder into two or more logic layers produces a faster and cheaper implementation
4 input pass-transistor based column decoder 0 1 2 3 A 0 A 1 2 input NOR decoder S 0 S 1 S 2 S 3 D dvantage: speed (t pd does not add to overall memory access time) only 1 extra transistor in signal path sadvantage: large transistor count
4-to-1 tree based column decoder 0 1 2 3 A 0 A 0 A 1 A 1 D Number of devices drastically reduced Delay increases quadratically with # of sections; prohibitive for large decoders Solutions: buffers progressive sizing combination of tree and pass transistor approaches
Decoder for circular shift-register WL 0 WL 1 WL 2 φ φ φ φ φ φ R φ φ R φ φ R φ φ...
Sense Amplifiers t p = C V ---------------- I av make V as small as possible large small Idea: Use Sense Amplifer small transition s.a. input output
Differential Sensing - SRAM PC y M3 M4 y EQ x M1 M2 SE M5 x x SE x WL i (b) Doubled-ended Current Mirror Amplifier SRAM cell i Diff. x Sense x Amp y y D D x y SE y x (a) SRAM sensing scheme. (c) Cross-Coupled Amplifier
Latch-Based Sense Amplifier EQ SE SE Initialized in its meta-stable point with EQ Once adequate voltage gap created, sense amp enabled with SE Positive feedback quickly forces output to a stable operating point.
Single-to-Differential Conversion WL x Diff. x cell S.A. + _ V ref y y How to make good V ref?
Address Transition Detection A 0 DELAY t d ATD ATD A 1 DELAY t d A N-1 DELAY t d...
Reliability and Yield
Open Bit-line Architecture Cross Coupling EQ WL 1 WL 0 WL D WL D WL 0 WL 1 C W C W C C C C Sense Amplifier C C C C
Folded-Bitline Architecture WL 1 WL 1 WL 0 WL 0 WL D WL D C W C x y... C C C C C C EQ Sense Amplifier C x y C W
Transposed-Bitline Architecture " C cross (a) Straightforward bitline routing. SA " C cross SA (b) Transposed bitline architecture.
Alpha-particles α-particle WL n + SiO 2 1 particle ~ 1 million carriers
Yield Yield curves at different stages of process maturity (from [Veendrick92])
Redundancy Redundant columns Redundant rows Array Row Address Row Decoder : Fuse Bank Column Decoder Column Address
Semiconductor Trends Size as a function of time: x 4 every three years
Semiconductor Trends Increasing die size factor 1.5 per generation Combined with reducing cell size factor 2.6 per generation
Semiconductor Trends Technology feature size for different SRAM generations