Semiconductor Memories

Similar documents
Digital Integrated Circuits A Design Perspective. Semiconductor. Memories. Memories

Semiconductor Memories

Digital Integrated Circuits A Design Perspective

Semiconductor Memories

GMU, ECE 680 Physical VLSI Design 1

SEMICONDUCTOR MEMORIES

Semiconductor Memories

Semiconductor Memory Classification

Lecture 25. Semiconductor Memories. Issues in Memory

Magnetic core memory (1951) cm 2 ( bit)

Semiconductor memories

Memory Trend. Memory Architectures The Memory Core Periphery

Chapter Overview. Memory Classification. Memory Architectures. The Memory Core. Periphery. Reliability. Memory

Hw 6 and 7 Graded and available Project Phase 2 Graded Project Phase 3 Launch Today

EE241 - Spring 2000 Advanced Digital Integrated Circuits. References

EE141- Fall 2002 Lecture 27. Memory EE141. Announcements. We finished all the labs No homework this week Projects are due next Tuesday 9am EE141

Semiconductor Memories. Jan M. Rabaey Anantha Chandrakasan Borivoje Nikolic Paolo Spirito

ESE 570: Digital Integrated Circuits and VLSI Fundamentals

ESE 570: Digital Integrated Circuits and VLSI Fundamentals

Topics. Dynamic CMOS Sequential Design Memory and Control. John A. Chandy Dept. of Electrical and Computer Engineering University of Connecticut

! Charge Leakage/Charge Sharing. " Domino Logic Design Considerations. ! Logic Comparisons. ! Memory. " Classification. " ROM Memories.

EE141. EE141-Spring 2006 Digital Integrated Circuits. Administrative Stuff. Class Material. Flash Memory. Read-Only Memory Cells MOS OR ROM

CMOS Digital Integrated Circuits Lec 13 Semiconductor Memories

! Memory. " RAM Memory. ! Cell size accounts for most of memory array size. ! 6T SRAM Cell. " Used in most commercial chips

Administrative Stuff

ECE520 VLSI Design. Lecture 23: SRAM & DRAM Memories. Payman Zarkesh-Ha

Semiconductor Memories

University of Toronto. Final Exam

Topics to be Covered. capacitance inductance transmission lines

EE141Microelettronica. CMOS Logic

Moores Law for DRAM. 2x increase in capacity every 18 months 2006: 4GB

MTJ-Based Nonvolatile Logic-in-Memory Architecture and Its Application

Lecture 23. Dealing with Interconnect. Impact of Interconnect Parasitics

EE141-Fall 2011 Digital Integrated Circuits

Thin Film Transistors (TFT)

The Devices. Digital Integrated Circuits A Design Perspective. Jan M. Rabaey Anantha Chandrakasan Borivoje Nikolic. July 30, 2002

ESE570 Spring University of Pennsylvania Department of Electrical and System Engineering Digital Integrated Cicruits AND VLSI Fundamentals

ESE 570: Digital Integrated Circuits and VLSI Fundamentals

CMOS Inverter. Performance Scaling

Memory, Latches, & Registers

Objective and Outline. Acknowledgement. Objective: Power Components. Outline: 1) Acknowledgements. Section 4: Power Components

Name: Answers. Mean: 83, Standard Deviation: 12 Q1 Q2 Q3 Q4 Q5 Q6 Total. ESE370 Fall 2015

Power Dissipation. Where Does Power Go in CMOS?

Digital Integrated Circuits A Design Perspective

EECS 312: Digital Integrated Circuits Final Exam Solutions 23 April 2009

S No. Questions Bloom s Taxonomy Level UNIT-I

F14 Memory Circuits. Lars Ohlsson

L ECE 4211 UConn F. Jain Scaling Laws for NanoFETs Chapter 10 Logic Gate Scaling

Random Access Memory. DRAM & SRAM Design DRAM SRAM MS635. Dynamic Random Access Memory. Static Random Access Memory. Cell Structure. 6 Tr.

C.K. Ken Yang UCLA Courtesy of MAH EE 215B

MOSFET: Introduction

EE 466/586 VLSI Design. Partha Pande School of EECS Washington State University

Flash Memory Cell Compact Modeling Using PSP Model

VLSI GATE LEVEL DESIGN UNIT - III P.VIDYA SAGAR ( ASSOCIATE PROFESSOR) Department of Electronics and Communication Engineering, VBIT

The Wire. Digital Integrated Circuits A Design Perspective. Jan M. Rabaey Anantha Chandrakasan Borivoje Nikolic. July 30, 2002

ELEN0037 Microelectronic IC Design. Prof. Dr. Michael Kraft

COMBINATIONAL LOGIC. Combinational Logic

Lecture 34: Portable Systems Technology Background Professor Randy H. Katz Computer Science 252 Fall 1995

Fig. 1 CMOS Transistor Circuits (a) Inverter Out = NOT In, (b) NOR-gate C = NOT (A or B)

9/18/2008 GMU, ECE 680 Physical VLSI Design

MOS Transistor Theory

Single Event Effects: SRAM

ESE570 Spring University of Pennsylvania Department of Electrical and System Engineering Digital Integrated Cicruits AND VLSI Fundamentals

Lecture 16: Circuit Pitfalls

Spiral 2 7. Capacitance, Delay and Sizing. Mark Redekopp

Lecture 6 Power Zhuo Feng. Z. Feng MTU EE4800 CMOS Digital IC Design & Analysis 2010

The Physical Structure (NMOS)

CMOS Inverter (static view)

WARM SRAM: A Novel Scheme to Reduce Static Leakage Energy in SRAM Arrays

CMPEN 411 VLSI Digital Circuits Spring Lecture 21: Shifters, Decoders, Muxes

CMPEN 411 VLSI Digital Circuits. Lecture 03: MOS Transistor

THE INVERTER. Inverter

Integrated Circuits & Systems

Floating Point Representation and Digital Logic. Lecture 11 CS301

Digital Integrated Circuits A Design Perspective. Jan M. Rabaey Anantha Chandrakasan Borivoje Nikolic. The Devices. July 30, Devices.

EE241 - Spring 2000 Advanced Digital Integrated Circuits. Announcements

Today s lecture. EE141- Spring 2003 Lecture 4. Design Rules CMOS Inverter MOS Transistor Model

Digital Integrated Circuits Lecture 14: CAMs, ROMs, and PLAs

The Inverter. Digital Integrated Circuits A Design Perspective. Jan M. Rabaey Anantha Chandrakasan Borivoje Nikolic

ESE570 Spring University of Pennsylvania Department of Electrical and System Engineering Digital Integrated Cicruits AND VLSI Fundamentals

CS 152 Computer Architecture and Engineering

MOS Transistor I-V Characteristics and Parasitics

CMPEN 411 VLSI Digital Circuits. Lecture 04: CMOS Inverter (static view)

Digital Integrated Circuits A Design Perspective

Digital Integrated Circuits A Design Perspective

MOS Transistor Theory

Chapter 2 CMOS Transistor Theory. Jin-Fu Li Department of Electrical Engineering National Central University Jungli, Taiwan

ECE251. VLSI System Design

Lecture 16: Circuit Pitfalls

EE105 Fall 2014 Microelectronic Devices and Circuits. NMOS Transistor Capacitances: Saturation Region

The Devices. Digital Integrated Circuits A Design Perspective. Jan M. Rabaey Anantha Chandrakasan Borivoje Nikolic. July 30, 2002

Digital Integrated Circuits. The Wire * Fuyuzhuo. *Thanks for Dr.Guoyong.SHI for his slides contributed for the talk. Digital IC.

Lecture 8-1. Low Power Design

Clock signal in digital circuit is responsible for synchronizing the transfer to the data between processing elements.

EECS 427 Lecture 11: Power and Energy Reading: EECS 427 F09 Lecture Reminders

CMPEN 411 VLSI Digital Circuits Spring 2012

EE115C Winter 2017 Digital Electronic Circuits. Lecture 6: Power Consumption

Clock Strategy. VLSI System Design NCKUEE-KJLEE

CARNEGIE MELLON UNIVERSITY DEPARTMENT OF ELECTRICAL AND COMPUTER ENGINEERING DIGITAL INTEGRATED CIRCUITS FALL 2002

Dynamic Combinational Circuits. Dynamic Logic

Transcription:

Digital Integrated Circuits A Design Perspective Jan M. Rabaey Anantha Chandrakasan Borivoje Nikolic Semiconductor Memories December 20, 2002

Chapter Overview Memory Classification Memory Architectures The Memory Core Periphery Reliability Case Studies

Semiconductor Memory Classification Read-Write Memory Non-Volatile Read-Write Memory Read-Only Memory Random Access Non-Random Access EPROM E 2 PROM Mask-Programmed Programmable (PROM) SRAM FIFO FLASH DRAM LIFO Shift Register CAM

Memory Timing: Definitions Read cycle READ Read access Read access Write cycle WRITE Data valid Write access DATA Data written

N w o r d s D e c o d e r Memory Architecture: Decoders M bits M bits S 0 S 1 S 2 Word 0 Word 1 Word 2 Storage cell A 0 A 1 S 0 Word 0 Word 1 Word 2 Storage cell S N 2 Word N 2 A K - 1 Word N 2 S N 1 Word N 1 Word N 1 Input-Output (M bits) K = log 2 N Input-Output (M bits) Intuitive architecture for N x M memory Too many select signals: N words == N select signals Decoder reduces the number of select signals K = log 2 N

Array-Structured Memory Architecture Problem: ASPECT RATIO or HEIGHT >> WIDTH 2 L 2 K Bit line Storage cell A K A K11 A L 21 Row Decoder Word line Sense amplifiers / Drivers M.2 K Amplify swing to rail-to-rail amplitude A 0 A K21 Column decoder Selects appropriate word Input-Output (M bits)

Hierarchical Memory Architecture Block 0 Block i Block P 2 1 Row address Column address Block address Control circuitry Block selector Global amplifier/driver Global data bus Advantages: 1. Shorter wires within blocks 2. Block address activates only 1 block => power savings I/O

Block31 Block30 Subglobalrowdecoder Globalrowdecoder Subglobalrowdecoder Block1 128KArayBlock0 Localrowdecoder Block Diagram of 4 Mbit SRAM Clock generator Z-address buffer X-address buffer Predecoder and block selector Bit line load Transfer gate Column decoder Sense amplifier and write driver CS, WE buffer I/O buffer x1/x4 controller Y-address buffer X-address buffer Digital Integrated Circuits 2nd [Hirose90] Memories

I / O B u f f e r s I / O B u f f e r s C o m m a n d s C o m m a n d s A d d r e s s D e c o d e r A d d r e s s D e c o d e r 9 9 2 V a l i d i t y B i t s 2 V a l i d i t y B i t s P r i o r i t y E n c o d e r P r i o r i t y E n c o d e r Contents-Addressable Memory I/O Buffers Data (64 bits) Commands Comparand Mask Control Logic R/W Address (9 bits) Address Decoder CAM Array 2 9 words 3 64 bits 2 9 Validity Bits Priority Encoder

Read-Write Memories (RAM) STATIC (SRAM) Data stored as long as supply is applied Large (6 transistors/cell) Fast Differential DYNAMIC (DRAM) Periodic refresh required Small (1-3 transistors/cell) Slower Single Ended

6-transistor CMOS SRAM Cell WL V DD M 2 M 4 Q M Q M 5 6 M 1 M 3 BL BL

CMOS SRAM Analysis (Read) WL BL V DD M 4 Q = 0 Q = 1 M 6 M 5 BL V DD M 1 V DD V DD C bit C bit

V o l t a g e r i s e [ V ] CMOS SRAM Analysis (Read) 1.2 1 Voltage Rise (V) 0.8 0.6 0.4 0.2 0 0 0.5 11.2 1.5 2 Cell Ratio (CR) 2.5 3

CMOS SRAM Analysis (Write) V DD WL M 4 Q = 0 M 6 M 5 Q = 1 M 1 V DD BL = 1 BL = 0

CMOS SRAM Analysis (Write)

6T-SRAM Layout V DD M2 M4 Q Q M1 M3 M5 M6 GND WL BL BL

Resistance-load SRAM Cell V DD WL R L R L M 3 Q Q M 4 BL M 1 M 2 BL Static power dissipation -- Want R L large Bit lines precharged to V DD to address t p problem

SRAM Characteristics

Sense Amplifiers t p = C V ---------------- I av make V as small as possible large small Idea: Use Sense Amplifer small transition s.a. input output

Differential Sense Amplifier V DD M 3 M 4 y Out bit M 1 M 2 bit SE M 5 Directly applicable to SRAMs

Differential Sensing SRAM V DD PC V DD BL EQ BL y M 3 V DD M 4 V DD 2 y WL i x M 1 M 2 2 x x 2 x SE M 5 SE SRAM cell i SE x Diff. Sense 2 x Amp V DD y Output Output (a) SRAM sensing scheme SE (b) two stage differential amplifier

Address Transition Detection V DD A 0 DELAY t d ATD ATD A 1 DELAY t d A N 2 1 DELAY t d

4 Mbit SRAM Hierarchical Word-line Architecture Global word line Sub-global word line Local word line Local word line Block group select Memory cell Block select Block 0 Block 1 Block select Block 2...

Bit-line Circuitry Bit-line load Block select ATD BEQ Local WL Memory cell B/T B/T CD I/O line CD Sense amplifier CD I/O I/O

Sense Amplifier (and Waveforms) Address I/O I/O SEQ BS Block select ATD ATD BEQ SA BS SA SEQ Vdd I/O Lines GND SEQ SEQ DATA SEQ De i SEQ Vdd SA, SA GND DATA BS Data-cut

( A ) Data Retention in SRAM 1.30u I leakage 1.10u 900n 700n 500n Factor 7 0.13 m CMOS 300n 100n 0.18 m CMOS m 0.00.600 1.20 1.80 V DD SRAM leakage increases with technology scaling

Suppressing Leakage in SRAM sleep V DD low-threshold transistor V DD V DDL V DD,int sleep V DD,int SRAM cell SRAM cell SRAM cell SRAM cell SRAM cell SRAM cell sleep V SS,int Inserting Extra Resistance Reducing the supply voltage

3-Transistor DRAM Cell BL1 BL2 WWL RWL WWL M 3 RWL M 1 X M 2 X V DD 2 V T C S BL 1 V DD BL 2 V DD 2 V T DV No constraints on device ratios Reads are non-destructive Value stored at node X when writing a 1 = V WWL -V Tn

3T-DRAM Layout BL2 BL1 GND RWL M3 M2 WWL M1

1-Transistor DRAM Cell WL BL WL Write 1 Read 1 M 1 X GND V DD 2 V T C S BL V DD V DD /2 V sensing DD /2 C BL Write: C S is charged or discharged by asserting WL and BL. Read: Charge redistribution takes places between bit line and storage capacitance V = VBL V PRE = V BIT V PRE C S ------------ C S + C BL Voltage swing is small; typically around 250 mv.

DRAM Cell Observations 1T DRAM requires a sense amplifier for each bit line, due to charge redistribution read-out. DRAM memory cells are single ended in contrast to SRAM cells. The read-out of the 1T DRAM cell is destructive; read and refresh operations are necessary for correct operation. Unlike 3T cell, 1T cell requires presence of an extra capacitance that must be explicitly included in the design. When writing a 1 into a DRAM cell, a threshold voltage is lost. This charge loss can be circumvented by bootstrapping the word lines to a higher value than V DD

1-T T DRAM Cell Capacitor Metal word line Poly n + n + Inversion layer Poly induced by plate bias Cross-section SiO 2 Field Oxide Diffused bit line Polysilicon gate Layout Polysilicon plate M 1 word line Uses Polysilicon-Diffusion Capacitance Expensive in Area

SEM of poly-diffusion capacitor 1T-DRAM

Advanced 1T DRAM Cells Word line Insulating Layer Cell plate Capacitor dielectric layer Cell Plate Si Capacitor Insulator Refilling Poly Transfer gate Isolation Storage electrode Storage Node Poly 2nd Field Oxide Si Substrate Trench Cell Stacked-capacitor Cell

Open Bit-line Architecture Cross Coupling EQ BL WL 1 WL 0 C WBL WL D WL D WL 0 WL 1 C WBL BL C BL C C C Sense Amplifier C C C C BL

Folded-Bitline Architecture WL 1 WL 1 WL 0 WL 0 WL D C WBL WL D BL C BL x y C C C C C C EQ Sense Amplifier BL C BL x y C WBL

Transposed-Bitline Architecture BL 9 BL BL BL 99 C cross (a) Straightforward bit-line routing SA BL 9 BL BL BL 99 C cross SA (b) Transposed bit-line architecture

Latch-Based Sense Amplifier (DRAM) EQ BL BL V DD SE SE Initialized in its meta-stable point with EQ Once adequate voltage gap created, sense amp enabled with SE Positive feedback quickly forces output to a stable operating point.

Open bitline architecture with dummy cells EQ L L 1 L 0 V DD R 0 SE R 1 L BLL BLR C S C S C S SE C S C S C S Dummy cell Dummy cell

Sense Amp Operation V BL V(1) V PRE DV(1) Sense amp activated Word line activated V(0) t

V V V DRAM Read Process with Dummy Cell 3 3 2 2 BL BL 1 BL 1 BL 0 0 1 2 3 0 0 1 2 3 t (ns) t (ns) reading 0 reading 1 3 EQ WL 2 SE 1 0 0 1 2 3 t (ns) control signals

Charge-Redistribution Amplifier V ref V L M 1 V S M 2 M 3 C large C small Concept 2.5 2.0 1.5 Transient Response V in V S V 1.0 V L 0.5 0.0 0.0 1.00 2.00 time (nsec) V ref 5 3V 3.00

Charge-Redistribution Amplifier EPROM V DD SE M 4 Load Out V casc M 3 C out Cascode device WLC M 2 C col Column decoder BL WL M 1 C BL EPROM array

Single-to to-differential Conversion BL Cell WL x Diff. S.A. 2 x 1 2 V ref Output How to make a good V ref?

Static CAM Memory Cell Bit Word CAM Bit Bit CAM Bit Bit M4 M8 M6 M9 M7 M5 Bit Word CAM CAM Word Match S M3 int M2 S M1 Wired-NOR Match Line

A d d r e s s D e c o d e r H i t L o g i c CAM in Cache Memory CAM ARRAY SRAM ARRAY Input Drivers Sense Amps / Input Drivers Address Tag Hit R/W Data

Periphery Decoders Sense Amplifiers Input/Output Buffers Control / Timing Circuitry

Row Decoders Collection of 2 M complex logic gates Organized in regular and dense fashion (N)AND Decoder NOR Decoder

Hierarchical Decoders Multi-stage implementation improves performance WL 1 WL 0 A 0 A 1 A 0 A 1 A 0 A 1 A 0 A 1 A 2 A 3 A 2 A 3 A 2 A 3 A 2 A 3 A 1 A 0 A 0 A 1 A 3 A 2 A 2 A 3 NAND decoder using 2-input pre-decoders

Dynamic Decoders Precharge devices GND GND V DD WL 3 V DD WL 3 WL 2 V DD WL 2 WL 1 WL 0 V DD WL 1 WL 0 V DD φ A 0 A 0 A 1 A 1 A 0 A 0 A 1 A 1 φ 2-input NOR decoder 2-input NAND decoder

2 - i n p u t N O R d e c o d e r 4-input pass-transistor based column decoder BL 0 BL 1 BL 2 BL 3 A 0 S 0 S 1 S 2 A 1 S 3 Advantages: speed (t pd does not add to overall memory access time) Only one extra transistor in signal path Disadvantage: Large transistor count D

4-to-11 tree based column decoder BL 0 BL 1 BL 2 BL 3 A 0 A 0 A 1 A 1 Number of devices drastically reduced Delay increases quadratically with # of sections; prohibitive for large decoders Solutions: buffers progressive sizing combination of tree and pass transistor approaches D

Decoder for circular shift-register V DD V DD V DD V DD V DD V DD WL 0 R f f f f R WL 1 f f f f R WL 2 f f f f V DD

DRAM Bank Architecture

DRAM Timing

Synchronous DRAM (SDRAM) timing

Conventional DRAM modules

Direct RDRAM Architecture

m u x / d e m u x n e t w o r k RDRAM Architecture Bus Clocks Data bus k k3 l memory array Column Row demux demux packet dec. packet dec.

RAMBUS Microarchitecture

RDRAM Read Cycle

RDRAM Interleaved Memory transactions

Voltage Regulator V DD V REF M drive V DL Equivalent Model V bias V REF - + M drive V DL

Charge Pump V DD 2V DD 2 V T CLK A M1 B V B V DD 2 V T 0 V C pump M2 V load C load V load 0 V

Reliability and Yield

V, V, C, Q, C s m a x D S S D D Sensing Parameters in DRAM 1000 C D(1F) V smax (mv) 100 C S(1F) Q S(1C) 10 V DD (V) Q S 5 C S V DD /2 V smax 5 Q S /(C S 1 C D ) 4K 64K 1M 16M 256M 4G 64G Memory Capacity (bits /chip) From [Itoh01]

Noise Sources in 1T DRam BL substrate Adjacent BL C WBL a-particles WL leakage C S electrode C cross

Alpha-particles particles (or Neutrons) a-particle WL V DD BL n 1 1 1 2 2 2 1 2 1 2 1 2 1 SiO 2 1 Particle ~ 1 Million Carriers

Yield Yield curves at different stages of process maturity (from [Veendrick92])

R o w D e c o d e r Redundancy Redundant columns Redundant rows Memory Array Row Address Fuse : Bank Column Decoder Column Address

Error-Correcting Codes Example: Hamming Codes with e.g. B3 Wrong 1 1 = 3 0

Redundancy and Error Correction

Sources of Power Dissipation in Memories V DD CHIP I DD 5SC i DV i f1s I DCP C PT V INT f nc DE V INT f selected m mi act I DCP n ROW DEC non-selected ARRAY m(n 2 1)i hld PERIPHERY mc DE V INT f COLUMN DEC V SS Digital Integrated Circuits 2nd From [Itoh00] Memories

Data Retention in DRAM 10 1 10 0 10 21 I ACT Current (A) 10 22 10 23 10 24 10 25 I DC I AC Cycle time : 150 ns T 5 75 C,S 5 97 mv/dec. 10 26 15M 64M 255M 1G 4G 15G 64G Capacity (bit) 3.3 2.5 2.0 1.5 1.2 1.0 0.8 Operating voltage (V) 0.53 0.40 0.32 0.24 0.19 0.16 0.13 Extrapolated threshold voltage at 25 C (V) Digital Integrated Circuits 2nd From [Itoh00] Memories

Case Studies Programmable Logic Array SRAM Flash Memory

PLA versus ROM Programmable Logic Array structured approach to random logic two level logic implementation NOR-NOR (product of sums) NAND-NAND (sum of products) IDENTICAL TO ROM! Main difference ROM: fully populated PLA: one element per minterm Note: Importance of PLA s has drastically reduced 1. slow 2. better software techniques (mutli-level logic synthesis) But

Programmable Logic Array Pseudo-NMOS PLA GND GND GND GND V DD GND GND GND V DD X 0 X 0 X 1 X 1 X 2 X 2 f 0 f 1 AND-plane OR-plane

Dynamic PLA f AND GND V DD f OR V DD f AND X 0 X 0 X 1 X 1 X 2 X 2 f 0 f 1 GND f OR AND-plane OR-plane

Clock Signal Generation for self-timed dynamic PLA f f Dummy AND row f AND f AND t pre t eval f AND Dummy AND row f OR f OR (a) Clock signals (b) Timing generation circuitry

PLA Layout V DD And-Plane Or-Plane φ GND x 0 x 0 x 1 x 1 x 2 x 2 Pull-up devices f 0 f 1 Pull-up devices

1 Gbit Flash Memory 512Mb Memory Array BL0 BL1 BL16895 512Mb Memory Array BL16996 BL16897 BL33791 Word Line Driver SGD WL31 WL0 SGS Block0 Block1023 BLT0 BLT1 Sense Latches (10241 32) 3 8 Word Line Driver Word Line Driver Block0 Block1023 Bit Line Control Circuit Sense Latches (10241 32) 3 8 Word Line Driver Data Caches (10241 32) 3 8 I/O I/O Data Caches (10241 32) 3 8 Digital Integrated Circuits 2nd From [Nakamura02] Memories

N u m b e r o f c e l l s R e a d l e v e l ( 4. 5 V ) Writing Flash Memory Verify level 5 0.8 V Word-line level 5 4.5 V Number of memory cells 0V 1V 2V 3V 4V Vt of memory cells Result of 4 times program 10 8 10 6 10 4 10 2 10 0 0V 1V 2V 3V Vt of memory cells 4V Evolution of thresholds Final Distribution Digital Integrated Circuits 2nd From [Nakamura02] Memories

125mm 2 1Gbit NAND Flash Memory 10.7mm Charge pump 2kB Page buffer & cache 32 word lines x 1024 blocks 16896 bit lines 11.7mm Digital Integrated Circuits 2nd From [Nakamura02] Memories

125mm 2 1Gbit NAND Flash Memory Technology 0.13µm p-sub CMOS triple-well 1poly, 1polycide, 1W, 2Al Cell size 0.077µm2 Chip size 125.2mm2 Organization 2112 x 8b 8b x 64 64 page x 1k 1k block Power supply 2.7V-3.6V Cycle time 50ns Read time 25µs Program time 200µs // page Erase time 2ms // block Digital Integrated Circuits 2nd From [Nakamura02] Memories

Semiconductor Memory Trends (up to the 90 s) Memory Size as a function of time: x 4 every three years

Semiconductor Memory Trends (updated) Digital Integrated Circuits 2nd From [Itoh01] Memories

Trends in Memory Cell Area Digital Integrated Circuits 2nd From [Itoh01] Memories

Semiconductor Memory Trends Technology feature size for different SRAM generations

Read-Only Memory Cells BL BL BL 1 WL WL V DD WL BL BL BL 0 WL WL WL GND Diode ROM MOS ROM 1 MOS ROM 2

MOS OR ROM BL[0] BL[1] BL[2] BL[3] WL[0] V DD WL[1] WL[2] V DD WL[3] V bias Pull-down loads

MOS NOR ROM V DD Pull-up devices WL[0] WL [1] GND WL [2] GND WL [3] BL [0] BL [1] BL [2] BL [3]

MOS NOR ROM Layout Cell (9.5λ x 7λ) Programmming using the Active Layer Only Polysilicon Metal1 Diffusion Metal1 on Diffusion

MOS NOR ROM Layout Cell (11λ x 7λ) Programmming using the Contact Layer Only Polysilicon Metal1 Diffusion Metal1 on Diffusion

MOS NAND ROM V DD Pull-up devices BL[0] BL[1] BL[2] BL[3] WL[0] WL[1] WL[2] WL[3] All word lines high by default with exception of selected row

MOS NAND ROM Layout Cell (8λ x 7λ) Programmming using the Metal-1 Layer Only No contact to VDD or GND necessary; drastically reduced cell size Loss in performance compared to NOR ROM Polysilicon Diffusion Metal1 on Diffusion

NAND ROM Layout Cell (5λ x 6λ) Programmming using Implants Only Polysilicon Threshold-altering implant Metal1 on Diffusion

Equivalent Transient Model for MOS NOR ROM Model for NOR ROM V DD WL r word C bit BL c word Word line parasitics Wire capacitance and gate capacitance Wire resistance (polysilicon) Bit line parasitics Resistance not dominant (metal) Drain and Gate-Drain capacitance

Equivalent Transient Model for MOS NAND ROM Model for NAND ROM V DD BL r bit C L WL r word c bit c word Word line parasitics Similar to NOR ROM Bit line parasitics Resistance of cascaded transistors dominates Drain/Source and complete gate capacitance

Decreasing Word Line Delay WL Driver Polysilicon word line Metal word line (a) Driving the word line from both sides Metal bypass WL K cells (b) Using a metal bypass Polysilicon word line (c) Use silicides

Precharged MOS NOR ROM f pre V DD Precharge devices WL [0] WL [1] GND WL [2] GND WL [3] BL [0] BL [1] BL [2] BL [3] PMOS precharge device can be made as large as necessary, but clock driver becomes harder to design.

Non-Volatile Memories The Floating-gate gate transistor (FAMOS) Source Floating gate Gate Drain D t ox G n + Substrate p t ox n +_ S Device cross-section Schematic symbol

Floating-Gate Transistor Programming 20 V 0 V 5 V 10 V 5 V 20 V -5 V 0 V -2.5 V 5 V S D S D S D Hot-carrier injection Removing programming voltage leaves charge trapped Programming results in higher V T.

A Programmable-Threshold Transistor I D 0 -state 1 -state ON DV T OFF V WL V GS

FLOTOX EEPROM Floating gate Source Gate Drain I 20 30 nm n 1 Substrate p n 1 10 nm -10 V 10 V V GD FLOTOX transistor Fowler-Nordheim I-V characteristic

EEPROM Cell BL WL V DD Absolute threshold control is hard Unprogrammed transistor might be depletion 2 transistor cell

Flash EEPROM Control gate Floating gate erasure n 1 source programming p-substrate Thin tunneling oxide n 1 drain Many other options

Cross-sections sections of NVM cells Flash EPROM Digital Integrated Circuits 2nd Courtesy Intel Memories

Basic Operations in a NOR Flash Memory Erase cell array BL 0 BL 1 12 V G 0 V WL 0 S D 12 V 0 V WL 1 open open

Basic Operations in a NOR Flash Memory Write 12 V BL 0 BL 1 G 6 V 12 V WL 0 S D 0 V 0 V WL 1 6 V 0 V

Basic Operations in a NOR Flash Memory Read 5 V G 1 V 5 V BL 0 BL 1 WL 0 S D 0 V 0 V WL 1 1 V 0 V

NAND Flash Memory Word line(poly) Unit Cell Gate ONO Gate Oxide FG Source line (Diff. Layer) Digital Integrated Circuits 2nd Courtesy Toshiba Memories

NAND Flash Memory Select transistor Word lines Active area STI Bit line contact Source line contact Digital Integrated Circuits 2nd Courtesy Toshiba Memories

Characteristics of State-of of-the-art NVM

Read-Write Memories (RAM) STATIC (SRAM) Data stored as long as supply is applied Large (6 transistors/cell) Fast Differential DYNAMIC (DRAM) Periodic refresh required Small (1-3 transistors/cell) Slower Single Ended

6-transistor CMOS SRAM Cell WL V DD M 2 M 4 Q M Q M 5 6 M 1 M 3 BL BL

CMOS SRAM Analysis (Read) WL BL V DD M 4 Q = 0 Q = 1 M 6 M 5 BL V DD M 1 V DD V DD C bit C bit

V o l t a g e r i s e [ V ] CMOS SRAM Analysis (Read) 1.2 1 Voltage Rise (V) 0.8 0.6 0.4 0.2 0 0 0.5 11.2 1.5 2 Cell Ratio (CR) 2.5 3

CMOS SRAM Analysis (Write) V DD WL M 4 Q = 0 M 6 M 5 Q = 1 M 1 V DD BL = 1 BL = 0

CMOS SRAM Analysis (Write)

6T-SRAM Layout V DD M2 M4 Q Q M1 M3 M5 M6 GND WL BL BL

Resistance-load SRAM Cell V DD WL R L R L M 3 Q Q M 4 BL M 1 M 2 BL Static power dissipation -- Want R L large Bit lines precharged to V DD to address t p problem

SRAM Characteristics

3-Transistor DRAM Cell BL1 BL2 WWL RWL WWL M 3 RWL M 1 X M 2 X V DD 2 V T C S BL 1 V DD BL 2 V DD 2 V T DV No constraints on device ratios Reads are non-destructive Value stored at node X when writing a 1 = V WWL -V Tn

3T-DRAM Layout BL2 BL1 GND RWL M3 M2 WWL M1

1-Transistor DRAM Cell WL BL WL Write 1 Read 1 M 1 X GND V DD 2 V T C S BL V DD V DD /2 V sensing DD /2 C BL Write: C S is charged or discharged by asserting WL and BL. Read: Charge redistribution takes places between bit line and storage capacitance V = VBL V PRE = V BIT V PRE C S ------------ C S + C BL Voltage swing is small; typically around 250 mv.

DRAM Cell Observations 1T DRAM requires a sense amplifier for each bit line, due to charge redistribution read-out. DRAM memory cells are single ended in contrast to SRAM cells. The read-out of the 1T DRAM cell is destructive; read and refresh operations are necessary for correct operation. Unlike 3T cell, 1T cell requires presence of an extra capacitance that must be explicitly included in the design. When writing a 1 into a DRAM cell, a threshold voltage is lost. This charge loss can be circumvented by bootstrapping the word lines to a higher value than V DD

Sense Amp Operation V BL V(1) V PRE DV(1) Sense amp activated Word line activated V(0) t

1-T T DRAM Cell Capacitor Metal word line Poly n + n + Inversion layer Poly induced by plate bias Cross-section SiO 2 Field Oxide Diffused bit line Polysilicon gate Layout Polysilicon plate M 1 word line Uses Polysilicon-Diffusion Capacitance Expensive in Area

SEM of poly-diffusion capacitor 1T-DRAM

Advanced 1T DRAM Cells Word line Insulating Layer Cell plate Capacitor dielectric layer Cell Plate Si Capacitor Insulator Refilling Poly Transfer gate Isolation Storage electrode Storage Node Poly 2nd Field Oxide Si Substrate Trench Cell Stacked-capacitor Cell

Static CAM Memory Cell Bit Word CAM Bit Bit CAM Bit Bit M4 M8 M6 M9 M7 M5 Bit Word CAM CAM Word Match S M3 int M2 S M1 Wired-NOR Match Line

A d d r e s s D e c o d e r H i t L o g i c CAM in Cache Memory CAM ARRAY SRAM ARRAY Input Drivers Sense Amps / Input Drivers Address Tag Hit R/W Data

Periphery Decoders Sense Amplifiers Input/Output Buffers Control / Timing Circuitry

Row Decoders Collection of 2 M complex logic gates Organized in regular and dense fashion (N)AND Decoder NOR Decoder

Hierarchical Decoders Multi-stage implementation improves performance WL 1 WL 0 A 0 A 1 A 0 A 1 A 0 A 1 A 0 A 1 A 2 A 3 A 2 A 3 A 2 A 3 A 2 A 3 A 1 A 0 A 0 A 1 A 3 A 2 A 2 A 3 NAND decoder using 2-input pre-decoders

Dynamic Decoders Precharge devices GND GND V DD WL 3 V DD WL 3 WL 2 V DD WL 2 WL 1 WL 0 V DD WL 1 WL 0 V DD φ A 0 A 0 A 1 A 1 A 0 A 0 A 1 A 1 φ 2-input NOR decoder 2-input NAND decoder

2 - i n p u t N O R d e c o d e r 4-input pass-transistor based column decoder BL 0 BL 1 BL 2 BL 3 A 0 S 0 S 1 S 2 A 1 S 3 Advantages: speed (t pd does not add to overall memory access time) Only one extra transistor in signal path Disadvantage: Large transistor count D

4-to-11 tree based column decoder BL 0 BL 1 BL 2 BL 3 A 0 A 0 A 1 A 1 Number of devices drastically reduced Delay increases quadratically with # of sections; prohibitive for large decoders Solutions: buffers progressive sizing combination of tree and pass transistor approaches D

Decoder for circular shift-register V DD V DD V DD V DD V DD V DD WL 0 R f f f f R WL 1 f f f f R WL 2 f f f f V DD

Sense Amplifiers t p = C V ---------------- I av make V as small as possible large small Idea: Use Sense Amplifer small transition s.a. input output

Differential Sense Amplifier V DD M 3 M 4 y Out bit M 1 M 2 bit SE M 5 Directly applicable to SRAMs

Differential Sensing SRAM V DD PC V DD BL EQ BL y M 3 V DD M 4 V DD 2 y WL i x M 1 M 2 2 x x 2 x SE M 5 SE SRAM cell i SE x Diff. Sense 2 x Amp V DD y Output Output (a) SRAM sensing scheme SE (b) two stage differential amplifier

Latch-Based Sense Amplifier (DRAM) EQ BL BL V DD SE SE Initialized in its meta-stable point with EQ Once adequate voltage gap created, sense amp enabled with SE Positive feedback quickly forces output to a stable operating point.

Charge-Redistribution Amplifier V ref V L M 1 V S M 2 M 3 C large C small Concept 2.5 2.0 1.5 Transient Response V in V S V 1.0 V L 0.5 0.0 0.0 1.00 2.00 time (nsec) V ref 5 3V 3.00

Charge-Redistribution Amplifier EPROM V DD SE M 4 Load Out V casc M 3 C out Cascode device WLC M 2 C col Column decoder BL WL M 1 C BL EPROM array

Single-to to-differential Conversion BL Cell WL x Diff. S.A. 2 x 1 2 V ref Output How to make a good V ref?

Open bitline architecture with dummy cells EQ L L 1 L 0 V DD R 0 SE R 1 L BLL BLR C S C S C S SE C S C S C S Dummy cell Dummy cell

V V V DRAM Read Process with Dummy Cell 3 3 2 2 BL BL 1 BL 1 BL 0 0 1 2 3 0 0 1 2 3 t (ns) t (ns) reading 0 reading 1 3 EQ WL 2 SE 1 0 0 1 2 3 t (ns) control signals

Voltage Regulator V DD V REF M drive V DL Equivalent Model V bias V REF - + M drive V DL

Charge Pump V DD 2V DD 2 V T CLK A M1 B V B V DD 2 V T 0 V C pump M2 V load C load V load 0 V

DRAM Timing

m u x / d e m u x n e t w o r k RDRAM Architecture Bus Clocks Data bus k k3 l memory array Column Row demux demux packet dec. packet dec.

Address Transition Detection V DD A 0 DELAY t d ATD ATD A 1 DELAY t d A N 2 1 DELAY t d

Reliability and Yield

V, V, C, Q, C s m a x D S S D D Sensing Parameters in DRAM 1000 C D(1F) V smax (mv) 100 C S(1F) Q S(1C) 10 V DD (V) Q S 5 C S V DD /2 V smax 5 Q S /(C S 1 C D ) 4K 64K 1M 16M 256M 4G 64G Memory Capacity (bits /chip) From [Itoh01]

Noise Sources in 1T DRam BL substrate Adjacent BL C WBL a-particles WL leakage C S electrode C cross

Open Bit-line Architecture Cross Coupling EQ BL WL 1 WL 0 C WBL WL D WL D WL 0 WL 1 C WBL BL C BL C C C Sense Amplifier C C C C BL

Folded-Bitline Architecture WL 1 WL 1 WL 0 WL 0 WL D C WBL WL D BL C BL x y C C C C C C EQ Sense Amplifier BL C BL x y C WBL

Transposed-Bitline Architecture BL 9 BL BL BL 99 C cross (a) Straightforward bit-line routing SA BL 9 BL BL BL 99 C cross SA (b) Transposed bit-line architecture

Alpha-particles particles (or Neutrons) a-particle WL V DD BL n 1 1 1 2 2 2 1 2 1 2 1 2 1 SiO 2 1 Particle ~ 1 Million Carriers

Yield Yield curves at different stages of process maturity (from [Veendrick92])

R o w D e c o d e r Redundancy Redundant columns Redundant rows Memory Array Row Address Fuse : Bank Column Decoder Column Address

Error-Correcting Codes Example: Hamming Codes with e.g. B3 Wrong 1 1 = 3 0

Redundancy and Error Correction

Sources of Power Dissipation in Memories V DD CHIP I DD 5SC i DV i f1s I DCP C PT V INT f nc DE V INT f selected m mi act I DCP n ROW DEC non-selected ARRAY m(n 2 1)i hld PERIPHERY mc DE V INT f COLUMN DEC V SS Digital Integrated Circuits 2nd From [Itoh00] Memories

( A ) Data Retention in SRAM 1.30u I leakage 1.10u 900n 700n 500n Factor 7 0.13 m CMOS 300n 100n 0.18 m CMOS m 0.00.600 1.20 1.80 V DD SRAM leakage increases with technology scaling

Suppressing Leakage in SRAM sleep V DD low-threshold transistor V DD V DDL V DD,int sleep V DD,int SRAM cell SRAM cell SRAM cell SRAM cell SRAM cell SRAM cell sleep V SS,int Inserting Extra Resistance Reducing the supply voltage

Data Retention in DRAM 10 1 10 0 10 21 I ACT Current (A) 10 22 10 23 10 24 10 25 I DC I AC Cycle time : 150 ns T 5 75 C,S 5 97 mv/dec. 10 26 15M 64M 255M 1G 4G 15G 64G Capacity (bit) 3.3 2.5 2.0 1.5 1.2 1.0 0.8 Operating voltage (V) 0.53 0.40 0.32 0.24 0.19 0.16 0.13 Extrapolated threshold voltage at 25 C (V) Digital Integrated Circuits 2nd From [Itoh00] Memories

Case Studies Programmable Logic Array SRAM Flash Memory

PLA versus ROM Programmable Logic Array structured approach to random logic two level logic implementation NOR-NOR (product of sums) NAND-NAND (sum of products) IDENTICAL TO ROM! Main difference ROM: fully populated PLA: one element per minterm Note: Importance of PLA s has drastically reduced 1. slow 2. better software techniques (mutli-level logic synthesis) But

Programmable Logic Array Pseudo-NMOS PLA GND GND GND GND V DD GND GND GND V DD X 0 X 0 X 1 X 1 X 2 X 2 f 0 f 1 AND-plane OR-plane

Dynamic PLA f AND GND V DD f OR V DD f AND X 0 X 0 X 1 X 1 X 2 X 2 f 0 f 1 GND f OR AND-plane OR-plane

Clock Signal Generation for self-timed dynamic PLA f f Dummy AND row f AND f AND t pre t eval f AND Dummy AND row f OR f OR (a) Clock signals (b) Timing generation circuitry

PLA Layout V DD And-Plane Or-Plane φ GND x 0 x 0 x 1 x 1 x 2 x 2 Pull-up devices f 0 f 1 Pull-up devices

4 Mbit SRAM Hierarchical Word-line Architecture Global word line Sub-global word line Local word line Local word line Block group select Memory cell Block select Block 0 Block 1 Block select Block 2...

Bit-line Circuitry Bit-line load Block select ATD BEQ Local WL Memory cell B/T B/T CD I/O line CD Sense amplifier CD I/O I/O

Sense Amplifier (and Waveforms) Address I/O I/O SEQ BS Block select ATD ATD BEQ SA BS SA SEQ Vdd I/O Lines GND SEQ SEQ DATA SEQ De i SEQ Vdd SA, SA GND DATA BS Data-cut

1 Gbit Flash Memory 512Mb Memory Array BL0 BL1 BL16895 512Mb Memory Array BL16996 BL16897 BL33791 Word Line Driver SGD WL31 WL0 SGS Block0 Block1023 BLT0 BLT1 Sense Latches (10241 32) 3 8 Word Line Driver Word Line Driver Block0 Block1023 Bit Line Control Circuit Sense Latches (10241 32) 3 8 Word Line Driver Data Caches (10241 32) 3 8 I/O I/O Data Caches (10241 32) 3 8 Digital Integrated Circuits 2nd From [Nakamura02] Memories

N u m b e r o f c e l l s R e a d l e v e l ( 4. 5 V ) Writing Flash Memory Verify level 5 0.8 V Word-line level 5 4.5 V Number of memory cells 0V 1V 2V 3V 4V Vt of memory cells Result of 4 times program 10 8 10 6 10 4 10 2 10 0 0V 1V 2V 3V Vt of memory cells 4V Evolution of thresholds Final Distribution Digital Integrated Circuits 2nd From [Nakamura02] Memories

125mm 2 1Gbit NAND Flash Memory 10.7mm Charge pump 2kB Page buffer & cache 32 word lines x 1024 blocks 16896 bit lines 11.7mm Digital Integrated Circuits 2nd From [Nakamura02] Memories

125mm 2 1Gbit NAND Flash Memory Technology 0.13µm p-sub CMOS triple-well 1poly, 1polycide, 1W, 2Al Cell size 0.077µm2 Chip size 125.2mm2 Organization 2112 x 8b 8b x 64 64 page x 1k 1k block Power supply 2.7V-3.6V Cycle time 50ns Read time 25µs Program time 200µs // page Erase time 2ms // block Digital Integrated Circuits 2nd From [Nakamura02] Memories

Semiconductor Memory Trends (up to the 90 s) Memory Size as a function of time: x 4 every three years

Semiconductor Memory Trends (updated) Digital Integrated Circuits 2nd From [Itoh01] Memories

Trends in Memory Cell Area Digital Integrated Circuits 2nd From [Itoh01] Memories

Semiconductor Memory Trends Technology feature size for different SRAM generations

Magnetic Memories (MRAM) Magnetic Moment vs Magnetic Field

Magnetoresistance vs Magnetic Field

Spin-Valve Memory cell

Switching threshold locus of the spin valve (Asteroid Curve)

Magnetic Tunnel Junction (MTJ) Memory Cell

Cell array organization of an MTJ memory cell

Ferroelectric Memories: Polarization vs Electric Field

Ferroelectric Memories: 2T2C cell

Ferroelectric memories: 1T1C cell

Phase Change Memory (PCM)

Phase Change Memory (PCM)