ECE-470 Digital Design II Memory Test Motivation Semiconductor memories are about 35% of the entire semiconductor market Memories are the most numerous IPs used in SOC designs Number of bits per chip continues to increase exponentially and fault sensitivity increases; faults become more complex Less charge stored per memory cell, cells are smaller and closer cell coupling faults Traditional tests require long test-times, which increases a lot with the increase of the memory size Test cost per memory chip must not increase significantly 1 2 Memory Cells Per Chip Test Time in Seconds (Memory Size: n Bits) Size n n Number of Test Algorithm Operations n x log 2 n n 3/2 n 2 1 Mb 4 Mb 16 Mb 64 Mb 256 Mb 1 Gb 2 Gb 0.06 0.25 1.01 4.03 16.11 64.43 128.9 1.26 5.54 24.16 104.7 451.0 1932.8 3994.4 64.5 515.4 1.2 hr 9.2 hr 73.3 hr 586.4 hr 1658.6 hr 18.3 hr 293.2 hr 4691.3 hr 75060.0 hr 1200959.9 hr 19215358.4 hr 76861433.7 hr 3 4 Fault Types Permanent: System is broken and stays broken the same way indefinitely Transient: Fault temporarily affects the system behavior, and then the system reverts to the good machine -- time dependency, caused by environmental condition Intermittent: times causes a failure, sometimes does not Permanent faults: Failure Mechanisms Missing/added electrical connection Broken component (IC mask defect or siliconto-metal connection) Burnt-out chip wire Corroded connection between chip & package Chip logic error (Pentium division bug) 5 6 1
Failure Mechanisms (Continued) Transient Faults: Cosmic Ray An particle (ionized Helium atom) Air pollution (causes wire short/open) Humidity (temporary short) Temperature (temporary logic error) Pressure (temporary wire open/short) Vibration (temporary wire open) Power Supply Fluctuation (logic error) Electromagnetic Interference (coupling) Static Electrical Discharge (change state) Ground Loop (misinterpreted logic value) 7 Failure Mechanisms (Continued) Intermittent Faults: Loose Connections Aging Components (changed logic delays) Hazards and Races in critical timing paths (bad design) Resistor, Capacitor, Inductor variances (timing faults) Physical Irregularities (narrow wire -- high resistance) Electrical Noise (memory state changes) 8 Physical Failure Mechanisms Memory Test Levels Corrosion Electromigration Bonding Deterioration -- Au package wires inter-diffuse with Al chip pads Ionic Contamination -- Na + diffuses through package and into FET gate oxide oying -- Al migrates from metal layers into Si substrate Radiation and Cosmic Rays -- 8 MeV, collides with Si lattice, generates n - p pairs, causes soft memory error Chip, Array, & Board 9 10 March Test Notation r -- Read a memory location w -- Write a memory location r0 -- Read a 0 from a memory location r1 -- Read a 1 from a memory location w0 -- Write a 0 to a memory location w1 -- Write a 1 to a memory location -- Write a 1 to a cell containing 0 March Test Notation (Continued) -- Complement the cell contents -- Increasing memory addressing -- Decreasing memory addressing -- Either increasing or decreasing -- Write a 0 to a cell containing 1 11 12 2
More March Test Notation -- Any write operation <... > -- Denotes a particular fault,... <I / F > -- I is the fault sensitizing condition, F is the faulty cell value <I1,..., In-1; In / F> -- Denotes a fault covering n cells I1,..., In-1 are fault sensitization conditions in cells 1 through n-1 for cell n In gives sensitization condition for cell n If In is empty, write In / F as F A MATS+ March Test M0: { March element (w0) } for cell := 0 to n-1 (or any other order) do write 0 to A [cell]; M1: { March element (r0, w1) } for cell := 0 to n-1 do read A [cell]; { Expected value = 0} write 1 to A [cell]; M2: {March element (r1, w0) } for cell := n-1 down to 0 do read A [cell]; { Expected value = 1 } write 0 to A [cell]; 13 14 Part 1: Fault Modeling Behavioral (black-box) Model: State machine modeling all memory content combinations -- Intractable Functional (gray-box) Model: Used Logic Gate Model: Not used Inadequately models transistors & capacitors Electrical Model: Very expensive Geometrical Model: Layout Model Used with Inductive Fault Analysis Functional Model We do not use the logic gate model for memories because memory cells are designed using transistors and capacitors Simplified functional model used when not interested in diagnosis Geometric model implies complete knowledge of the chip layout Testing time will increase 15 16 Functional Model Simplified Functional Model Address Refresh Address latch Column decoder Refresh logic Read/Write logic Row decoder Memory Cell Array Write driver Memory Cell Array Address Decoder Sense amplifiers Data out Data register Data in 17 18 R/W and CE 3
Simplified Functional Model Functional Faults Address Address Decoder Memory Cell Array Read/Write Logic Data 19 Fault TF NPSF A B C D E F G H I J K L M N O P Functional Fault Cell stuck Driver stuck Read/write line stuck Chip-select line stuck Data line stuck Open circuit in data line Short circuit between data lines Crosstalk between data lines Address line stuck Open circuit in address line Shorts between address lines Open circuit in decoder Wrong address access Multiple simultaneous address access Cell can be set to 0 (1) but not to 1 (0) Neighborhood pattern sensitive cell interaction 20 Reduced Functional Faults Stuck-at Faults () Necessary condition for detection: For each cell, must read a 0 and a 1 Notation: < /0> (< /1>) Fault TF NPSF Functional Fault Stuck-at fault Coupling fault Transition fault Neighborhood Pattern Sensitive fault 21 22 Transition Faults (TF) Cell fails to make 0 1 or 1 0 write transition Necessary condition to detect and locate: Each cell must undergo a transition and a transition, and be read after each, before undergoing any further transitions Notation: < /0>, < /1> < /0> transition fault 23 Coupling Faults () Coupling Fault: Transition in bit j causes unwanted change in bit i 2-Coupling Fault: Involves 2 cells, special case of k- Coupling Fault Must restrict k cells to make practical Inversion and Idempotent s: special cases of 2- Coupling Faults Bridging and State Coupling Faults: Involve any # of cells, caused by logic level Dynamic Coupling Fault (dyn): Read or write on j forces i to 0 or 1 24 4
Neighborhood Pattern Sensitive Coupling Faults (NPSF) Content of cell i (or the ability to of cell i to change) is influenced by the contents of all other memory cells (which may be a pattern of 0 s or 1 s or a pattern of transitions) Three types 1. Active/dynamic NPSF (ANPSF) 2. Passive NPSF (PNPSF) 3. Static NPSF (SNPSF) Base-cell: cell under test Neighborhood: Immediate cluster of cells whose pattern makes the base-cell fail Deleted neighborhood: neighborhood without the base cell 1. Active NPSF (APNSF): Type 1 Base cell changes when one deleted neighborhood cell transitions Condition for detection and location: Each base cell must be read in state 0 and state 1, for all possible deleted neighborhood pattern changes C i,j <d 0, d 1, d 3, d 4 ; b> C i,j <0,, 1, 1; 0> and C i,j <0,, 1, 1; > 25 26 More complex 1. Active NPSF: Type 2 Used when diagonal couplings are significant, and do not necessarily cause horizontal/vertical coupling 2. Passive NPSF: PNSPF A certain neighborhood pattern prevents the base cell from changing Condition for detection and location: Each base cell must be written and read in state 0 and in state 1, for all deleted neighborhood pattern permutations / 0 ( /1) -- Base cell fault effect indicating that base cannot change from 0 (or 1) 27 28 3. Static NPSF: SNPSF Static: Base cell forced into a particular state when deleted neighborhood contains a particular pattern Differs from active: need not have a transition to sensitize an SNPSF Condition for detection and location: Each base cell must be read in state 0 and in state 1, for all deleted neighborhood pattern permutations C i,j < 0, 1, 0, 1; - / 0> and C i,j < 0, 1, 0, 1; - / 1> 29 Address Decoder Faults (ADFs) Represents an address decoding error under assumption: Behavior is the same during both read and write Multiple ADFs must be tested for Decoders have CMOS stuck-open faults 30 5
Theorem 9.2 A March test satisfying conditions 1 and 2 below detects all address decoder faults... Means any # of read or write operations Before condition 1, must have wx element x can be 0 or 1, but must be consistent in test Condition March element Part 2: Memory Testing March tests: family of tests called marches Neighborhood tests Inductive Fault Analysis 1 2 (rx,, wx ) (rx,, wx) 31 32 March Tests Preferred method for RAM testing Using external tester or, Through BIST O(n) time complexity Cannot be used for testing NPSFs Algorithm MATS MATS+ MATS++ MARCH X MARCH C- MARCH A MARCH Y MARCH B Irredundant March Tests Description { (w0); (r0, w1); (r1) } { (w0); (r0, w1); (r1, w0) } { (w0); (r0, w1); (r1, w0, r0) } { (w0); (r0, w1); (r1, w0); (r0) } { (w0); (r0, w1); (r1, w0); (r0, w1); (r1, w0); (r0) } { (w0); (r0, w1, w0, w1); (r1, w0, w1); (r1, w0, w1, w0); (r0, w1, w0) } { (w0); (r0, w1, r1); (r1, w0, r0); (r0) } { (w0); (r0, w1, r1, w0, r0, w1); (r1, w0, w1); (r1, w0, w1, w0); (r0, w1, w0) } 33 34 Irredundant March Tests Summary MATS+ Example: Cell (2,1) SA0 Fault Algorithm MATS MATS+ MATS++ MARCH X MARCH C- MARCH A MARCH Y MARCH B TF in id dyn S Linked Faults MATS+: { M0: (w0); M1: (r0, w1); M2: (r1, w0) } 35 36 6
MATS+ Example: Cell (2,1) SA1 Fault MATS+: { M0: (w0); M1: (r0, w1); M2: (r1, w0) } 37 Testing Neighborhood Pattern-Sensitive Faults We assume that read operation is fault free Example: ANPSF - condition for detection and location: Each base cell must be read in state 0 and state 1, for all possible deleted neighborhood patterns Important to minimize the number of writes for short tests: use Hamiltonian sequences Testing neighborhoods simultaneously: tiling or two-group methods 38 Basic Algorithms: Fault Detection and Location 1. write base-cells with 0; 2. loop apply a pattern; { it could change the base-cell from 0 to 1 } read base-cell; // LOCATION end; read base-cells; // DETECTION 3. write base-cells with 1; 4. loop apply a pattern; { it could change the base-cell from 1 to 0 } read base-cell; // LOCATION end; read base-cells; // DETECTION 39 Testing for Layout-related Faults DRAMs may be repaired (consecutive addresses may not be adjacent) and prior testing approaches do not consider that High density memories (GB) may have new kinds of defects not covered by NPSF tests Use inductive fault analysis to model a single defect per memory cell 40 Testing for Layout-related Faults Inductive Fault Analysis: 1 -- Generate defect sizes, location, layers based on fabrication line model 2 -- Place defects on layout model 3 -- Extract defective cell schematic & electrical parameters 4 -- Evaluate cell testing using a Monte Carlo based analysis tool such as VLASIC Summary Multiple fault models are essential Combination of tests is essential: March -- SRAM and DRAM NPSF -- DRAM DC Parametric -- Both AC Parametric -- Both Inductive Fault Analysis is now required 41 42 7