Outline - BIST. Why BIST? Memory BIST Logic BIST pattern generator & response analyzer Scan-based BIST architecture. K.T. Tim Cheng 08_bist, v1.

1 Outline - BIST Why BIST? Memory BIST Logic BIST pattern generator & response analyzer Scan-based BIST architecture 2

Why Built-In Self Test? TYPES On-Line Self-Test (Concurrent Checking) Functional Self-Test (system (micro)code-based) Structural Self-Test (pseudo-random/-exhaustive, deterministic) Regular Structure BIST, Logic BIST,... OBJECTIVES OF STRUCTURAL SELF-TEST Reduction in IC/Module Manufacturing Cost (memory/thruput) Need for Autonomous Test at Board/Module/System Levels Diagnostics Burn-In 3 Built-In Self Test Stimulus Source Stored Program (1s & 0s, algorithm) Pseudo-Random (LFSR) Circuit Under Test Response Capture & Compare Stored Program Data Compaction (CRC Signatures) Pass/Fail Compare Self-Test Controller Clocks Mode Control... including fencing Initialization Response Sampling Defect-Free Response 4

Memory BIST. (Shared) RAMBIST CONTROLLER Addr COLLAR 1149.1 TAP Parameters Controller: global/local Algorithm Parallel Testing (multiples) Diagnostics Load/Unload protocol DataIn Wen Clk DataOut RAM 5 Memory BIST Collar # Ports = # Independent Addresses Each Port is Read Only, Write Only or Read-Write MultiPort Write Contention Resolved by Circuit Design or Address Sequencing Stepper Logic LOCAL RAMBIST SEQUENCER Pattern Logic Addr DataOut DataIn Compare P/F OR Wen OR Clk 6

Ex: RAMBIST Algorithm (6N) 6N RAM test algorithm. For address 0 to max do: W(0) For address 0 to max do: R(0), W(1), increment the address For address max to 0 do: R(1), W(0), R(0), decrement the address Each location receives six operations: hence the name of the algorithm, 6N For a byte-oriented RAM, the algorithm is repeated four times with different write First sequence: Second sequence: Third sequence: Fourth sequence: W(0) = W(00000000), W(1) = W(11111111) W(0) = W(00001111), W(1) = W(11110000) W(0) = W(00110011), W(1) = W(11001100) W(0) = W(01010101), W(1) = W(10101010) 7 Test Time Complexity (100MHz) Size N 10N NlogN N 1.5 N 2 1M 0.01s 0.1s 0.2s 11s 3h 16M 0.16s 1.6s 3.9s 11m 33d 64M 0.66s 6.6s 17s 1.5h 1.43y 256M 2.62s 26s 1.23m 12h 23y 1G 10.5s 1.8m 5.3m 4d 366y 4G 42s 7m 22.4m 32d 57c 16G 2.8m 28m 1.6h 255d 915c 8

RAM Test Algorithm A test algorithm (or simply test) is a finite sequence of test elements. A test element contains a number of memory operations (access commands).» Data pattern (background) specified for the Read operation.» Address (sequence) specified for the Read and Write operations. A march test algorithm is a finite sequence of march elements. A march element is specified by an address order and a number of Read/Write operations. 9 March Tests March X For AF, SAF, TF, & CFin. { c ( w0); ( r0, w1); ( r1, w0); c ( r0)} March C [Marinescu 1982] For AF, SAF, TF, & all CFs---redundant. { c ( w0); ( r0, w1); ( r1, w0); c ( r0); ( r0, w1); ( r1, w0); c ( r0)} March C- [Goor 1991] Also for AF, SAF, TF, & all CFs---irredundant. { c ( w 0 ); ( r 0, w 1); ( r 0, w 1); ( r1, w 0 ); c ( r1, w 0 ); ( r 0 )} 10

Coverage of March Tests MATS++ March X March Y March C- SAF 1 1 1 1 TF 1 1 1 1 AF 1 1 1 1 SOF 1.002 1.002 CFin.75 1 1 1 CFid.375.5.5 1 CFst.5.625.625 1 * Extended March C- (11N) has a 100% coverage of SOF. 11 Testing Word-Oriented RAM Background bit is replaced by background word. MATS++: { c ( wa ); ( ra, wa ' ); ( ra ', wa, ra )} Conventional method is to use log(m)+1 different backgrounds for m-bit words. m=8: 00000000, 01010101, 00110011, and 00001111. Apply the test algorithm logm+1=4 times, so complexity is 4*6N/8=3N. 12

Logic BIST - Stimulus Generation There many way to generate the test. The simplest categorization is in terms of the type of testing used: 1. Exhaustive testing 2. Pseudo-random testing 3. Pre-stored testing Logic BIST - Response Analysis 1. Parity checking 2. Transition counting 3. Syndrome generation or ones counting 4. Signature analysis 13 Test Pattern Generator for BIST (a) Exhaustive test: use a counter and apply all possible patterns (2 n patterns) to the circuit under test. (b) Random test: Use linear-feedback shift register (LFSR) to apply random patterns to CUT. Ex: TPG of random testing + D 1 D 2 D 3 C U T initial value (seed) This 3-stage LFSR can generate test sequence of length 2 3-1=7 D 1 D 2 D 3 1 0 0 1 1 0 1 1 1 0 1 1 1 0 1 0 1 0 0 0 1 1 0 0 14

Maximum-Length LFSR An m-stage LFSR can generate test seq. of length 2 m -1 Such sequence are called maximal length sequence. Such an LFSR is called a maximum-length LFSR. When only a fraction of the 2 m -1 can be applied, (because m is too large), LFSR is better than counters. Cycle Counter LFSR 1 0 0 0 1 0 0 2 0 0 1 1 1 0 3 0 1 0 1 1 1 4 0 1 1 0 1 1 5 1 0 0 1 0 1 6 1 0 1 0 1 0 7 1 1 0 0 0 1 8 1 1 1 Sequences of LFSR is more random: every bit is random 15 An LFSR Can Be Expressed by its Characteristic Polynomial f(x) + f(x)=x 5 +x 3 +1 a n =a n-3 +a n-5 The characteristic polynomial of a maximum-length LFSR is called primitive polynomial. Several listings of such polynomials exist in the literature. Given a CUT with m inputs, pick a primitive polynomial of degree m and construct the corresponding LFSR as a TPG. Ref: Built-In Test for VLSI, Paul H. Bardell et al., John Wiley & Sons, 1987 (up to degree 300). Ref: Essential of Electronic Testing, M. L. Bushnell, V. Agrawal, Kluwer, 2000. (pp. 620 - up to degree 100). 16

Characteristics of M-L LFSR The state diagram contains two components: one contains the all-zero state, the other contains other 2 m -1 states. 0000 0001 2 m -1 states Cycle LFSR 1 1 0 0 2 1 1 0 3 1 1 1 4 0 1 1 5 1 0 1 6 0 1 0 7 0 0 1 8 1 0 0 For every bit, # of 1 s differs from # of 0 s by one. # of transitions between 1 and 0 in one period is (m+1)/2. 17 Characteristics of M-L LFSR Cont d Autocorrelation between different bits: Autocorrelation function is defined as: 1 2 m 1 C(i, j) = 2 m b i (n)b j (n) 1 n =1 b i (t) = 1 where a i (t) = 0 where b i (t) = 1 where a i (t) = 1 Cycle LFSR m=3 1 1 0 0 2 1 1 0 3 1 1 1 4 0 1 1 5 1 0 1 6 0 1 0 7 0 0 1 The autocorrelation function of every M-L LFSR of period p=2 m -1 is: C(i,i)=1 C(i,j)=-1/p i j Ex. m=3: C(1,2)=-1/7 C(1,3)=-1/7 C(2,3)=-1/7 18

Linear Dependency Ref: Rajski & Tyszer, BIST for SoC, 1999 FTCS Tutorial 19 Selection of LFSR as RPTG Degree Large enough so the state will not repeat Large enough to reduce linear dependencies Type Primitive Avoid trinomials (increased linear dependencies) Seed value Select through fault simulation 20

Definitions - Random Pattern Testability for Logic BIST Detection probability q i of fault f i : the probability a randomly selected input vector will detect the fault. Error latency EL i of fault f i : the number of random input vectors applied to a circuit until f i is detected Theorem: EL of a fault has a geometric distribution. I.e Pr{ EL i = j} = (1 - q ) j-1 i q i Cumulative detection probability: F ELi (t) = Pr{EL i t} = t j=1 (1 - q )j-1 i q i F ELi (t) = 1 - (1 - q ) t i t 1 Mean : M i = E(EL ) i = j=1 j (1 - q ) j-1 i q i = 1/q i Var (EL ) i = E (EL i2 ) - E 2 (EL ) i = (1 - q i )/q 2 i 21 Required Random Test Length as a Function of Detection & Escape Prob. Escape probability of a fault f i : the probability that the fault will go undetected after the application of t random input vectors. Similarly, escape probability of a fault set {f 1, f 2,,f m } is the probability that at least one member of the fault set will be left undetected after application of t random input vectors. The random test length required to detect a fault f i with escape probability no larger than a given threshold e i can be obtained as T i = [ln e /ln(1 i - q )] i (Note: Pr{escape} = 1 - F ELi (t) = (1 - q i ) t ) 22

A Testability Analysis Method -COP (F.Brglez, 1984) C s : The probability of the signal having 1 at signal s O s : The probability of signal s being observed at a PO a b X C X = C a * C b O a = O X * C b a b a a x Y b X C X = 1 - (1 - C a )*(1 - C b ) O a = O X * (1 - C b ) C b = 1 - C a O a = O b C X = C Y = C a O a = 1 - (1 - O X ) * (1 - O Y ) Compute C s from PI s toward PO s Compute O s from PO s toward PI s 23 Estimate of Circuit Random Pattern Testability Estimate of detection probability of a stuck-at fault: Pd s / 0 = C s O s Pd s /1 = (1 C s ) O s for stuck-at-0 fault at s for stuck-at-1 fault at s An estimate of circuit testability * : U = 1 F 1 i F pd i * R. Lisanke et al, Testability-Driven Random Test Pattern Generation, IEEE TCAD, Nov. 1987. 24

Test Response Compression T P G Circuit Under Test output response Compressor Signature The signature & its collection algorithm should meet the following guideline: 1. The algorithm must be simple enough to be implemented as part of the built-in test circuitry. 2. The implementation must be fast enough to remove it as a limiting factor in test time. 3. The compression method should minimize the loss of information. Specifically it should minimize the loss of evidence of a fault indicated by a wrong response from the circuit under test. 25 Use of LFSRs for Polynomial Division Suppose we are interested in modulo 2 division. P(x)/G(x) =(x 7 +x 3 +x)/(x 5 +x 3 +x+1) The longhand division can be conducted in terms of the detached coefficients only: 101 Q(x)=x 2 +1 101011 10001010 101011 00100110 101011 001101=R R(x)=x 3 +x 2 +1 This division process can be mechanized using a LFSR. 26

LFSR Implementing Polynomial Division LFSR implementing division by f(x)=x 5 +x 3 +x+1 : input + + + x 0 x 1 x 2 x 3 x 4 When a shift occurs, x 5 is replaced by x 3 +x 1 +x 0. Whenever a quotient coefficient (the x 5 term) is shifted off the right-most stage, x 3 +x 1 +x 0 is added to the register (or subtracted from the register since addition is the same as subtraction modulo 2). Effectively, the dividend has been divided by x 5 +x 3 + x+1. 27 Using LFSR for Polynomial Division The LFSR is initialized to zero The message word (or dividend) P(x) is serially streamed to the LFSR input, high-order coefficient first, The content of the LFSR after the last message bit is the remainder R(x) from the division of the message polynomial by the divisor G(x) P(x) = Q(x)G(x) + R(x) The shifted-out bit stream forms the quotient Q(x) 28

An Example Example: P(x)=x 7 +x 3 +x input=10001010 input + + + time input x 0 x 1 x 2 x 3 x 4 0 0 0 0 0 0 x 5 1 1 1 0 0 0 0 0 2 0 0 1 0 0 0 0 3 0 0 0 1 0 0 0 4 0 0 0 0 1 0 0 5 1 1 0 0 0 1 0 6 0 1 0 0 1 0 1 7 1 1 1 0 0 1 0 8 0 1 0 1 1 0 1 Remainder R=x 3 +x 2 +1 Quotient Q=x 2 +1 29 LFSR As a Signature Analyzer Any data, such as the test response resulting from a circuit, can be compressed into a signature by an LFSR The signature is the remainder from the division process The LFSR is called a signature analyzer. P(x)=Q(x) G(x) + R(x) divisor signature LFSR polynomial 30

Aliasing in Signature Analysis If P(x) is the polynomial of the correct data, any P (x)=p(x)+m(x)g(x) will have the same signature as P(x) for any M(x). Example: P(x)=x 7 +x 3 +x G(x)=x 5 +x 3 +x+1 signature R(x)=x 3 +x 2 +1 P (x)=p(x)+g(x)=x 7 +x 5 +1 P (x)=p(x)+x G(x)=x 7 +x 6 +x 4 +x 3 +x 2 P (x) and P (x) have same signature x 3 +x 2 +1. Aliasing: condition in which a faulty ckt with erroneous response produces same signature as the good circuit. Aliasing probability is usually used to measure the quality of a data compressor. 31 Aliasing Prob. of Using LFSR as a Data Compressor - P(x)=Q(x) G(x)+R(x) For an input string of m-bit long, P(x) s degree is (m-1) There are 2 m different polynomials with an degree equal to or less than (m-1). Among them, 2 m -1 polynomials represent possible wrong bit streams. For a divisor polynomial G(x) of degree r, there are 2 m-r different Q(x) s that result in a polynomial of degree equal to or less than (m-1). There are 2 m-r -1 wrong m-bit streams that map into the same signature as the correct bit stream. Aliasing prob. P(M)=(2 m-r -1)/(2 m -1) For large m, P(M) 1/2 r. 32

Multiple-Input Signature Register (MISR) For multiple-output circuits, overhead of a single-input signature analyzer on every output would be high. A multiple-input signature register (MISR) is used: C U T I 0 I 1 I 2 I 3 I 4 + + + + + It can be proved that the aliasing prob. of a MISR is: (2 m-1-1)/(2 r+m-1-1) 1/2 r r: number of stages in MISR, m: length of data to be compressed. 33 Scan-Based BIST Architectures Test-Per-Scan Tests applied after filling up the scan chains Example: STUMP [Bardell et al. 1982] Lower area and performance overhead Longer test application time Test-Per-Clock Tests applied and responses compressed every clock cycle Examples: BILBO [Konemann et al., 1979], circular BIST [Krasniewski et al., 1989] Short test application time Higher area overhead and performance degradation Hybrid Example: PSBIST [Lin et al. 1993] 34

Example of Test-Per-Scan STUMP: Bardell et al, 1982. Parallel Random Pattern Generator SI 1 SI 2 SI 3 SI n S C A N 1 S C A N 2 S C A N 3... S C A N n SO 2 SO 3 SO n SO 1 Multiple-Input Signature Register 35 Example of Test-Per-Clock Circular BIST: Krasniewski et al. 1989. I/P Boundary- Scan Combinational Circuit O/P boundary- Scan BIST FFs Shift Register BS1 BS0 BS1 CBIST BS0 Controller Q i-1 Q i I-1 I D i BIST FFs 36

Example of Hybrid Architecture: PSBIST PI M U X combinational or feedback-free sequential circuit S C PO M L I F S R P S S R Scan chains are are observed per per scan scan but but PO s PO s are are observed per per clock. Ref: C.-J. Lin, et al., Integration of Partial Scan and Built-In Self-Test,JETTA, 1995 37 General BIST Issues No X state propagation to observation points Structural dependencies for scan-based BIST Solution: Using phase shifter Random Pattern Resistance Solutions: Inserting test points; Using additional deterministic tests 38

Use of Phase Shifter to Guarantee Channel Separation Phase Shifter An XOR network If carefully designed, guarantee minimum channel separation Ref: Rajski & Tyszer, BIST for SoC, 1999 FTCS Tutorial 39 Random Pattern (RP) Resistance Coverage?? saturation high gain # Patterns 10-30% of of faults are typically random pattern resistant. 40

Test Point Insertion: Inserting an Observation Point Region influenced by an observation point: observation point Circuit before adding an observation point: e Circuit after adding an observation point: e 41 Test Point Insertion: Inserting a Control Point Region influenced by a control point: control point Circuit before adding a control point: e Hard to to set set to to 1 Circuit after adding a control point: e G e r 42

PSBIST With Test Points L PI M U X to control points combinational or feedback-free sequential circuit S C PO from obs. point M I F S R P S S R 43 Timing-Driven Test Point Insertion Timing-driven test point selection technique* automatically selects control/observation points With greatest random testability improvements Not in critical paths Under the PSBIST architecture, timing-driven partial scan + timing-driven test point insertion offer a lowperformance-penalty DFT solution to timing critical circuits. Ref: Cheng and Lin, Timing-Driven Test Point Insertion for Full-Scan and Partial-Scan BIST, Int l Test Conf., 1995. 44

Estimate of Circuit Random Pattern Testability Detection probability of a stuck-at fault: Pd s / 0 = C s O s for stuck-at-0 fault at s Pd s /1 = (1 C s ) O s for stuck-at-1 fault at s An estimate of circuit testability * : U = 1 F 1 i F pd i * R. Lisanke et al, Testability-Driven Random Test Pattern Generation, IEEE TCAD, Nov. 1987. 45 COP Testability Measures (F.Brglez, 1984) C s : The probability of the signal having 1 at signal s O s : The probability of signal s being observed at a PO a b X C X = C a * C b O a = O X * C b a b a a x Y X C X = 1 - (1 - C a )*(1 - C b ) O a = O X * (1 - C b ) Compute C s C s from PI s PI s toward PO s PO s C b = 1 - C a O a = O b Compute O s O s from PO s PO s toward PI s PI s C X = C Y = C a O a = 1 - (1 - O X ) * (1 - O Y ) b 46

A Simple Algorithm for Selecting Test Points while (FC < desired_fc) & (#_of_test_point < Mac_number) { Compute slacks* for all nodes in the circuit; For each node s with a slack > threshold Compute U s assuming a test point at s; Insert the test point at s that has the lowest U s ; } Fault simulation using random vectors; * The difference of required arrival time and actual arrival time Problem: Exhaustively simulating all nodes causes high complexity. There are several solutions available to reduce the complexity. 47 Typical Test Application Scheme - One Capture Per Scan DATA SCAN_IN MODE_SW CLK 1 0 Q D-FF Q MUX Scan Cell scan L cycles capture 1 cycle CLK MODE_SW 48

Scan-Based BIST Does Not Have To Be 1-Capture-Per-Scan!! Two captures per scan L clock scancycles 2 capture clock cycles CLK MODE_SW K captures per scan scan L clock cycles capture k clock cycles CLK MODE_SW 49 Potential Advantages of Multiple Captures After Each Scan Tests are less random It provides tests with different signal probability profiles An example Scan_in A B C PSI F Scan FF 50

0.5 A 0.5 B 0.5 C PSI Signal Probability Profile At the first capture cycle: Scan_in 0.5 F.9375 Scan FF At the second capture cycle: Scan_in 0.5 A 0.5 B 0.5 C PSI.9375 F.8828 Scan FF Easier to observe: A s/1, B s/1, C s/1 Easier to activate: F s/1 Harder to activate: PSI s/1 51 A General Test Application Scheme for Scan-Based BIST [Tsai, Cheng and Bhamik, DAC99] Divide the testing into several sessions Each test session has a unique number (k) of capture cycles per scan Each test session detects a subset of faults Find the number of test sessions and the corresponding number of capture cycles for each test session to maximize the overall fault coverage 52

Fault Coverage Curve s38417 Multiple Single 53 Logic BIST Summary Circuit with logic BIST has core logic with scan pseudo-random pattern generator (LFSR) response compactor: Multiple-Input Signature Register (MISR) BIST controller (shift counter and pattern counter) Test points for improving random pattern testability BIST architectures can be classified into: Test-per-scan (such as STUMP) Test-per-clock (such as circular BIST) Hybrid (such as PSBIST) Multiple captures after each scan sequence for PSBIST may improve test quality without additional hardware overhead. 54

System-on-Chip: Heterogeneity and Programmability Increasing heterogeneity: More transistors doing different things! Digital, Analog, Memory, Software, High-speed bus Increasing programmability: Almost all SoCs have some programmable cores (Processor, DSP, FPGA) High NRE results in fewer design starts Domain-specific - more applications for a single design Programmability Power/Performance General purpose Domain specific Application specific 55 Fewer, But More Programmable Designs 56

Embedded-Software-Based Self- Testing For Programmable Chips Reuse of on-chip programmable components for test Using embedded processors as general computing platform for self-testing Processor/DSP/FPGA cores for on-chip test generation, measurement, response analysis and even diagnosis Self-test a processor using its instruction set for high structural fault coverage Bridging high-level functional test and low-level physical defects Use the tested processor/dsp to test buses, interfaces and other components, including analog and mixed-signal components View test as as an an application of of a programmable SOC!! 57 Embedded SW Self-Testing DSP VCI BusInterface Master Wrapper BusInterface Target Wrapper VCI System Memory Bus Arbiter CPU VCI On-chip Bus Test program On-Chip Responses Memory Signatures BusInterface Master Wrapper BusInterface Target Wrapper VCI IP Core Low-Cost Tester Loading test program at low speed Self-test at operational speed Unloading response signature at low speed Low-cost tester High-quality at-speed test Low test overhead Non-intrusive Test in normal operational mode No violation of power consumption More accurate speed-binning Ref: Krstic, et al DAC 02 58