Short Papers. Test Data Compression and Decompression Based on Internal Scan Chains and Golomb Coding

Similar documents
Defect-Aware SOC Test Scheduling

Fast Montgomery-like Square Root Computation over GF(2 m ) for All Trinomials

Relating Entropy Theory to Test Data Compression

A Better Algorithm For an Ancient Scheduling Problem. David R. Karger Steven J. Phillips Eric Torng. Department of Computer Science

Randomized Recovery for Boolean Compressed Sensing

A Simplified Analytical Approach for Efficiency Evaluation of the Weaving Machines with Automatic Filling Repair

List Scheduling and LPT Oliver Braun (09/05/2017)

This model assumes that the probability of a gap has size i is proportional to 1/i. i.e., i log m e. j=1. E[gap size] = i P r(i) = N f t.

Reduced Length Checking Sequences

e-companion ONLY AVAILABLE IN ELECTRONIC FORM

time time δ jobs jobs

Department of Electronic and Optical Engineering, Ordnance Engineering College, Shijiazhuang, , China

Genetic Quantum Algorithm and its Application to Combinatorial Optimization Problem

A note on the multiplication of sparse matrices

Non-Parametric Non-Line-of-Sight Identification 1

Elliptic Curve Scalar Point Multiplication Algorithm Using Radix-4 Booth s Algorithm

Extension of CSRSM for the Parametric Study of the Face Stability of Pressurized Tunnels

Algorithms for parallel processor scheduling with distinct due windows and unit-time jobs

Homework 3 Solutions CSE 101 Summer 2017

Analyzing Simulation Results

Randomized Accuracy-Aware Program Transformations For Efficient Approximate Computations

Lec 05 Arithmetic Coding

On Concurrent Detection of Errors in Polynomial Basis Multiplication

On the Maximum Number of Codewords of X-Codes of Constant Weight Three

Feature Extraction Techniques

Interactive Markov Models of Evolutionary Algorithms

Low complexity bit parallel multiplier for GF(2 m ) generated by equally-spaced trinomials

Intelligent Systems: Reasoning and Recognition. Perceptrons and Support Vector Machines

COS 424: Interacting with Data. Written Exercises

NBN Algorithm Introduction Computational Fundamentals. Bogdan M. Wilamoswki Auburn University. Hao Yu Auburn University

An improved self-adaptive harmony search algorithm for joint replenishment problems

Vulnerability of MRD-Code-Based Universal Secure Error-Correcting Network Codes under Time-Varying Jamming Links

New Slack-Monotonic Schedulability Analysis of Real-Time Tasks on Multiprocessors

On the Communication Complexity of Lipschitzian Optimization for the Coordinated Model of Computation

EMPIRICAL COMPLEXITY ANALYSIS OF A MILP-APPROACH FOR OPTIMIZATION OF HYBRID SYSTEMS

SPECTRUM sensing is a core concept of cognitive radio

In this chapter, we consider several graph-theoretic and probabilistic models

N-Point. DFTs of Two Length-N Real Sequences

A Self-Organizing Model for Logical Regression Jerry Farlow 1 University of Maine. (1900 words)

ASSUME a source over an alphabet size m, from which a sequence of n independent samples are drawn. The classical

Upper bound on false alarm rate for landmine detection and classification using syntactic pattern recognition

arxiv: v3 [cs.ds] 22 Mar 2016

Stability Analysis of the Matrix-Free Linearly Implicit 2 Euler Method 3 UNCORRECTED PROOF

Birthday Paradox Calculations and Approximation

Design of Spatially Coupled LDPC Codes over GF(q) for Windowed Decoding

Graphical Models in Local, Asymmetric Multi-Agent Markov Decision Processes

Adapting the Pheromone Evaporation Rate in Dynamic Routing Problems

Data-Driven Imaging in Anisotropic Media

Effective joint probabilistic data association using maximum a posteriori estimates of target states

A method to determine relative stroke detection efficiencies from multiplicity distributions

A Division Algorithm Using Bisection Method in Residue Number System

Efficient Filter Banks And Interpolators

Pattern Classification using Simplified Neural Networks with Pruning Algorithm

Sharp Time Data Tradeoffs for Linear Inverse Problems

Ch 12: Variations on Backpropagation

DESIGN OF THE DIE PROFILE FOR THE INCREMENTAL RADIAL FORGING PROCESS *

Symbolic Analysis as Universal Tool for Deriving Properties of Non-linear Algorithms Case study of EM Algorithm

Hybrid System Identification: An SDP Approach

Constant-Space String-Matching. in Sublinear Average Time. (Extended Abstract) Wojciech Rytter z. Warsaw University. and. University of Liverpool

Support Vector Machine Classification of Uncertain and Imbalanced data using Robust Optimization

Block designs and statistics

Optimal Resource Allocation in Multicast Device-to-Device Communications Underlaying LTE Networks

Pattern Recognition and Machine Learning. Artificial Neural networks

A Note on Scheduling Tall/Small Multiprocessor Tasks with Unit Processing Time to Minimize Maximum Tardiness

Introduction to Discrete Optimization

Experimental Design For Model Discrimination And Precise Parameter Estimation In WDS Analysis

Statistical Logic Cell Delay Analysis Using a Current-based Model

Multi-Dimensional Hegselmann-Krause Dynamics

Low-complexity, Low-memory EMS algorithm for non-binary LDPC codes

Improved multiprocessor global schedulability analysis

Lost-Sales Problems with Stochastic Lead Times: Convexity Results for Base-Stock Policies

Spine Fin Efficiency A Three Sided Pyramidal Fin of Equilateral Triangular Cross-Sectional Area

Polygonal Designs: Existence and Construction

Easy Evaluation Method of Self-Compactability of Self-Compacting Concrete

ma x = -bv x + F rod.

A Low-Complexity Congestion Control and Scheduling Algorithm for Multihop Wireless Networks with Order-Optimal Per-Flow Delay

SUPERIOR-ORDER CURVATURE-CORRECTED PROGRAMMABLE VOLTAGE REFERENCES

ONE of the main challenges in very large scale integration

Using a De-Convolution Window for Operating Modal Analysis

Nonmonotonic Networks. a. IRST, I Povo (Trento) Italy, b. Univ. of Trento, Physics Dept., I Povo (Trento) Italy

arxiv: v3 [quant-ph] 18 Oct 2017

Compression and Predictive Distributions for Large Alphabet i.i.d and Markov models

On random Boolean threshold networks

UCSD Spring School lecture notes: Continuous-time quantum computing

Error Exponents in Asynchronous Communication

Quantum algorithms (CO 781, Winter 2008) Prof. Andrew Childs, University of Waterloo LECTURE 15: Unstructured search and spatial search

Convex Programming for Scheduling Unrelated Parallel Machines

Approximation in Stochastic Scheduling: The Power of LP-Based Priority Policies

A DESIGN GUIDE OF DOUBLE-LAYER CELLULAR CLADDINGS FOR BLAST ALLEVIATION

Arithmetic Unit for Complex Number Processing

DSPACE(n)? = NSPACE(n): A Degree Theoretic Characterization

MSEC MODELING OF DEGRADATION PROCESSES TO OBTAIN AN OPTIMAL SOLUTION FOR MAINTENANCE AND PERFORMANCE

ARTICLE IN PRESS. Murat Hüsnü Sazlı a,,canişık b. Syracuse, NY 13244, USA

A Simple Regression Problem

ESTIMATING AND FORMING CONFIDENCE INTERVALS FOR EXTREMA OF RANDOM POLYNOMIALS. A Thesis. Presented to. The Faculty of the Department of Mathematics

On Constant Power Water-filling

Model Fitting. CURM Background Material, Fall 2014 Dr. Doreen De Leon

The proofs of Theorem 1-3 are along the lines of Wied and Galeano (2013).

Impact of Imperfect Channel State Information on ARQ Schemes over Rayleigh Fading Channels

IN modern society that various systems have become more

Transcription:

IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 1, NO. 6, JUNE 00 715 Short Papers Test Data Copression and Decopression Based on Internal Scan Chains and Golob Coding Anshuan Chandra and Krishnendu Chakrabarty Abstract We present a data copression ethod and decopression architecture for testing ebedded cores in a syste-on-a-chip (SOC). The proposed approach akes effective use of Golob coding and the internal scan chain(s) of the core under test and provides significantly better results than a recent copression ethod that uses Golob coding and a separate cyclical scan register (CSR). The ajor advantages of Golob coding of test data include very high copression, analytically predictable copression results, and a low-cost and scalable on-chip decoder. The use of the internal scan chain for decopression obviates the need for a CSR, thereby reducing hardware overhead considerably. In addition, the novel interleaving decopression architecture allows ultiple cores in an SOC to be tested concurrently using a single ATE I/O channel. We deonstrate the effectiveness of the proposed approach by applying it to the ISCAS 89 benchark circuits. Index Ters Autoatic test equipent (ATE), decopression architecture, ebedded core testing, precoputed test sets, response vectors, syste-on-a-chip testing, test set encoding, testing tie, variable-to-variable-length codes. I. INTRODUCTION Ebedded cores are becoing coonplace in large syste-on-a-chip (SOC) designs [1]. Along with the benefits of higher integration and shorter tie to arket, intellectual property (IP) cores pose several difficult test challenges. The volue of test data for an SOC is growing rapidly as IP cores becoe ore coplex and an increasing nuber of these cores are integrated in a chip. In order to effectively test these systes, each core ust be adequately exercised with a set of precoputed test patterns provided by the core vendor. However, the input/output (I/O) channel capacity, speed and accuracy, and data eory of autoatic test equipent (ATE) are severely liited. The testing tie for an SOC depends on the test data volue, the tie required to transfer the data to the cores, and the rate at which it is transferred (easured by the cores test data bandwidth and ATE channel capacity). Lower testing tie increases production capacity as well as reduces test cost and tie to arket for an SOC. New techniques are therefore needed for decreasing test data volue in order to overcoe eory bottlenecks and to reduce testing tie. An attractive approach for reducing test data volue for SOCs is based on the use of data copression techniques [] [4]. In this approach, the precoputed test set T D provided by the core-vendor is copressed (encoded) to a uch saller test set T E and stored in the ATE eory. An on-chip decoder is used for pattern decopression to Manuscript received Deceber 1, 000; revised July 10, 001. This work was supported in part by the National Science Foundation under Grant CCR- 987534. This paper was presented in part in Proc. Design, Autoation and Test in Europe (DATE) Conference, pp. 145 149, Munich, Gerany, March 001. This paper was recoended by Associate Editor R. Aitken. The authors are with the Departent of Electrical and Coputer Engineering, Duke University, Durha, NC 7708 USA (e-ail: achandra@ee.duke.edu). Publisher Ite Identifier S 078-0070(0)04700-0. generate T D fro T E during pattern application. Test data copression using statistical coding of test sequences for synchronous sequential (nonscan) circuits was presented in [] and for full-scan circuits in [3]. While the copression ethod in [] is restricted to sequential circuits with a large nuber of flip flops and relatively few priary inputs, the work presented in [3] does not conclusively deonstrate that statistical coding provides greater copression than standard ATPG copaction ethods for full-scan circuits [5], [6]. Test data can be ore efficiently copressed by taking advantage of the fact that the nuber of bits changing between successive test patterns in a test sequence is generally very sall. This observation was used in [4], where a difference vector sequence T di deterined fro T D was copressed using run-length coding. A drawback of the copression ethod described in [4] is that it relies on variable-tofixed-length codes, which are less efficient than ore general variable-to-variable-length codes [7], [8]. Furtherore, it is inefficient for cores with internal scan chains that are used to capture test responses; in these circuits, separate CSRs ust be added to the SOC, thereby increasing hardware overhead. A ore efficient copression and decopression ethod was used in [9], where T di was copressed using variable-to-variable-length Golob codes. However, this approach requires separate CSRs and is therefore also inefficient for cores that use the sae internal scan chains for applying test patterns and capturing test responses. In this copanion paper to [9], we present an iproved test data copression and decopression ethod for IP cores in an SOC. The proposed approach akes effective use of Golob codes and the internal scan chain(s) of the core under test. No separate CSR is required for pattern decopression. The difference sequence T R di is derived fro the given precoputed test set T D using the fault-free responses R of the core under test to T D. Golob coding is then applied to T R di. The resulting encoded test set T E is uch saller than the original precoputed test set T D. We apply our copression approach to test sets for the ISCAS 89 benchark circuits and show that T E is not only considerably saller than the sallest test sets obtained using ATPG copaction [5], but it is also significantly saller than the copressed test sets obtained using Golob coding in [9]. The proposed copression approach for reducing test data volue is especially suitable for syste-on-a-chip containing IP cores since it does not require gate-level odels for the ebedded cores. Precoputed test sets can be directly encoded without any fault siulation or subsequent test generation. This is in contrast to other recent techniques, such as LFSR-based reseeding for BIST [10] and scan broadcast [11], which require structural odels for fault siulation and test generation. The ixed-ode BIST technique in [10] relies on fault siulation for identifying hard faults and test generation to deterine test cubes for these faults. The scan broadcast technique in [11] also requires test generation. We extend the decopression architecture of [9] to an interleaving schee that allows ultiple cores to be tested in parallel with a single ATE I/O channel. We also present analytical results for test data copression and testing tie. Finally, we show that test data copression not only reduces the volue of test data but it also allows a slower tester to be used without any penalty on testing tie. 078-0070/0$17.00 00 IEEE

716 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 1, NO. 6, JUNE 00 The proposed ethod differs fro [9] in that no separate CSR is used; instead the internal scan chain is used for pattern decopression and the fault-free responses of the core under test are used to generate a difference vector set T R di. Given an (ordered) precoputed test set T D, the set of corresponding fault-free responses R = fr 1 ;r ;...;r n g is used to generate the test patterns. The difference vector set T R di is now given by T R di = fd 1 ;d ;...;d n g = ft 1;r 1 8 t ;r 8 t 3;...;r n01 8 t ng Fig. 1. An exaple of Golob coding for =4. II. COMPRESSION METHOD AND TEST ARCHITECTURE We first review Golob coding and its application to test data copression [9]. The first step in the encoding procedure is to select the Golob code paraeter. The choice of has received a lot of attention in the inforation theory literature; for certain distributions of the input data strea (T R di in our case), the group size can be optially deterined. For exaple, if the input data strea is rando with 0 probability p, then should be chosen such that p 0:5 [8]. However, since the difference vectors for precoputed test sets do not satisfy the randoness assuption, the best value of for test data copression ust be deterined experientally. Nevertheless, it has been shown in [9] that the best value of can be approxiated analytically. Once the group size is deterined, the runs of zeros in precoputed test set are apped to groups of size (each group corresponding to a run length). The nuber of such groups is deterined by the length of the longest run of zeros in the precoputed test set. The set of run lengths f0; 1; ;...;0 1g fors group A 1; the set f; +1;+;...; 0 1g, group A ; etc. In general, the set of run lengths f(k 0 1); (k 0 1) +1; (k 0 1) +;...;k0 1g coprises group A k [8]. To each group A k, we assign a group prefix of (k 0 1) 1s followed by a zero. We denote this by 1 (k01) 0. If is chosen to be a power of two, i.e., = N, each group contains N ebers and a log -bit sequence (tail) uniquely identifies each eber within the group. Thus, the final code word for a run length L that belongs to group A k is coposed of two parts, a group prefix and a tail. The prefix is 1 (k01) 0 and the tail is a sequence of log bits. It can be easily shown that (k 0 1) = (L od ) i.e., k =(L od )+1. The encoding process is illustrated in Fig. 1 for =4. The next step in the copression procedure is to derive the difference vector set T di fro T D, where T D = ft 1 ;t ;t 3 ;...;t n g is the (ordered) precoputed test set. The ordering is deterined using a heuristic procedure described later. T di is defined as follows: T di = fd 1 ;d ;...;d n g = ft 1;t 1 8 t ;t 8 t 3;...;t n01 8 t ng where a bit-wise exclusive-or operation is carried out between patterns t i and t j. This assues that the CSR starts in the all-zero state. (Other starting states can be considered siilarly.) The details of the Golob coding procedure are presented in the copanion paper [9], hence oitted here. where r i is the fault-free response of the core under test to pattern t i. A test architecture based on the use of T R di is shown in Fig.. As observed in [9], test data copression is ore effective if T D consists of test cubes containing don t-care bits. In order to deterine T R di in such cases, we need to assign appropriate binary values to the don t-care bits and perfor logic siulation to obtain the corresponding fault-free responses. (In general, the siulation odel for the core provided by the core vendor can be used to obtain the fault-free responses.) First, we set all don t-care bits in t 1, the first pattern in T D, to zeros and use the logic siulation engine of FSIM [1] to generate the fault-free response r 1. The ordering algorith described below is then used to generate the successive test patterns. The proble of deterining the best ordering is equivalent to the NP-Coplete Traveling Salesan proble. Therefore, a greedy algorith is used to generate an ordering and the corresponding T R di. Suppose a partial ordering t 1t 111t i has already been deterined for the patterns in T D. To deterine t i+1, we first deterine r i using FSIM and then calculate the Haing distance HD(r i ;t j ) between r i and all patterns t j that have not been placed in the ordered list. We define HD(r i ;t j ) as the nuber of bit positions for which r i and t j have different (specified) binary values. We select the pattern t j for which HD(r i;t j) is iniu and add it to the ordered list, denoting it by t i+1. All don t-care bits in t i+1 are set to the corresponding specified bit in r j. In this way, a fully specified test pattern is obtained and the sallest nuber of ones is added to the difference vector sequence. We continue this process until all test patterns in T D are placed in the ordered list. Fig. 3 illustrates the procedure for obtaining T R di fro T D. For ost cores, the nuber of inputs ji core j driven by the scan cells is not equal to the nuber of outputs jo core j that feed the scan chain. (I core and O core refer to the sets of inputs driven by the scan chain and outputs feeding the scan chain, respectively.) Consider the following two cases. Case 1 ji core j > jo core j: Assue without loss of generality that the outputs in O core drive scan eleents that are located at the beginning of the scan chain. Let t i = h~t i; 1; ~t i; ;...; ~t i; ni and ri = h~r i; 1 ; ~r i; ;...; ~r i; k i denote the ith test pattern and the ith fault-free response, respectively. The encoding procedure is odified as follows to generate the difference vector d i+1 = h d ~ i+1; 1 ; d ~ i+1; ;... d ~ i+1;n i, where ~d i+1; 1 = ~t i+1; 1 8 ~r i; 1 ~d i+1; = ~t i+1; 8 ~r i; 111 111 ~d i+1;k = ~t i+1;k 8 ~r i; k ~d i+1;k+1 = ~t i+1;k+1 8 ~t i; k+1 111 111 ~d i+1;n = ~t i+1;n 8 ~t i; n :

IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 1, NO. 6, JUNE 00 717 Fig.. Test architecture based on Golob coding and the use of internal scan chains. Let n c be the total nuber of bits in T E and r be the nuber of ones in Tdi. R T E contains r tail parts, r separator zeros, and the nuber of prefix ones in T E equals n c 0r(1 + log ). Therefore, the axiu and iniu testing ties (T ax and T in, respectively), easured by the nuber of cycles, are given by T ax =(n c 0 r(1 + log )) + r + r = n c 0 r( log 0 1) T in =(n c 0 r(1 + log )) + r + r = n c 0 r((1 + log ) 0 (1 + )): Therefore, the difference between T ax and T in is given by T = T ax 0T in Fig. 3. An exaple to illustrate the procedure for deriving T. Case ji core j < jo core j: In this case, additional zeros ust be inserted into the difference vector sequence as follows: ~d i+1; 1 = ~t i+1; 1 8 ~r i; 1 ~d i+1; = ~t i+1; 8 ~r i; 111 111 ~d i+1;k = ~t i+1;k 8 ~r i; k ~d i+1;k+1 =0 111 111 ~d i+1;n =0: An on-chip decoder decopresses the encoded test set T E and produces T R di. The exclusive-or gate and the internal scan chain are used to generate the test patterns fro the difference vectors. As discussed in the copanion paper [9], the decoder can be efficiently ipleented by a log -bit counter and a finite-state achine (FSM). The synthesized decode FSM circuit contains only four flip flops and 34 cobinational gates [9]. For any circuit whose test set is copressed using =4, the given logic is the only additional hardware required other than the two-bit counter. III. ANALYSIS OF TEST APPLICATION TIME AND TEST DATA COMPRESSION In this section, we analyze the testing tie for a single scan chain when Golob coding is eployed with the test architecture shown in Fig.. Fro the state diagra of the Golob decoder [9], we note the following. Each 1 in the prefix part takes cycles for decoding. Each separator 0 takes one cycle. The tail part takes a axiu of cycles and a iniu of = log +1cycles. = r( 0 log 0 1): We will ake use of this result in Section IV. A ajor advantage of Golob coding is that on-chip decoding can be carried out at scan clock frequency f scan while T E can be fed to the core under test with external clock frequency f ext <f scan. This allows us to use slower testers without increasing the test application tie. The external and scan clocks ust be synchronized, e.g., using the schee described in [13], and f scan = f ext, where the Golob code paraeter is usually a power of. This allows the bits of Tdi R to be generated by the decoder at the frequency of f scan. We now present an analysis of testing tie using f sys = f ext and copare the testing tie for our ethod with that of external testing in which ATPG-copacted patterns are applied using an external tester. Let the ATPG-copacted test set contain p patterns and let the length of the scan be n bits. Therefore, the size of the ATPG-copacted test set is pn bits and the testing tie T ATPG equals pn external clock cycles. Next, suppose the difference vector Tdi R obtained fro the uncopacted test set contains r ones and its Golob-coded test set T E contains n c bits. The axiu nuber of scan clock cycles required for applying the test patterns using the Golob coding schee is T ax = n c 0 r( log 0 1). Now, the axiu testing tie (seconds) when Golob coding is used is given by = Tax f scan = n c 0 r( log 0 1) f scan and the testing tie 0 (seconds) for external testing with ATPG-copacted patterns is given by = pn f ext = pn f scan : If testing is to be accoplished in? seconds using Golob coding, the scan clock frequency f scan ust equal T ax =?, i.e., f scan = nc 0 r( log 0 1)? :

718 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 1, NO. 6, JUNE 00 This is achieved using a slow external tester operating at frequency f ext = f scan =. On the other hand, if only an external test is used with the p ATPG-copacted patterns, the required external tester clock frequency f 0 ext equals pn=?. Let us take the ratio of f 0 ext between f ext f 0 ext = pn=? f ext f scan= = pn n c 0 r log + r= : Experiental results presented in Section V show that f 0 ext is uch greater than f ext, therefore, deonstrating that the use of Golob coding allows us to decrease the volue of test data and use a slower tester without increasing testing tie. We next analyze the aount of copression that is achieved using Golob coding of a precoputed test set T D. The following three leas will lead to the ain result in Theore 1. As in [9], we assue without loss in generality here that the difference sequence always ends in one. Lea 1: Let T D be the given precoputed test set, and let T R di be the bit strea derived fro T D and the set of fault-free responses. Let the nuber of don t cares in T D be n. The nuber of zeros in T R di is at least n. Proof: The lea follows fro the fact that every don t care in T D can be apped to a zero in T R di, while ones and zeros in T D ust be selectively apped to ones or zeros in T R di, depending on the fault-free response. Lea [9]: If an n-bit data strea S containing r ones is encoded using Golob code with paraeter, an upper bound on the length G S of the encoded sequence is given by G S n + r log + r 1 0 1 : Lea 3: Let S be any binary sequence and let S? be a binary sequence derived fro S by replacing one or ore ones in it by zeros. Let S E (S E)? be the Golob-coded sequence corresponding to S (S? ). Then len(s E ) > len(s E),? where len(s E ) and len(s E)? are the nuber of bits in S E and S E,? respectively. Proof: Suppose we copleent a 1 in S that separates two runs of zeros of length l 1 and l (l 1, l 0), respectively, to obtain S?.We now have a run of (l 1 + l +1) zeros in S?. The nuber of bits N required to encode the two runs of zeros of length l 1 and l is given by N = l 1 +1+log + l +1+log = l1 + l + log +: Siilarly, the nuber of bits in N? required to encode the single run of (l 1 + l +1) zeros in S? is given by This iplies that N? = len(s E ) 0 len(s? E) = N 0 N? (l1 + l +1) = l 1 + l 0 (l 1 + l +1) + log +1: + log +1 l 1 + l 0 (l 1 + l +1) + log +1 l 1 0 1 + l 0 1 0 (l 1 + l +1) + log +1 = log 0 1 0 1 : This iplies that len(s E) 0 len(s? E) > 0 if >. For the special case of =, we note that len(s E) 0 len(s? E) =0:5: = l 1 + l 0 (l 1 + l +1) (l 1 0 1) + (l 0 1) 0 (l 1 + l +1) + log +1 + log +1 Therefore, copleenting a single one to a zero always decreases the length of the Golob-coded sequence. This arguent can be easily extended using transitivity to show that len(s E ) >len(s? E) whenever one or ore ones in S are changed to zeros to obtain S?. We now present an upper bound on the aount of expression that is obtained via Golob coding of T R di. The proof of the theore follows fro Leas 1 3. Theore 1: Let T D be the given precoputed test set, and let T R di be the n-bit data strea derived fro T D and the set of fault-free responses. Let the nuber of don t cares in T D be n.ift R di is encoded using Golob code with paraeter, an upper bound on the length G of the encoded sequence is given by G n +(n 0 n ) log +(n 0 n ) 1 0 1 : Theore 1 provides an easy-to-copute bound on the size of the encoded test set T E. This bound depends only on the precoputed test set T D and is independent of the fault-free response. It can therefore be obtained without any logic siulation. We list these bounds for several ISCAS 89 circuits in Section V. IV. INTERLEAVING DECOMPRESSION ARCHITECTURE We now present a novel interleaving decopression architecture, which enables testing of ultiple cores or the loading of ultiple balanced scan chains in parallel. The sae decoder can be used to drive equal-length scan chains in one or ore cores in parallel. An iportant constraint here is that the sae value of ust encode test sequences for all the scan chains. The proposed decopression architecture not only reduces the testing tie and the size of the test data to be stored in the ATE eory, but also allows testing of ultiple cores using a single ATE I/O channel, thereby increasing the ATE I/O channel capacity. As discussed in Section II, when Golob coding is applied to a block of data containing a run of 0s followed by a single 1, the code word contains two parts a prefix and tail. For a given code paraeter (group size), the length of the tail (log ) is independent of the run length. Note further that every one in the prefix corresponds to zeros in the decoded difference vector. Thus the prefix consists of a string of ones followed by a zero, and the zero can be used to identify the beginning of the tail. As shown in [9], the FSM in the decoder runs the counter for decode cycles whenever a one is received and starts decoding the tail as soon as a zero is received. The tail decoding takes at ost cycles. During prefix decoding, the FSM has to wait for cycles before the next bit of the prefix can be decoded. Therefore, we can use interleaving to test cores together, such that the decoder corresponding to each core is fed with encoded prefix data after every cycles. (This can also be used to feed ultiple scan chains in parallel as long as the capture cycles of the scan chains are synchronized.) Whenever the tail is to be decoded (identified by a zero in the encoded bit strea), the respective decoder is fed with the entire tail of log bits in a single burst of log cycles. The SOC channel selector consisting of a deultiplexer, a log counter, and an FSM is used for interleaving; see Fig. 4. This interleaving schee works as follows.

IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 1, NO. 6, JUNE 00 719 Fig. 4. SOC channel selector for application to ultiple cores and ultiple scan chains. Fig. 5. Coposite encoded test data for two cores with group size =. First, the encoded test data for cores are cobined to generate a coposite bit strea T C that is stored in the ATE. Next, T C is fed to the FSM, which is used to detect the beginning of each tail and to feed the deultiplexer. An i-bit counter (i =log ) is used to select the outputs to the decoders of the various cores. T C is obtained by interleaving the prefix parts of the copressed test sets of each core, but the tails are included unchanged in T C. An exaple is shown in the Fig. 5 where copressed data for two cores (generated using group size =) have been interleaved to obtain the final encoded test set to be applied through the decopression schee for ultiple cores. We now describe the SOC channel selector in ore detail. The FSM, the i bit counter, and the deultiplexer together constitute the SOC channel selector. The FSM is used to detect the beginning of the tail and generates the clk stop signal to stop the i-bit counter. The data in is the input to the FSM, data out is the output, and signals v in and v out are used to indicate that the input and output data is valid. The i-bit counter is connected to the select lines of the deultiplexer and the deultiplexer outputs are connected to the decoders of the different scan chains. Every scan chain has a dedicated decoder. This decoder receives either a one or the tail of the copressed data corresponding to the various cores connected to the scan chain. If the FSM detects that a portion of the tail has arrived, the zero that is used to identify the tail is passed to the decoder and the clk stop goes high for the next cycles. The output of the deultiplexer does not change for this period and the entire tail of length log -bits is passed on continuously to the appropriate core. The state diagra of the FSM for = 4 and the corresponding tiing diagra are shown in Figs. 6 and 7, respectively. The FSM is fed with T C corresponding to four different cores. It reains in state S0 as long as it receives the ones corresponding to the prefixes. As soon as a zero is received, it outputs the entire tail unchanged and akes clk stop high. This stops the i-bit counter and prevents any change at deultiplexer output. It is shown in the tiing diagra (Fig. 7) that whenever a zero is received, the SOC channel selection reains unchanged for the next (1 +) cycles. As discussed in Section III, the difference in T ax and T in is given by T = r( 0 log 0 1). Therefore, the difference between axiu and iniu testing ties for a single tail is t =(0 log 0 1). If we restrict to be sall, 8, t 4. In this case, Fig. 6. State diagra for the SOC channel selector FSM ( =4). the decode FSM can be easily odified by introducing additional states to the Golob decoder FSM of [9] such that the tail decoding always takes cycles and t =0. To ake tail and prefix decoding equal for =4, three additional states are required as shown in Fig. 8. The additional states do not adversely affect the testing tie and the hardware overhead significantly. There are cores in parallel and each separator zero and tail takes (1 +) cycles to decode. Therefore, for cores, the decoding tie t tail for the separator and the tail is given by t tail = (r j + r j ) =(1+) =(1+)R where R = rj. Since all the prefixes of the cores are decoded in parallel, the nuber of cycles t prex required for decoding all the prefixes in T C is equal to the nuber of ones in the prefix of the core with the largest encoded test data. Therefore t prex =axf(n C; i 0 ri(1 + log ))g r j =(n C; ax 0 r ax (1 + log ))

70 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 1, NO. 6, JUNE 00 Fig. 7. Tiing diagra for the SOC channel selector FSM ( =4). Fig. 8. Modified state diagra of the decode FSM to ake tail and prefix decode cycles equal. where n C; i and r i are the nuber of encoded bits in T E and nuber of ones in Tdi for the ith core, respectively, and n C; ax and rax are the nuber of encoded bits in T E and nuber of ones in Tdi for the core with the largest encoded test data. Therefore, total testing tie T I

IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 1, NO. 6, JUNE 00 71 TABLE I EXPERIMENTAL RESULTS ON TEST DATA COMPRESSION USING GOLOMB CODES (COMPARISON WITH [9]) for cores when tested in parallel using the interleaving architecture is given by TI = t prex + t tail =(n C; ax 0 r ax (1 + log )) +(1+)R: (1) Let us now find the testing tie T NI (NI denotes noninterleaved) required if all the cores were tested one by one independently using a single ATE I/O channel T NI = f(n C; j 0 r j(1 + log ))g +(1+)R = jt Cj0 r j log 0 = jt Cj0R log + R r j +(1+)R = jt C j0r( log 0 1) () where jt C j denotes the nuber of bits in T C. The difference between the interleaved and the noninterleaved testing ties is given by T NI 0T I = jt C j0r log + R 0 n C; ax + r ax (1 + log ) 0 R 0 R = (jt C j0n C; ax ) 0 (1 + log )(R 0 r ax ) = ((jt Cj0n C; ax) 0 (1 + log )(R 0 r ax)) (jt Cj0n C; ax) 0 since n C; ax r ax and T C R. Consider a hypothetical exaple of four cores with encoded test data size equal to n C; 1 =40;n C; =60;n C; 3 =80;n C; 4 = 100 and nuber of ones equal to r 1 =4;r =6;r 3 =8;r 4 =10. Therefore, n C; ax = 100; r ax = 10; = 4;R = 8 and jt Cj = 80. Therefore T NI 0T I = 4((80 0 100) 0 (1 + )(8 0 10)) = 504.It is evident fro the above analysis that interleaving architecture reduces testing tie and increases the ATE channel bandwidth. We developed a Verilog odel for the FSM for =4and siulated it for several T C sequences. The gate-level scheatic (derived using Synopsys Design Copiler) of the channel selector FSM consists of only four flip flops and 17 gates. The additional hardware overhead is therefore very sall. V. EXPERIMENTAL RESULTS In this section, we present experiental results on Golob coding of the precoputed test sets for the six largest ISCAS 89 benchark TABLE II COMPARISON BETWEEN THE EXTERNAL CLOCK FREQUENCY f REQUIRED FOR GOLOMB-CODED TEST DATA AND THE EXTERNAL CLOCK FREQUENCY f REQUIRED FOR EXTERNAL TESTING USING ATPG-COMPACTED PATTERNS (FOR THE SAME TESTING TIME) circuits. We used test cubes (with dynaic copaction) obtained using the Mintest ATPG progra [5]. The difference between the size of the test sequences here and in [9] can be explained as follows. Since the nuber of inputs driven by the scan chains is less in every case than the nuber of outputs that feed the scan chains, additional (duy) zeros are inserted in the difference vector sequence Tdi. R This procedure was explained in Section II. The results shown in Table I deonstrate that significant aount of copression is achieved if Golob coding is applied to difference vectors obtained fro the test set and the fault-free responses. In five out of six cases, we achieve better results than ATPG copaction using Mintest. In addition, the proposed ethod outperfors [9] in five out of the six cases. The upper bound values (derived fro Theore 1) represent the worst case copression that can be achieved using Golob codes. The upper bound is an iportant paraeter which can be used to deterine the suitability of the proposed ethod. Table II deonstrates that Golob coding allows us to use a slower tester without incurring any testing tie penalty. As discussed in Section III, Golob coding provides three iportant benefits: 1) it significantly reduces the volue of test data; ) the test patterns can be applied to the core under test at the scan clock frequency f scan using an external tester that runs at frequency f ext = f scan=; and 3) in coparison with external testing using ATPG-copacted patterns, the sae testing tie is achieved using a uch slower tester. The third issue is highlighted in Table II. We next copare our results with a recent parallel scan design technique aied at reducing test data volue and testing tie [11]. A direct coparison is difficult since the two ethods eploy different strategies. Nevertheless, a coparison with the published results in [11] shows that the proposed ethod outperfors [11] for five out of the six largest ISCAS 89 benchark circuits; see Table III. Moreover, the scan broadcast approach in [11] requires a structural odel of a core for test generation and for deterining aliased faults, a restriction that does not affect the proposed copression technique.

7 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 1, NO. 6, JUNE 00 TABLE III COMPARISON WITH THE SCAN BROADCAST SCHEME IN [11] Testing tie coparison between the two ethods is especially difficult. The testing tie in [11] is presented in ters of clock cycles, whereas the proposed ethod eploys two different clock rates: a faster on-chip decode clock and a slower off-chip tester clock for feeding TE. Hence, even though four to six ties ore clock cycles are required here copared to [11], we clai that the testing tie is less since f scan is uch larger than f ext and the use of 16 scan chains as in [11] can offer significantly ore parallelis. (We assued single scan chains for all the bencharks in our experients.) The copression ethod presented here is directed at IP cores in SOCs, which are not BIST-ed and whose structural odels are not available. Only the precoputed test sets are available to the syste integrator. Greater copression can be achieved with LFSR-based reseeding in a BIST environent [10]. This, however, requires that the cores be BIST-ed and structural odels be ade available for fault siulation to identify easy faults and for test generation to deterine test cubes for hard faults. [3] A. Jas, J. Ghosh-Dastidar, and N. A. Touba, Scan vector copression/decopression using statistical coding, in Proc. IEEE VLSI Test Syp., 1999, pp. 114 10. [4] A. Jas and N. A. Touba, Test vector decopression via cyclical scan chains and its application to testing core-based design, in Proc. Int. Test Conf., 1998, pp. 458 464. [5] I. Hazaoglu and J. H. Patel, Test set copaction algoriths for cobinational circuits, in Proc. Int. Test Conf., 1998, pp. 83 89. [6] S. Kajihara, I. Poeranz, K. Kinoshita, and S. M. Reddy, On copacting test sets by addition and reoval of vectors, in Proc. VLSI Test Syp., 1994, pp. 0 07. [7] S. W. Golob, Run-length encoding, IEEE Trans. Infor. Theory, vol. IT-1, pp. 399 401, 1966. [8] H. Kobayashi and L. R. Bahl, Iage data copression by predictive coding, Part I: Prediction algorith, IBM J. Res. Devel., vol. 18, p. 164, 1974. [9] A. Chandra and K. Chakrabarty, Syste-on-a-chip test data copression and decopression architectures based on Golob codes, IEEE Trans. Coputer-Aided Design, vol. 0, pp. 355 368, Mar. 001. [10] S. Hellebrand, H.-G. Liang, and H.-J. Wunderlich, A ixed-ode BIST schee based on reseeding of folding counters, in Proc. Int. Test Conf., 000, pp. 778 784. [11] I. Hazaoglu and J. H. Patel, Reducing test application tie for full scan ebedded cores, in Proc. Int. Syp. Fault-Tolerant Coputing, 1999, pp. 60 67. [1] H. K. Lee and D. S. Ha, An efficient forward fault siulation algorith based on the parallel pattern single fault propagation, in Proc. Int. Test Conf., Oct. 1991, pp. 946 955. [13] D. Heidel, S. Dhong, P. Hofstee, M. Iediato, K. Nowka, J. Silberan, and K. Stawiasz, High-speed serializing/de-serializing design-for-test ethods for evaluating a 1 GHz icroprocessor, in Proc. IEEE VLSI Test Syp., 1998, pp. 34 38. [14] H. K. Lee and D. S. Ha, On the generation of test patterns for cobinational circuits, Dept. Electrical Eng., Virginia Polytechnic Inst. State Univ., Tech. Rep. 1_93. VI. CONCLUSION We have presented a new test data copression and decopression ethod for testing ebedded cores in an SOC. We have shown that the proposed schee akes efficient use of Golob codes and the internal scan chain(s) of the core under test to achieve high test data copression for SOCs and to save ATE eory and testing tie. We have also presented a novel interleaving decopression architecture that allows testing of ultiple cores in parallel using a single ATE I/O channel. This reduces the testing tie of an SOC further and increases the ATE I/O channel capacity. The additional logic for the SOC channel selector is sall and easy to ipleent. In addition, it is independent of the ultiple cores under test and their corresponding precoputed test sets. We also show that apart fro reduction in the volue of test data, test data copression also allows a slower tester to be used without any reduction in testing tie. Experiental results for the ISCAS bencharks show that the proposed schee is very efficient for copressing test data. The results also show that ATPG copaction ay not always be necessary for saving ATE eory and reducing testing tie. ACKNOWLEDGMENT The authors acknowledge Prof. H.-J. Wunderlich of the University of Stuttgart, Gerany, for discussions on the use of the internal scan chain for pattern application. The authors would also like to thank S. Swainathan for help in carrying out the experients. REFERENCES [1] Y. Zorian, E. J. Marinissen, and S. Dey, Testing ebedded-core based syste chips, in Proc. Int. Test Conf., 1998, pp. 130 143. [] V. Iyengar, K. Chakrabarty, and B. T. Murray, Deterinistic built-in pattern generation for sequential circuits, J. Electron. Testing: Theory and Applications (JETTA), vol. 15, pp. 97 115, Aug./Oct. 1999.