# Low-Power Test Planning for Arbitrary At-Speed Delay-Test Clock Schemes

Christian G. Zoellin, Hans-Joachim Wunderlich University of Stuttgart Pfaffenwaldring 47, D-70569 Stuttgart, Germany email: {zoellin, wu}@iti.uni-stuttgart.de

Abstract—High delay-fault coverage requires rather sophisticated clocking schemes in test mode, which usually combine launch-on-shift and launch-on-capture strategies. These complex clocking schemes make low power test planning more difficult as initialization, justification and propagation require multiple clock cycles. This paper describes a unified method to map the sequential test planning problem to a combinational circuit representation. The combinational representation is subject to known algorithms for efficient low power built-in self-test planning. Experimental results for a set of industrial circuits show that even rather complex test clocking schemes lead to an efficient low power test plan.

Keywords-Delay test, power-aware testing, built-in self-test

#### I. INTRODUCTION

Delay testing is a standard technique to ensure product quality, and it is part of nearly all volume test schemes today. Structural testing of circuits with scan design requires special means if delay faults are addressed, since the time between triggering a transition and capturing the circuit responses has to be sufficiently small.

Mainly two techniques are commonly used to launch transitions after shifting in the pattern. The *launch-on-shift* (LOS) technique launches the transition by shifting the initialization pattern one more bit [1]. The *launch-on-capture* (LOC) technique does not apply a shift clock but a system clock. In this way, the second pattern is just the functional response of the circuit to the first pattern [2].

Neither of the two techniques allow the generation of an arbitrary transition pattern from an initialization pattern, and they lead to incomplete delay fault coverage in general. Yet, these techniques do not detect the same faults, and higher fault coverage is obtained by combining both [3], [4], [5]. However, there are circuits where we even have to apply a sequence of shift or capture clocks in order to detect a certain delay fault (examples are given in [6]), and repeated and combined fault mechanisms have to be applied [7], [8], [9]. This results in a rather complex multicycle clocking scheme, which will make low power test strategies ineffective.

The relevance of low power testing is well known [10], [11], [12], and delay testing is especially sensitive to excessive power consumption as peak power affects timing directly. More complex clock schemes require additional at-speed clock cycles, increasing the likelihood of IR-drop. Moreover, pattern sequences may be applied that exercise circuit states and transitions which are functionally unreachable. Hence, there is the concern of over-testing [13], [14], especially due to excessive power consumption of such tests [15], [16].

A plethora of methods has been presented that reduces power during built-in self-test (BIST) [11], [12]. Usually, a combination of those techniques has to be applied, and very effective test planning methods, which switch off complete scan chains for some time, have been proposed in [17], [18], [19]. These scan enabling techniques assume a combinational circuit and fault model, and they are not directly applicable to multicycle delay testing. The paper at hand presents for the first time a formalized way to deal with multicycle clock schemes for low power test planning.

A procedure is presented to derive graph-based circuit representations that reflect a specified clock sequence. Using this technique, test plan generation is adapted to arbitrary transition test clock sequences.

The rest of the paper is structured as follows: The next section gives a brief overview of the relevant methods for delay testing and power-aware test. The third section introduces the formalized method to deal with graph representations of circuits that are subject to a given test clock sequence. The fourth section demonstrates how the approach is applied to test plan generation and section five shows the experimental evaluation. It is shown that test plan generation is effective, even with complex test clock schemes. For a set of industrial benchmark circuits, a significant number of flip-flops can be deactivated during both shift and capture phases without impacting fault coverage.

# II. STATE OF THE ART: LOW POWER TESTING, DELAY TESTING AND CIRCUIT MODELING FOR TEST

Usually, low power test and delay test are dealt with separately, and a special circuit modeling technique is not applied. This section introduces the state of the art of these three subjects only as far as needed for the subsequent section.

*a) Power aware testing:* The elevated switching activity during test may result in average and instantaneous (peak) power consumption beyond the functional specification [11]. This can result in yield loss or even the degradation of product quality and reliability. Peak power is split into two categories: Peak power during shift and peak power during launch and capture of the test pattern.

Power-aware DFT techniques include special flip-flops, which suppress output toggling during shift [20]. Methods for power-aware test generation [21], [22] and don't-care fill [23] significantly reduce the peak power consumption of the combinational logic only. The aforementioned techniques do not avoid the power consumed in the clock distribution. For this, the peak power during shift can also be reduced by staggered clocking of the scan chains [24], [25] or by modifying the clock duty cycle [26]. However, if the atspeed launch and capture clocks are skewed or executed in a staggered fashion, additional patterns may be necessary to compensate for lost fault coverage.

In general, clock gating is an effective technique to reduce both the power of the clock distribution and the combinational logic. The STUMPS architecture (self-test using MISR and parallel shift register sequence generator [27]) for BIST may be extended by clock gating (Figure 1). For example, in current designs each scan chain can be disabled individually and the clocks for these chains are disabled completely, both during scanning and during launch and capture [28]. The power grid of parts of the circuit with disabled clocks contributes considerable capacitance to the active parts and this significantly reduces the likelihood of errors due to IR-drop [16].



Fig. 1. STUMPS architecture with clock gating

To take advantage of such an architecture, the scan operation may be done sequentially for each chain separately [29], but test time would be increased.

A test planning has been published in [18], [19], [30] which activates only a small number of scan chains at a time. This test plan still detects all faults without increasing test time, while significantly reducing average and peak power for scan and launch-capture cycles. Heuristics allow to solve the underlying set covering problem with acceptable run times even for industrial circuits.

*b) Multicycle delay test:* As already explained, it is beneficial to apply both LOC and LOS tests. Fault coverage of LOC can be further increased by using additional functional clock cycles [7], [8]. And finally, both schemes may be combined to form Launch-on-Capture-Shift (LOCS) and Launch-on-Shift-Capture (LOSC) [9]. However, the test planning has to take

into account the specific clock sequence regarding the faults that may be detected and the scan elements that may have to be controlled and observed.

All of these techniques transform a combinational test problem into a sequential one. Now, clock sequences are required, different flip-flops and different scan chains have to be activated at different clock cycles, and, even worse than the classical test problem, the combinations of scan and capture clocks cause both a multiclock and multicycle problem.

c) Circuit models: For many design automation problems, circuits are modeled as directed graphs. In fault simulation and ATPG, the circuit is represented by a directed graph, where the vertices correspond to primary inputs, primary outputs and the outputs of the gates. The graph types employed in the test plan generation are circuit graphs for fault simulation [31] and S-graphs of the flip-flops [32],

A circuit graph G = (V, E) consists of primary inputs I, primary outputs O, combinational nodes  $V_{com}$  corresponding to gates, and sequential nodes  $V_{seq}$  corresponding to flip-flops:  $V = I \cup O \cup V_{com} \cup V_{seq}$ . There is an edge between two nodes  $a, b \in V$ ,  $(a, b) \in E$ , if there is a component where a is input pin and b output pin. The circuit graph is a refinement of the S-graph  $G_S = (V_S, E_S)$ , where  $V_S = I \cup O \cup V_{seq}$ and  $(a, b) \in E_S$ , if there is a path  $a, a_1, \ldots, a_n, b$  in G with  $a_i \in V_{com}$ .

Sequential test generation can be mapped to combinational test generation if the S-graph does not contain any cycles in quadratic worst case complexity [32]. An S-graph is equidistant or balanced, if all the paths between two nodes have identical length [32], [33], [34], [35]. Test generation for circuits with an equidistant S-graph is mapped directly to combinational test generation.

An S-graph can be made acyclic or equidistant by removing nodes from  $V_{seq}$ , which models putting the corresponding flip-flops in a (partial) scan path. Efficient algorithms are found in [33], [34], [36], [37].

The basic idea of ATPG for acyclic circuits exploits the fact, that unrolling a circuit [38] can be reduced to copying only those parts of a circuit which are necessary for fault detection. All the algorithms for generating combinational representations rely on a single clock scheme, and modifications are required for a scan based delay test.

# III. CIRCUIT GRAPH GENERATION FOR DELAY TEST SCHEMES

The central idea is to generate a combinational representation of a circuit based on a multicycle, multiclock scheme, and to apply test planning on this representation. We formalize the implications of shift and capture cycles in arbitrary clock sequences. For each of capture and shift clock, we show how to compute a set of edges that can connect two isomorphic copies of the circuit graph. The final graph is then created by concatenating several copies of the original graph.

For sake of simplicity we consider only full-scan circuits, but the presented formalism is easily extended to partial scan circuits. The information about the structure of the scan chains and the input/output relation of scan flip-flops is not included in the circuit graph G. Scan Flip-Flops  $FF \subset O \times I$  are edges between pseudo-primary outputs and pseudo-primary inputs. The scan chain organization  $SC \subset \mathcal{P}(I)$  is a partitioning of the pseudo-primary inputs in the circuit. For each scan chain  $SC_i \subset I$  in SC the scan chain order  $sc_i \in SC_i^*$  is given as a unique sequence  $sc_i = (ppi_1, ppi_2, \ldots)$ . Figure 2 shows a circuit graph for a small example.



Fig. 2. An example circuit graph. The sets of inputs and outputs to this circuit are  $I = \{pi_1, ppi_1, ppi_2, ppi_3\}$  and  $O = \{ppo_1, ppo_2, ppo_3, po_1\}$ . The edges that represent the (scan) flip-flops and the scan in *si* and scan out *so* are not depicted here. They are  $FF = \{(ppo_1, ppi_1), (ppo_2, ppi_2), (ppo_3, ppi_3)\}$ . The single scan chain of the circuit is  $sc_1 = (ppi_1, ppi_2, ppi_3)$ .

Let  $G_t(V_t, E_t)$  be a copy of G and let  $I_t$ ,  $O_t$  be the sets of inputs and outputs in  $G_t$ . We say two vertices  $v_{t_1} \in V_{t_1}$  and  $v_{t_2} \in V_{t_2}$  with  $t_1 \neq t_2$  are structurally equivalent (i.e. map to the same circuit node) if they are derived from the same node in G.

The circuit state and output after a clock can now be described as the concatenation of two copies  $G_t$  and  $G_{t+1}$  of G.

#### A. Graph Concatenation for Capture Clock

A capture clock causes the data at the pseudo-primary outputs of  $G_t$  to appear at the pseudo-primary inputs of  $G_{t+1}$ . The data flow for this case is described by the edges represented in the set of scan flip-flops FF.

Hence two graphs  $G_t$  and  $G_{t+1}$  may be concatened using the following set of edges:

$$Cap_{t,t+1} \subset O_t \times I_{t+1}$$

$$Cap_{t,t+1} = \{(o_t, i_{t+1}) \in O_t \times I_{t+1} \mid \\ \exists (o_f, i_f) \in FF: \\ o_t, o_f \text{ and } i_{t+1}, i_f \text{ are struct. equiv.} \}$$

Figure 3 shows the set of edges  $Cap_{t,t+1}$  for the example circuit above.

## B. Graph Concatenation for Shift Clock

 $\sim$ 

If a shift clock is applied instead of a capture clock, the inputs of one circuit graph are mapped to inputs of the other



Fig. 3. The graph concatenation for a capture clock.

circuit graph. The concatenation of the graphs  $G_t$  and  $G_{t+1}$  is derived from the scan chains of the circuit:

$$Shf_{t,t+1} \subset I_t \times I_{t+1}$$

$$Shf_{t,t+1} = \{(i_1, i_2) \in I_t \times I_{t+1} \ | \ i_1 \in ff_k \land i_2 \in ff_{k+1}\}$$

where  $ff_k$ ,  $ff_{k+1}$  are successive flip-flops in a scan chain  $SC_j \in SC$  of the scan chain organization of the circuit.

Figure 4 shows  $Shf_{t,t+1}$  for the example. Depending on the purpose of the graph, scan-in and scan-out nodes can be added for the shift clock cycle.



Fig. 4. The graph concatenation for a shift clock.

#### C. Graph Generation from a Clock Sequence

A clock sequence is described by any launch clock sequence  $l \in L^*$  over the alphabet  $L = \{CAP, SHIFT\}$ .

For the final graph  $G_l = (V_l, E_l)$ , |l| the copies  $G_1..G_l$  of the graph  $G_0 = G$  are created, one for each clock in the sequence. The vertices of the final graph are then:

$$V_l = \bigcup_{t=0}^{|l|} V_t$$

The edges of the final graph are the edges of each copy plus the edges for the concatenation of the graphs:

$$E_l = E_0 + \bigcup_{t=1}^{|l|} \begin{cases} E_t \cup Cap_{t-1,t} & \text{if } l_t = CAP \\ E_t \cup Shf_{t-1,t} & \text{if } l_t = SHIFT \end{cases}$$

The presented formalization can be easily implemented using almost any graph representation. With simple coding techniques, the graphs can share the same algorithm to deal with arbitrary clock sequences.

# IV. TEST PLANNING FOR DELAY FAULTS

Now, we outline the general method of test-planning for BIST power reduction and show how it uses the information generated in section 2. For every seed of the pattern generator in Fig. 1, a configuration of the scan chains is computed such that fault coverage is not impaired. The degrees of freedom are encoded into constraints for a set covering problem, which is solved using branch & bound and a divide-and-conquer heuristic.

PPSFP fault simulation on a circuit graph is used to classify faults and defines detecting flip-flops. Each detecting flip-flop determines a set of required scan-chains which is computed using the S-graph. The S-graph generated by the method in section 2 reflects the clocking scheme and no other measures have to be taken to support delay tests. The circuit graph for the PPSFP fault simulation is also just concatenated using the method of section 2. The only special consideration is that the fault simulator is aware of the time frames and injects the transition faults in every time-frame of the clock sequence.

A test block is a tuple  $(s, SC_b)$  consisting of a seed  $s \in S$ and a set of activated scan chains  $SC_b \subset SC$ . The test set generated from s has constant size N and the set S of possible seeds is given. A block  $b = (s, SC_b)$  has an associated set of faults  $F_b$  detected by b. The goal of test planning is to compute a set of blocks B such that a given set of faults F is detected and the set is optimized w.r.t. the estimated power consumption. A given fault may be covered by several different blocks, and these constraints are input to a set covering.

The set covering is evaluated with a cost function, which is an estimate of the power consumption. To allow efficient evaluation during the branch & bound optimization, we use the number of activated scan chains of all the seeds S in B.  $B_s \subset B$  is the set of blocks with seed s. The cost function is now:  $Cost(B) = \sum_{s \in S_B} \left| \bigcup_{(s,sc) \in B_s} sc \right|$ .

Input to the set covering is a set of constraints. To deal with the computational complexity of the set covering problem considered here, the divide-and-conquer heuristic is employed. The set of faults is divided according to the testability of the faults, which is determined by fault simulation. Be  $F_i \subset F$  the set of faults to be considered in one step of the divide-and-conquer heuristic.

For each fault  $f \in F_i$ , fault simulation is used to determine the set of flip-flops  $FF_{f,s} \in FF$  that observe the fault effect when applying a seed s. For a flip-flop  $ff \in FF_{f,s}$ , the fault is known to be detected if all of the flip-flops in its input cone are activated during application of s. The flip-flops in the input cone are derived from the transitive inputs pred(ff) of ff in the S-graph of the circuit.

From  $\{ff\} \cup pred(ff)$  we can determine the scan chains c(ff) to be activated to detect fault f in flip-flop ff. For each seed s and each fault f, we can now determine a set of blocks that detect f:

$$B_{f,s} = \bigcup_{ff \in FF_{f,s}} \{(s, c(ff))\}$$

Now, the set

$$\bigcup_{f \in F_i} \bigcup_{s \in S} B_{f,s}$$

is the set of constraints for the set covering problem with respect to  $F_i$ . The results of the set covering is a set of blocks  $B_i$  that detect  $F_i$ . The problem is solved using a branchand-bound algorithm such that all faults are detected and  $cost(\bigcup_{j=1..i} B_j)$  is minimal.

For large industrial circuits, the constraints contain a high degree of freedom since most faults can be detected numerous times. Consequently, searching for the optimal solution of the set covering problem is not feasible. However, a very good solution can be efficiently found if the problem is divided into several sub-problems by a divide-and-conquer heuristic [19].

#### V. EVALUATION

While the approach presented in the previous sections works with arbitrary clock sequences, we concentrate here on atspeed delay tests with the most important clock schemes:

- A single capture cycle (LOC)
- A single shift cycle (LOS)
- A capture cycle followed by a shift cycle (LOCS)
- A shift cycle followed by a capture cycle (LOSC)

The experiments were conducted for a number of large circuits. The scan chains for all the circuits are clustered according to the method presented in [30] that does not target a fault model or test set. Only the largest designs from the well-known ISCAS and ITC benchmarks have been selected. The designs from ISCAS89 are denoted by s\* and the design from ITC99 by b\*. The industrial circuits have been provided by NXP (denoted by p\*). These circuits exhibit the typical properties of industrial circuits, such as shorter paths and smaller input cones necessitated by the optimization for high frequency, low area and low power.

Table I shows the characteristics of the circuits. For each circuit it gives the number of gates, chains, flip-flops and transition faults. If timing information of the circuit is available, the approach can be adapted to small gate delay and path delay faults and the general remarks below are still valid.

All the test plans are generated for a BIST with 200 seeds, and 1024 patterns are generated from each seed. Transition faults are tested by multi-pattern tests, so they have lower detectability than stuck-at faults. Hence, it may be acceptable or desireable to apply test sequences even longer than 200k

| Circuit | # Gates | # Chains | # FFs  | # Faults |  |  |  |  |  |  |
|---------|---------|----------|--------|----------|--|--|--|--|--|--|
| s38417  | 24079   | 32       | 1770   | 65364    |  |  |  |  |  |  |
| s38584  | 22092   | 32       | 1742   | 52018    |  |  |  |  |  |  |
| b17     | 37446   | 32       | 1549   | 143346   |  |  |  |  |  |  |
| b18     | 130949  | 32       | 3378   | 487136   |  |  |  |  |  |  |
| b19     | 263547  | 32       | 6693   | 981866   |  |  |  |  |  |  |
| p286k   | 332726  | 55       | 17713  | 1117520  |  |  |  |  |  |  |
| p330k   | 312666  | 64       | 17226  | 947798   |  |  |  |  |  |  |
| p388k   | 433331  | 50       | 24065  | 1476348  |  |  |  |  |  |  |
| p418k   | 382633  | 64       | 29205  | 1173036  |  |  |  |  |  |  |
| p951k   | 816072  | 82       | 104624 | 2634564  |  |  |  |  |  |  |
| TABLE I |         |          |        |          |  |  |  |  |  |  |

CIRCUIT CHARACTERISTICS

patterns. It has been shown that even better results are obtained for longer tests, since the test planning is able to take advantage of the added degrees of freedom [19].

Table II reports the results for each of the four clock sequences. |F| is the number of faults detectable by the seeds of the tests and targeted by the test plan.  $|F_{ess}|$  is the number of (essential) faults, detected by just a single seed from the overall set of seeds. P is the estimated power in percent of the power of the regular execution of the test without turning off any scan chains. As a precise power estimation would require a circuit simulation for each shift cycle, the power is estimated by computing the switching activity of the flip-flops for the sake of computation time. Flip-Flops that are deactivated are not clocked during shift, launch and capture and subsequently both average and peak power are reduced. The runtimes of the approach are dominated by the fault simulation of the pseudorandom patterns.

As expected, LOS detects significantly more faults than LOC. With LOC, the random patterns are launched through the logic network and this introduces significant correlation between the two patterns. With LOS, the shift cycle causes correlation between consecutive flip-flops in the scan chains, but this is less severe compared to LOC. LOCS uses a capture clock cycle followed by a shift clock cycle. The shift cycle is able to randomize much of the correlation caused by the

combinational logic. Consequently, LOCS is very close to LOS in terms of fault coverage. Finally, LOSC has some interesting properties: First, the leading shift cycle activates a large number of transition faults as expected from LOS. Second, the capture cycle effectively propagates the circuit responses and at the same time it activates additional transition faults. Hence, LOSC has the highest fault coverage for all the circuits except p286k. In contrast, the responses of the leading capture cycle in the LOCS scheme can only be used for justification since erroneous responses from transition faults are not propagated by the subsequent shift cycle.

In most cases, the highest reduction of the test power is achieved when using the LOS clock scheme. For LOC, the set of flip-flops that has to be actived to detect a target fault is relatively large. Besides the observing flip-flops it includes all of the flip-flops in the input cone and in turn all the flip-flops in the input cones of these flip-flops. These flip-flops span many more scan chains than the small set that is sufficient for LOS. LOCS and LOSC also suffer from the rather large input cones due to the capture cycle. But they exhibit some rather interesting properties: Many faults are detected by many more seeds compared to LOC and as an indication of this, the number of essential faults is significantly reduced for LOCS and LOSC. This effect is even more pronounced for LOSC, since faults are activated in both cycles of the clock scheme. The additional degree of freedom is effectively used by the test planning and the power reduction achieved with LOSC is comparable to that of LOS and even exceeds LOS for s38584, s38417 and p951k, despite the higher fault coverage.

If the best clock scheme is selected for each of the circuits, the power reduction obtained here is in the same range as the power reduction obtained for stuck-at-faults in [19]. Furthermore, it should be emphasized that the test planning used here keeps fault coverage and test length under all circumstances.

### VI. CONCLUSIONS

To achieve high fault-coverage and short test time, at-speed delay tests are tailored using arbitrary test clock sequences. We have presented a consistent, formalized scheme to generate the circuit graphs that reflect the sequential behavior caused by a

| Circuit | LOC     |             |       | LOS     |             | LOCS  |         |             | LOSC  |         |             |       |
|---------|---------|-------------|-------|---------|-------------|-------|---------|-------------|-------|---------|-------------|-------|
| Name    | F       | $ F_{ess} $ | P [%] | F       | $ F_{ess} $ | P [%] | F       | $ F_{ess} $ | P [%] | F       | $ F_{ess} $ | P [%] |
| s38584  | 47527   | 502         | 15.33 | 58645   | 182         | 9.06  | 57353   | 314         | 11.60 | 61174   | 288         | 08.92 |
| s38417  | 47869   | 1283        | 16.11 | 49209   | 1179        | 15.93 | 48128   | 1089        | 13.47 | 50511   | 947         | 11.47 |
| b17     | 89814   | 4099        | 56.11 | 113476  | 7017        | 56.88 | 110467  | 7874        | 53.19 | 117766  | 4732        | 58.36 |
| b18     | 259294  | 23026       | 69.10 | 383652  | 19414       | 77.30 | 374471  | 20725       | 74.40 | 389137  | 14725       | 77.59 |
| b19     | 518771  | 40038       | 77.85 | 768408  | 40908       | 84.65 | 755697  | 42435       | 82.59 | 778310  | 30475       | 84.68 |
| p286k   | 802947  | 41401       | 79.04 | 1020417 | 16170       | 70.41 | 980883  | 19999       | 77.34 | 1010510 | 18945       | 73.05 |
| p330k   | 753738  | 16568       | 56.53 | 823477  | 8580        | 38.57 | 786042  | 19875       | 51.84 | 830790  | 8725        | 39.01 |
| p388k   | 1256203 | 17920       | 58.81 | 1416672 | 7153        | 38.48 | 1399901 | 14714       | 53.34 | 1416907 | 9627        | 44.71 |
| p418k   | 866561  | 26480       | 60.36 | 1035798 | 20019       | 56.10 | 944916  | 17666       | 55.15 | 1045834 | 21316       | 56.27 |
| p951k   | 2280840 | 19812       | 37.42 | 2418250 | 16207       | 36.45 | 2409177 | 18034       | 36.57 | 2449348 | 15186       | 33.67 |

TABLE II SIMULATION RESULTS FOR DIFFERENT CLOCK SEQUENCES

given clock sequence. This scheme was employed to generate the graphs used during low-power test planning. This way, test plans can be computed for any clock scheme and clock schemes are easily compared.

The most common clock schemes have been evaluated for a set of industrial benchmarks. A significant power reduction is obtained for all the combinations of circuits and clock schemes. From the clock schemes evaluated here, launchon-shift and launch-on-shift-capture provide the best trade-off between fault coverage and power consumption.

#### REFERENCES

- J. Savir, "Skewed-load transition test: Part 1, calculus," in *Proceedings* IEEE International Test Conference 1992, Discover the New World of Test and Design, Baltimore, Maryland, USA, September 20-24, 1992, 1992, pp. 705–713.
- [2] J. Savir and S. Patil, "Broad-side delay test," *IEEE Trans. on CAD of Integrated Circuits and Systems*, vol. 13, no. 8, pp. 1057–1064, 1994.
- [3] R. Madge, B. Benware, and W. R. Daasch, "Obtaining high defect coverage for frequency-dependent defects in complex ASICs," *IEEE Design & Test of Computers*, vol. 20, no. 5, pp. 46–53, 2003.
- [4] S. Wang, X. Liu, and S. T. Chakradhar, "Hybrid delay scan: A low hardware overhead scan-based delay test technique for high fault coverage and compact test sets," in 2004 Design, Automation and Test in Europe Conference and Exposition (DATE 2004), 16-20 February 2004, Paris, France, 2004, pp. 1296–1301.
- [5] N. Ahmed and M. Tehranipoor, "Improving transition delay fault coverage using hybrid scan-based technique," in 20th IEEE International Symposium on Defect and Fault-Tolerance in VLSI Systems (DFT 2005), 3-5 October 2005, Monterey, CA, USA, 2005, pp. 187–198.
- [6] S. M. Reddy, "Models for delay faults," in *Models in Hardware Testing* - Lecture Notes of the Forum in Honor of Christian Landrault, H.-J. Wunderlich, Ed. Springer-Verlag Berlin Heidelberg, 2009.
- J. Abraham, U. Goel, and A. Kumar, "Multi-cycle sensitizable transition delay faults," in 24th IEEE VLSI Test Symposium (VTS 2006), 30 April - 4 May 2006, Berkeley, California, USA, 2006, pp. 306–313.
- [8] Z. Zhang, S. M. Reddy, I. Pomeranz, X. Lin, and J. Rajski, "Scan tests with multiple fault activation cycles for delay faults," in 24th IEEE VLSI Test Symposium (VTS 2006), 30 April - 4 May 2006, Berkeley, California, USA, 2006, pp. 343–348.
- [9] I. Park and E. J. McCluskey, "Launch-on-shift-capture transition tests," in *IEEE International Test Conference (ITC08), October 26 - 31, 2008, Santa Clara, CA*, 2008, p. 35.3.
- [10] Y. Zorian, "A distributed BIST control scheme for complex VLSI devices," in *Proceedings of the 11th IEEE VLSI Test Symposium (VTS '93)*, 1993, pp. 4–9.
  [11] P. Girard, "Survey of low-power testing of vlsi circuits," *IEEE Design*
- [11] P. Girard, "Survey of low-power testing of vlsi circuits," *IEEE Design & Test of Computers*, vol. 19, no. 3, pp. 82–92, 2002.
- [12] P. Girard, N. Nicolici, and X. Wen, Eds., *Power-Aware Testing and Test Strategies for Low Power Devices*. Springer-Verlag Berlin Heidelberg, 2009.
- [13] J. Rearick, "Too much delay fault coverage is a bad thing," in *Proceedings IEEE International Test Conference 2001, Baltimore, MD, USA, 30 October 1 November 2001*, 2001, pp. 624–633.
- [14] Z. Zhang, S. M. Reddy, and I. Pomeranz, "Warning: Launch off shift tests for delay faults may contribute to test escapes," in *Proceedings of* the 12th Conference on Asia South Pacific Design Automation, ASP-DAC 2007, Yokohama, Japan, January 23-26, 2007, 2007, pp. 817–822.
- [15] J. Saxena, K. M. Butler, V. B. Jayaram, S. Kundu, N. V. Arvind, P. Sreeprakash, and M. Hachinger, "A case study of IR-drop in structured at-speed testing," in *Proceedings 2003 International Test Conference* (*ITC 2003*), 28 September - 3 October 2003, Charlotte, NC, USA, 2003, pp. 1098–1104.
- [16] R. Franch, P. Restle, N. James, W. Huott, J. Friedrich, R. Dixon, S. Weitzel, K. Van Goor, and G. Salem, "On-chip timing uncertainty measurements on IBM microprocessors," in *IEEE International Test Conference*, 2007. ITC 2007., 2007, pp. 1–7.
- [17] R. Sankaralingam and N. A. Touba, "Controlling peak power during scan testing," in 20th IEEE VLSI Test Symposium (VTS 2002), 28 April - 2 May 2002, Monterey, CA, USA, 2002, pp. 153–159.

- [18] C. Zoellin, H.-J. Wunderlich, N. Maeding, and J. Leenstra, "BIST power reduction using scan-chain disable in the Cell processor," in *IEEE International Test Conference (ITC '06), Santa Clara, CA, USA, Oct.* 24 - 26, 2006, p. 32.3.
- [19] M. E. Imhof, C. G. Zoellin, H.-J. Wunderlich, N. Mäding, and J. Leenstra, "Scan test planning for power reduction," in *Proceedings of the* 44th Design Automation Conference, DAC 2007, San Diego, CA, USA, June 4-8, 2007, 2007, pp. 521–526.
- [20] S. Gerstendoerfer and H.-J. Wunderlich, "Minimized power consumption for scan-based BIST," in *IEEE International Test Conference (ITC '99)*, *NJ, USA, 27-30 Sept.*, 1999, pp. 77–84.
- [21] F. Corno, P. Prinetto, M. Rebaudengo, and M. S. Reorda, "A test pattern generation methodology for low-power consumption," in 16th IEEE VLSI Test Symposium (VTS '98), 28 April - 1 May 1998, Princeton, NJ, USA, 1998, pp. 453–459.
- [22] X. Wen, S. Kajihara, K. Miyase, T. Suzuki, K. K. Saluja, L.-T. Wang, K. S. Abdel-Hafez, and K. Kinoshita, "A new ATPG method for efficient capture power reduction during scan testing," in 24th IEEE VLSI Test Symposium (VTS 2006), 30 April - 4 May 2006, Berkeley, California, USA, 2006, pp. 58–65.
- [23] S. Remersaro, X. Lin, Z. Zhang, S. Reddy, I. Pomeranz, and J. Rajski, "Preferred fill: A scalable method to reduce capture power for scan based designs," in *IEEE International Test Conference (ITC)*, 2006, pp. 1–10.
- [24] P. Girard, L. Guiller, C. Landrault, S. Pravossoudovitch, and H.-J. Wunderlich, "A modified clock scheme for a low power BIST test pattern generator," in 19th IEEE VLSI Test Symposium (VTS 2001), 29 April -3 May 2001, Marina Del Rey, CA, USA, 2001, pp. 306–311.
- [25] P. M. Rosinger, B. M. Al-Hashimi, and N. Nicolici, "Scan architecture with mutually exclusive scan segment activation for shift- and capturepower reduction," *IEEE Trans. on CAD of Integrated Circuits and Systems*, vol. 23, no. 7, pp. 1142–1153, 2004.
- [26] T. Yoshida and M. Watari, "MD-SCAN method for low power scan testing," in *Proceedings 11th Asian Test Symposium (ATS 2002), 18-20 November 2002, Guam, USA*, 2002, pp. 80–85.
- [27] P. Bardell and W. McAnney, "Self-testing of multichip logic modules." in Proc. IEEE International Test Conference, 1982, pp. 200–204.
- [28] M. Riley, L. Bushard, N. Chelstrom, N. Kiryu, and S. Ferguson, "Testability features of the first-generation Cell processor," in *Proceedings of the IEEE International Test Conference (ITC '05)*, 8-10 Nov, Austin TX, 2005, p. 6.1.
- [29] L. Whetsel, "Adapting scan architectures for low power operation," in Proceedings IEEE International Test Conference 2000, Atlantic City, NJ, USA, October 2000, 2000, pp. 863–872.
- [30] M. Elm, H.-J. Wunderlich, M. E. Imhof, C. G. Zoellin, J. Leenstra, and N. Mäding, "Scan chain clustering for test power reduction," in *Proceedings of the 45th Design Automation Conference, DAC 2008, Anaheim, CA, USA, June 8-13, 2008*, 2008, pp. 828–833.
- [31] K. Antreich and M. H. Schulz, "Accelerated fault simulation and fault grading in combinational circuits," *IEEE Trans. on CAD of Integrated Circuits and Systems*, vol. 6, no. 5, pp. 704–712, 1987.
- [32] H. Wunderlich, "The design of random-testable sequential circuits," in International Symposium on Fault-Tolerant Computing (FTCS), 1989, pp. 110–117.
- [33] R. Gupta and M. Breuer, "BALLAST: a methodology for partial scan design," in *International Symposium on Fault-Tolerant Computing* (*FTCS*), 1989, pp. 118–125.
- [34] A. Kunzmann and H. Wunderlich, "An analytical approach to the partial scan problem," *Journal of Electronic Testing*, vol. 1, no. 2, pp. 163–174, 1990.
- [35] S. Narayanan, C. Njinda, R. Gupta, and M. Breuer, "SIESTA: a multifacet scan design system," in *Proceedings of the conference on European design automation*. IEEE Computer Society Press Los Alamitos, CA, USA, 1992, pp. 246–251.
- [36] S. T. Chakradhar, A. Balakrishnan, and V. D. Agrawal, "An exact algorithm for selecting partial scan flip-flops," *J. Electronic Testing*, vol. 7, no. 1-2, pp. 83–93, 1995.
- [37] A. P. Stroele and H.-J. Wunderlich, "Hardware-optimal test register insertion," *IEEE Trans. on CAD of Integrated Circuits and Systems*, vol. 17, no. 6, pp. 531–539, 1998.
- [38] E. McCluskey, "Iterative combinational switching networks–general design considerations," *IRE Transactions on Electronic Computers*, vol. 7, pp. 285–291, 1958.