# MINIMIZED POWER CONSUMPTION FOR SCAN-BASED BIST Stefan Gerstendörfer, Hans-Joachim Wunderlich Computer Architecture Lab University of Stuttgart Breitwiesenstr. 20-22, 70565 Stuttgart, Germany sgersten@informatik.uni-stuttgart.de, wu@informatik.uni-stuttgart.de #### **Abstract** Power consumption of digital systems may increase significantly during testing. In this paper, systems equipped with a scan-based built-in self-test like the STUMPS architecture are analyzed, the modules and modes with the highest power consumption are identified, and design modifications to reduce power consumption are proposed. The design modifications include some gating logic for masking the scan path activity during shifting, and the synthesis of additional logic for suppressing random patterns which do not contribute to increase the fault coverage. These design changes reduce power consumption during BIST by several orders of magnitude, at very low cost in terms of area and performance. Keywords: BIST, Low Power, Power consumption ### 1. Introduction ## 1.1 Motivation Power and energy consumption of digital systems are considerably higher in test mode than in system mode. Especially during self-test, power dissipation increases since random patterns cause as many nodes switching as possible while a power saving system mode only activates a few modules at the same time. Power supply and packaging of a circuit are cost-intensive parts which must be sized corresponding to the peak power consumption, and power dissipation during BIST limits the optimization potential of so-called "low power" designs. In addition to that, BIST may be reused all over the system lifetime, and for remote applications a high energy consumption may shorten the lifetime of batteries. Scan based BIST architectures are popular because of their low impact on area and performance. However, scan based architectures are expensive in power consumption as each test pattern requires a large number of shift operations with a high circuit activity. In this paper, a new scan based BIST technique is described which drastically reduces the total energy consumption as well as the average power consumption, and which does not have a major impact on neither test speed nor system speed. ### 1.2 Previous work Today, low power design is a mature research area, and power consumption is considered at all levels of system abstraction. The state of the art is comprehensively described in textbooks and survey papers [BeEL95, ChBr95, DDS95, Mein95, RaPe96, SING95, NeMe97]. Independently, self-test techniques have been developed which can be classified into "test-per-clock" schemes and "test-per-scan" schemes. This paper concentrates on "test-per-scan" schemes as shown in Figure 1, which may contain a single scan path [EiLi83] or multiple scan paths (STUMPS, [BaMc84]). Figure 1: Test-per-scan scheme. A pattern generator, in this case a linear feedback shift register (LFSR), generates a bit sequence which is fed into the scan path. The content of the scan path serves as a test pattern for the module under test (MUT), the responses of the MUT are captured by the scan path in parallel, and serially fed into a signature analysis register (SA). The test control unit usually contains a bit counter and a pattern counter. The bit counter indicates that a pattern is loaded into the scan path and a system clock has to be applied. The pattern counter is used to indicate BIST completion. In order to increase fault coverage, the pattern generator may be modified so that weighted random patterns or deterministic test patterns are generated [Wu88, LBDG87]. Known techniques are reseeding, bit-fixing, or bit-flipping [HELL95, ToMc96, WuKi96]. The methods for power reduction presented in this paper will work for any kind of serial test pattern generator. While academic research on "low power design" and on BIST has been performed nearly independently, the industrial practice has required very early ad hoc solutions for considering power consumption during BIST [VTS97]. Proposed solutions include: - Oversizing power supply, package and cooling to stand the increased current during testing. Breaks are inserted into the test process for avoiding hot spots. - Test with reduced operation frequency. - Partitioning of the system under test and appropriate test planning. The first solution increases both hardware costs and test application time. The second proposal uses less hardware, but the reduced system frequency prolongs test application time and may lead to a loss of defect coverage as dynamic faults may be masked. Test partitioning and planning allows detecting dynamic faults but increases hardware overhead, test time and total energy during BIST. The practical needs initiated academic research. Test scheduling for parallel BIST under power and time constraints was investigated in [ChSA97]. A solution for system partitioning under the same constraints was published in [Zori93]. Methods for reducing power dissipation for scan-based BIST are based on the previous work on external testing. The ordering of both scan-flipflops and the test patterns influences power and energy. Optimization techniques are published in [ChDa94, WaGu97]. ### 1.3 Organization of this paper In this paper, two techniques for reducing power dissipation of scan-based BIST architectures are proposed. They mask scan path activities in the shift mode, and they skip useless random patterns. In the next section, power consumption of a scan based architecture is analyzed. In section 3, a method for masking the shifting activity is proposed. At very low penalties in terms of area and performance, power savings up to 90 % are obtainable. In section 4, a method is presented for gating the shift clock so that useless patterns are not applied. This technique may reach savings up to another 90 %. In section 5, both methods are combined, and it is shown that the effect of both approaches adds up to savings of approximately 98%. ## 2. Power analysis ## 2.1 Estimation metric Power consumption in standard CMOS technology originates from three different sources: The *switching energy* is caused by charging and discharging capacitors during signal computation, *short circuits* occur in the dynamic phase where both the nMOS and pMOS transistors are conducting, and the *leakage current* is the minimal current of CMOS gates in the static phase. For recent technologies short circuit current and leakage current may be neglected, but this may change for future developments of high scaled integration [TAUR95]. We use the switching activity of the circuit nodes as a metric for the energy consumption since in static CMOS circuits the switching energy contributes to over 90 % of the total energy consumption [DM95]. The weighted switching activity (WSA) is defined as the number of toggles of a node multiplied by its capacitance. Neglecting short circuit and leakage currents, the average energy consumption in a static CMOS logic is proportional to the sum of the WSA of each node. Three parameters are important for evaluating the power properties of a BIST architecture: - The consumed energy directly corresponds to the WSA and has impact on battery lifetime during remote testing. - The average power consumption is the WSA divided by the test time. This parameter is even more important than the energy as hot spots and reliability problems may be caused by constantly high power consumption. - The peak power consumption corresponds to the highest WSA during one clock cycle. If the peak power exceeds certain limits, the correct functioning of the entire circuit is no longer guaranteed. All three metrics are based on the WSA, and in the next subsection we analyze the WSA of the different modules in the self-testable architecture of Figure 1. ### 2.2 WSA in a standard scan design In this section, the WSA during BIST is computed for all ISCAS85 and ISCAS89 circuits with more than 1,000 gates, and its distribution to LFSR, SA, MUT and scan path is analyzed [Br85, Br89]. The energy is given in million weighted toggles, i.e. each toggle on every node is weighted by the fanout of each node respectively. So we have $E = \sum c_i n_i$ , with $c_i$ the normalized capacitance of each node and n, the number of toggles at each node. For every experiment, the energy was calculated using a variable delay model with fanout delays and short pulse filtering. Two 32 bit LFSRs were used to generate 10,000 pseudo random patterns and to perform the signature analysis. The LFSR polynomials were chosen to provide the highest fault coverage [HWH96]. The sequence was cut off after the last pattern which detected a new fault thus minimizing the number of used patterns without reducing fault coverage. All experiments are based on random pattern testing and complete fault coverage cannot be expected. Table 1 shows the pattern number after which BIST is stopped and the fault coverage obtained so far. None of the design modifications presented in this paper will reduce the fault coverage shown in Table 1. The test control unit is not considered here, since its power consumption and area are nearly independent of the circuit size and they are not affected by the design techniques presented in this paper. Table 1: Test length and fault coverage of the analyzed test design. The columns show the number of patterns applied and the fault coverage reached. | ı | Benchmark | Test length | Fault coverage | |---|-----------|-------------|----------------| | ı | c2670 | 10,000 | 84.91 | | ı | c3540 | 4,947 | 96.05 | | ı | c5315 | 1,705 | 98.90 | | ı | c6288 | 89 | 99.56 | | | c7552 | 8,306 | 93.92 | | | s5378 | 9,681 | 98.22 | | ı | s9234 | 9,950 | 83.90 | | ı | s9234.1 | 9.994 | 83.44 | | Benchmark | Test length | Fault coverage | |-----------|-------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------| | s13207 | 10,000 | 91.58 | | s13207.1 | 9,983 | 90.92 | | s15850 | 9,811 | 91.22 | | s15850.1 | 9,760 | 90.46 | | s35932 | 248 | 89.64 | | s38417 | 9,966 | 91.74 | | s38584 | 9,936 | 94.37 | | s38584.1 | 9,973 | 94.39 | | | \$13207<br>\$13207.1<br>\$15850<br>\$15850.1<br>\$35932<br>\$38417<br>\$38584 | \$13207.1 9,983 \$15850 9,811 \$15850.1 9,760 \$35932 248 \$38417 9,966 \$38584 9,936 | Table 2 shows the WSA of the complete design but without test controller in column 2, and the corresponding ratio of all modules. The module under test (MUT) is activated at every single clock in the shift mode and in system mode. For this reason, most of the WSA has to be spent here, and column 3 shows numbers from 62.84 % up to 98.99 %. The next large portion of the WSA is due to the clock tree. We assume an edge triggered, multiplexed scan design. If more complex clocking schemes are applied, some straightforward modifications are required. The results of column 4 are based on a simple clock tree design with one buffer feeding 32 buffers in the next level so that the number of toggles per clock transition will be slightly larger than the number of flipflops in the scan path. The low number for circuit c6288 is due to the fact that the number of patterns for complete fault coverage is rather low and the scan path is rather short. The scan path needs less energy than the clock tree and the MUT. The last two columns show that the power consumption of the test pattern generator and the signature analysis register can be neglected. Because of these results we concentrate on reducing the power consumed in the MUT and in the clocking tree as described in the next sections. ### 3. Toggle suppression During shifting the output of each scan element is highly active, whereas capturing the test response accounts only little to the total power consumption. The power consumption dissipated in the MUT can be dramatically reduced by the usage of a modified shift register, which suppresses the activity at output q during shift operation, as proposed in [ZHW98]. Figure 2 shows a scan element with desired properties. Figure 2: Scan path element with reduced output activity Table 2: Energy consumption in a standard scan path design. The columns show the energy fraction consumed by the various parts of the circuit. The weighted switching activity (WSA) is given in million weighted toggles as described above. | | WSA (in | energy ratio<br>from clock | energy ratio<br>from MUT | energy ratio<br>from TPG | energy ratio from scan | energy ratio | |-----------|----------|----------------------------|--------------------------|--------------------------|------------------------|--------------| | Benchmark | million) | (%) | (%) | (%) | path (%) | from SA (%) | | c2670 | 64.17 | 28.08 | 62.84 | 1.28 | 6.53 | 1.27 | | c3540 | 11.60 | 6.60 | 89.04 | 1.46 | 1.43 | 1.47 | | c5315 | 73.52 | 15.14 | 79.72 | 0.85 | 3.43 | 0.85 | | c6288 | 74.06 | 0.59 | 98.99 | 0.15 | 0.12 | 0.54 | | c7552 | 130.70 | 10.34 | 86.19 | 0.54 | 2.39 | 0.54 | | s5378 | 566.64 | 20.65 | 71.84 | 1.33 | 4.86 | 1.32 | | s9234 | 106.96 | 12.89 | 82.45 | 0.79 | 3.09 | 0.78 | | s9234.1 | 109.03 | 13.42 | 81.90 | 0.76 | 3.14 | 0.77 | | s13207 | 513.70 | 23.12 | 70.52 | 0.46 | 5.44 | 0.46 | | s13207.1 | 522.70 | 23.58 | 69.96 | 0.45 | 5.56 | 0.46 | | s15850 | 526.36 | 16.74 | 78.46 | 0.40 | 4.01 | 0.40 | | s15850.1 | 546.43 | 17.58 | 77.45 | 0.39 | 4.18 | 0.39 | | s35932 | 5,373.74 | 14.11 | 82.34 | 0.12 | 3.32 | 0.11 | | s38417 | 3,873.33 | 15.70 | 80.28 | 0.15 | 3.73 | 0.15 | | s38584 | 2,936.74 | 17.93 | 77.50 | 0.17 | 4.24 | 0.17 | | s38584.1 | 2,957.41 | 18.07 | 77.33 | 0.17 | 4.26 | 0.17 | The scan path element receives its input either from the MUT via d or from the previous scan element via sdi. It is controlled by the test enable signal t, and only if the test is disabled (t=0), the NOR gate is transparent. If the scan path contains n flipflops, then n shift clocks and one system clock have to be applied. For these n+1 clocks the NOR-gate allows at most 2 signal changes, and on average we may expect only one. Without the NOR gate, n signal changes may occur, and on average we have to expect (n+1)/2 signal changes. To quantify the impact on delay the modified flipflop was simulated at the transistor level and compared to the standard scan element. The results obtained with HSpice for a 0.7 $\mu$ m CMOS process show that the delay time is prolonged by less than 10 % compared to the original scan element [HSP96, ZHW98]. The energy savings due to these design modifications are significant. Table 2 describes the energy savings obtainable by using a NOR gate and a NAND gate. The NOR gate holds all inputs of the MUT at constant 0, the NAND gate uses constant 1. Table 3: Energy consumption during scan based test of several benchmark circuits. Column two shows the energy values with the scan path directly connected to the MUT. Columns three and four show the energy values using a NAND and NOR gate at the output of the scan path. The remaining columns show the ratios with respect to the original consumption. The results are given in million weighted toggles as described above. | | | | | ratio of | ratio of | |-----------|----------|-------------|-------------|-------------|-------------| | | | energy with | energy with | energy with | energy with | | | energy | NAND | NOR | NAND | NOR | | Benchmark | unmasked | masking | masking | masking (%) | masking (%) | | c2670 | 64.17 | 24.15 | 24.12 | 37.64 | 37.59 | | c3540 | 11.60 | 1.72 | 1.64 | 14.82 | 14.13 | | c5315 | 73.52 | 15.62 | 15.55 | 21.24 | 21.15 | | c6288 | 74.06 | 3.99 | 4.22 | 5.39 | 5.70 | | c7552 | 130.70 | 19.14 | 19.04 | 14.64 | 14.57 | | s5378 | 56.64 | 16.36 | 16.32 | 28.88 | 28.82 | | s9234 | 106.96 | 19.69 | 19.60 | 18.40 | 18.32 | | s9234.1 | 109.03 | 20.64 | 20.56 | 18.93 | 18.85 | | s13207 | 513.70 | 152.65 | 152.49 | 29.72 | 29.68 | | s13207.1 | 522.70 | 158.23 | 158.09 | 30.27 | 30.24 | | s15850 | 526.36 | 114.89 | 114.76 | 21.83 | 21.80 | | s15850.1 | 546.43 | 124.72 | 124.62 | 22.82 | 22.81 | | s35932 | 5,373.74 | 954.37 | 953.47 | 17.76 | 17.74 | | s38417 | 3,873.33 | 767.82 | 767.90 | 19.82 | 19.83 | | s38584 | 2,936.74 | 664.57 | 663.98 | 22.63 | 22.61 | | s38584.1 | 2,957.41 | 674.01 | 673.41 | 22.79 | 22.77 | Blocking during pattern shifting by NOR or by NAND gates saves 78 % of the total energy for testing on average, and there is nearly no difference between the use of NAND or NOR gates. The next table shows that there is no significant change of the peak power caused by the blocking gate, but the average power is reduced as expected. Table 4: Peak and average power during BIST with scan path elements gated with NOR gates compared to scan elements without NOR gates. | | | | | | ratio average | |-----------|------------|------------|-----------|------------|---------------| | | | | average | average | power with | | | peak power | peak power | power w/o | power with | NOR vs. | | | w/o NOR | with NOR | NOR | NOR | without NOR | | Benchmark | masking | masking | masking | masking | (%) | | c2670 | 4,619 | 4,145 | 1,720.3 | 646.6 | 37.6 | | c3540 | 4,944 | 4,936 | 1,610.6 | 229.5 | 14.1 | | c5315 | 6,086 | 7,043 | 2,442.5 | 516.5 | 21.2 | | c6288 | 35,558 | 35,109 | 11,571.1 | 661.2 | 5.7 | | c7552 | 10,279 | 10,121 | 4,149.2 | 607.3 | 14.6 | | s5378 | 3,606 | 3,552 | 2,153.6 | 621.4 | 28.8 | | s9234 | 6,272 | 6,854 | 3,976.2 | 728.0 | 18.3 | | s9234.1 | 6,417 | 6,756 | 3,812.4 | 721.2 | 18.9 | | s13207 | 9,366 | 9,247 | 6,257.0 | 1,860.5 | 29.7 | | s13207.1 | 9,939 | 9,278 | 6,134.9 | 1,858.4 | 30.2 | | s15850 | 12,382 | 12,600 | 7,540.9 | 1,645.0 | 21.8 | | s15850.1 | 13,202 | 12,473 | 7,180.4 | 1,638.7 | 22.8 | | s35932 | 46,206 | 45,103 | 25,798.1 | 4,580.7 | 17.7 | | s38417 | 31,768 | 30,977 | 21,883.2 | 4,342.7 | 19.8 | | s38584 | 26,915 | 26,857 | 16,858.4 | 3,818.0 | 22.6 | | s38584.1 | 27,182 | 27,133 | 16,727.4 | 3,817.2 | 22.8 | It should be noted that power savings are obtained even if not all of the flipflops are gated due to timing reasons. To identify further possible savings, the ratio of energy consumed in the different modules was analyzed. Table 5 shows the results. Most of the energy is now consumed in the clock tree and the scan path. A method to reduce the power consumption in this modules is presented in the next section. Table 5: Energy consumption during scan based test of several benchmark circuits. The columns show the power ratios of a NOR gated scan path design originating from the clock tree, the TPG, the SA, the MUT and the scan path respectively. | | energy for | energy for | energy for | energy for | energy for | |-----------|------------|------------|------------|------------|------------| | Benchmark | clock tree | MUT | TPG | scan path | SA | | c2670 | 74.72 | 1.14 | 3.38 | 17.37 | 3.38 | | c3540 | 46.54 | 22.51 | 10.38 | 10.22 | 10.36 | | c5315 | 71.65 | 4.05 | 4.01 | 16.28 | 4.01 | | c6288 | 10.36 | 82.27 | 2.64 | 2.11 | 2.62 | | c7552 | 70.70 | 5.39 | 3.68 | 16.55 | 3.68 | | s5378 | 71.62 | 2.30 | 4.59 | 16.91 | 4.59 | | s9234 | 70.47 | 4.17 | 4.29 | 16.78 | 4.29 | | s9234.1 | 70.99 | 3.95 | 4.07 | 16.92 | 4.07 | | s13207 | 77.75 | 0.69 | 1.56 | 18.45 | 1.56 | | s13207.1 | 77.84 | 0.66 | 1.50 | 18.49 | 1.50 | | s15850 | 76.76 | 1.21 | 1.81 | 18.41 | 1.81 | | s15850.1 | 77.06 | 1.11 | 1.71 | 18.40 | 1.72 | | s35932 | 79.46 | 0.48 | 0.65 | 18.77 | 0.65 | | s38417 | 79.11 | 0.52 | 0.74 | 18.90 | 0.74 | | s38584 | 79.17 | 0.46 | 0.75 | 18.88 | 0.75 | | s38584.1 | 79.19 | 0.45 | 0.74 | 18.88 | 0.74 | ## 4. Pattern suppression ### **4.1** Target structure The energy consumption caused by switching the clock tree and by shifting patterns into the scan path can be minimized if the shift clock is enabled during essential patterns only. Essential patterns are those patterns that contribute to increase the fault tolerance. For all other patterns the clock tree is disabled so that the patterns generated by the TPG are not shifted into the scan path. Thus, the fault coverage is not affected, but the energy consumption can be reduced significantly. A gating signal is derived from a decoding logic fed by the output of the pattern counter, which is usually present in a test control unit. The resulting test design is shown in Figure 3. Figure 3: Scan path design with decoder for shift and system clock gating. ### 4.2 Decoder synthesis To determine the useful patterns fault simulation with fault dropping is used. For all patterns detecting additional faults, the decoder enables the scan clock, for all other patterns the scan clock will be disabled. The usage of fault dropping leads to a non-uniform distribution of selected patterns with more patterns being accepted at the beginning of the test sequence. This helps to synthesize the decoding logic. At the expense of energy spent in the clock tree and scan path the decoder may be simplified by including some irrelevant patterns not increasing the fault coverage. This trade-off may either minimize the energy consumption or reduce the size of the pattern decoder and is not considered in this paper. The gating decoder uses the output of the pattern counter. All pattern numbers needed for the fault detection are put into the on-set of the decoder, all numbers of patterns not needed are put into the off-set, and all indices of patterns not reached due to the test length (see Table 1) are put into the don't care set. On these three sets, two level minimization is applied [BHM86]. The technology mapping from this PLA description to multi level logic was performed by SIS using a simple combinatorial li- brary without process specific AND-OR-Invert logic trees, which could reduce the needed gate count further [SIS92]. Redundancies have been removed prior to the actual technology mapping step. For clarification we describe the applied method with the help of a small example: A set of 16 test patterns is generated from a suitable TPG. From these patterns an incremental fault list is generated that contains the number of faults that each pattern will detect. The result looks as shown in Table 6: Table 6: Example test set. | index | | binary | | # faults | |-------|---|--------|------|----------| | | 0 | | 0000 | 17 | | | 1 | | 0001 | 9 | | | 2 | | 0010 | 4 | | | 3 | | 0011 | 0 | | | 4 | | 0100 | 5 | | | 5 | | 0101 | 2 | | | 6 | | 0110 | 3 | | | 7 | | 0111 | 0 | | index | binary | # faults | |-------|--------|----------| | 8 | 1000 | 2 | | 9 | 1001 | 0 | | 10 | 1010 | 0 | | 11 | 1011 | 1 | | 12 | 1100 | 0 | | 13 | 1101 | 0 | | 14 | 1110 | 0 | | 15 | 1111 | 0 | The first three patterns detect new faults and the pattern with index 3 does not. We obtain two lists of pattern indices, one detecting faults and one not detecting faults. We have to determine the highest pattern index to detect a new fault (11 in our example) so we may stop after pattern 12 because the result of applying pattern 11 has to be shifted out. We stop shifting during patterns 3,7,9 and 10, and we enable the shift clock during patterns 0,1,2,4,5,6,8,11 and 12. After pattern 12, we assume the test to be stopped, so we may put patterns 13,14 and 15 into the don't care set of the decoder. We get the following PLA description: This will be minimized and synthetisized using SIS. The result is shown in Figure 4: Figure 4: Pattern decoding logic. The output 'clk gate' of this circuit is used to gate the clock fed into the clocking tree of the scan path elements. The remaining part of the test controller remains unaltered. ### 4.3 Results on area and power The energy consumption for both the MUT and the decoding logic has been determined in the same way as described above. The resulting energy savings are shown in Table 7. Table 7: Average power without and with clock gating of scan based designs without NOR masking of the scan path elements. | | average | average | | |-----------|--------------|--------------|-----------| | | power w/o | power with | | | Benchmark | clock gating | clock gating | ratio (%) | | c2670 | 1,720.4 | 42.7 | 2.48 | | c3540 | 1,611.8 | 107.3 | 6.66 | | c5315 | 2,442.7 | 322.0 | 13.18 | | c6288 | 11,571.5 | 6,119.3 | 52.88 | | c7552 | 4,149.5 | 154.3 | 3.72 | | s5378 | 2,154.0 | 117.2 | 5.44 | | s9234 | 3,976.7 | 215.4 | 5.42 | | s9234.1 | 3,812.8 | 203.2 | 5.33 | | s13207 | 6,257.2 | 369.8 | 5.91 | | s13207.1 | 6,135.1 | 363.9 | 5.93 | | s15850 | 7,541.1 | 412.4 | 5.47 | | s15850.1 | 7,180.5 | 377.4 | 5.26 | | s35932 | 25,798.1 | 9,278.0 | 35.96 | | s38417 | 21,883.3 | 1,976.8 | 9.03 | | s38584 | 16,858.6 | 1,842.9 | 10.93 | | s38584.1 | 16.727.6 | 1,846.8 | 11.04 | It should be noted that fault coverage (with respect to the stuck-at fault model) is not affected, but both average power and the energy go down significantly. Gating the shift clock has no penalty of the MUT with respect to timing as the critical paths are not touched. In addition, this approach is able to mask certain patterns in order to reduce peak power. Table 7 also includes the additional power required by the decoder logic. The energy consumption of the decoder logic is very small since the inputs from the pattern counter change only once after n shift clocks, and the additional area of the decoder logic is moderate. Both energy and power of the decoder logic can further be minimized if the pattern counter uses Gray code. Table 8 shows the number of gates of the decoder and of the benchmark circuit. It also contains estimations for the area overhead caused by changing the circuit design into a scan design and to introduce the decoding logic which is needed for the proposed low power design. A boundary scan design is assumed with one flipflop at each IO port of the circuit. A cell size of 24 is used for a simple D flipflop, 30 for a D flipflop with scan and 8 for a logic element. It turns out that the area needed to include the shift clock gating is below 7 % for all larger benchmarks, and is less than the cost of a scan path. Routing costs are not considered neither for the scan path nor for the decoding logic. Table 8: Number of gates contained in the benchmark circuits (over 1000 gates) and in the decoding logic. The area overhead is estimated for a 0.18 µm standard cell library. | | number of | l | l | l | |-----------|-----------|-----------|--------------|--------------| | | gates in | number of | approx. area | approx. area | | | benchmark | gates in | overhead for | overhead for | | Benchmark | circuit | decoder | scan (%) | decoder (%) | | c2670 | 1,193 | 184 | 16.13 | 7.96 | | c3540 | 1,669 | 357 | 3.82 | 18.94 | | c5315 | 2,307 | 246 | 9.38 | 7.66 | | c6288 | 2,416 | 36 | 2.45 | 1.38 | | c7552 | 3,512 | 318 | 7.07 | 7.13 | | s5378 | 2,958 | 627 | 7.02 | 16.73 | | s9234 | 5,825 | 728 | 4.06 | 10.98 | | s9234.1 | 5,808 | 687 | 4.29 | 10.31 | | s13207 | 8,620 | 768 | 7.41 | 6.93 | | s13207.1 | 8,589 | 782 | 7.64 | 7.02 | | s15850 | 10,369 | 726 | 5.60 | 5.83 | | s15850.1 | 10,306 | 670 | 6.04 | 5.32 | | s35932 | 17,793 | 94 | 8.66 | 0.39 | | s38417 | 23,815 | 1,014 | 6.08 | 3.48 | | s38584 | 20,705 | 1,182 | 6.72 | 4.56 | | s38584.1 | 20,679 | 1,336 | 6.80 | 5.14 | A detailed analysis of the energy distribution in the different modules of the design with clock gating gives almost identical results as shown in Table 2 for the original scan path design. The energy spent in the decoding logic is almost negligible. This observation suggests to combine the toggle suppression and the pattern suppression scheme. ## 5. Combining pattern and toggle suppression Best results are obtainable if both techniques presented in the previous sections are combined. Table 9 shows that only 1.00 % to 6.44 % of the original energy is required, and now highest power consumption is found in the clock tree. The approach works successfully on the circuit c6288, too, but here most power is still spent in the combinational network as only a few patterns are required for complete fault coverage. Combining both approaches is especially attractive if not all flipflops are allowed to be masked by NOR gates due to timing reasons. In this case, gating the shift clock still provides reductions without penalties of performance and fault coverage. For the large circuits, the largest fraction of the energy is consumed in the clock tree needed to feed the scan path. In order to reduce the power consumption further a more sophisticated pattern selection scheme had to be used, and only a small additional amount of energy could be saved. Table 9: Energy consumption by combining pattern suppression and toggle suppression. The first column shows the energy ratios of a NOR gated scan path design with decoder to gate the system clock and shift clock applied to the MUT vs. the same design without gating. The remaining columns show the individual contributions of selective parts to the total energy consumption of the design with gated shift clocks. | Benchmark | percentage<br>of energy<br>remaining | energy for clock tree | energy for<br>MUT+DEC | energy for<br>TPG+SA+SP | |-----------|--------------------------------------|-----------------------|-----------------------|-------------------------| | c2670 | 1.72 | 19.76 | 0.78 | 79.46 | | c3540 | 2.21 | 15.51 | 11.03 | 73.46 | | c5315 | 3.38 | 55.74 | 3.38 | 40.88 | | c6288 | 3.09 | 10.05 | 80.55 | 9.40 | | c7552 | 1.00 | 33.18 | 3.19 | 63.63 | | s5378 | 2.49 | 34.51 | 1.90 | 63.59 | | s9234 | 1.61 | 37.20 | 3.04 | 59.76 | | s9234.1 | 1.61 | 38.18 | 2.87 | 58.95 | | s13207 | 2.07 | 61.19 | 0.69 | 38.12 | | s13207.1 | 2.10 | 61.79 | 0.64 | 37.57 | | s15850 | 1.49 | 57.30 | 1.18 | 41.62 | | s15850.1 | 1.49 | 57.66 | 0.97 | 41.37 | | s35932 | 6.44 | 78.65 | 0.47 | 20.88 | | s38417 | 1.90 | 73.57 | 0.51 | 25.92 | | s38584 | 2.59 | 74.55 | 0.46 | 24.99 | | s38584.1 | 2.64 | 74.67 | 0.47 | 24.86 | ### 6. Conclusion Two techniques for reducing the average power dissipation and the energy in scan based BIST were presented. Masking the flipflop outputs during shifting has negligible hardware costs, but may introduce a small additional delay. Gating the shift and system clock to exclude useless patterns has no timing penalty but requires some additional hardware. On average, approximately 2 % of the original average power are required if both methods are combined. # 7. Acknowledgement For all the experiments, power estimations were performed by a variable delay model simulator implemented by Eugeni Isern Ruitort from the University of the Balearic Island. The fault simulator with fault dropping used for the decoder synthesis was provided by Gundolf Kiefer, University of Stuttgart. ### A. References [BaMc84] P. H. Bardell, W. H. McAnney: Parallel Paeudorandom Sequences for Built-In Test, Proc. Int. Test Conf., 1984, pp. 302-308 [BHM86] R. K. Brayton, G. D. Hachtel, C. T. McMullen, A. L. Sangiovanni-Vincentelli; Logic Minimization Algorithms for VLSI Synthesis; Kluwer Academic Press, Boston, La Hague, Dordrecht, Lancaster 1986 [BeEl95] A. Bellaouar, M. I. Elmasry: Low-Power VLSI Design: Circuits and Systems, Kluwer Academic Publishers, Boston 1995 [Br85] David Bryan; The ISCAS '85 benchmark circuits and netlist format; North-Carolina State University, 1985 [Br89] Franc Brglez, David Bryan, Krzysztof Kozminski; Notes on the ISCAS '89 Benchmark Circuits; North-Carolina State University, 1989 [ChBr95] A. P. Chandrakasan, R. W. Brodersen: Low Power Digital CMOS Design, Kluwer Academic Publichers, Boston, 1995 [ChDa94] S. Chakravarty, V. P. Dabholkar; Two Techniques for Minimizing Power Dissipation in Scan Circuits During Test Application; Proc. IEEE Asian Test Conference, 1994, pp. 324-329 [ChSA97] R. M. Chou, K. K. Saluja, V. D. Agrawal; Scheduling Tests for VLSI Systems Under Power Constraints; IEEE Transactions on Very Large Scale Integration (VLSI) Systems, Vol. 5, No. 2, June 1997, pp. 175-185 [DDS95] B. Davari, R. H. Dennard, G. G. Shahidi: CMOS Scaling for High Performance and Low Power - The Next Ten Years, Proceedings of the IEEE, April 1995, Vol. 83, No. 4, pp. 595-606 [DM95] S. Devadas, S. Malik; A Survey of Optimization Techniques Targeting Low Power VLSI Circuits; 32nd Design Automation Conference, San Francisco, USA 1995, pp. 242-247 [EiLi83] E. B. Eichelberger, E. Lindbloom; Random-Pattern Coverage Enhancement and Diagnosis for LSSD Logic Self-Test; IBM Journal of Research and Developments, Vol. 27, No. 3, May 1983, pp. 265-272 [HELL95] S. Hellebrand, J. Rajski, S. Tarnick, S. Venkataraman and B. Courtois: Built-In Test for Circuits with Scan Based on Reseeding of Multiple-Polynomial Linear Feedback Shift Registers; IEEE Transactions on Computers, Vol. 44, No. 2, February 1995, pp. 223-233 [HSP96] Meta Software, Inc.; HSPICE User's Manual, Vol. I-III, 1996 [HWH96] S. Hellebrand, H.-J. Wunderlich, A. Hertwig; Mixed-Mode BIST using embedded processors; Proc. International Test Conference 1996, pp. 195-204 [HWZ98] A. Hertwig, H.-J. Wunderlich, M. Zelleröhr: Low Power Serial Built-In Self-Test, IEEE European Test Workshop 1998, Sitges, Barcelona, Spain [LBDG87] R. Lisanke et al.: Testability-Driven Random Test-Pattern Generation; IEEE Trans. On Computer-Aided Design, Vol. CAD-6, No. 6, Nov. 1987, pp. 1082-1087 [Mein95] J. D. Meindl: Low Power Microelectronics: Retrospect and Prospect, Proceedings of the IEEE, April 1995, Vol. 83, No. 4, pp. 619-635 [NeMe97] W. Nebel, J. Mermet: Low Power Design in Deep Submicron Electronics, Kluwer Ed. 1997 [RaPe96] J. M. Rabey, M. Pedram: Low Power Design Methodologies, Kluwer Academic Publishers, Boston, 1996 - [SING95] D. Singh et al.: Power Conscious CAD Tools and Methodologies: A Perspective, Proceedings of the IEEE, April 1995, Vol. 83, No. 4, pp. 570-594 - [SIS92] E. M. Sentovich, K. J. Singh, L. Lavagno, C. Moon, R. Murgai, A. Saldanha, H. Savoj, P. R. Stephan, R. K. Brayton, A. Sangiovanni-Vincentelli; SIS: A System for Sequential Circuit Synthesis; Department of Electrical Engineering and Computer Science, University of California, Berkeley, 1992 - [TAUR95] Y. Taur, Y. J. Mii, D. J. Frank, H. S. Wong, D. A. Buchanan, S. J. Wind, S. A. Rishton, G. A. Sai-Halasz and E. J. Nowak: CMOS scaling into the 21st century: 0.1 $\mu$ m and beyond, IBM Journal of Research and Development, vol. 39, no. 1/2, Jan/Mar 1995, pp. 245-260 - [ToMc96] N. A. Touba and E. J. McCluskey: Altering a Pseudo-Random Bit Sequence for Scan-Based BIST; Proceedings IEEE International Test Conference, Washington D.C., 1996, pp. 167-175 - [VTS97] J. Monzel, S. Chakravarty, V. D. Agrawal, R. Aitken, J. Braden, J. Figueras, S. Kumar, H.-J. Wunderlich, Y. - Zorian: Power Dissipation During Testing: Should We Worry About it? Panel Session, IEEE VLSI Test Symposium, Monterey 1997 - [WaGu97] S. Wang, S. K. Gupta; DS-LFSR: A New BIST TPG for Low Heat Dissipation, Proc. IEEE International Test Conference, 1997, pp. 848-857 - [Wu88] H.-J. Wunderlich: Multiple Distributions for Biased Random Test Patterns; in Proc. IEEE International Test Conference, Washington D.C., USA, 1988, pp. 236-244 - [WuKi96] H.-J. Wunderlich, G. Kiefer: Bit-Flipping BIST; Proceedings IEEE/ACM International Conference on CAD-96, San Jose, CA, November 1996, pp. 337-343 - [ZHW98] M. Zelleroehr, A. Hertwig, H.-J. Wunderlich, Scan-Path Design for Low-Power Serial Built-In Self-Test, GI and IEEE Workshop Testmethoden und Zuverlässigkeit von Schaltungen und Systemen, Herrenberg, March 1998 - [Zori93] Y. Zorian; A distributed BIST control scheme for complex VLSI devices; Proc. 11th IEEE VLSI Test Symposium, April 1993, pp. 4-9