# Stress-Aware Periodic Test of Interconnects

Somayeh, Sadeghi-Kohan; Hellebrand, Sybille; Wunderlich, Hans Joachim

Journal of Electronic Testing: Theory and Applications (JETTA), 2022

doi: https://doi.org/10.1007/s10836-021-05979-5

**Abstract:** Safety-critical systems have to follow extremely high dependability requirements as specified in the standards for automotive, air, and space applications. The required high fault coverage at runtime is usually obtained by a combination of concurrent error detection or correction and periodic tests within rather short time intervals. The concurrent scheme ensures the integrity of computed results while the periodic test has to identify potential aging problems and to prevent any fault accumulation which may invalidate the concurrent error detection mechanism. Such periodic built-in self-test (BIST) schemes are already commercialized for memories and for random logic. The paper at hand extends this approach to interconnect structures. A BIST scheme is presented which targets interconnect defects before they will actually affect the system functionality at nominal speed. A BIST schedule is developed which significantly reduces aging caused by electromigration during the lifetime application of the periodic test.

## Preprint

#### **General Copyright Notice**

This article may be used for research, teaching and private study purposes. Any substantial or systematic reproduction, re-distribution, re-selling, loan or sub-licensing, systematic supply or distribution in any form to anyone is expressly forbidden.

This is the author's "personal copy" of the final, accepted version of the paper published by Springer Science+Business Media.<sup>1</sup>

<sup>1</sup> 

 $<sup>\</sup>textcircled{C}$  Springer Science+Business Media, LLC, part of Springer Nature

# Stress-Aware Periodic Test of Interconnects

Somayeh Sadeghi-Kohan<sup>1</sup>, Sybille Hellebrand<sup>1</sup>, Hans-Joachim Wunderlich<sup>2</sup>

<sup>1</sup>Computer Engineering Group EIM/E, University of Paderborn, Germany

<sup>2</sup> Institute of Computer Architecture and Computer Engineering, University of Stuttgart, Germany {somayeh.sadeghi, sybille.hellebrand}@uni-paderborn.de, wu@informatik.uni-stuttgart.de

Abstract— Safety-critical systems have to follow extremely 1 2 high dependability requirements as specified in the standards 3 for automotive, air, and space applications. The required high 4 fault coverage at runtime is usually obtained by a combination 5 of concurrent error detection or correction and periodic tests 6 within rather short time intervals. The concurrent scheme 7 ensures the integrity of computed results while the periodic test 8 has to identify potential aging problems and to prevent any 9 fault accumulation which may invalidate the concurrent error 10 detection mechanism. Such periodic built-in self-test (BIST) 11 schemes are already commercialized for memories and for ran-12 dom logic. The paper at hand extends this approach to inter-13 connect structures. A BIST scheme is presented which targets 14 interconnect defects before they will actually affect the system 15 functionality at nominal speed. A BIST schedule is developed 16 which significantly reduces aging caused by electromigration 17 during the lifetime application of the periodic test.

18 Keywords: Periodic Test, Functional Safety, Hidden Inter-19 connect Defect, Electro-Migration, Multi-frequency Test.

### 20 I. INTRODUCTION

21 Functional safety is a key concern in autonomous systems. 22 In the automotive domain, for example, the ISO 26262 stand-23 ard defines clear targets for test and reliability that drive 24 research and development in the industry [Eychenne16] [Nardi19] [Pateras17]. First of all, the manufacturing test 25 26 must ensure a high product quality by reducing test escapes 27 to a minimum ("zero defect strategy"). During operation, 28 safety critical systems are typically protected by error cor-29 recting codes and other techniques for concurrent testing. 30 However, safety-critical systems, such as anti-blocking 31 brakes, may also have longer idle time, where faults cannot 32 be detected by concurrent testing. Similarly, stand-by spare 33 parts are not used during normal operation, but their health 34 status must be maintained. To avoid fault accumulation dur-35 ing idle times, built-in tests are needed which can be trig-36 gered periodically. For logic circuits, quite a few approaches 37 are already available addressing these specific requirements. 38 They range from dedicated BIST and observation schemes 39 [Liu18] [Mukherjee19] to applications of software-based self-test [Reimann14] [Bernardi16]. Similarly, schemes for 40 embedded memories rely on scrubbing [Mariani05] and 41 42 periodic consistency checking [Hellebrand02] in addition to 43 the protection by the error detecting and correcting codes.

44 However, in today's complex systems, the reliability of the

45 long interconnects between the components has also become

46 a major concern. The severe impact of technology scaling on

47 the signal integrity in bus structures or network on chip 48 (NoC) links [Caignet01] has triggered research on advanced 49 interconnect testing. Here specific defect and aging 50 mechanisms such as crosstalk or electromigration (EM) must 51 be addressed, which can manifest themselves for example as 52 delay fault or glitches at the gate level. In this context, the 53 BIST and monitoring schemes proposed in [Bai00] 54 [Nourani01] [Pendurkar01] [Chen02] [Sekar02] [Tehranipoor03] [Tehranipoor04] [Grecu06] [Nourmandi-55 56 Pour10] [Sadeghi-Kohan12] [Mohammadi14], focus on 57 manufacturing test, but they do not address the requirements 58 and challenges of health monitoring during the lifetime. 59 Nevertheless, because of the complex interplay between EM 60 and crosstalk, periodic testing of interconnects is mandatory 61 and at the same time extremely challenging. On the one hand, 62 EM may change the interconnect geometry and lead to 63 increased crosstalk effects. On the other hand, the crosstalk-64 induced currents can in turn aggravate EM [Livshits12], and 65 even small crosstalk effects can constitute reliability risks 66 that must be considered [Sadeghi-Kohan20].

67 The available interconnect BIST schemes for manufacturing 68 testing cannot be directly applied, because they mainly target 69 "large" crosstalk effects which change the system data. Fur-70 thermore, each test execution itself adds stress to the inter-71 connects. In periodic testing, this stress can accumulate and 72 lead to accelerated aging. Consequently, in a safety critical 73 system, where a reliability above a given threshold has to be 74 guaranteed, the test must be carefully designed to minimize 75 its negative impact on the mission time. In particular, in the 76 case of stand-by spare parts, a sufficient mission time after a 77 reconfiguration must be ensured.

78 Degradation caused by EM has been studied extensively in 79 the context of chip design [Black69a] [Black69b] 80 [D'Heurle71] [Doyen08] [Mishra15], and EM-aware design 81 techniques exploit self-healing mechanisms triggered by 82 reversed current [Lienig18]. In particular, work on EM-83 aware routing in NoCs addresses the problem of stress 84 accumulation by packet transmission over the network links [Hosseini11]. To exploit self-healing by a reversed current, 85 86 a dynamic routing strategy balances the number of packets 87 that are sent and received over a link. Such an EM-aware routing scheme can easily be combined with test and 88 89 diagnosis schemes for NoCs reusing the NoC infrastructure 90 [Grecu06] [Schley17].

In this paper, a new approach for periodic EM-aware test willbe presented, which is applicable to general bi-directional

interconnect structures at the system level. It can identify and 1 2 classify reliability risks before they actually cause a failure. 3 At the same time, the proposed EM-aware strategy maximizes the mission time of the system. Similar to the 4 5 dynamic routing strategy in [Hosseini11], it tries to properly balance senders and receivers during test. The scheme is 6 7 based on a multi-frequency test, which not only detects failures, but also provides a reliability profile of the 8 9 interconnect structures. The periodic update of this reliability 10 profiles supports a dynamic test scheduling, where the direction of the test is changed whenever the accumulated 11 stress gets too high. 12

13 Before the proposed strategy is explained in more detail, the 14 necessary background is provided in Section II. Subsequently, Section III analyzes the impact of the periodic test 15 on electromigration. Section IV introduces the basic BIST 16 17 architecture. Finally, Section V deals with the proper tuning of this architecture and explains the developed concepts for 18 19 test scheduling. The experimental results in Section VI will 20 show that the developed stress-aware test improves the mis-21 sion times by orders of magnitude compared to a straightfor-22 ward approach.

#### 23 II. BACKGROUND

This section briefly summarizes the necessary background
on interconnect modeling and test, the relation between coupling and electromigration, as well as the multi-frequency
test scheme introduced in [Sadeghi-Kohan20].

28 A. Interconnect and Fault Modeling

29Interconnect lines will be modeled as a sequence of RLC cir-30cuits [Chen98] [Cuviello99] [Roy09]. As an example, Figure311 shows one segment of a three-lines interconnect. Each wire32i is characterized by its capacitance to other layers  $C_i$ , induct-33ance  $L_i$ , and resistance  $R_i$ . Between every two wires i and j,34there are also coupling capacitances  $C_{ij}$ , inductances  $L_{ij}$ , and35resistances  $R_{ij}$  depending on the space between the wires.



38 Coupling between lines leads to crosstalk effects such as 39 glitches, delay and speedy faults, and also overshoots and 40 undershoots. The amplitudes of glitch distortions and of the 41 overshoots and undershoots as well as the delay sizes depend 42 on the strength of the coupling elements.

43 Crosstalk effects are usually described as a signal distortion 44 on a *victim line* caused by a transition on one or more 45 *aggressor lines*. Several fault models have been proposed to 46 support crosstalk analysis and test at higher levels of abstrac-47 tion. The maximum aggressor (MA) model assumes that the 48 worst-case effect on a single victim line is provoked when all 49 other lines act as aggressors in the same way [Cuviello99].

However, this model does not consider the impact of induct-50 51 ances and does not always correctly reflect the worst case. 52 To overcome the disadvantages of the MA model, some authors suggest to use pseudo-random patterns for signal 53 54 integrity test [Nourani01]. Nevertheless, shorter test times 55 can be ensured with advanced deterministic approaches. The 56 maximum transition (MT) fault model combines a transition 57 or a stable signal on a victim line with multiple transitions on 58 a limited number of aggressor lines [Tehranipoor03]. Based 59 on the analysis of the combined effect of capacitances and 60 inductances, the maximal dominant signal integrity (MDSI) fault model also works with a limited number of aggressors 61 62 but derives conditions for the remaining lines in addition to 63 that [Chun07].

64 The MDSI model allows for a very simple deterministic test 65 with only a few pattern pairs. Table 1 summarizes the 66 necessary pattern pairs for a complete crosstalk test of one victim line. To test the victim for delays, glitches, and speedy 67 68 faults, 6 pattern pairs are sufficient. As the MDSI fault model assumes that only one victim at a time is addressed, in total 69 70  $6 \cdot N$  (possibly overlapping) pattern pairs are needed for an N-71 bit interconnect.

Table 1. Complete MDSI test of one victim line

| Cross talk fault | Victim<br>line    | Aggressor<br>lines | Other<br>lines    |
|------------------|-------------------|--------------------|-------------------|
| Delayed transi-  | $0 \rightarrow 1$ | $1 \rightarrow 0$  | $0 \rightarrow 1$ |
| tion             | $1 \rightarrow 0$ | $0 \rightarrow 1$  | $1 \rightarrow 0$ |
| Glitch           | 0                 | $0 \rightarrow 1$  | $1 \rightarrow 0$ |
|                  | 1                 | $1 \rightarrow 0$  | $0 \rightarrow 1$ |
| Speedy transi-   | $0 \rightarrow 1$ | $0 \rightarrow 1$  | $0 \rightarrow 1$ |
| tion             | $1 \rightarrow 0$ | $1 \rightarrow 0$  | $1 \rightarrow 0$ |

73

72

74 A more efficient test scheme based on multiple victim testing 75 (MVT) is proposed in [Nourmandi-Pour10] and used in this 76 paper. The conditions for the signals on the victim and 77 aggressor lines are the same as in the MDSI model, and 78 working with several victim lines has a similar effect as the 79 conditions for the remaining lines in the MDSI model. An 80 example is shown in Figure 2, where the two victim lines v1 81 and v2 (blue lines) are tested in parallel for crosstalk delays by activating the inverse transitions on the neighboring 82 83 aggressor lines (red lines).



#### 1 B. Interconnect BIST

2 While many existing schemes for interconnect BIST rely on 3 a serial transmission of test data within a boundary scan

4 environment, the presented work deals with the parallel test

5 application in SoCs. The short overview in this section there-

6 fore focuses on the main ideas and skips details on boundary-

7 scan integration.

8 Early approaches on interconnect BIST mainly address
9 manufacturing defects modeled as shorts, stuck-opens, and
10 stuck-at faults [Hassan88]. Counter sequences, walking
11 ones, or LFSR-based pseudo-random sequences are gener12 ated serially by respective test pattern generators.

13 With the increasing progress in technology, interconnect per-14 formance and signal integrity have become predominant. Bai 15 et al. describe an approach for generating deterministic patterns based on the MA model [Bai00]. A small finite state 16 17 machine produces the proper transitions for the victim line 18 and the aggressor lines, which are then distributed to the 19 interconnect via multiplexers. Sekar and Dey also base their 20 analysis on the MA model but suggest to re-use the LFSR 21 typically available for logic BIST [Sekar02]. To guarantee a 22 high fault coverage with an acceptable number of patterns 23 the LFSR-outputs are modified by some extra logic. In 24 [Chen02] a software-based self-test relying on MA patterns 25 is proposed.

To avoid the problems related to the MA model, an LFSR is 26 27 used as a pseudo-random pattern generator in [Nourani01]. 28 Furthermore, special receiver cells for interconnect BIST 29 based on sense amplifiers are presented. Similarly, Pendur-30 kar et al. build on small pre-characterized LFSRs which are 31 combined to mimic the switching activity of the interconnect 32 in system mode [Pendurkar01]. Other authors promote a 33 pseudo-exhaustive test, where all possible combinations of 34 transitions are applied to groups of lines, or even an exhaus-35 tive test, in case the interconnect topology is unknown 36 [Liu07] [Rudnicki09]. Both approaches use LFSRs for pat-37 tern generation. Deterministic approaches for advanced fault 38 models integrate tests based on the MT and the MDSI model 39 [Tehranipoor04] [Mohammadi14].

40 A parallel BIST scheme for testing manufacturing defects is
41 presented in [Jutman04]. It uses a simple circular shift regis42 ter as the core of a parallel BIST scheme. In section IV a
43 parallel generator for the periodic crosstalk test will be intro44 duced for multiple victim test.

#### 45 C. Reliability Measures

In this work the specification and evaluation of reliability
properties relies on common fault tolerance concepts and
terminology. As a more in-depth introduction is beyond the
scope of this paper, the reader is referred to respective
textbooks, e.g. [Koren07].

The reliability R(t) is formally defined as the probability that a system survives from time 0 to t. For safety critical systems, it is typically required that R(t) is above a given threshold  $R_{th}$ , and the mission time  $T_M(R_{th})$  is defined as time span where  $R(t) \ge R_{th}$  holds. Changes in the design or test strategy are then evaluated by the mission time improvement factor

$$MTIF = T_M^{new}(R_{th})/T_M^{old}(R_{th}).$$
 (1)

59 If the mission times cannot be determined directly, they can
60 be computed with the help of the median time to failure t50
61 or the more common mean time to failure MTTF as shown
62 in the following.

63 If a constant failure rate  $\lambda$  is assumed, then

$$R(t) = e^{-\lambda t}$$
, and  $MTTF = \frac{1}{\lambda}$  (2)

hold [Koren07]. The median time to failure is the time when 50% of the interconnects fail, i.e. the reliability is  $R(t_{50}) =$ 1/2. Using Equation (2), it can be shown that

68 
$$t_{50} = MTTF \cdot ln(2),$$
 (3)

69 And similarly

58

64

101

70 
$$T_M = -MTTF \cdot ln(R_{th}) = -\frac{t_{50}}{ln(2)} \cdot ln(R_{th}).$$
 (4)

71 Therefore, the mission time improvement can also be esti-72 mated as

73 
$$MTIF = \frac{MTTF^{new}}{MTTF^{old}} = \frac{t_{50}^{new}}{t_{50}^{old}}.$$
 (5)

#### 74 D. Coupling and Electromigration

75 As shown in [Livshits12], crosstalk can aggravate EM and 76 thus reduce the reliability of the system. This is not only true 77 for crosstalk effects actually changing the functionality of the 78 system [Sadeghi-Kohan20]. Even if the crosstalk noise only 79 leads to small delays within the design margin, it can trigger 80 EM. Such small crosstalk faults remain undetected by tests 81 at the nominal frequency and are therefore called hidden 82 interconnect defects.

83 The relation between coupling and EM, in particular in the 84 presence of variations in the line spacing, will be summa-85 rized in the following. EM refers to the transportation of 86 metal ions caused by an electrical field. For a detailed intro-87 duction into various aspects of EM in integrated circuit 88 design, the reader is referred to the textbook of Lienig and 89 Thiele [Lienig18]. The metal ion transport is reshaping inter-90 connect lines over time. This, in turn, changes the resistance 91 of interconnects and can lead to increased interconnect 92 delays [Mishra15]. Furthermore, several studies have shown 93 that EM can cause serious failures by creating hillocks and 94 voids in the interconnect lines [Doyen08]. In the worst case, 95 a hillock can become a bridging fault between adjacent 96 wires, and a void can result in a broken line.

97 The impact of EM on the system is typically characterized 98 by the median time to failure  $t_{50}$ . According to Black's For-99 mula,  $t_{50}$  measured in hours can be estimated based on the 100 physical parameters of the system as

$$t_{50} = \frac{A}{j^n} e^{E_a/k_B \cdot T},$$
 (6)

102 where A is a constant depending on the cross-section of the 103 wire, j is the current density in amperes per square centi-104 meter, n is a constant related to the material,  $E_a$  is the activa-105 tion energy in electron volts,  $k_B$  is the Boltzmann constant 1 and T is the temperature in degrees Kelvin [Black69a]

2 [Black69b] [Sapatnekar19] [Tu19]. The material constant *n* 

3 is typically between 1 and 2, e.g. 1.1 - 1.3 for copper and 2

4 for aluminum [Lienig18].

5 When the parameters j and T change, the mission time 6 improvement is obtained as

7 
$$MTIF = \left(\frac{j^{old}}{j^{new}}\right)^n exp\left(\frac{E_a}{k_B}\left(\frac{1}{T^{new}} - \frac{1}{T^{old}}\right)\right)$$
(7)

8 By inserting Formula (6) into equation (5).

9 As equations (6) and (7) show, in addition to the temperature, 10 the current density j has a major impact on EM effects and 11 on the resulting changes in mission time.

11 on the resulting changes in mission time.

For a given cross-section, the current density is defined asthe amount of charge per unit time that flows through a unitarea. It can be estimated by

15 
$$j = \frac{l_{avg}}{W \cdot H}.$$
 (8)

The parameters W and H denote the width and the height of 16 17 the wire, and  $I_{avg}$  is the average current. The average current Iavg can be determined by simulations or analytically, e.g. 18 using the techniques in [Agarwal07] [Blaauw03]. In CMOS 19 technology, the dynamic power is dominant, therefore the 20 average current can be estimated as  $I_{avg} = C \cdot V_{dd} \cdot f \cdot p$ , 21 22 where C is the capacitance of the wire,  $V_{dd}$  is the supply volt-23 age, f is the clock frequency, and p is the switching proba-24 bility [Abella10]. The current density is then obtained as

$$j = \frac{C \cdot V_{dd} \cdot f \cdot p}{W \cdot H}.$$
(9)

26 As coupling effects strongly depend on the line spacing, a 27 realistic analysis of the induced average current must take 28 into account variations of the interconnect layout. For this, in 29 [Sadeghi-Kohan20] variations in the line spacing have been 30 analyzed ranging from 100 % down to 80 % of the nominal 31 value. Simulation results for 80 % of the nominal line spac-32 ing in 32 nm technology have shown more than a 20 % increase of the coupling capacitance and more than a 7 % 33 34 increase of the coupling inductance for typical crosstalk pat-35 terns. A line spacing below 80 % of the nominal value has 36 not been considered, because this would result in large cross-37 talk faults which change the functionality of the system and 38 could be easily detected.

39 Figure 3 summarizes the impact of variations in the wire 40 spacing on the average current for the 32 and 45 nm technologies and a glitch pattern  $000 \rightarrow 101$  applied to a 3-line 41 42 interconnect (source voltage is 0.9 volt). The horizontal axis shows variations in the line spacing ranging from 100 % 43 44 down to 80 % of the nominal value, and the vertical axis 45 shows the increase in current for the glitch pattern relative to 46 the situation with nominal line spacing.

47 It can be observed that the changes in current evolve almost
48 linearly with the increasing coupling capacitances and
49 inductances caused by a reduced line spacing. In particular,
50 the curve for the 32 nm technology shows that already small
51 variations in the line spacing can increase the current by
52 almost 10 %, and their contribution to EM cannot be





58 Figure 3. The average current increment for varying line spacing.

## 59 E. Dynamic Multi-frequency Test

60 As shown in Section II.C, already small variations in the line 61 spacing have a non-negligible impact on EM. As the result-62 ing crosstalk faults may be hidden at the nominal frequency, 63 they must be tested at higher frequencies. The approach in 64 [Sadeghi-Kohan20] uses several frequencies to characterize 65 the risk of EM-degradation by crosstalk-induced delays. The main ideas are briefly summarized with the help of the 66 67 pseudo-code in Figure 4. 68

| ~~       | ·                                  |
|----------|------------------------------------|
| 69       |                                    |
| 70       | Multi-frequency Test (L, F)        |
| 71       | // Lines L                         |
| 72       | // Frequencies $F = \{f_0,, f_m\}$ |
| 73       | i = 0;                             |
| 74       | while (L != {}) and (i < m)        |
| 75       | $f = f_i;$                         |
| 76       | $L_i = test(L, f_i)$               |
| 77       | $L = L \setminus L_i;$             |
| 78       | i++                                |
| 79<br>80 | Return $L_1$ ,, $L_m$ ;            |

#### Figure 4. Multi-frequency test.

81

The multi-frequency test starts with a delay test for all lines *L* at the nominal frequency  $f_0 = f_{nom}$ . In each iteration *i*, the frequency is increased to the next frequency  $f_i$ , and a delay test is applied to the remaining lines in *L*. The lines failing at  $f_i$  are collected in the set  $L_i$  and removed from the set of target lines *L*. These steps are repeated until the maximum frequency is reached or the list of target lines is empty.

89 The test time depends on the variations in the interconnect 90 layout. If the line spacing is very narrow, then crosstalk 91 delays will be observed on all lines already at the nominal 92 frequency, and the test will stop after the first iteration. If the line spacing is close to the nominal value for some lines, the
 test will go through all iterations until the hidden delays on
 these lines are detected by the highest frequency. In general,
 multi-frequency testing comes with severe challenges.
 Robust Adaptive Voltage & Frequency Systems (AVFS) are
 able to overcome them for the critical systems targeted in this

7 paper. For interconnect lines the problem is simplified, as 8 any distortion of the received signal is considered as a 9 detected error.

10 After the test, each line is associated with the failing fre-11 quency as a measure of the severity of the fault. The lowest 12 frequency detecting a delay can be used as a reliability indi-13 cator for the complete interconnect structure. This way, the 14 test also monitors the health status of the system inter-15 connects.

#### III. AGING AND HEALING

16

To predict reliability risks before an actual failure occurs, the 17 reliability profile obtained by the multi-frequency test of 18 19 Section II.E should be continuously updated by periodic 20 tests, which in turn adds stress to the system, where the 21 interval between tests is in the range of milliseconds. 22 Although the stress induced by a single test may be negligi-23 ble, the accumulated EM-degradation over the lifetime of a 24 system is a serious issue in periodic testing. For EM-aware 25 testing, possible self-healing effects must, therefore, be 26 properly exploited as it is already done in EM-aware design 27 [Lienig18].

28 Self-healing occurs when the current is reversed, because 29 then also the direction of the ion transport is changed 30 [Tao93]. This effect occurs when the direction of communi-31 cation is changed or when inverse transitions lead to alternat-32 ing current (AC) on the line. However, two complementary 33 transitions will not lead to perfect healing, since the healing 34 effects also depend on the severity of already caused dam-35 ages and thus on the time between changes [Tao94] [Tao95]. 36 The resulting difference between the opposite current densi-37 ties is referred to as the *effective current density* in the sequel. 38 For a more precise analysis of healing in the case of bidirec-39 tional communication or alternating current, in [Tao95] a healing parameter y has been introduced. The effective cur-40 41 rent density for EM is given by

42 
$$j_{ac} = j^+_{dc} - \gamma \cdot j^-_{dc}$$
, (10)

43 where  $j_{dc}^+$  and  $j_{dc}^-$  are the average absolute values of the cur-44 rent densities in the forward and backward transition, or in 45 the positive and negative half-cycle, respectively. The 46 parameter  $\gamma$  depends on the frequency *f* 

47 
$$\gamma = 1 - 2\left(\frac{f_0}{f}\right)^{1/n}, f_0 = \frac{1}{2 \cdot t_{50}(DC)}.$$
 (11)

48 Here,  $t_{50}(DC)$  denotes the median time to failure for direct 49 current (DC) in Eq. (6), and *n* is again the material constant 50 in Black's formula. Furthermore,  $f_0$  is the frequency where 51 interconnects fail before the current is reversed. As self-52 healing is not possible in such a case, the formula is only 53 valid for  $f > f_0$ . 54 Consequently, the median time to failure under AC stress is 55 given by

56 
$$t_{50}(AC) = \frac{A}{(j_{dc}^+ - \gamma j_{dc}^-)^n} \cdot e^{E_a/k_B \cdot T}.$$
 (12)

57 Based on Eqs. (10) to (12) the EM-degradation during test 58 can be minimized following a similar strategy as it is 59 described in [Hosseini11] for the communication in NoCs. 60 In this work, it is assumed that  $j_{dc}$  is the nominal current den-61 sity associated with the transfer of one data package. Furthermore, let  $m^+$  denote the number of received packets,  $m^-$  the 62 number of sent packets, and let  $m \ge m^+ + m^-$  denote the total 63 64 number of packets that can be sent over a link in case of 100 % utilization, then according to [Hosseini11] the aver-65 age values  $j_{dc}^+$  and  $j_{dc}^-$  in Eq. (10) can be estimated as 66

$$j_{dc}^{+} = \frac{m^{+}}{m}$$
 and  $j_{dc}^{-} = \frac{m^{-}}{m}$ . (13)

67

70

95

68 In case  $m > m^+ + m^-$ , the frequency *f*, which determines  $\gamma$ , 69 must be adjusted by

$$f = f_{sys} \cdot \frac{m^+ + m^-}{m}, \qquad (14)$$

71 where  $f_{sys}$  denotes the frequency of the system clock. To min-72 imize  $t_{50}$  for the communication links in the NoC, the authors 73 suggest a dynamic routing scheme balancing sent and 74 received packets on each link.

75 In the context of the periodic test for bidirectional intercon-76 nect structures, both alternating transitions during a single 77 test and changing the direction of the test application con-78 tribute to self-healing. As expected, preliminary simulation 79 results have shown that alternating transitions in a single test 80 do not fully compensate each other, because the induced cur-81 rents are not symmetric. This effect is even more pronounced in the presence of layout variations. 82

83 Because of the unpredictable impact of layout variations, it 84 is not possible to exactly determine the current density of a 85 single test upfront. Nevertheless, for minimizing the EM 86 degradation by the periodic test over the lifetime of the system, a rough guideline can be established as in [Hosseini11]. 87 88 The "forward" and "backward" test applications should be 89 balanced for each interconnect section. The estimations in 90 formula (13) will be even more precise in this case, since the 91 test packets sent in both directions are identical. In addition 92 to that, the test should dynamically adjust to the currently 93 observed reliability profile and change the direction when-94 ever needed.

#### IV. PATTERN GENERATION AND EVALUATION

96 This section introduces the basic BIST scheme for the pro-97 posed interconnect test. To simplify explanations, stress and 98 recovery conditions are not considered yet. They will be in 99 the focus of Section V. As pointed out in Section II.A, this 100 work is based on multiple victim testing [Nourmandi-101 Pour10], where several victim lines are tested simul-102 taneously and the transitions on victim and aggressor lines 103 are generated as described in Table 1. Furthermore, a high-104 speed interconnect test at multiple frequencies is supported 105 by parallel generation and application of test patterns.

- Using the pattern pairs in Table 1 leads to a very regular 1 structure of the test. As the multi-frequency test of Section 2 3 II.E only targets crosstalk delays, one victim can be tested by three test patterns with transitions  $1 \rightarrow 0 \rightarrow 1$  on the victim 4 5 line and the opposite transitions  $0 \rightarrow 1 \rightarrow 0$  on the aggressor
- lines. As illustrated in Figure 5, a complete test for several 6
- 7 victims at a time can for example start with a '1' at all victim lines and a '0' at all aggressor lines. The next two patterns 8
- 9 are obtained by bitwise inversion, such that the third pattern
- 10 is equal to the initial pattern.

| Pattern 1 | 0010010100                |
|-----------|---------------------------|
| Pattern 2 | 110 <mark>110</mark> 1011 |
| Pattern 3 | 0010010100                |
| Pattern 4 | 1001001010                |
| Pattern 5 | 0110110101                |
| Pattern 6 | 1001001010                |
|           |                           |



13 To change the positions of victims, this pattern must be 14 shifted before again bitwise inversions are applied. In the 15 example, a complete test for crosstalk delay can be done with 9 patterns. In general, if  $2 \cdot k$  aggressors are assumed per vic-16 tim (k on each side), then  $(k+1) \cdot 3$  test patterns are suffi-17 18 cient. Similarly, crosstalk glitches and speedy faults can be 19 tested by properly selecting the seeds and the positions for 20 bitwise inversions.

21 This can be implemented using the hardware structure shown 22 in Figure 6. The test starts with loading the appropriate seeds 23 into the pattern register the inversion register. Then, transi-24 tions are generated until all necessary transitions have been

25 applied to the addressed victims. The set of victims can be

26 updated by simply shifting the registers and reseeding proper

27 seed bits to bit position 1.



29

Figure 6. Generator for parallel multiple victim testing.

30 This simple structure is sufficient to implement the proposed 31 periodic multi-frequency test. If a more comprehensive test 32 is needed for the manufacturing test, it can also generate patterns for glitch or speedy faults by properly adjusting the 33 34 seed and inversion bits. At only a little extra cost, this generator can be extended, such that the pattern register 35 receives the first bit from a circular or a linear feedback (cf. 36 37 Figure 7). This way, the hardware can also be used for testing 38 static defects as in [Jutman04] or for an extended LFSR-39 based signal integrity test as described in section II.B.

As already suggested in [Bai00], test response evaluation 40 will be based on pattern generation in the receiver. An iden-41

- tical generator will produce exactly the same set of test pat-42
- terns in the receiver, and the received patterns will be com-43

44 pared to the expected ones.



46 Figure 7. Extended generator for manufacturing and periodic test.

#### 47 V. TEST TUNING AND SCHEDULING

48 In this section, it is shown how the basic BIST scheme of 49 Section IV can be used within the framework of a stress-50 aware periodic test. For this, various implementation details 51 are discussed. In particular, a strategy for scheduling the tests 52 is presented, such that self-healing is supported.

#### 53 A. Use of Multi-frequency Test

54 As pointed out in Section III, the EM-aware test must be 55 dynamically adapted to the current reliability profile of the 56 interconnect. For this, the multi-frequency test summarized in Section II.E provides an effective solution. A frequency 57 58 sweep from the lowest to the highest frequency not only 59 reveals all crosstalk faults but also characterizes each line 60 with the failing frequency. Consider for example the inter-61 connect layout with variations of Figure 8, where the per-62 centages between the lines show the line spacing relative to 63 the nominal line spacing. If the manufacturing test is run with 64 ten different frequencies  $F_0$  to  $F_9$ , then crosstalk delays due to narrow line spacing (80 %) are already detected with the 65 lowest frequency  $F_0$ , whereas the highest frequency  $F_9$  is 66 67 needed for close to nominal line spacing (98%), and for the remaining lines, the intermediate frequency  $F_5$  is sufficient. 68 69 Overall, the profile of the interconnect is described by the 70 line sets  $L(F_0) = \{L_7, L_8, L_9, L_{10}\}, L(F_5) = \{L_6, L_5, L_4\}$ , and 71  $L(F_9) = \{L_1, L_2, L_3\}.$ 



Figure 8. A sample layout of a 10-line interconnect

74 This profile is stored on-chip and can be compared to the new profiles obtained during periodic testing. This way, aging 75 76 effects can be monitored and related to specific wires. This 77 information is then used to control mitigation schemes 78 [Abella08] or to adapt the test schedule, such that the stress 79 for critical wires is reduced.

 Clock generation for the multi-frequency test can either reuse the existing infrastructure in circuits with dynamic voltage and frequency scaling (DVFS), rely on existing schemes
 for on-chip clock generation in faster-than-at-speed test
 [Tayade08] [Pei10] [Pei15] or programmable delay elements
 [Sadeghi-Kohan17] [Liu18a] [Liu20].

#### 7 B. Test Scheduling

8 Because of its regular structure, the basic BIST described in 9 Section IV can be easily split into small chunks that fit into 10 the slots provided for periodic testing. Small extensions of 11 the test control are sufficient to ensure that the test can be 12 stopped and resumed whenever needed, so that no special considerations for test scheduling are necessary in this 13 14 respect. However, as explained in the following, proper test scheduling is crucial for minimizing the stress during the test. 15

16 According to Section III, properly balancing forward and 17 backward test applications for each interconnect link is the main measure to support the self-healing of EM-degrada-18 19 tions. The test conditions in Table 1 naturally lead to a balanced distribution of rising and falling transitions. As 20 21 explained in Section II.C, the self-healing effects depend on 22 the frequency of changes and on the average positive and 23 negative current densities. In a simple bidirectional intercon-24 nect between two cores, the main challenge is to find the best 25 trade-off between a high frequency of changes and other test 26 considerations.

27 In the more general scenario of Figure 9, the test patterns 28 launched by one sender will reach multiple receivers, and it 29 is not possible to simply revert this communication. But

29 is not possible to simply revert this communication. But 30 changing the sender with every test execution will also pro-

21 wide same healing offerste





32 33

Figure 9. System interconnect with 3 cores.

34 This idea is analyzed for a simple rotating scheme in Table 35 2, where the communication on the interconnect sections (A, 36 F), (B, F), (C, F) between the cores A, B, C, and the fanout 37 F is shown. The first column counts the number of test 38 executions, and the second column identifies the sender 39 among the three cores A, B, C. The remaining columns 40 symbolically show the direction of the current in the three 41 interconnect segments between the bidirectional fanout F 42 and the three cores A, B, C. It can be observed that changing 43 the sender will compensate the stress on two interconnect 44 segments but add stress to the remaining third segment.

45 Although the simple rotating scheme of Table 2 cannot fully 46 avoid the accumulation of stress effects, its analysis also

shows that the stress-recovery balance of a specific intercon-47 48 nect segment can always be improved by selecting a proper 49 sender. This observation is exploited for dynamic test sched-50 uling as follows. In regular intervals, the reliability profile is 51 checked, and the interconnect segment from F to the receiver 52 X observing the largest faults is considered critical. The next 53 sender is then determined based on the recorded sender/ 54 receiver information for X. If X has been used as a receiver 55 in the majority of cases, then it is now used as a sender. If it has been mostly used as a sender, X now becomes a receiver. 56



Table 2. Example for rotating test schedule with 3 cores A, B, C
 and bidirectional fanout F as in Figure 9

| Test | Sender | Current on Affected Intercon-<br>nect Section                                           |               |               |  |  |  |
|------|--------|-----------------------------------------------------------------------------------------|---------------|---------------|--|--|--|
|      |        | $(\mathbf{A},\mathbf{F}) \qquad (\mathbf{B},\mathbf{F}) \qquad (\mathbf{C},\mathbf{F})$ |               |               |  |  |  |
| 1    | Core A | $\rightarrow$                                                                           | ←             | $\leftarrow$  |  |  |  |
| 2    | Core B | ←                                                                                       | $\rightarrow$ | ←             |  |  |  |
| 3    | Core C | ←                                                                                       | ←             | $\rightarrow$ |  |  |  |
| 4    | Core A | $\rightarrow$                                                                           | ←             | ←             |  |  |  |
|      |        |                                                                                         |               |               |  |  |  |

60

61

75

#### VI. EXPERIMENTAL RESULTS

To validate the presented technique, a simulation using 62 63 HSPICE has been conducted. As the work addresses safety 64 critical systems, a scenario has been assumed which is 65 typical for automotive applications. Here high reliability 66 thresholds have to be guaranteed even at extremely high temperatures. This is for example documented in the AEC 67 68 Q100 Standard for accelerated aging tests [AEC21]. The highest quality "grade" defines a temperature spectrum from 69 -40 °C to 150 °C. While the temperatures in the AEC Q100 70 71 standard refer to ambient temperatures, the temperature 72 parameter T in Black's formula denotes the junction 73 temperature. The junction temperature T is higher than the 74 ambient temperature and can be derived by

$$T = T_a + P_{chip} \cdot R_{ia} \tag{15}$$

76 where  $T_a$  denotes the ambient temperature,  $P_{chip}$  is the total 77 power dissipation of the chip and  $R_{ja}$  is the junction-to-ambi-78 ent thermal resistance [Vassighi06]. Since the exact values 79 for  $P_{chip}$  and  $R_{ja}$  were not available for the study, the range 80 for *T* is assumed between 40 °C and 175 °C in the simulation 81 study described in the following.

82 Furthermore, all experiments are based on a 32 nm tech-83 nology with the interconnect parameters list in Table 3. The 84 interconnect structures are 32-bit wide in all experiments, 85 and random layout variations are applied as illustrated in 86 Figure 8. For the periodic test, a fixed timeline is assumed as 87 sketched in Figure 10. The time intervals should be selected, 88 such that self-healing is still possible for EM degradations. 89 Furthermore, as degradations in the chip do not always 90 evolve gradually at a slow pace, the time intervals for safety critical applications must be relatively short. 91

#### Table 3. Interconnect parameters in 32 nm technology

| Parameter                                       | Value or Range                               |
|-------------------------------------------------|----------------------------------------------|
| Height (space between wire and reference plane) | 0.09 μm                                      |
| Width                                           | 84.4 nm                                      |
| Height                                          | 151 nm                                       |
| Space (nominal)                                 | 84.4 nm                                      |
| Space variation factor                          | 0.8 to 0.98                                  |
| Length                                          | 2000 μm (100 μm distance<br>between buffers) |
| Conductor resistivity for copper                | 1.7e-8 Ωm                                    |
| Dielectric constant                             | 1.36                                         |
| Supply voltage                                  | 0.9 V                                        |
| n                                               | 1.1 - 1.3                                    |
| Ea                                              | 0.9                                          |

2

3 In our experiments, the time interval between two tests has 4 been set to 0.25 ms, and during the test phase, a complete 5 multi-frequency test with 10 frequencies is performed.





9 For the proposed BIST scheme, the number of patterns for a 10 single frequency is  $(k+1) \cdot 3$ , where k is the number of aggressors on each side of the victim. Consequently, the 11 12 overall number of test patterns is between  $(k+1) \cdot 3$  and  $10 \cdot (k+1) \cdot 3$  (cf. Section IV). In our experiments, the 13 14 parameter k has been set to k = 2 and k = 4. As we assume that all 10 frequencies are used in each test, 90 patterns must 15 be applied for k = 2, and 150 for k = 4. 16

17 The simulation study covers both a simple interconnects

18 between two cores A and B and the interconnect structure of

Figure 9. Overall, the experiments analyze the test strategieslisted in Table 4.

Since the main motivation for the periodic BIST is to avoid fault accumulation during longer idle times, the presented analysis focuses on stress and self-healing the during test and does not take into account self-healing effects during normal system operation.

### 26 A. Simple interconnects

27 In this subsection, only simple interconnects are considered, 28 and the strategies One directional and Bi directional are 29 compared to each other for k = 2. As explained above, poten-30 tial healing effects by data transfers between the tests are 31 neglected. In the first step the current densities, the median 32 times to failure  $t_{50}$ , and the mission times  $T_M$  for a reliability threshold  $R_{th} = 0.999999$  have been determined for the one-33 34 directional test.

35

36

Table 4. Investigated test strategies

| Strategy        |       | Details                                           |
|-----------------|-------|---------------------------------------------------|
| One_directional | A B   | Tests are applied always in the same direction    |
| Bi_directional  | A — B | Alternating test direc-<br>tions                  |
| Just_A          |       | A is always the sender.<br>B and C are receivers. |
| Rotation        |       | Rotating scheme of Table 2.                       |

To obtain the respective values for the bi-directional test, the effective current densities introduced in Formula (10) have been determined based on the self-healing parameter  $\gamma$ . According to Formula (11),  $\gamma$  depends on the frequencies  $f_0$  $41 = 1/t_{50}(DC)$  and f, and as the time between two tests is 0.25 ms in our experiments, the frequency f is set to 2 KHz.

43 The observed current densities are independent of the 44 temperature in the one-directional case and reach 3139 45 A/cm2. In the bi-directional test, the current densities are 46 reduced by two orders of magnitude. Although they are 47 temperature dependent because of the self-healing parameter 48  $\gamma$ , the changes are extremely small and the values range 49 between 62 and 63 A/cm2.

50 The results for median times to failure t50 and the resulting mission times TM in years are summarized in Table 5. The 51 52 columns 2 and 3 show t50 for both test strategies, and the 53 mission times are reported in columns 4 and 5. Finally, the 54 mission time improvement factor MTIF is listed in column 55 6. According to Formula (7), the mission time improvement 56 only depends on the current densities and the parameter n for 57 a fixed temperature, which explains that this parameter does 58 not change over the temperature range.

59 Although the median times to failure in columns 2 and 3 are 60 extremely high, the mission times in columns 4 and 5 quickly 61 decrease over the temperature range because of the high 62 reliability threshold for safety critical systems. For example, 63 for the one-dimensional test the mission time at 125 °C is below 1.5 years, and for 150 °C it is already below 4 months 64 65 (0.31 years). But the self-healing effects triggered by the bidirectional test application can ensure considerably higher 66 67 mission times (MTIF  $\approx$  74). For example, the mission time at 150 °C is improved from approximately 4 months to more 68 than 23 years. 69

- 70 As shown in Table 6, this effect is even more pronounced, if
- 71 the worst-case value 1.3 is assumed for the parameter n. Here
- 72 the mission time for the one-directional test at  $150 \,^{\circ}\text{C}$  is
- 73 already below one month (0.06 years).

|                 | t <sub>50</sub> [years] |                    | Mission Time T <sub>M</sub> ( | (6) MTIF   |      |
|-----------------|-------------------------|--------------------|-------------------------------|------------|------|
| (1) Temperature | (2) One_dir             | (3) Bi_dir         | (4) One_dir                   | (5) Bi_dir |      |
| 40 °C           | 1,256,322,005           | 67,231,850,134,437 | 1,812.49                      | 134,715.36 | 74.3 |
| 55 °C           | 273,659,613             | 14,644,845,786,190 | 394.81                        | 29,344.51  | 74.3 |
| 65 °C           | 106,800,588             | 5,715,414,563,517  | 154.08                        | 11,452.22  | 74.3 |
| 75 °C           | 43,995,813              | 2,354,428,135,529  | 63.47                         | 4,717.67   | 74.3 |
| 85 °C           | 19,043,970              | 1,019,134,655,764  | 27.47                         | 2,042.08   | 74.3 |
| 95 °C           | 8,626,992               | 461,671,892,875    | 12.45                         | 925.07     | 74.3 |
| 105 °C          | 4,075,207               | 218,083,963,268    | 5.88                          | 436.98     | 74.3 |
| 115 °C          | 2,000,888               | 107,077,115,625    | 2.89                          | 214.56     | 74.3 |
| 125 °C          | 1,018,155               | 54,486,343,169     | 1.47                          | 109.18     | 74.3 |
| 150 °C          | 216,254                 | 11,572,745,964     | 0.31                          | 23.19      | 74.3 |
| 175 °C          | 54,599                  | 2,921,816,617      | 0.08                          | 5.85       | 74.3 |

Table 5. Mission Times for One-directional and Bi-directional Test, n = 1.1, and k = 2(Current densities are 3139 A/cm<sup>2</sup> for One dir and 62 - 63 A/cm<sup>2</sup> for Bi dir).

Table 6. Mission Times for One-directional and Bi-directional Test, n = 1.3, and k =2 (Current densities are 3139 A/cm<sup>2</sup> for One dir and 62 - 63 A/cm<sup>2</sup> for Bi dir).

|                 | t <sub>50</sub> [years] |                    | Mission Time T <sub>M</sub> ( | (6) MTIF   |       |
|-----------------|-------------------------|--------------------|-------------------------------|------------|-------|
| (1) Temperature | (2) One_dir             | (3) Bi_dir         | (4) One_dir                   | (5) Bi_dir |       |
| 40 °C           | 251,045,731             | 29,405,876,606,661 | 362.18                        | 58,921.82  | 162.7 |
| 55 °C           | 54,684,288              | 6,405,352,458,728  | 78.89                         | 12,834.68  | 162.7 |
| 65 °C           | 21,341,526              | 2,499,796,619,469  | 30.79                         | 5,008.95   | 162.7 |
| 75 °C           | 8,791,503               | 1,029,769,273,309  | 12.68                         | 2,063.39   | 162.7 |
| 85 °C           | 3,805,478               | 445,739,972,966    | 5.49                          | 893.15     | 162.7 |
| 95 °C           | 1,723,895               | 201,918,234,454    | 2,49                          | 404,59     | 162.7 |
| 105 °C          | 814,331                 | 95,378,900,383     | 1.17                          | 191.11     | 162.7 |
| 115 °C          | 399,828                 | 46,827,694,797     | 0.58                          | 93.83      | 162.7 |
| 125 °C          | 203,453                 | 23,826,346,903     | 0.29                          | 47.74      | 162.7 |
| 150 °C          | 43,213                  | 5,058,257,698      | 0.06                          | 10.14      | 162.6 |
| 175 °C          | 10,910                  | 1,275,449,638      | 0.02                          | 2.56       | 162.4 |

21

7

8 Since the mission time improvement grows with the 9 parameter n according to Formula (7), the bi-directional test 10 can still ensure a reasonable mission of more than 10 years.

The highlighted trends are illustrated in Figure 11, where the 11 mission times for n = 1.3 are shown as a function of temper-12

13 ature in a logarithmic scale. The blue line corresponds to the

14 one-directional without self-healing, and the orange line to

15 the bi-directional case with self-healing.

16 It can be seen that the curves have more or less the same shape, which is in line with the almost constant mission time 17 improvement factor.

18

Mission times for k = 2 and n = 1.3100000 10000 Mission time [years] 1000 100 10 1 0.1 0.01 40 55 65 75 85 95 105 115 125 150 175 Temperature [°C] -TM one\_directional [years] -TM bi\_directional [years] Figure 11. Mission times for one- and bi-directional test.

19

9

3 4

5

|                 | T <sub>M</sub> (0.999999) one_directional [years] |             | T <sub>M</sub> (0.999999) bi_directional [years] |           | MTIF        |           |
|-----------------|---------------------------------------------------|-------------|--------------------------------------------------|-----------|-------------|-----------|
| (1) Temperature | (2) $k = 2$                                       | (3) $k = 4$ | (4) k = 2                                        | (5) k = 4 | (6) $k = 2$ | (7) k = 4 |
| 40 °C           | 362.18                                            | 270.98      | 58,921.82                                        | 43,094.73 | 162.7       | 159.0     |
| 55 °C           | 78.89                                             | 59.03       | 12,834.68                                        | 9,387.14  | 162.7       | 159.0     |
| 65 °C           | 30.79                                             | 23.04       | 5,008.95                                         | 3,663.49  | 162.7       | 159.0     |
| 75 °C           | 12.68                                             | 9.49        | 2,063.39                                         | 1,509.14  | 162.7       | 159.0     |
| 85 °C           | 5.49                                              | 4.11        | 893.15                                           | 653.24    | 162.7       | 159.0     |
| 95 °C           | 2.49                                              | 1.86        | 404.59                                           | 295.91    | 162.7       | 159.0     |
| 105 °C          | 1.17                                              | 0.88        | 191.11                                           | 139.78    | 162.7       | 159.0     |
| 115 °C          | 0.58                                              | 0.43        | 93.83                                            | 68.63     | 162.7       | 159.0     |
| 125 °C          | 0.29                                              | 0.22        | 47.74                                            | 34.92     | 162.7       | 159.0     |
| 150 °C          | 0.06                                              | 0.05        | 10.14                                            | 7.41      | 162.6       | 158.9     |
| 175 °C          | 0.02                                              | 0.01        | 2.56                                             | 1.87      | 162.4       | 158.7     |

Table 7. Comparing Mission Times for k = 2 and k = 4, n = 1.3.

Table 8. Mission Times for the Strategies Just\_A and Rotation, n = 1.3 and k = 4 (Current densities are 5100 A/cm<sup>2</sup> for Just\_A and 93 A/cm<sup>2</sup> for Rotation).

|                 | t <sub>50</sub> [years] |                    | Mission Time Tr | Mission Time T <sub>M</sub> (0.999999) [years] |       |  |
|-----------------|-------------------------|--------------------|-----------------|------------------------------------------------|-------|--|
| (1) Temperature | (2 Just_A               | (3) Rotation       | (4) Just_A      | (5) Rotation                                   |       |  |
| 40 °C           | 96,155,905,716          | 17,601,569,003,957 | 192.67          | 35,269.02                                      | 183.1 |  |
| 55 °C           | 20,945,256,894          | 3,834,073,234,943  | 41.97           | 7,682.50                                       | 183.1 |  |
| 65 °C           | 8,174,263,024           | 1,496,312,255,688  | 16.38           | 2,.998.23                                      | 183.1 |  |
| 75 °C           | 3,367,334,630           | 616,393,196,239    | 6.75            | 1,235.09                                       | 183.1 |  |
| 85 °C           | 1,457,579,825           | 266,808,788,527    | 2.92            | 534.62                                         | 183.0 |  |
| 95 °C           | 660,289,074             | 120,863,515,232    | 1.32            | 242.18                                         | 183.0 |  |
| 105 °C          | 311,906,364             | 57,091,825,547     | 0.62            | 114.40                                         | 183.0 |  |
| 115 °C          | 153,142,908             | 28,030,291,720     | 0.31            | 56.17                                          | 183.0 |  |
| 125 °C          | 77,926,909              | 14,262,232,281     | 0.16            | 28.58                                          | 183.0 |  |
| 150 °C          | 16,551,367              | 3,028,031,835      | 0.03            | 6.07                                           | 182.9 |  |
| 175 °C          | 4,178,733               | 763.663.990        | 0.01            | 1.53                                           | 182.8 |  |

6

7 The same experiments have been repeated for k = 4 where 4 8 aggressors are assumed on each side of the victim line and a 9 higher number of test patterns are needed. Table 7 compares 10 the mission times and the mission time improvement factors 11 for n = 1.3 to the previously discussed case of k = 2. As 12 expected, the longer test times for k = 4 result in a higher 13 stress and reduced mission times.

14 But also in this case, the bi-directional test provides a mis-15 sion time improvement factor of 159 and still ensures a mis-16 sion time for more than 7 years at 150 °C.

#### 17 B. General interconnect structures

This subsection focuses on more general interconnect struc-tures and presents the results for strategies *Just\_A* and *Rota-*

20 *tion*. As the basic trends are the same as discussed in Subsec-21 tion VI.A, only the worst-case results for n = 1.3 and k = 422 are reported in Table 8.

23 Again, the straightforward test application Just A is associated with an unacceptable reduction of the mission times at 24 25 higher temperatures. The self-healing effects introduced by 26 the Rotation scheme lead to a mission time improvement by 27 orders of magnitude. For example, at 150 °C, the Rotation 28 strategy still guarantees a mission time of 6 years. For the 29 more optimistic scenario with k = 2 and n = 1.1, the mission 30 time would even increase to more than 19 years.

Plotting the mission times as a function of temperature (cf.
Figure 12) shows the same general trends as in the case of
simple interconnects in Figure 11. Now, the blue line

4 5

corresponds to using only one sender without self-healing. 1 2 and the orange line to the rotation scheme with self-healing.



#### VII. CONCLUSIONS

7 Periodic interconnect testing is mandatory in safety critical 8 systems to monitor components with longer idle times as 9 well as standby spare units. However, the analysis in this paper has shown that a straightforward test strategy can lead 10 to stress-induced electromigration and drastically reduce the 11 12 mission time of the system. This effect gets extremely critical 13 at higher temperatures which occur for example in the automotive domain. The proposed EM-aware strategy 14 15 exploits self-healing effects triggered by reverse current. A 16 bidirectional test for simple interconnects and a rotating test 17 schedule for more complex interconnect structures improve 18 the available lifetime for the system workload by orders of 19 magnitude.

#### VIII. FUNDING

21 Parts of this work have been supported by the German 22 Research Foundation (DFG) under grants WU 245/19-1 and 23 HE1686/4-1, FAST.

IX. CONFLICT OF INTEREST 24

20

26

30

25 The authors declare that they have no competing interests.

27 The datasets generated and analyzed during the current study 28 are available from the corresponding author on reasonable 29 request.

#### REFERENCES

- 31 [Abella08] J. Abella, X. Vera, O. S. Unsal, O. Ergin, A. González, and J. 32 33 W. Tschanz, "Refueling: Preventing wire degradation due to electromigration," IEEE Micro, Vol. 28, No. 6, pp. 37-46, 2008.
- 34 [Abella10] J. Abella and X. Vera, "Electromigration for Microarchitects," 35 ACM Computer Surveys (CSUR), Vol. 42, No. 2, Article No. 9, pp. 1-36 18, 2010.
- 37 [AEC21] http://aecouncil.com/AECDocuments.html, accessed on 38 August 5, 2021.
- 39 [Agarwal07] K. Agarwal and F. Liu, "Efficient computation of current flow 40 in signal wires for reliability analysis," Proc. IEEE/ACM International

41 Conference on Computer-Aided Design (ICCAD'07), San Jose, CA, 42 USA, pp. 741-746, 2007.

- 43 [Bai00] X. Bai, S. Dey, and J. Rajski, "Self-Test Methodology for At-44 Speed Test of Crosstalk in Chip Interconnects," Proc. ACM/IEEE 45 Design Automation Conference (DAC '00), Los Angeles, CA, USA, 46 pp. 619-624, 2000.
- 47 [Bernardi16] P. Bernardi, R. Cantoro, S. De Luca, E. Sánchez, and A. 48 Sansonetti, "Development Flow for On-Line Core Self-Test of 49 Automotive Microcontrollers," IEEE Transactions on Computers, Vol. 50 65, No. 3, pp. 744-754, 2016.
- 51 52 53 54 [Blaauw03] D. T. Blaauw, C. Oh, V. Zolotov, and A. Dasgupta "Static electromigration analysis for on-chip signal interconnects," IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (TCAD), Vol. 22, No. 1, pp. 39-48, 2003.
- 55 [Black69a] J. R. Black, "Electromigration - A brief survey and some 56 57 recent results," IEEE Transactions on Electron Devices, Vol. 16, No. 4, pp. 338-347, 1969.
- 58 59 [Black69b] J. R. Black, "Electromigration failure modes in aluminum metallization for semiconductor devices," Proceedings of the IEEE, 60 Vol. 57, No. 9, pp. 1587-1594, 1969.
- [Caignet01] F. Caignet, S. Delmas-Bendhia, and E. Sicard, "The challenge of signal integrity in deep-submicrometer CMOS technology,' 63 Proceedings of the IEEE, Vol. 89, No. 4, pp. 556-573, 2001.

61

62

- 64 L. Chen, X. Bai, and S. Dey, "Testing for Interconnect [Chen02] 65 Crosstalk Defects Using On-Chip Embedded Processor Cores,' 66 Journal of Electronic Testing (JETTA), Vol. 18, No. 4-5, pp. 529-538, 67 2002.
- 68 [Chen98] H. H. Chen and J. and S. Neely, "Interconnect and Circuit 69 70 71 Modeling Techniques for Full-Chip Power Supply Noise Analysis," IEEE Transactions on Components, Packaging, and Manufacturing Technology: Part B, Vol. 21, No. 3, pp. 209-215, 1998.
- 72 73 74 [Chun07] S. Chun, Y. Kim, and S. Kang, "MDSI: Signal integrity interconnect fault modeling and testing for SOCs," Journal of Electronic Testing (JETTA), Vol. 23, No. 4, pp. 357-362, 2007.
- 75 76 77 78 [Cuviello99] M. Cuviello, S. Dey, X. Bai, and Y. Zhao, "Fault modeling and simulation for crosstalk in system-on-chip interconnects," Proc. IEEE/ACM International Conference on Computer-Aided Design (ICCAD'99), San Jose, CA, USA, pp. 297-303, 1999.
- 79 [D'Heurle71] F. M. d'Heurle, "Electromigration and failure in 80 electronics: An introduction," Proceedings of the IEEE, Vol. 59, No. 81 10, pp. 1409-1418, 1971.
- 82 83 [Doyen08] L. Doyen, E. Petitprez, P. Waltz, X. Federspiel, L. Arnaud, and Y. Wouters, "Extensive analysis of resistance evolution due to 84 85 electromigration induced degradation," Journal of Applied Physics, Vol. 104, pp. 123521(1-6), 2008.
- 86 [Eychenne16] C. Eychenne and Y. Zorian, "From manufacturing to 87 functional safety use, how infield built-in self-test architecture must 88 evolve to support real time system constraints," Handouts 1st IEEE 89 International Workshop on Automotive Reliability and Test (ART'16), 90 Forth Worth, TX, USA, 2016.
- C. Grecu, P. Pande, A. Ivanov, and R. Saleh, "BIST for [Grecu06] <u>92</u> network-on-chip interconnect infrastructures," Proc. 24th IEEE VLSI 93 Test Symposium (VTS'06), Berkeley, CA, USA, pp.30-35, 2006.
- 94 [Hassan88] A. Hassan, J. Rajski, and V. K. Agarwal, "Testing and <u>9</u>5 diagnosis of interconnects using boundary scan architecture," Proc. 96 IEEE International Test Conference (ITC'88), Washington, DC, USA, **9**7 pp. 126-137, 1988.
- 98 [Hellebrand02] S. Hellebrand, H. -. Wunderlich, A. A. Ivaniuk, Y. V. 99 Klimets and V. N. Yarmolik, "Efficient online and offline testing of 100 embedded DRAMs," IEEE Transactions on Computers, Vol. 51, No. 101 7, pp. 801-809, 2002.
- 102 [Hosseini11] A. Hosseini, and V. Shabro, "Electromigration-aware dynamic 103 routing algorithm for network-on-chip applications," International 104 Journal of High Performance Systems Architecture, Vol. 3, No. 1, pp. 105 56-63, 2011.
- 106 [Jutman04] A. Jutman, "At-speed on-chip diagnosis of board-level 107 interconnect faults," Proc. IEEE European Test Symposium (ETS'04), 108 Corsica, France, pp. 2-7, 2004.

1

- [Koren07] I. Koren and C. Mani Krishna, "Fault Tolerant Systems," Morgan Kaufmann Publishers (Elsevier), 2007.
- [Lienig18] J. Lienig and M. Thiele, "Fundamentals of Electromigration-Aware Integrated Circuit Design," Springer International Publishing AG. 2018.
- [Liu07] J. Liu, W. B. Jone and S. R. Das, "Pseudo-Exhaustive Built-in Self-Testing of Signal Integrity for High-Speed SoC Interconnects," Proc. IEEE Instrumentation & Measurement Technology Conference (IMTC'07), Warsaw, Poland, pp. 1-4, 2007.
- [Liu18] Y. Liu, N. Mukheriee, J. Raiski, S. M. Reddy and J. Tyszer, "Deterministic Stellar BIST for In-System Automotive Test," Proc. IEEE International Test Conference (ITC'18), Phoenix, AZ, USA, pp. 1-9, 2018.
- [Liu18a] C. Liu, E. Schneider, M. Kampmann, S. Hellebrand, and H.-J. Wunderlich, "Extending Aging Monitors for Early Life and Wear-out Failure Prevention," Proc. IEEE Asian Test Symposium (ATS'18), Hefei, Anhui, China, pp. 92-97, 2018.
- [Liu20] C. Liu, E. Scheider, and H.-J. Wunderlich, "Using Programmable Delay Monitors for Wear-Out and Early Life Failure Prediction," Proc. Design Automation and Test in Europe (DATE'20), Grenoble, France, pp. 1-6, 2020.
- [Livshits12] P. Livshits and S. Sofer, "Aggravated electromigration of copper interconnection lines in ULSI devices due to crosstalk noise," IEEE Transactions on Device and Materials Reliability, Vol. 12, No. 2, pp. 341-346, 2012.
- [Mariani05] R. Mariani and G. Boschi, "Scrubbing and partitioning for protection of memory systems," Proc. IEEE International On-Line Testing Symposium (IOLTS'05), Saint Raphael, French Riviera, France, pp. 195-196, 2005.
- [Mishra15] V. Mishra and S. S. Sapatnekar, "Circuit delay variability due to wire resistance evolution under AC electromigration," Proc. IEEE International Reliability Physics Symposium, Monterey, CA, USA, pp. 3D.3.1-3D.3.7, 2015.
- [Mohammadi14] M. Mohammadi, S. Sadeghi-Kohan, N. Masoumi, and Z. Navabi, "An off-line MDSI interconnect BIST incorporated in BS 1149.1," Proc. IEEE European Test Symposium (ETS'14), Paderborn, Germany, pp. 1-2, 2014.
- [Mukherjee19] N. Mukherjee, D. Tille, M. Sapati, Y. Liu, J. Maver, S. Milewski, E. Moghaddam, J. Rajski, J. Solecki, and J. Tyszer, "Test Time and Area Optimized BIST Scheme for Automotive ICs," Proc. IEEE International Test Conference (ITC'19), Washington, DC, USA, pp. 1-10, 2019.
- [Nadeau-Dostie99] B. Nadeau-Dostie, J.-F. Cote, H. Hulvershorn, and S. Pateras, "An Embedded Technique for At-Speed Interconnect Testing," Proc. IEEE International Test Conference (ITC'99), Atlantic City, NJ, USA, pp. 431-438, 1999.
- [Nardi19] A. Nardi, A. Armato, and F. Lertora, "Automotive Functional Safety Using LBIST and Other Detection Methods," Cadence White Paper, 2019.
- [Nourani01] M. Nourani and A. Attarha, "Built-in self-test for signal integrity," Proc. ACM/IEEE Design Automation Conference (DAC'01), Las Vegas, NV, USA, pp. 792-797, 2001.
- [Nourmandi-Pour10] R. Nourmandi-Pour, A. Khadem-Zadeh, and A. M. Rahmani, "An IEEE 1149.1-based BIST method for at-speed testing of inter-switch links in network on chip," Microelectronics Journal, Vol. 41, No. 7, pp. 417-429, 2010.
- [Pateras17] S. Pateras and T. Tai, "Automotive semiconductor test," Proc. International Symposium on VLSI Design, Automation and Test (VLSI-DAT'17), Hsinchu, Taiwan, pp. 1-4, 2017.
- [Pei10] S. Pei, H. Li, and X. Li, "An On-Chip Clock Generation Scheme for Faster-than-at-Speed Delay Testing," Proc. Design Automation and Test in Europe (DATE'10), Dresden, Germany, pp. 1353-1356, 2010.
- [Pei15] S. Pei, Y. Geng, H. Li, J. Liu, and S. Jin, "Enhanced LCCG: A 65 Novel Test Clock Generation Scheme for Faster-than-at-Speed Delay 66 Testing," Proc. 20th Asia and Sout Pacific Design Automation 67 Conference (ASP-DAC), Chiba, Japan, pp. 514-519, 2015.
- 68 [Pendurkar01] R. Pendurkar, A. Chatterjee, and Y. Zorian, "Switching 69 70 activity generation with automated BIST synthesis for performance testing of interconnects," IEEE Transactions on Computer-Aided

Design of Integrated Circuits and Systems (TCAD), Vol. 20, No. 9, pp. 1143-1158, 2001.

71

72

87

95

96

112

113

114

115

116

117

118

119

120

121

122

123

- 73 [Reimann14] F. Reimann, M. Glass, J. Teich, A. Cook, L.Gomez, D.Ull, H.-74 75 76 77 J. Wunderlich, P. Engelke, and U. Abelein, "Advanced diagnosis: SBST and BIST integration in automotive E/E architectures," Proc. ACM/EDAC/IEEE Design Automation Conference (DAC'14), San Francisco, CA, USA, pp. 1-6, 2014.
- 78 79 [Roy09] S. Roy and A. Dounavis, "Efficient delay and crosstalk modeling of RLC interconnects using delay algebraic equations," IEEE 80 Transaction Very Large Scale Integration Systems (TVLSI), Vol. 19, 81 No. 2, pp. 342-346, 2009.
- 82 [Rudnicki09] T. Rudnicki, T. Garbolino, K. Gucwa, and A. Hlawiczka, 83 "Effective BIST for Crosstalk Faults in Interconnects," Proc. 12th 84 International Symposium on Design and Diagnostics of Electronic 85 Circuits & Systems (DDECS'09), Liberec, pp. 164-169, 2009.
- 86 [Sadeghi-Kohan12] S. Sadeghi-Kohan, M. Namaki-Shoushtari, F. Javaheri, and Z. Navabi, "BS 1149.1 extensions for an online interconnect fault 88 detection and recovery," Proc. IEEE International Test Conference 89 (ITC'12), Anaheim, CA, USA, pp. 1-9, 2012.
- 90 [Sadeghi-Kohan17] S. Sadeghi-Kohan, M. Kamal, and Z. Navabi, "Self-91 92 Adjusting Monitor for Measuring Aging Rate and Advancement," IEEE Transactions on Emerging Topics In Computing, Vol. 8, No. 3, 93 pp. 627-641, 2020.
- 94 [Sadeghi-Kohan20] S. Sadeghi-Kohan and S. Hellebrand, "Dynamic Multi-Frequency Test Method for Hidden Interconnect Defects," Proc. 38th IEEE VLSI Test Symposium (VTS'20), pp. 1-6, 2020.
- 97 [Sapatnekar19] S. S. Sapatnekar, "Electromigration-Aware Interconnect 98 Design," Proc. ISPD'19, San Francisco, CA, USA, pp. 83-90, 2019.
- 99 [Schley17] G. Schley, A. Dalirsani, M. Eggenberger, N. Hatami, H. 100 Wunderlich, and M. Radetzki, "Multi-Layer Diagnosis for Fault-101 Tolerant Networks-on-Chip," IEEE Transactions on Computers, Vol. 102 66, No. 5, pp. 848-861, 2017.
- 103 [Sekar02] K. Sekar and S. Dey, "LI-BIST: A Low-Cost Self-Test 104Scheme for SoC Logic Cores and Interconnects," Proc. IEEE VLSI
- 105 [Tao93] J. Tao, N.W. Cheung, and C. Hu, "Metal electromigration damage 106 healing under bidirectional current stress," IEEE Electron Device 107 Letters, Vol. 14, No. 12, pp. 554-556, 1993.
- 108 [Tao94] J. Tao, N. W. Cheung, and C. Hu, "An electromigration failure 109 model for interconnects under pulsed and bidirectional current 110 stressing," IEEE Transactions on Electron Devices, Vol. 41, No. 4, pp. 111 539-545, 1994.
  - [Tao95] J. Tao, N. W. Cheung, and C. Hu, "Modeling electromigration lifetime under bidirectional current stress," IEEE Electron Device Letters, Vol. 16, No. 11, pp. 476-478, 1995.
  - [Tayade08] R. Tayade and J. A. Abraham, "On-chip Programmable Capture for Accurate Path Delay Test and Characterization," Proc. IEEE International Test Conference (ITC'08), Santa Clara, CA, USA, pp. 1-10, 2008.
  - [Tehranipoor03] M.H. Tehranipoor, N. Ahmed, and M. Nourani, "Multiple transition model and enhanced boundary scan architecture to test interconnects for signal integrity," Proc. IEEE International Conference on Computer Design (ICCD'03), San Jose, CA, USA, pp. 554-559, 2003.
- 124 [Tehranipoor04] M. H. Tehranipoor, N. Ahmed, and M. Nourani, "Testing 125 SoC interconnects for signal integrity using extended JTAG 126 architecture," IEEE Transactions on Computer-Aided Design of 127 Integrated Circuits and Systems (TCAD), Vol. 23, No. 5, pp. 800-811, 128 2004.
- 129 [Tu19] K. N. Tu and A. M. Gusak, "A unified model of mean-time-to-130 failure for electromigration, thermomigration, and stress-migration 131 based on entropy production," Journal of Applied Physics, Vol. 126, 132 No. 7, pp. 075109(1-6), 2019.
- 133 [Vassighi06] A. Vassighi and M. Sachdev, "Thermal and Power 134 Management of Integrated Circuits," Springer 2006
- 135 [Zhang15] H. Zhang, M. A. Kochte, E. Schneider, L. Bauer, H. 136 Wunderlich and J. Henkel, "STRAP: Stress-aware placement for aging 137 mitigation in runtime reconfigurable architectures," IEEE/ACM 138 International Conference on Computer-Aided Design (ICCAD'15), 139 Austin, TX, USA, pp. 38-45, 2015.