Research

Our research efforts pay special attention to topics related to test, reliability and fault tolerance of digital systems as well as innovative architectures for approximative and heterogeneous computing. A part of our work is done in cooperation with various partners coming both from national and international universities and from industry.

CA - Current Research Projects

An important problem in modern technology nodes in nano-electronics are early life failures, which often cause recalls of shipped products and incur high costs. An important root cause of such failures are marginal circuit structures, which pass a conventional manufacturing test, but are not able to cope with the later workload and stress in the field. Such structures can be identified on the basis of non-functional indicators, in particular by testing the timing behavior. For an effective and cost-efficient test of these indicators, the FAST project investigates novel scan designs and built-in self-test strategies for circuits, which can operate at frequencies beyond the functional specification to detect small deviations of the nominal timing behavior and thus potential early life failures.

since 02.2017, DFG-Project: WU 245/19-1

The project in detail:

State-of-the-art nanoscale technologies allow for the integration of billions of transistors with feature sizes of 14 nm or below into a single chip. This enables innovative approaches and solutions in many application domains, but it also comes along with fundamental challenges. Early life failures are particularly critical, as they can cause product recalls associated with a loss of billions of dollars. A major cause of early life failures are "weak" devices that operate correctly during manufacturing test, but cannot stand operational stress in the field. While other failure mechanisms, such as aging or external disturbances, to some extent, may be compensated by a robust design, potential early life failures must be detected by tests, and the respective systems have to be sorted out. This requires specific approaches far beyond today’s state-of-the-art.

As they work properly in the beginning, weak structures must be identified by analyzing the non-functional circuit behavior with the help of appropriate observables. Besides power consumption, the circuit timing is one of the most important reliability indicators. In particular, small delay faults may indicate marginal hardware that can degrade further under stress. However, they can be “hidden” at nominal frequency and only be detected at higher frequencies (“faster-than-at-speed test” / FAST). Therefore, conventional approaches for testing reach their limitations, and new methods must be investigated and developed in the following three domains:

  1. Specific techniques for „design for test“ (DFT) must be developed to deal with the challenges of testing beyond nominal frequency.
  2. Strategies for test scheduling must ensure that a maximum fault coverage is achieved with a minimum number of test frequencies and a short test time.
  3. Appropriate metrics are needed to quantify the coverage of weak devices. Here it is particularly challenging to distinguish the behavior of week devices from variations due to nanoscale integration.

Since FAST imposes extreme requirements on the automatic test equipment (ATE), it is very important to support an efficient implementation as a built-in self-test (BIST).

Within the framework of the project, strategies and solutions will be developed for the problems mentioned above. This way, the enormous cost of a traditional „burn-in“ test can be reduced, thus enabling the introduction of nanoscale technology to new application domains.

since 08.2014, DFG-Project: WU 245/17-1, WU 245/17-2

Please visit our project page for detailed information.

 

Project Description (Phase 2)

RSNs were initially brought up to manage the extensive amount of instrumentation in modern systems-on-chip to facilitate cost-efficient bring-up and debug, test, diagnosis and maintenance. Recently, the reuse of RSNs at system runtime for online fault classification and fault management moved into the center of research activities. Reasons are not only the increased complexity and dependability requirements in new technologies, but also the emerging application paradigms of self-aware and autonomous systems. Especially in safety-critical applications, online test, system monitoring and fault tolerance at low cost become mandatory. For example, the standard ISO 26262 specifies critical faults to be detected within certain test intervals at runtime and allows only a maximum fault reaction time until the system has to be transferred into a safe state. The periodic test is usually structure oriented and targets stuck-at, transition and delay faults.

It is common practice that the required tasks for initializing the periodic testing, for fault detection and for fault reaction are executed by the system functionality transparent in the background. Disadvantages of this approach are manifold: The periodic test and test evaluation constitute some significant additional workload, reduce performance, consume a large amount of additional power, and may take too much time for avoiding dangerous situations. Guaranteeing deadlines and verifying fault tolerance is extremely difficult as the properties have to be proven in the presence of faults.An alternative is the use of the non-functional infrastructure for concurrent fault detection and fault management, and first approaches to employ RSNs in in-system runtime tests have already been proposed. These first attempts still require a dedicated regular structure of RSNs and its permanent background operation which should be avoided in practice.

The results of the first phase of ACCESS provide an excellent basis for further research on the runtime use of RSNs. Since RSNs are integrated into the chip any way, the required cost of the modifications for runtime use are affordable even for a mass market like automotive. The goal of the second phase of ACCESS is a technique for a robust online use of RSNs to support safety, fault tolerance and reliability management.

This comprises:

  • In-system run-time test using RSNs
  • System wide collection of diagnosis information
  • Online diagnosis of RSNs
  • Investigation of robust and fault-tolerant RSNs

This work is supported by the German Research Foundation (DFG) under grant WU 245/17-2 (2019-2021).

Hans-Joachim Wunderlich (i.R.)
Prof. Dr. rer. nat. habil.

Hans-Joachim Wunderlich (i.R.)

Heading the Research Group Computer Architecture

To the top of the page