OTERA

OTERA: Online Test Strategies for Reliable Reconfigurable Architectures

Dynamically reconfigurable architectures enable a major acceleration of diverse applications by changing and optimizing the structure of the system at runtime. Permanent and transient faults threaten the correct operation of such an architecture. This project aims to increase dependability of runtime reconfigurable systems by a novel system-level strategy for online tests and online adaptation to an impaired state. This will be achieved by (a) scheduling such that tests for reconfigurable resources are executed with minimal performance impact, (b) resource management such that partially faulty resources are used for components which do not require the faulty elements, and (c) online monitoring and error checking. To ensure reliable runtime reconfiguration, each reconfiguration process is thoroughly tested by a novel and efficient combination of online structural and functional tests. Compared to existing fault-tolerance approaches, our proposal avoids the large hardware overhead of structural redundancy schemes. The saved resources are available for further application acceleration. Still, the proposed scheme covers faults in the fabric, in the reconfigured application logic and errors in the process of reconfiguration.

10.2010 - 06.2017, DFG-Project: WU 245/10-1, 10-2, 10-3

The project in detail:

In the framework of the SPP 1500 priority program, this project contributes to

  • Dependable Hardware Architectures,
  • Design Methods and
  • Operation, Observation and Adaptation.

Motivation

Dynamically reconfigurable architectures allow to adapt and optimize at runtime according to the current system state, load, and application. This enables a major acceleration of diverse applications with low hardware overhead. To achieve the desired execution of these applications, the underlying reconfigurable fabric must provide a high degree of reliability, i.e. error-free operation. In addition to classical fault models found in static VLSI hardware, new types of faults threaten the operation of such reconfigurable architectures and need to be considered. The reconfigured application Module may not only suffer from permanent faults in the underlying reconfigurable fabric (e.g. due to aging) but also from errors during the reconfiguration process and operation. In particular, an erroneous reconfiguration or a transient fault affecting the configuration memory changes the structure of the reconfigured application Module.

Effective methods for a manufacturing test of FPGA fabric exist, i.e. the tests can be performed in a reasonable time while covering the entire configurable fabric. Reconfigurability of the system can also be exploited to conduct efficient and thorough tests both of the fabric and the application logic. In case of faults, mitigation is possible by adaptation to the impaired state. This requires that the system is capable of detecting and diagnosing faults at runtime, and taking an appropriate adaptation decision.

Altogether, these approaches target non-reconfigurable systems that are implemented using a reconfigurable fabric. The reconfigurability can be used to increase the reliability in these cases. However, missing are techniques that address runtime reconfigurable systems that exploit partial runtime reconfiguration as part of their normal operation. Here, it is crucial to assure a reliable runtime reconfiguration that can be applied efficiently with low overhead, online in the field, with a limited number of hardware resources. Therefore, our novel OTERA project aims to realize a reliable reconfiguration and system adaptation that additionally provides a high resource efficiency, for instance by utilizing partially faulty FPGA structures, i.e. selectively reconfiguring those application Modules to a partially faulty area that do not demand the faulty parts.

Goals

This project aims to increase the dependability of reconfigurable systems by a novel online system-level strategy for reliable runtime reconfiguration and system adaptation. The whole approach over all three phases (3 x 2 years) comprises that:

  • Errors are detected concurrently and can be contained (do not spread system-wide),
  • Faults are detected in the reconfigurable fabric and reconfigured application logic to ensure correct completion of a reconfiguration,
  • Root causes of detected errors are determined by diagnosis,
  • Potential future errors are predicted (based on recent errors, online monitoring, and system load),
  • Reliable system operation is achieved by the runtime system that dynamically schedules test routines (while trading test coverage and system stress due to testing), and
  • Adaptation to an impaired state is managed by the runtime system with minimal impact to the application performance.

This work is supported by the German Research Foundation (DFG) under grant WU 245/10-1 (2011-2012), WU 245/10-2 (2013-2014), and WU 245/10-3 (2015-2016).

Books and Book Chapters

  1. 2019

    1. Advances in Hardware Reliability of Reconfigurable Many-core Embedded Systems. Lars Bauer; Hongyan Zhang; Michael A. Kochte; Eric Schneider; Hans-Joachim. Wunderlich and Jörg Henkel. In Many-Core Computing: Hardware and software, B. M. Al-Hashimi and G. V. Merrett (eds.). Institution of Engineering and Technology (IET), 2019, pp. 395--416. DOI: https://doi.org/10.1049/PBPC022E_ch16

Journals and Conference Proceedings

  1. 2017

    1. Aging Resilience and Fault Tolerance in Runtime Reconfigurable Architectures. Hongyan Zhang; Lars Bauer; Michael A. Kochte; Eric Schneider; Hans-Joachim Wunderlich and Jörg Henkel. IEEE Transactions on Computers 66, 6 (2017), pp. 957--970. DOI: https://doi.org/10.1109/TC.2016.2616405
  2. 2016

    1. Functional Diagnosis for Graceful Degradation of NoC Switches. Atefe Dalirsani and Hans-Joachim Wunderlich. In Proceedings of the 25th IEEE Asian Test Symposium (ATS’16), Hiroshima, Japan, 2016, pp. 246--251. DOI: https://doi.org/10.1109/ATS.2016.18
  3. 2015

    1. STRAP: Stress-Aware Placement for Aging Mitigation in Runtime Reconfigurable Architectures. Hongyan Zhang; Michael A. Kochte; Eric Schneider; Lars Bauer; Hans-Joachim Wunderlich and Jörg Henkel. In Proceedings of the 34th IEEE/ACM International Conference onComputer-Aided Design (ICCAD’15), Austin, Texas, USA, 2015, pp. 38–45.
    2. Adaptive Multi-Layer Techniques for Increased System Dependability. Lars Bauer; Jörg Henkel; Andreas Herkersdorf; Michael A. Kochte; Johannes M. Kühn; Wolfgang Rosenstiel; Thomas Schweizer; Stefan Wallentowitz; Volker Wenzel; Thomas Wild; Hans-Joachim Wunderlich and Hongyan Zhang. it - Information Technology 57, 3 (2015), pp. 149--158. DOI: https://doi.org/10.1515/itit-2014-1082
  4. 2014

    1. Resilience Articulation Point (RAP): Cross-layer Dependability Modeling for Nanometer System-on-chip Resilience. Andreas Herkersdorf; Hananeh Aliee; Michael Engel; Michael Glaß; Christina Gimmler-Dumont; Jörg Henkel; Veit B. Kleeberger; Michael A. Kochte; Johannes M. Kühn; Daniel Mueller-Gritschneder; Sani R. Nassif; Holm Rauchfuss; Wolfgang Rosenstiel; Ulf Schlichtmann; Muhammad Shafique; Mehdi B. Tahoori; Jürgen Teich; Norbert Wehn; Christian Weis and Hans-Joachim Wunderlich. Elsevier Microelectronics Reliability Journal 54, 6--7 (2014), pp. 1066--1074. DOI: https://doi.org/10.1016/j.microrel.2013.12.012
    2. GUARD: GUAranteed Reliability in Dynamically Reconfigurable Systems. Hongyan Zhang; Michael A. Kochte; Michael E. Imhof; Lars Bauer; Hans-Joachim Wunderlich and Jörg Henkel. In Proceedings of the 51st ACM/EDAC/IEEE Design Automation Conference (DAC’14), San Francisco, California, USA, 2014, pp. 1--6. DOI: https://doi.org/10.1145/2593069.2593146
  5. 2013

    1. Module Diversification: Fault Tolerance and Aging Mitigation for Runtime Reconfigurable Architectures. Hongyan Zhang; Lars Bauer; Michael A. Kochte; Eric Schneider; Claus Braun; Michael E. Imhof; Hans-Joachim Wunderlich and Jörg Henkel. In Proceedings of the IEEE International Test Conference (ITC’13), Anaheim, California, USA, 2013. DOI: https://doi.org/10.1109/TEST.2013.6651926
    2. SAT-based Code Synthesis for Fault-Secure Circuits. Atefe Dalirsani; Michael A. Kochte and Hans-Joachim Wunderlich. In Proceedings of the 16th IEEE Symp. Defect and Fault Tolerance in VLSI and Nanotechnology Systems (DFT’13), New York City, NY, USA, 2013, pp. 38--44. DOI: https://doi.org/10.1109/DFT.2013.6653580
    3. Test Strategies for Reliable Runtime Reconfigurable Architectures. Lars Bauer; Claus Braun; Michael E. Imhof; Michael A. Kochte; Eric Schneider; Hongyan Zhang; Jörg Henkel and Hans-Joachim Wunderlich. IEEE Transactions on Computers 62, 8 (2013), pp. 1494--1507. DOI: https://doi.org/10.1109/TC.2013.53
  6. 2012

    1. OTERA: Online Test Strategies for Reliable Reconfigurable Architectures. Lars Bauer; Claus Braun; Michael E. Imhof; Michael A. Kochte; Hongyan Zhang; Hans-Joachim Wunderlich and Jörg Henkel. In Proceedings of the NASA/ESA Conference on Adaptive Hardware and Systems (AHS’12), Erlangen, Germany, 2012, pp. 38--45. DOI: https://doi.org/10.1109/AHS.2012.6268667
    2. Transparent Structural Online Test for Reconfigurable Systems. Mohamed S. Abdelfattah; Lars Bauer; Claus Braun; Michael E. Imhof; Michael A. Kochte; Hongyan Zhang; Jörg Henkel and Hans-Joachim Wunderlich. In Proceedings of the 18th IEEE International On-Line Testing Symposium (IOLTS’12), Sitges, Spain, 2012, pp. 37--42. DOI: https://doi.org/10.1109/IOLTS.2012.6313838
  7. 2011

    1. Efficient BDD-based Fault Simulation in Presence of Unknown Values. Michael A. Kochte; S. Kundu; Kohei Miyase; Xiaoqing Wen and Hans-Joachim Wunderlich. In Proceedings of the 20th IEEE Asian Test Symposium (ATS’11), New Delhi, India, 2011, pp. 383--388. DOI: https://doi.org/10.1109/ATS.2011.52
    2. Design and Architectures for Dependable Embedded Systems. Jörg Henkel; Lars Bauer; Joachim Becker; Oliver Bringmann; Uwe Brinkschulte; Samarjit Chakraborty; Michael Engel; Rolf Ernst; Hermann Härtig; Lars Hedrich; Andreas Herkersdorf; Rüdiger Kapitza; Daniel Lohmann; Peter Marwedel; Marco Platzner; Wolfgang Rosenstiel; Ulf Schlichtmann; Olaf Spinczyk; Mehdi Tahoori; Jürgen Teich; Norbert Wehn and Hans-Joachim Wunderlich. In Proceedings of the 9th IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis (CODES+ISSS’11), Taipei, Taiwan, 2011, pp. 69--78. DOI: https://doi.org/10.1145/2039370.2039384

Workshop Contributions

  1. 2013

    1. Cross-Layer Dependability Modeling and Abstraction in Systems on Chip. Andreas Herkersdorf; Michael Engel; Michael Glaß; Jörg Henkel; Veit B. Kleeberger; Michael A. Kochte; Johannes M. Kühn; Sani R. Nassif; Holm Rauchfuss; Wolfgang Rosenstiel; Ulf Schlichtmann; Muhammad Shafique; Mehdi B. Tahoori; Jürgen Teich; Norbert Wehn; Christian Weis and Hans-Joachim Wunderlich. In Selse-9: The 9th Workshop on Silicon Errors in Logic - System Effects, Stanford, California, USA, 2013.
  2. 2012

    1. Fault Modeling in Testing. Stefan Holst; Michael A. Kochte and Hans-Joachim Wunderlich. In RAP Day Workshop, DFG SPP 1500, Munich, Germany, 2012.
Hans-Joachim Wunderlich (i.R.)
Prof. Dr. rer. nat. habil.

Hans-Joachim Wunderlich (i.R.)

Heading the Research Group Computer Architecture

To the top of the page