Zur Webseite der Uni Stuttgart

Claus Braun

Name:

Dr. rer. nat. Claus Braun
Gruppenleitung: Zuverlässiges Rechnen auf heterogenen und approximativen Architekturen

Adresse:

University of Stuttgart
Institute of Computer Architecture and Computer Engineering
Pfaffenwaldring 47
D-70569 Stuttgart
Germany

Raum:

3.171

Telefon:

(+49) (0)711 / 685-88407

Telefax:

(+49) (0)711 / 685-88288

E-Mail:

claus.braun@iti.uni-stuttgart.de


 

Projekte

PARSIVAL: Parallel High-Throughput Simulations for Efficient Nanoelectronic Design and Test Validation

Projektseite: PARSIVAL: Parallel High-Throughput Simulations for Efficient Nanoelectronic Design and Test Validation

Design and test validation is one of the most important and complex tasks within modern semi-conductor product development cycles. The design validation process analyzes and assesses a developed design with respect to certain validation targets to ensure its compliance with given specifications and customer requirements. Test validation evaluates the defect coverage obtained by certain test strategies and assesses the quality of the products tested and delivered. The validation targets include both, functional and non-functional properties, as well as the complex interactions and interdependencies between them. The validation means rely mainly on compute-intensive simulations which require more and more highly parallel hardware acceleration.

In this project novel methods for versatile simulation-based VLSI design and test validation on high throughput data-parallel architectures will be developed, which enable a wide range of important state-of-the-art validation tasks for large circuits. In general, due to the nature of the design validation processes and due to the massive amount of data involved, parallelism and throughput-optimization are the keys for making design validation feasible for future industrial-sized designs. The main focus and key features lie in the structure of the simulation model, the abstraction level and the used algorithms, as well as their parallelization on data-parallel architectures. The simulation algorithms should be kept simple to run fast, yet accurate enough to produce acceptable and valuable data for cross-layer validation of complex digital systems.

seit 10.2014, DFG-Projekt: WU 245/16-1    

Simulation on Reconfigurable Heterogeneous Computer Architectures

Projektseite: Simulation on Reconfigurable Heterogeneous Computer Architectures

Since the beginning of the DFG Cluster of Excellence "Simulation Technology" (SimTech) at the University of Stuttgart in 2008, the Institute of Computer Architecture and Computer Engineering (ITI, RA) is an active part of the research within the Stuttgart Research Center for Simulation Technology (SRC SimTech). The institute's research includes the development of fault tolerant simulation algorithms for new, tightly-coupled many-core computer architectures like GPUs, the acceleration of existing simulations on such architectures, as well as the mapping of complex simulation applications to innovative reconfigurable heterogeneous computer architectures

Within the research cluster, Hans-Joachim Wunderlich acts as a principal investigator (PI) and he co-coordinates the research activities of the SimTech Project Network PN2 "High-Performance Simulation across Computer Architectures". This project network is unique in terms of its interdisciplinary nature and its interfaces between the participating researchers and projects. Scientists from computer science, chemistry, physics and chemical engineering work together to develop and provide new solutions for some of the major challenges in simulation technology. The classes of computational problems treated within project network PN2 comprise quantum mechanics, molecular mechanics, electronic structure methods, molecular dynamics, Markov-chain Monte-Carlo simulations and polarizable force fields.

seit 06.2008, SimTech Exzellenz-Cluster    

OTERA: Online Test Strategies for Reliable Reconfigurable Architectures

Projektseite: Online Test Strategies for Reliable Reconfigurable Architectures

Dynamisch rekonfigurierbare Architekturen ermöglichen eine signifikante Beschleunigung verschiedener Anwendungen durch die Anpassung und Optimierung der Struktur des Systems zur Laufzeit. Permanente und transiente Fehler bedrohen die zuverlässigen Betrieb einer solchen Architektur. Dieses Projekt zielt darauf ab, die Zuverlässigkeit von Laufzeit-rekonfigurierbaren Systemen durch eine neuartige System- Level-Strategie für Online-Tests und Online-Anpassung an Fehler zu erhöhen. Dies wird erreicht durch (a) Scheduling, so dass Tests für rekonfigurierbare Ressourcen mit minimaler Auswirkung auf die Leistung ausgeführt werden, (b) Ressourcen-Management, so dass teilweise fehlerhafte Ressourcen für Komponenten verwendet werden, die den fehlerhaften Teil nicht verwenden, und (c) Online-Uberwachung und Error-Checking. Um eine zuverlässige Rekonfiguration zur Laufzeit zu gewährleisten, wird jeder Rekonfigurationsprozess durch eine neuartige und effiziente Kombination von Online-Struktur- und Funktionstests gründlich getestet. Im Vergleich zu bisherigen Fehlertoleranzkonzepten vermeidet dieser Ansatz die hohen Hardwarekosten von struktureller Redundanz. Die eingesparten Ressourcen können zur weiteren Beschleunigung der Anwendungen verwendet werden. Dennoch deckt das vorgeschlagene Verfahren Fehler in den rekonfigurierbaren Ressourcen, der Anwendungslogik und Fehler im Rekonfigurationsprozess ab.

seit 10.2010, DFG-Projekt: WU 245/10-1, 10-2, 10-3   

Publikationen

Zeitschriften und Konferenzberichte
Matching entries: 0
settings...
18. Energy-efficient and Error-resilient Iterative Solvers for Approximate Computing
Schöll, A., Braun, C. and Wunderlich, H.-J.
Proceedings of the 23rd IEEE International Symposium on On-Line Testing and Robust System Design (IOLTS'17), Thessaloniki, Greece, 3-5 July 2017, pp. 237-239
2017
DOI PDF 
Keywords: Approximate Computing, Energy-efficiency, Fault Tolerance, Quality Monitoring
Abstract: Iterative solvers like the Preconditioned Conjugate Gradient (PCG) method are widely-used in compute-intensive domains including science and engineering that often impose tight accuracy demands on computational results. At the same time, the error resilience of such solvers may change in the course of the iterations, which requires careful adaption of the induced approximation errors to reduce the energy demand while avoiding unacceptable results. A novel adaptive method is presented that enables iterative Preconditioned Conjugate Gradient (PCG) solvers on Approximate Computing hardware with high energy efficiency while still providing correct results. The method controls the underlying precision at runtime using a highly efficient fault tolerance technique that monitors the induced error and the quality of intermediate computational results.
BibTeX:
@inproceedings{SchoeBW2017,
  author = {Schöll, Alexander and Braun, Claus and Wunderlich, Hans-Joachim},
  title = {{Energy-efficient and Error-resilient Iterative Solvers for Approximate Computing}},
  booktitle = {Proceedings of the 23rd IEEE International Symposium on On-Line Testing and Robust System Design (IOLTS'17)},
  year = {2017},
  pages = {237--239},
  keywords = {Approximate Computing, Energy-efficiency, Fault Tolerance, Quality Monitoring},
  abstract = {Iterative solvers like the Preconditioned Conjugate Gradient (PCG) method are widely-used in compute-intensive domains including science and engineering that often impose tight accuracy demands on computational results. At the same time, the error resilience of such solvers may change in the course of the iterations, which requires careful adaption of the induced approximation errors to reduce the energy demand while avoiding unacceptable results. A novel adaptive method is presented that enables iterative Preconditioned Conjugate Gradient (PCG) solvers on Approximate Computing hardware with high energy efficiency while still providing correct results. The method controls the underlying precision at runtime using a highly efficient fault tolerance technique that monitors the induced error and the quality of intermediate computational results.},
  doi = {http://dx.doi.org/10.1109/IOLTS.2017.8046244},
  file = {http://www.iti.uni-stuttgart.de/fileadmin/rami/files/publications/2017/IOLTS_SchoeBW2017.pdf}
}
17. Applying Efficient Fault Tolerance to Enable the Preconditioned Conjugate Gradient Solver on Approximate Computing Hardware
Schöll, A., Braun, C. and Wunderlich, H.-J.
Proceedings of the IEEE International Symposium on Defect and Fault Tolerance in VLSI and Nanotechnology Systems (DFT'16), University of Connecticut, USA, 19-20 September 2016, pp. 21-26
DFT 2016 Best Paper Award
2016
DOI PDF 
Keywords: Approximate Computing, Fault Tolerance, Sparse Linear System Solving, Preconditioned Conjugate Gradient
Abstract: A new technique is presented that allows to execute the preconditioned conjugate gradient (PCG) solver on approximate hardware while ensuring correct solver results. This technique expands the scope of approximate computing to scientific and engineering applications. The changing error resilience of PCG during the solving process is exploited by different levels of approximation which trade off numerical accuracy and hardware utilization. Such approximation levels are determined at runtime by periodically estimating the error resilience. An efficient fault tolerance technique allows reductions in hardware utilization by ensuring the continued exploitation of maximum allowed energy-accuracy trade-offs. Experimental results show that the hardware utilization is reduced on average by 14.5% and by up to 41.0% compared to executing PCG on accurate hardware.
BibTeX:
@inproceedings{SchoeBW2016,
  author = {Schöll, Alexander and Braun, Claus and Wunderlich, Hans-Joachim},
  title = {{Applying Efficient Fault Tolerance to Enable the Preconditioned Conjugate Gradient Solver on Approximate Computing Hardware}},
  booktitle = {Proceedings of the IEEE International Symposium on Defect and Fault Tolerance in VLSI and Nanotechnology Systems (DFT'16)},
  year = {2016},
  pages = {21-26},
  keywords = {Approximate Computing, Fault Tolerance, Sparse Linear System Solving, Preconditioned Conjugate Gradient},
  abstract = {A new technique is presented that allows to execute the preconditioned conjugate gradient (PCG) solver on approximate hardware while ensuring correct solver results. This technique expands the scope of approximate computing to scientific and engineering applications. The changing error resilience of PCG during the solving process is exploited by different levels of approximation which trade off numerical accuracy and hardware utilization. Such approximation levels are determined at runtime by periodically estimating the error resilience. An efficient fault tolerance technique allows reductions in hardware utilization by ensuring the continued exploitation of maximum allowed energy-accuracy trade-offs. Experimental results show that the hardware utilization is reduced on average by 14.5% and by up to 41.0% compared to executing PCG on accurate hardware.},
  doi = {http://dx.doi.org/10.1109/DFT.2016.7684063},
  file = {http://www.iti.uni-stuttgart.de/fileadmin/rami/files/publications/2016/DFT_SchoeBW2016.pdf}
}
16. Pushing the Limits: How Fault Tolerance Extends the Scope of Approximate Computing
Wunderlich, H.-J., Braun, C. and Schöll, A.
Proceedings of the 22nd IEEE International Symposium on On-Line Testing and Robust System Design (IOLTS'16), Sant Feliu de Guixols, Catalunya, Spain, 4-6 July 2016, pp. 133-136
2016
DOI PDF 
Keywords: Approximate Computing, Variable Precision, Metrics, Characterization, Fault Tolerance
Abstract: Approximate computing in hardware and software promises significantly improved computational performance combined with very low power and energy consumption. This goal is achieved by both relaxing strict requirements on accuracy and precision, and by allowing a deviating behavior from exact Boolean specifications to a certain extent. Today, approximate computing is often limited to applications with a certain degree of inherent error tolerance, where perfect computational results are not always required. However, in order to fully utilize its benefits, the scope of applications has to be significantly extended to other compute-intensive domains including science and engineering. To meet the often rather strict quality and reliability requirements for computational results in these domains, the use of appropriate characterization and fault tolerance measures is highly required. In this paper, we evaluate some of the available techniques and how they may extend the scope of application for approximate computing.
BibTeX:
@inproceedings{WundeBS2016,
  author = {Wunderlich, Hans-Joachim and Braun, Claus and Schöll, Alexander},
  title = {{Pushing the Limits: How Fault Tolerance Extends the Scope of Approximate Computing}},
  booktitle = {Proceedings of the 22nd IEEE International Symposium on On-Line Testing and Robust System Design (IOLTS'16)},
  year = {2016},
  pages = {133--136},
  keywords = {Approximate Computing, Variable Precision, Metrics, Characterization, Fault Tolerance},
  abstract = {Approximate computing in hardware and software promises significantly improved computational performance combined with very low power and energy consumption. This goal is achieved by both relaxing strict requirements on accuracy and precision, and by allowing a deviating behavior from exact Boolean specifications to a certain extent. Today, approximate computing is often limited to applications with a certain degree of inherent error tolerance, where perfect computational results are not always required. However, in order to fully utilize its benefits, the scope of applications has to be significantly extended to other compute-intensive domains including science and engineering. To meet the often rather strict quality and reliability requirements for computational results in these domains, the use of appropriate characterization and fault tolerance measures is highly required. In this paper, we evaluate some of the available techniques and how they may extend the scope of application for approximate computing.},
  doi = {http://dx.doi.org/10.1109/IOLTS.2016.7604686},
  file = {http://www.iti.uni-stuttgart.de/fileadmin/rami/files/publications/2016/IOLTS_WundeBS2016.pdf}
}
15. Efficient Algorithm-Based Fault Tolerance for Sparse Matrix Operations
Schöll, A., Braun, C., Kochte, M.A. and Wunderlich, H.-J.
Proceedings of the 46th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN'16), Toulouse, France, 28 June-1 July 2016, pp. 251-262
2016
DOI PDF 
Keywords: Fault Tolerance, Sparse Linear Algebra, ABFT, Online Error Localization
Abstract: We propose a fault tolerance approach for sparse matrix operations that detects and implicitly locates errors in the results for efficient local correction. This approach reduces the runtime overhead for fault tolerance and provides high error coverage. Existing algorithm-based fault tolerance approaches for sparse matrix operations detect and correct errors, but they often rely on expensive error localization steps. General checkpointing schemes can induce large recovery cost for high error rates. For sparse matrix-vector multiplications, experimental results show an average reduction in runtime overhead of 43.8%, while the error coverage is on average improved by 52.2% compared to related work. The practical applicability is demonstrated in a case study using the iterative Preconditioned Conjugate Gradient solver. When scaling the error rate by four orders of magnitude, the average runtime overhead increases only by 31.3% compared to low error rates.
BibTeX:
@inproceedings{SchoeBKW2016,
  author = {Schöll, Alexander and Braun, Claus and Kochte, Michael A. and Wunderlich, Hans-Joachim},
  title = {{Efficient Algorithm-Based Fault Tolerance for Sparse Matrix Operations}},
  booktitle = {Proceedings of the 46th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN'16)},
  year = {2016},
  pages = {251--262},
  keywords = {Fault Tolerance, Sparse Linear Algebra, ABFT, Online Error Localization},
  abstract = {We propose a fault tolerance approach for sparse matrix operations that detects and implicitly locates errors in the results for efficient local correction. This approach reduces the runtime overhead for fault tolerance and provides high error coverage. Existing algorithm-based fault tolerance approaches for sparse matrix operations detect and correct errors, but they often rely on expensive error localization steps. General checkpointing schemes can induce large recovery cost for high error rates. For sparse matrix-vector multiplications, experimental results show an average reduction in runtime overhead of 43.8%, while the error coverage is on average improved by 52.2% compared to related work. The practical applicability is demonstrated in a case study using the iterative Preconditioned Conjugate Gradient solver. When scaling the error rate by four orders of magnitude, the average runtime overhead increases only by 31.3% compared to low error rates.},
  doi = {http://dx.doi.org/10.1109/DSN.2016.31},
  file = {http://www.iti.uni-stuttgart.de/fileadmin/rami/files/publications/2016/DSN_SchoeBKW2016.pdf}
}
14. Fault Tolerance of Approximate Compute Algorithms
Wunderlich, H.-J., Braun, C. and Schöll, A.
Proceedings of the 34th VLSI Test Symposium (VTS'16), Caesars Palace, Las Vegas, Nevada, USA, 25-27 April 2016
2016
DOI PDF 
Abstract: Approximate computing algorithms cover a wide range of different applications and the boundaries to domains like variable-precision computing, where the precision of the computations can be online adapted to the needs of the application [1, 2], as well as probabilistic and stochastic computing [3], which incorporate stochastic processes and probability distributions in the target computations, are sometimes blurred. The central idea of purely algorithm-based approximate computing is to transform algorithms, without necessarily requiring approximate hardware, to trade-off accuracy against energy. Early termination of algorithms that exhibit incremental refinement [4] reduces iterations at the cost of accuracy. Loop perforation [5] approximates iteratively-computed results by identifying and reducing loops that contribute only insignificantly to the solution. Another group of approximate algorithms is represented by neural networks, which can be trained to mimic certain algorithms and to compute approximate results [6]. Today, approximate computing is predominantly proposed for applications in multimedia and signal processing with a certain degree of inherent error tolerance. However, in order to fully utilize the benefits of these architectures, the scope of applications has to be significantly extended to other computeintensive tasks, for instance, in science and engineering. Such an extension requires that the allowed error or the required minimum precision of the application is either known beforehand or reliably determined online to deliver trustworthy and useful results. Errors outside the allowed range have to be reliably detected and tackled by appropriate fault tolerance measures.
BibTeX:
@inproceedings{WundeBS2016a,
  author = {Wunderlich, Hans-Joachim and Braun, Claus and Schöll, Alexander},
  title = {{Fault Tolerance of Approximate Compute Algorithms}},
  booktitle = {Proceedings of the 34th VLSI Test Symposium (VTS'16)},
  year = {2016},
  abstract = {Approximate computing algorithms cover a wide range of different applications and the boundaries to domains like variable-precision computing, where the precision of the computations can be online adapted to the needs of the application [1, 2], as well as probabilistic and stochastic computing [3], which incorporate stochastic processes and probability distributions in the target computations, are sometimes blurred. The central idea of purely algorithm-based approximate computing is to transform algorithms, without necessarily requiring approximate hardware, to trade-off accuracy against energy. Early termination of algorithms that exhibit incremental refinement [4] reduces iterations at the cost of accuracy. Loop perforation [5] approximates iteratively-computed results by identifying and reducing loops that contribute only insignificantly to the solution. Another group of approximate algorithms is represented by neural networks, which can be trained to mimic certain algorithms and to compute approximate results [6]. Today, approximate computing is predominantly proposed for applications in multimedia and signal processing with a certain degree of inherent error tolerance. However, in order to fully utilize the benefits of these architectures, the scope of applications has to be significantly extended to other computeintensive tasks, for instance, in science and engineering. Such an extension requires that the allowed error or the required minimum precision of the application is either known beforehand or reliably determined online to deliver trustworthy and useful results. Errors outside the allowed range have to be reliably detected and tackled by appropriate fault tolerance measures.},
  doi = {http://dx.doi.org/10.1109/VTS.2016.7477307},
  file = {http://www.iti.uni-stuttgart.de/fileadmin/rami/files/publications/2016/VTS_WundeBS2016.pdf}
}
13. Low-Overhead Fault-Tolerance for the Preconditioned Conjugate Gradient Solver
Schöll, A., Braun, C., Kochte, M.A. and Wunderlich, H.-J.
Proceedings of the International Symposium on Defect and Fault Tolerance in VLSI and Nanotechnology Systems (DFT'15), Amherst, Massachusetts, USA, 12-14 October 2015, pp. 60-65
2015
DOI PDF 
Keywords: Fault Tolerance, Sparse Linear System Solving, Preconditioned Conjugate Gradient, ABFT
Abstract: Linear system solvers are an integral part for many different compute-intensive applications and they benefit from the compute power of heterogeneous computer architectures. However, the growing spectrum of reliability threats for such nano-scaled CMOS devices makes the integration of fault tolerance mandatory. The preconditioned conjugate gradient (PCG) method is one widely used solver as it finds solutions typically faster compared to direct methods. Although this iterative approach is able to tolerate certain errors, latest research shows that the PCG solver is still vulnerable to transient effects. Even single errors, for instance, caused by marginal hardware, harsh environments, or particle radiation, can considerably affect execution times, or lead to silent data corruption. In this work, a novel fault-tolerant PCG solver with extremely low runtime overhead is proposed. Since the error detection method does not involve expensive operations, it scales very well with increasing problem sizes. In case of errors, the method selects between three different correction methods according to the identified error. Experimental results show a runtime overhead for error detection ranging only from 0.04% to 1.70%.
BibTeX:
@inproceedings{SchoeBKW2015a,
  author = {Schöll, Alexander and Braun, Claus and Kochte, Michael A. and Wunderlich, Hans-Joachim},
  title = {{Low-Overhead Fault-Tolerance for the Preconditioned Conjugate Gradient Solver}},
  booktitle = {Proceedings of the International Symposium on Defect and Fault Tolerance in VLSI and Nanotechnology Systems (DFT'15)},
  year = {2015},
  pages = {60-65},
  keywords = { Fault Tolerance, Sparse Linear System Solving, Preconditioned Conjugate Gradient, ABFT },
  abstract = {Linear system solvers are an integral part for many different compute-intensive applications and they benefit from the compute power of heterogeneous computer architectures. However, the growing spectrum of reliability threats for such nano-scaled CMOS devices makes the integration of fault tolerance mandatory. The preconditioned conjugate gradient (PCG) method is one widely used solver as it finds solutions typically faster compared to direct methods. Although this iterative approach is able to tolerate certain errors, latest research shows that the PCG solver is still vulnerable to transient effects. Even single errors, for instance, caused by marginal hardware, harsh environments, or particle radiation, can considerably affect execution times, or lead to silent data corruption. In this work, a novel fault-tolerant PCG solver with extremely low runtime overhead is proposed. Since the error detection method does not involve expensive operations, it scales very well with increasing problem sizes. In case of errors, the method selects between three different correction methods according to the identified error. Experimental results show a runtime overhead for error detection ranging only from 0.04% to 1.70%. },
  doi = {http://dx.doi.org/10.1109/DFT.2015.7315136},
  file = {http://www.iti.uni-stuttgart.de/fileadmin/rami/files/publications/2015/DFTS_SchoeBKW2015.pdf}
}
12. Efficient On-Line Fault-Tolerance for the Preconditioned Conjugate Gradient Method
Schöll, A., Braun, C., Kochte, M.A. and Wunderlich, H.-J.
Proceedings of the 21st IEEE International On-Line Testing Symposium (IOLTS'15), Elia, Halkidiki, Greece, 6-8 July 2015, pp. 95-100
2015
DOI PDF 
Keywords: Sparse Linear System Solving, Fault Tolerance, Preconditioned Conjugate Gradient, ABFT
Abstract: Linear system solvers are key components of many scientific applications and they can benefit significantly from modern heterogeneous computer architectures. However, such nano-scaled CMOS devices face an increasing number of reliability threats, which make the integration of fault tolerance mandatory. The preconditioned conjugate gradient method (PCG) is a very popular solver since it typically finds solutions faster than direct methods, and it is less vulnerable to transient effects. However, as latest research shows, the vulnerability is still considerable. Even single errors caused, for instance, by marginal hardware, harsh operating conditions or particle radiation can increase execution times considerably or corrupt solutions without indication. In this work, a novel and highly efficient fault-tolerant PCG method is presented. The method applies only two inner products to reliably detect errors. In case of errors, the method automatically selects between roll-back and efficient on-line correction. This significantly reduces the error detection overhead and expensive re-computations.
BibTeX:
@inproceedings{SchoeBKW2015,
  author = {Schöll, Alexander and Braun, Claus and Kochte, Michael A. and Wunderlich, Hans-Joachim},
  title = {{Efficient On-Line Fault-Tolerance for the Preconditioned Conjugate Gradient Method}},
  booktitle = {Proceedings of the 21st IEEE International On-Line Testing Symposium (IOLTS'15)},
  year = {2015},
  pages = {95--100},
  keywords = {Sparse Linear System Solving, Fault Tolerance, Preconditioned Conjugate Gradient, ABFT},
  abstract = {Linear system solvers are key components of many scientific applications and they can benefit significantly from modern heterogeneous computer architectures. However, such nano-scaled CMOS devices face an increasing number of reliability threats, which make the integration of fault tolerance mandatory. The preconditioned conjugate gradient method (PCG) is a very popular solver since it typically finds solutions faster than direct methods, and it is less vulnerable to transient effects. However, as latest research shows, the vulnerability is still considerable. Even single errors caused, for instance, by marginal hardware, harsh operating conditions or particle radiation can increase execution times considerably or corrupt solutions without indication. In this work, a novel and highly efficient fault-tolerant PCG method is presented. The method applies only two inner products to reliably detect errors. In case of errors, the method automatically selects between roll-back and efficient on-line correction. This significantly reduces the error detection overhead and expensive re-computations.},
  doi = {http://dx.doi.org/10.1109/IOLTS.2015.7229839},
  file = {http://www.iti.uni-stuttgart.de/fileadmin/rami/files/publications/2015/IOLTS_SchoeBKW2015.pdf}
}
11. Adaptive Parallel Simulation of a Two-Timescale-Model for Apoptotic Receptor-Clustering on GPUs
Schöll, A., Braun, C., Daub, M., Schneider, G. and Wunderlich, H.-J.
Proceedings of the IEEE International Conference on Bioinformatics and Biomedicine (BIBM'14), Belfast, United Kingdom, 2-5 November 2014, pp. 424-431
SimTech Best Paper Award
2014
DOI PDF 
Keywords: Heterogeneous computing, GPU computing, parallel particle simulation, multi-timescale model, adaptive Euler-Maruyama approximation, ligand-receptor aggregation
Abstract: Computational biology contributes important solutions for major biological challenges. Unfortunately, most applications in computational biology are highly computeintensive and associated with extensive computing times. Biological problems of interest are often not treatable with traditional simulation models on conventional multi-core CPU systems. This interdisciplinary work introduces a new multi-timescale simulation model for apoptotic receptor-clustering and a new parallel evaluation algorithm that exploits the computational performance of heterogeneous CPU-GPU computing systems. For this purpose, the different dynamics involved in receptor-clustering are separated and simulated on two timescales. Additionally, the time step sizes are adaptively refined on each timescale independently.
This new approach improves the simulation performance significantly and reduces computing times from months to hours for observation times of several seconds.
BibTeX:
@inproceedings{SchoeBDSW2014,
  author = {Schöll, Alexander and Braun, Claus and Daub, Markus and Schneider, Guido and Wunderlich, Hans-Joachim},
  title = {{Adaptive Parallel Simulation of a Two-Timescale-Model for Apoptotic Receptor-Clustering on GPUs}},
  booktitle = {Proceedings of the IEEE International Conference on Bioinformatics and Biomedicine (BIBM'14)},
  year = {2014},
  pages = {424--431},
  keywords = {Heterogeneous computing, GPU computing, parallel particle simulation, multi-timescale model, adaptive Euler-Maruyama approximation, ligand-receptor aggregation},
  abstract = {Computational biology contributes important solutions for major biological challenges. Unfortunately, most applications in computational biology are highly computeintensive and associated with extensive computing times. Biological problems of interest are often not treatable with traditional simulation models on conventional multi-core CPU systems. This interdisciplinary work introduces a new multi-timescale simulation model for apoptotic receptor-clustering and a new parallel evaluation algorithm that exploits the computational performance of heterogeneous CPU-GPU computing systems. For this purpose, the different dynamics involved in receptor-clustering are separated and simulated on two timescales. Additionally, the time step sizes are adaptively refined on each timescale independently.
This new approach improves the simulation performance significantly and reduces computing times from months to hours for observation times of several seconds.}, doi = {http://dx.doi.org/10.1109/BIBM.2014.6999195}, file = {http://www.iti.uni-stuttgart.de/fileadmin/rami/files/publications/2014/BIBM_SchoeBDSW2014.pdf} }
10. A-ABFT: Autonomous Algorithm-Based Fault Tolerance for Matrix Multiplications on Graphics Processing Units
Braun, C., Halder, S. and Wunderlich, H.-J.
Proceedings of the 44th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN'14), Atlanta, Georgia, USA, 23-26 June 2014, pp. 443-454
2014
DOI PDF 
Keywords: Algorithm-Based Fault Tolerance, Rounding Error Estimation, GPU, Matrix Multiplication
Abstract: Graphics processing units (GPUs) enable large-scale scientific applications and simulations on the desktop. To allow scientific computing on GPUs with high performance and reliability requirements, the application of software-based fault tolerance is attractive. Algorithm-Based Fault Tolerance (ABFT) protects important scientific operations like matrix multiplications. However, the application to floating-point operations necessitates the runtime classification of errors into inevitable rounding errors, allowed compute errors in the magnitude of such rounding errors, and into critical errors that are larger than those and not tolerable. Hence, an ABFT scheme needs suitable rounding error bounds to detect errors reliably. The determination of such error bounds is a highly challenging task, especially since it has to be integrated tightly into the algorithm and executed autonomously with low performance overhead.
In this work, A-ABFT for matrix multiplications on GPUs is introduced, which is a new, parallel ABFT scheme that determines rounding error bounds autonomously at runtime with low performance overhead and high error coverage.
BibTeX:
@inproceedings{BraunHW2014,
  author = {Braun, Claus and Halder, Sebastian and Wunderlich, Hans-Joachim},
  title = {{A-ABFT: Autonomous Algorithm-Based Fault Tolerance for Matrix Multiplications on Graphics Processing Units}},
  booktitle = {Proceedings of the 44th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN'14)},
  year = {2014},
  pages = {443--454},
  keywords = {Algorithm-Based Fault Tolerance, Rounding Error Estimation, GPU, Matrix Multiplication },
  abstract = {Graphics processing units (GPUs) enable large-scale scientific applications and simulations on the desktop. To allow scientific computing on GPUs with high performance and reliability requirements, the application of software-based fault tolerance is attractive. Algorithm-Based Fault Tolerance (ABFT) protects important scientific operations like matrix multiplications. However, the application to floating-point operations necessitates the runtime classification of errors into inevitable rounding errors, allowed compute errors in the magnitude of such rounding errors, and into critical errors that are larger than those and not tolerable. Hence, an ABFT scheme needs suitable rounding error bounds to detect errors reliably. The determination of such error bounds is a highly challenging task, especially since it has to be integrated tightly into the algorithm and executed autonomously with low performance overhead.
In this work, A-ABFT for matrix multiplications on GPUs is introduced, which is a new, parallel ABFT scheme that determines rounding error bounds autonomously at runtime with low performance overhead and high error coverage.}, doi = {http://dx.doi.org/10.1109/DSN.2014.48}, file = {http://www.iti.uni-stuttgart.de/fileadmin/rami/files/publications/2014/DSN_BraunHH2014.pdf} }
9. Module Diversification: Fault Tolerance and Aging Mitigation for Runtime Reconfigurable Architectures
Zhang, H., Bauer, L., Kochte, M.A., Schneider, E., Braun, C., Imhof, M.E., Wunderlich, H.-J. and Henkel, J.
Proceedings of the IEEE International Test Conference (ITC'13), Anaheim, California, USA, 10-12 September 2013
2013
DOI URL PDF 
Keywords: Reliability, online test, fault-tolerance, aging mitigation, partial runtime reconfiguration, FPGA
Abstract: Runtime reconfigurable architectures based on Field-Programmable Gate Arrays (FPGAs) are attractive for realizing complex applications. However, being manufactured in latest semiconductor process technologies, FPGAs are increasingly prone to aging effects, which reduce the reliability of such systems and must be tackled by aging mitigation and application of fault tolerance techniques. This paper presents module diversification, a novel design method that creates different configurations for runtime reconfigurable modules. Our method provides fault tolerance by creating the minimal number of configurations such that for any faulty Configurable Logic Block (CLB) there is at least one configuration that does not use that CLB. Additionally, we determine the fraction of time that each configuration should be used to balance the stress and to mitigate the aging process in FPGA-based runtime reconfigurable systems. The generated configurations significantly improve reliability by fault-tolerance and aging mitigation.
BibTeX:
@inproceedings{ZhangBKSBIWH2013,
  author = {Zhang, Hongyan and Bauer, Lars and Kochte, Michael A. and Schneider, Eric and Braun, Claus and Imhof, Michael E. and Wunderlich, Hans-Joachim and Henkel, Jörg},
  title = {{Module Diversification: Fault Tolerance and Aging Mitigation for Runtime Reconfigurable Architectures}},
  booktitle = {Proceedings of the IEEE International Test Conference (ITC'13)},
  year = {2013},
  keywords = {Reliability, online test, fault-tolerance, aging mitigation, partial runtime reconfiguration, FPGA},
  abstract = {Runtime reconfigurable architectures based on Field-Programmable Gate Arrays (FPGAs) are attractive for realizing complex applications. However, being manufactured in latest semiconductor process technologies, FPGAs are increasingly prone to aging effects, which reduce the reliability of such systems and must be tackled by aging mitigation and application of fault tolerance techniques. This paper presents module diversification, a novel design method that creates different configurations for runtime reconfigurable modules. Our method provides fault tolerance by creating the minimal number of configurations such that for any faulty Configurable Logic Block (CLB) there is at least one configuration that does not use that CLB. Additionally, we determine the fraction of time that each configuration should be used to balance the stress and to mitigate the aging process in FPGA-based runtime reconfigurable systems. The generated configurations significantly improve reliability by fault-tolerance and aging mitigation.},
  url = {http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=6651926},
  doi = {http://dx.doi.org/10.1109/TEST.2013.6651926},
  file = {http://www.iti.uni-stuttgart.de/fileadmin/rami/files/publications/2013/ITC_ZhangBKSBIWH2013.pdf}
}
8. Test Strategies for Reliable Runtime Reconfigurable Architectures
Bauer, L., Braun, C., Imhof, M.E., Kochte, M.A., Schneider, E., Zhang, H., Henkel, J. and Wunderlich, H.-J.
IEEE Transactions on Computers
Vol. 62(8), Los Alamitos, California, USA, August 2013, pp. 1494-1507
2013
DOI URL PDF 
Keywords: FPGA, Reconfigurable Architectures, Online Test
Abstract: FPGA-based reconfigurable systems allow the online adaptation to dynamically changing runtime requirements. The reliability of FPGAs, being manufactured in latest technologies, is threatened by soft errors, as well as aging effects and latent defects.To ensure reliable reconfiguration, it is mandatory to guarantee the correct operation of the reconfigurable fabric. This can be achieved by periodic or on-demand online testing. This paper presents a reliable system architecture for runtime-reconfigurable systems, which integrates two non-concurrent online test strategies: Pre-configuration online tests (PRET) and post-configuration online tests (PORT). The PRET checks that the reconfigurable hardware is free of faults by periodic or on-demand tests. The PORT has two objectives: It tests reconfigured hardware units after reconfiguration to check that the configuration process completed correctly and it validates the expected functionality. During operation, PORT is used to periodically check the reconfigured hardware units for malfunctions in the programmable logic. Altogether, this paper presents PRET, PORT, and the system integration of such test schemes into a runtime-reconfigurable system, including the resource management and test scheduling. Experimental results show that the integration of online testing in reconfigurable systems incurs only minimum impact on performance while delivering high fault coverage and low test latency.
BibTeX:
@article{BauerBIKSZHW2013,
  author = {Bauer, Lars and Braun, Claus and Imhof, Michael E. and Kochte, Michael A. and Schneider, Eric and Zhang, Hongyan and Henkel, Jörg and Wunderlich, Hans-Joachim},
  title = {{Test Strategies for Reliable Runtime Reconfigurable Architectures}},
  journal = {IEEE Transactions on Computers},
  publisher = {IEEE Computer Society},
  year = {2013},
  volume = {62},
  number = {8},
  pages = {1494--1507},
  keywords = {FPGA, Reconfigurable Architectures, Online Test},
  abstract = {FPGA-based reconfigurable systems allow the online adaptation to dynamically changing runtime requirements. The reliability of FPGAs, being manufactured in latest technologies, is threatened by soft errors, as well as aging effects and latent defects.To ensure reliable reconfiguration, it is mandatory to guarantee the correct operation of the reconfigurable fabric. This can be achieved by periodic or on-demand online testing. This paper presents a reliable system architecture for runtime-reconfigurable systems, which integrates two non-concurrent online test strategies: Pre-configuration online tests (PRET) and post-configuration online tests (PORT). The PRET checks that the reconfigurable hardware is free of faults by periodic or on-demand tests. The PORT has two objectives: It tests reconfigured hardware units after reconfiguration to check that the configuration process completed correctly and it validates the expected functionality. During operation, PORT is used to periodically check the reconfigured hardware units for malfunctions in the programmable logic. Altogether, this paper presents PRET, PORT, and the system integration of such test schemes into a runtime-reconfigurable system, including the resource management and test scheduling. Experimental results show that the integration of online testing in reconfigurable systems incurs only minimum impact on performance while delivering high fault coverage and low test latency.},
  url = {http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=6475939},
  doi = {http://dx.doi.org/10.1109/TC.2013.53},
  file = {http://www.iti.uni-stuttgart.de/fileadmin/rami/files/publications/2013/TC_BauerBIKSZHW2013.pdf}
}
7. Efficacy and Efficiency of Algorithm-Based Fault Tolerance on GPUs
Wunderlich, H.-J., Braun, C. and Halder, S.
Proceedings of the IEEE International On-Line Testing Symposium (IOLTS'13), Crete, Greece, 8-10 July 2013, pp. 240-243
2013
DOI PDF 
Keywords: Scientific Computing, GPGPU, Soft Errors, Fault Simulation, Algorithm-based Fault Tolerance
Abstract: Computer simulations drive innovations in science and industry, and they are gaining more and more importance. However, their high computational demand generates extraordinary challenges for computing systems. Typical highperformance computing systems, which provide sufficient performance and high reliability, are extremly expensive.
Modern GPUs offer high performance at very low costs, and they enable simulation applications on the desktop. However, they are increasingly prone to transient effects and other reliability threats. To fulfill the strict reliability requirements in scientific computing and simulation technology, appropriate fault tolerance measures have to be integrated into simulation applications for GPUs. Algorithm-Based Fault Tolerance on GPUs has the potential to meet these requirements.
In this work we investigate the efficiency and the efficacy of ABFT for matrix operations on GPUs. We compare ABFT against fault tolerance schemes that are based on redundant computations and we evaluate its error detection capabilities
BibTeX:
@inproceedings{WundeBH2013,
  author = {Wunderlich, Hans-Joachim and Braun, Claus and Halder, Sebastian},
  title = {{Efficacy and Efficiency of Algorithm-Based Fault Tolerance on GPUs}},
  booktitle = {Proceedings of the IEEE International On-Line Testing Symposium (IOLTS'13)},
  year = {2013},
  pages = {240--243},
  keywords = {Scientific Computing, GPGPU, Soft Errors, Fault Simulation, Algorithm-based Fault Tolerance},
  abstract = {Computer simulations drive innovations in science and industry, and they are gaining more and more importance. However, their high computational demand generates extraordinary challenges for computing systems. Typical highperformance computing systems, which provide sufficient performance and high reliability, are extremly expensive.
Modern GPUs offer high performance at very low costs, and they enable simulation applications on the desktop. However, they are increasingly prone to transient effects and other reliability threats. To fulfill the strict reliability requirements in scientific computing and simulation technology, appropriate fault tolerance measures have to be integrated into simulation applications for GPUs. Algorithm-Based Fault Tolerance on GPUs has the potential to meet these requirements.
In this work we investigate the efficiency and the efficacy of ABFT for matrix operations on GPUs. We compare ABFT against fault tolerance schemes that are based on redundant computations and we evaluate its error detection capabilities}, doi = {http://dx.doi.org/10.1109/IOLTS.2013.6604090}, file = {http://www.iti.uni-stuttgart.de/fileadmin/rami/files/publications/2013/IOLTS_WundeBH2013.pdf} }
6. Parallel Simulation of Apoptotic Receptor-Clustering on GPGPU Many-Core Architectures
Braun, C., Daub, M., Schöll, A., Schneider, G. and Wunderlich, H.-J.
Proceedings of the IEEE International Conference on Bioinformatics and Biomedicine (BIBM'12), Philadelphia, Pennsylvania, USA, 4-7 October 2012, pp. 1-6
2012
DOI PDF 
Keywords: GPGPU; parallel particle simulation; numerical modeling; apoptosis; receptor-clustering
Abstract: Apoptosis, the programmed cell death, is a physiological process that handles the removal of unwanted or damaged cells in living organisms. The process itself is initiated by signaling through tumor necrosis factor (TNF) receptors and ligands, which form clusters on the cell membrane. The exact function of this process is not yet fully understood and currently subject of basic research. Different mathematical models have been developed to describe and simulate the apoptotic receptor-clustering.
In this interdisciplinary work, a previously introduced model of the apoptotic receptor-clustering has been extended by a new receptor type to allow a more precise description and simulation of the signaling process. Due to the high computational requirements of the model, an ef?cient algorithmic mapping to a modern many-core GPGPU architecture has been developed. Such architectures enable high-performance computing (HPC) simulation tasks on the desktop at low costs. The developed mapping reduces average simulation times from months to days (peak speedup of 256x), allowing the productive use of the model in research.
BibTeX:
@inproceedings{BraunDSSW2012,
  author = {Braun, Claus and Daub, Markus and Schöll, Alexander and Schneider, Guido and Wunderlich, Hans-Joachim},
  title = {{Parallel Simulation of Apoptotic Receptor-Clustering on GPGPU Many-Core Architectures}},
  booktitle = {Proceedings of the IEEE International Conference on Bioinformatics and Biomedicine (BIBM'12)},
  year = {2012},
  pages = {1--6},
  keywords = {GPGPU; parallel particle simulation; numerical modeling; apoptosis; receptor-clustering},
  abstract = {Apoptosis, the programmed cell death, is a physiological process that handles the removal of unwanted or damaged cells in living organisms. The process itself is initiated by signaling through tumor necrosis factor (TNF) receptors and ligands, which form clusters on the cell membrane. The exact function of this process is not yet fully understood and currently subject of basic research. Different mathematical models have been developed to describe and simulate the apoptotic receptor-clustering.
In this interdisciplinary work, a previously introduced model of the apoptotic receptor-clustering has been extended by a new receptor type to allow a more precise description and simulation of the signaling process. Due to the high computational requirements of the model, an ef?cient algorithmic mapping to a modern many-core GPGPU architecture has been developed. Such architectures enable high-performance computing (HPC) simulation tasks on the desktop at low costs. The developed mapping reduces average simulation times from months to days (peak speedup of 256x), allowing the productive use of the model in research.}, doi = {http://dx.doi.org/10.1109/BIBM.2012.6392661}, file = {http://www.iti.uni-stuttgart.de//fileadmin/rami/files/publications/2012/BIBM_BraunDSSW2012.pdf} }
5. Acceleration of Monte-Carlo Molecular Simulations on Hybrid Computing Architectures
Braun, C., Holst, S., Wunderlich, H.-J., Castillo, J.M. and Gross, J.
Proceedings of the 30th IEEE International Conference on Computer Design (ICCD'12), Montreal, Canada, 30 September-3 October 2012, pp. 207-212
2012
DOI PDF 
Keywords: Hybrid Computer Architectures; GPGPU; Markov-Chain Monte-Carlo; Molecular Simulation; Thermodynamics
Abstract: Markov-Chain Monte-Carlo (MCMC) methods are an important class of simulation techniques, which execute a sequence of simulation steps, where each new step depends on the previous ones. Due to this fundamental dependency, MCMC methods are inherently hard to parallelize on any architecture. The upcoming generations of hybrid CPU/GPGPU architectures with their multi-core CPUs and tightly coupled many-core GPGPUs provide new acceleration opportunities especially for MCMC methods, if the new degrees of freedom are exploited correctly.
In this paper, the outcomes of an interdisciplinary collaboration are presented, which focused on the parallel mapping of a MCMC molecular simulation from thermodynamics to hybrid CPU/GPGPU computing systems. While the mapping is designed for upcoming hybrid architectures, the implementation of this approach on an NVIDIA Tesla system already leads to a substantial speedup of more than 87x despite the additional communication overheads.
BibTeX:
@inproceedings{BraunHWCG2012,
  author = {Braun, Claus and Holst, Stefan and Wunderlich, Hans-Joachim and Castillo, Juan Manuel and Gross, Joachim},
  title = {{Acceleration of Monte-Carlo Molecular Simulations on Hybrid Computing Architectures}},
  booktitle = {Proceedings of the 30th IEEE International Conference on Computer Design (ICCD'12)},
  publisher = {IEEE Computer Society},
  year = {2012},
  pages = {207--212},
  keywords = {Hybrid Computer Architectures; GPGPU; Markov-Chain Monte-Carlo; Molecular Simulation; Thermodynamics},
  abstract = {Markov-Chain Monte-Carlo (MCMC) methods are an important class of simulation techniques, which execute a sequence of simulation steps, where each new step depends on the previous ones. Due to this fundamental dependency, MCMC methods are inherently hard to parallelize on any architecture. The upcoming generations of hybrid CPU/GPGPU architectures with their multi-core CPUs and tightly coupled many-core GPGPUs provide new acceleration opportunities especially for MCMC methods, if the new degrees of freedom are exploited correctly. 
In this paper, the outcomes of an interdisciplinary collaboration are presented, which focused on the parallel mapping of a MCMC molecular simulation from thermodynamics to hybrid CPU/GPGPU computing systems. While the mapping is designed for upcoming hybrid architectures, the implementation of this approach on an NVIDIA Tesla system already leads to a substantial speedup of more than 87x despite the additional communication overheads.}, doi = {http://dx.doi.org/10.1109/ICCD.2012.6378642}, file = {http://www.iti.uni-stuttgart.de/fileadmin/rami/files/publications/2012/ICCD_BraunHWCG2012.pdf} }
4. Transparent Structural Online Test for Reconfigurable Systems
Abdelfattah, M.S., Bauer, L., Braun, C., Imhof, M.E., Kochte, M.A., Zhang, H., Henkel, J. and Wunderlich, H.-J.
Proceedings of the 18th IEEE International On-Line Testing Symposium (IOLTS'12), Sitges, Spain, 27-29 June 2012, pp. 37-42
2012
DOI PDF 
Keywords: FPGA; Reconfigurable Architectures; Online Test
Abstract: FPGA-based reconfigurable systems allow the online adaptation to dynamically changing runtime requirements. However, the reliability of modern FPGAs is threatened by latent defects and aging effects. Hence, it is mandatory to ensure the reliable operation of the FPGA’s reconfigurable fabric. This can be achieved by periodic or on-demand online testing. In this paper, a system-integrated, transparent structural online test method for runtime reconfigurable systems is proposed. The required tests are scheduled like functional workloads, and thorough optimizations of the test overhead reduce the performance impact. The proposed scheme has been implemented on a reconfigurable system. The results demonstrate that thorough testing of the reconfigurable fabric can be achieved at negligible performance impact on the application.
BibTeX:
@inproceedings{AbdelBBIKZHW2012,
  author = {Abdelfattah, Mohamed S. and Bauer, Lars and Braun, Claus and Imhof, Michael E. and Kochte, Michael A. and Zhang, Hongyan and Henkel, Jörg and Wunderlich, Hans-Joachim},
  title = {{Transparent Structural Online Test for Reconfigurable Systems}},
  booktitle = {Proceedings of the 18th IEEE International On-Line Testing Symposium (IOLTS'12)},
  publisher = {IEEE Computer Society},
  year = {2012},
  pages = {37--42},
  keywords = {FPGA; Reconfigurable Architectures; Online Test},
  abstract = {FPGA-based reconfigurable systems allow the online adaptation to dynamically changing runtime requirements. However, the reliability of modern FPGAs is threatened by latent defects and aging effects. Hence, it is mandatory to ensure the reliable operation of the FPGA’s reconfigurable fabric. This can be achieved by periodic or on-demand online testing. In this paper, a system-integrated, transparent structural online test method for runtime reconfigurable systems is proposed. The required tests are scheduled like functional workloads, and thorough optimizations of the test overhead reduce the performance impact. The proposed scheme has been implemented on a reconfigurable system. The results demonstrate that thorough testing of the reconfigurable fabric can be achieved at negligible performance impact on the application.},
  doi = {http://dx.doi.org/10.1109/IOLTS.2012.6313838},
  file = {http://www.iti.uni-stuttgart.de/fileadmin/rami/files/publications/2012/IOLTS_AbdelBBIKZHW2012.pdf}
}
3. OTERA: Online Test Strategies for Reliable Reconfigurable Architectures
Bauer, L., Braun, C., Imhof, M.E., Kochte, M.A., Zhang, H., Wunderlich, H.-J. and Henkel, J.
Proceedings of the NASA/ESA Conference on Adaptive Hardware and Systems (AHS'12), Erlangen, Germany, 25-28 June 2012, pp. 38-45
2012
DOI PDF 
Abstract: FPGA-based reconfigurable systems allow the online adaptation to dynamically changing runtime requirements. However, the reliability of FPGAs, which are manufactured in latest technologies, is threatened not only by soft errors, but also by aging effects and latent defects. To ensure reliable reconfiguration, it is mandatory to guarantee the correct operation of the underlying reconfigurable fabric. This can be achieved by periodic or on-demand online testing. The OTERA project develops and evaluates components and strategies for reconfigurable systems that feature reliable reconfiguration. The research focus ranges from structural online tests for the FPGA infrastructure and functional online tests for the configured functionality up to the resource management and test scheduling. This paper gives an overview of the project tasks and presents first results.
BibTeX:
@inproceedings{BauerBIKZWH2012,
  author = {Bauer, Lars and Braun, Claus and Imhof, Michael E. and Kochte, Michael A. and Zhang, Hongyan and Wunderlich, Hans-Joachim and Henkel, Jörg},
  title = {{OTERA: Online Test Strategies for Reliable Reconfigurable Architectures}},
  booktitle = {Proceedings of the NASA/ESA Conference on Adaptive Hardware and Systems (AHS'12)},
  publisher = {IEEE Computer Society},
  year = {2012},
  pages = {38--45},
  abstract = {FPGA-based reconfigurable systems allow the online adaptation to dynamically changing runtime requirements. However, the reliability of FPGAs, which are manufactured in latest technologies, is threatened not only by soft errors, but also by aging effects and latent defects. To ensure reliable reconfiguration, it is mandatory to guarantee the correct operation of the underlying reconfigurable fabric. This can be achieved by periodic or on-demand online testing. The OTERA project develops and evaluates components and strategies for reconfigurable systems that feature reliable reconfiguration. The research focus ranges from structural online tests for the FPGA infrastructure and functional online tests for the configured functionality up to the resource management and test scheduling. This paper gives an overview of the project tasks and presents first results.},
  doi = {http://dx.doi.org/10.1109/AHS.2012.6268667},
  file = {http://www.iti.uni-stuttgart.de/fileadmin/rami/files/publications/2012/AHS_BauerBIKZWH2012.pdf}
}
2. Algorithmen-basierte Fehlertoleranz für Many-Core-Architekturen;
Algorithm-based Fault-Tolerance on Many-Core Architectures

Braun, C. and Wunderlich, H.-J.
it - Information Technology
Vol. 52(4), August 2010, pp. 209-215
2010
DOI  
Keywords: Zuverlässigkeit; Fehlertoleranz; parallele Architekturen; parallele Programmierung
Abstract: Moderne Many-Core-Architekturen bieten ein sehr hohes Potenzial an Rechenleistung. Dies macht sie besonders für Anwendungen aus dem Bereich des wissenschaftlichen Hochleistungsrechnens und der Simulationstechnik attraktiv. Die Architekturen folgen dabei einem Ausführungsparadigma, das sich am besten durch den Begriff ?Many-Threading? beschreiben lässt. Wie alle nanoelektronischen Halbleiterschaltungen leiden auch Many-Core-Prozessoren potentiell unter störenden Einflüssen von transienten Fehlern (soft errors) und diversen Arten von Variationen. Diese Faktoren können die Zuverlässigkeit von Systemen negativ beeinflussen und erfordern Fehlertoleranz auf allen Ebenen, von der Hardware bis zur Software. Auf der Softwareseite stellt die Algorithmen-basierte Fehlertoleranz (ABFT) eine ausgereifte Technik zur Verbesserung der Zuverlässigkeit dar. Der Aufwand für die Anpassung dieser Technik an moderne Many-Threading-Architekturen darf jedoch keinesfalls unterschätzt werden. In diesem Beitrag wird eine effiziente und fehlertolerante Abbildung der Matrixmultiplikation auf eine moderne Many-Core-Architektur präsentiert. Die Fehlertoleranz ist dabei integraler Bestandteil der Abbildung und wird durch ein ABFT-Schema realisiert, das die Leistung nur unwesentlich beeinträchtigt.

Modern many-core architectures provide a high computational potential, which makes them particularly interesting for applications from the fields of scientific high-performance computing and simulation technology. The execution paradigm of these architectures is best described as “Many-Threading”. Like all nano-scaled semiconductor devices, many-core processors are prone to transient errors (soft errors) and different kinds of variations that can have severe impact on the reliability of such systems. Therefore, fault-tolerance has to be incorporated at all levels, from the hardware up to the software. On the software side, Algorithm-based Fault Tolerance (ABFT) is a mature technique to improve the reliability. However, significant effort is required to adapt this technique to modern many-threading architectures. In this article, an efficient and fault-tolerant mapping of the matrix multiplication to a modern many-core architecture is presented. Fault-tolerance is thereby an integral part of the mapping and implemented through an ABFT scheme with marginal impact on the overall performance.

BibTeX:
@article{BraunW2010a,
  author = {Braun, Claus and Wunderlich, Hans-Joachim},
  title = {{Algorithmen-basierte Fehlertoleranz für Many-Core-Architekturen;
Algorithm-based Fault-Tolerance on Many-Core Architectures}}, journal = {it - Information Technology}, publisher = {Oldenbourg Wissenschaftsverlag}, year = {2010}, volume = {52}, number = {4}, pages = {209--215}, keywords = {Zuverlässigkeit; Fehlertoleranz; parallele Architekturen; parallele Programmierung}, abstract = {Moderne Many-Core-Architekturen bieten ein sehr hohes Potenzial an Rechenleistung. Dies macht sie besonders für Anwendungen aus dem Bereich des wissenschaftlichen Hochleistungsrechnens und der Simulationstechnik attraktiv. Die Architekturen folgen dabei einem Ausführungsparadigma, das sich am besten durch den Begriff ?Many-Threading? beschreiben lässt. Wie alle nanoelektronischen Halbleiterschaltungen leiden auch Many-Core-Prozessoren potentiell unter störenden Einflüssen von transienten Fehlern (soft errors) und diversen Arten von Variationen. Diese Faktoren können die Zuverlässigkeit von Systemen negativ beeinflussen und erfordern Fehlertoleranz auf allen Ebenen, von der Hardware bis zur Software. Auf der Softwareseite stellt die Algorithmen-basierte Fehlertoleranz (ABFT) eine ausgereifte Technik zur Verbesserung der Zuverlässigkeit dar. Der Aufwand für die Anpassung dieser Technik an moderne Many-Threading-Architekturen darf jedoch keinesfalls unterschätzt werden. In diesem Beitrag wird eine effiziente und fehlertolerante Abbildung der Matrixmultiplikation auf eine moderne Many-Core-Architektur präsentiert. Die Fehlertoleranz ist dabei integraler Bestandteil der Abbildung und wird durch ein ABFT-Schema realisiert, das die Leistung nur unwesentlich beeinträchtigt.

Modern many-core architectures provide a high computational potential, which makes them particularly interesting for applications from the fields of scientific high-performance computing and simulation technology. The execution paradigm of these architectures is best described as “Many-Threading”. Like all nano-scaled semiconductor devices, many-core processors are prone to transient errors (soft errors) and different kinds of variations that can have severe impact on the reliability of such systems. Therefore, fault-tolerance has to be incorporated at all levels, from the hardware up to the software. On the software side, Algorithm-based Fault Tolerance (ABFT) is a mature technique to improve the reliability. However, significant effort is required to adapt this technique to modern many-threading architectures. In this article, an efficient and fault-tolerant mapping of the matrix multiplication to a modern many-core architecture is presented. Fault-tolerance is thereby an integral part of the mapping and implemented through an ABFT scheme with marginal impact on the overall performance.}, doi = {http://dx.doi.org/10.1524/itit.2010.0593} }

1. Algorithm-Based Fault Tolerance for Many-Core Architectures
Braun, C. and Wunderlich, H.-J.
Proceedings of the 15th IEEE European Test Symposium (ETS'10), Praha, Czech Republic, 24-28 May 2010, pp. 253-253
2010
DOI PDF 
Abstract: Modern many-core architectures with hundreds of cores provide a high computational potential. This makes them particularly interesting for scientific high-performance computing and simulation technology. Like all nano scaled semiconductor devices, many-core processors are prone to reliability harming factors like variations and soft errors. One way to improve the reliability of such systems is software-based hardware fault tolerance. Here, the software is able to detect and correct errors introduced by the hardware. In this work, we propose a software-based approach to improve the reliability of matrix operations on many-core processors. These operations are key components in many scientific applications.
BibTeX:
@inproceedings{BraunW2010,
  author = {Braun, Claus and Wunderlich, Hans-Joachim},
  title = {{Algorithm-Based Fault Tolerance for Many-Core Architectures}},
  booktitle = {Proceedings of the 15th IEEE European Test Symposium (ETS'10)},
  publisher = {IEEE Computer Society},
  year = {2010},
  pages = {253--253},
  abstract = {Modern many-core architectures with hundreds of cores provide a high computational potential. This makes them particularly interesting for scientific high-performance computing and simulation technology. Like all nano scaled semiconductor devices, many-core processors are prone to reliability harming factors like variations and soft errors. One way to improve the reliability of such systems is software-based hardware fault tolerance. Here, the software is able to detect and correct errors introduced by the hardware. In this work, we propose a software-based approach to improve the reliability of matrix operations on many-core processors. These operations are key components in many scientific applications.},
  doi = {http://dx.doi.org/10.1109/ETSYM.2010.5512738},
  file = {http://www.iti.uni-stuttgart.de//fileadmin/rami/files/publications/2010/ETS_BraunW2010.pdf}
}
Created by JabRef on 06/12/2017.
Workshop-Beiträge
Matching entries: 0
settings...
3. Hardware/Software Co-Characterization for Approximate Computing
Schöll, A., Braun, C. and Wunderlich, H.-J.
Workshop on Approximate Computing, Pittsburgh, Pennsylvania, USA, 06 October 2016
2016
 
BibTeX:
@inproceedings{SchoeBW2016,
  author = {Schöll, Alexander and Braun, Claus and Wunderlich, Hans-Joachim},
  title = {{Hardware/Software Co-Characterization for Approximate Computing}},
  booktitle = {Workshop on Approximate Computing},
  year = {2016}
}
2. ABFT with Probabilistic Error Bounds for Approximate and Adaptive-Precision Computing Applications
Braun, C. and Wunderlich, H.-J.
Workshop on Approximate Computing, Paderborn, Germany, 15-16 October 2015
2015
 
BibTeX:
@inproceedings{BraunW2015,
  author = {Braun, Claus and Wunderlich, Hans-Joachim},
  title = {{ABFT with Probabilistic Error Bounds for Approximate and Adaptive-Precision Computing Applications}},
  booktitle = {Workshop on Approximate Computing},
  year = {2015}
}
1. A-ABFT: Autonomous Algorithm-Based Fault Tolerance on GPUs
Braun, C., Halder, S. and Wunderlich, H.-J.
International Workshop on Dependable GPU Computing, in conjunction with the ACM/IEEE DATE'14 Conference, Dresden, Germany, 28 March 2014
2014
 
Keywords: Algorithm-Based Fault Tolerance, Graphics Processing Units, Scientific Computing, Simulation Technology, Floating-Point Arithmetic, Roundoff Error Analysis, Error Tolerance Determination
Abstract: General-purpose computations on graphics processing units (GPUs) enable large-scale scientific applications and simulations on the desktop. Such applications typically have high performance and reliability requirements. For GPUs, which are still designed for the graphics mass-market, hardware-based fault tolerance measures often do not have the highest priority, which makes the application of appropriate software-based fault tolerance mandatory.
Algorithm-based Fault Tolerance (ABFT) allows the efficient and effective protection of important kernels from scientific computing. Some ABFT schemes have already been adapted for GPU architectures. However, due to roundoff error introduced by floating-point arithmetic, ABFT requires the determination of tight error bounds for the error detection. The determination of such error bounds is a highly challenging task.
In this work, we introduce A-ABFT for GPUs, a new parallel ABFT scheme that determines appropriate error bounds for the checksum comparison step autonomously and which therefore enables the transparent operation of ABFT without any user interaction.
BibTeX:
@inproceedings{BraunHW2014,
  author = {Braun, Claus and Halder, Sebastian and Wunderlich, Hans-Joachim},
  title = {{A-ABFT: Autonomous Algorithm-Based Fault Tolerance on GPUs}},
  booktitle = {International Workshop on Dependable GPU Computing, in conjunction with the ACM/IEEE DATE'14 Conference},
  year = {2014},
  keywords = {Algorithm-Based Fault Tolerance, Graphics Processing Units, Scientific Computing, Simulation Technology, Floating-Point Arithmetic, Roundoff Error Analysis, Error Tolerance Determination},
  abstract = {General-purpose computations on graphics processing units (GPUs) enable large-scale scientific applications and simulations on the desktop. Such applications typically have high performance and reliability requirements. For GPUs, which are still designed for the graphics mass-market, hardware-based fault tolerance measures often do not have the highest priority, which makes the application of appropriate software-based fault tolerance mandatory.
Algorithm-based Fault Tolerance (ABFT) allows the efficient and effective protection of important kernels from scientific computing. Some ABFT schemes have already been adapted for GPU architectures. However, due to roundoff error introduced by floating-point arithmetic, ABFT requires the determination of tight error bounds for the error detection. The determination of such error bounds is a highly challenging task.
In this work, we introduce A-ABFT for GPUs, a new parallel ABFT scheme that determines appropriate error bounds for the checksum comparison step autonomously and which therefore enables the transparent operation of ABFT without any user interaction.} }
Created by JabRef on 06/12/2017.

 

Lehre

Vorlesungen, Übungen und Seminare

SS 2016

Into Darkness: Challenges in the Dark Silicon Era

WS 2015/16

Hardware-Based Fault Tolerance

SS 2015

Grundlagen der Rechnerarchitektur, Advanced Processor Architecture

Design, Test and Application of Emerging Computer Architectures

WS 2013/14

Hardware Infrastructure for Safety and Security

SS 2012

Reconfigurable Hardware Architectures

SS 2011

Grundlagen der Rechnerarchitektur, Advanced Processor Architecture

Safety-critical Hardware/Software Systems

Parallel Programming (SimTech GS Seminar)

WS 2009/10

Grundlagen der Rechnerarchitektur, Advanced Processor Architecture

SS 2009

Reliable Networks-On-Chip in the Many-Core Era

WS 2008/09

Grundlagen der Rechnerarchitektur, Advanced Processor Architecture

Master-, Diplom- und Studienarbeiten, Projektarbeiten

2013

Integration von algorithmenbasierter Fehlertoleranz in grundlegende Operationen der lineare Algebra auf GPGPUs
Diplomarbeit, S. Halder, 16.10.2013 - 17.04.2014

2012

Online Self-Test Wrapper for Runtime-Reconfigurable Systems
Master Thesis, J. Wang, 03.12.2012 - 02.06.2013


Framework für beschleunigte Monte Carlo Molekularsimulationen auf hybriden Architekturen
Studienarbeit, S. Halder, 01.06.2012 - 01.12.2012


Effiziente mehrwertige Logiksimulation verzögerungsbehafteter Schaltungen auf datenparallelen Architekturen
Diplomarbeit, A. Schöll, 01.06.2012 - 01.12.2012


Ebenenübergreifende Simulation des HaPra-Prozessors
Software-Praktikum, A. Milutinovic

2011

Parallele Partikelsimulation auf GPGPU-Architekturen zur Evaluierung von Apoptose-Signalwegen
Studienarbeit, A. Schöll, 01.09.2011 - 02.03.2012


Evaluation of Advanced Techniques for Structural FPGA Self-Test
Master Thesis, M. Abdelfattah, 01.03.2011 - 31.08.2011

Implementing Density Functional Theory Methods on GPGPU Accelerators
Master Thesis, B. M. Gosswami, 01.05.2011 - 31.10.2011

2009

Algorithmen-basierte Fehlertoleranz in Many-Core Systemen
Softwarepraktikum, D. Pfander, S. Kanis, 01.08.2009 - 28.02.2010


(Disclaimer: the respective users themselves are responsible for the contents of the material presented in their pages. Statements or opinions on these pages are by no means expressed in behalf of the University or of its departments!)