Position within the page tree

Heterogeneous Computing

Simulation on Reconfigurable Heterogeneous Computer Architectures

06.2008 - 10.2017, SimTech Cluster of Excellence

Overview

Since the beginning of the DFG Cluster of Excellence "Simulation Technology" (SimTech) (EXC 310/1) at the University of Stuttgart in 2008, the Institute of Computer Architecture and Computer Engineering (ITI, RA) is an active part of the research within the Stuttgart Research Center for Simulation Technology (SRC SimTech). The institute's research includes the development of fault tolerant simulation algorithms for new, tightly-coupled many-core computer architectures like GPUs, the acceleration of existing simulations on such architectures, as well as the mapping of complex simulation applications to innovative reconfigurable heterogeneous computer architectures.

Within the research cluster, Hans-Joachim Wunderlich acts as a principal investigator (PI) and he co-coordinates the research activities of the SimTech Project Network PN2 "High-Performance Simulation across Computer Architectures". This project network is unique in terms of its interdisciplinary nature and its interfaces between the participating researchers and projects. Scientists from computer science, chemistry, physics and chemical engineering work together to develop and provide new solutions for some of the major challenges in simulation technology. The classes of computational problems treated within project network PN2 comprise quantum mechanics, molecular mechanics, electronic structure methods, molecular dynamics, Markov-chain Monte-Carlo simulations and polarizable force fields.

Project Overview

The ongoing semiconductor technology scaling impels the integration of highly diversified computer architectures with different kinds of processing cores, communication channels and embedded memories. Besides classic CPU and data-parallel GPU cores, runtime reconfigurable units emerge as integrated part of such architectures.

Simulation technology will benefit significantly from these emerging computer architectures since they will close the gap between serial or coarse-grained parallel tasks on CPU cores and highly data-parallel tasks on GPU cores. Reconfigurable units can change their functionality at runtime and hence adapt dynamically to the needs of simulation applications. However, the upcoming architectural advances will be accompanied by a significant increase of complexity on the software side. For example, the shift from serial programming to parallel programming of multiple processors (CPUs), or the use of graphics processing units (GPUs) introduces new programming paradigms, which increasingly reflect and exhibit particular aspects of the underlying hardware structures.

Consequently, algorithms have to be analyzed with a much stronger focus to the available hardware structures. Furthermore, algorithmic parts have to be identified and isolated to deduce compute modules for optimal matching architectures. The combination of different computing resources in a reconfigurable heterogeneous architecture demands sophisticated loadbalancing and adaption to changing system conditions (e.g. changing availability of computing resources).

In this project, we develop new methods that enable the direct mapping of simulation applications to innovative reconfigurable heterogeneous computer architctures. This includes methods for the assisted analysis and partitioning of algorithms, the deduction and design of compute modules and integrated software infrastructure for runtime load-balancing and adaption.

Project Overview

Computer simulations drive innovations in science and industry, and they are gaining more and more importance. However, their extraordinary high computational demand generates significant challenges for contemporary computing systems. Typical high-performance computing systems, which provide sufficient performance and high reliability, are extremely expensive.

Modern many-core processor architectures like graphics processors (GPUs) offer high computational performance at very low costs, and they enable scientific simulation applications on the researcher's desktop. However, being designed for the graphics mass-market, GPUs offer only limited fault tolerance measures (e.g. ECC-protected memory) to cope with the increasing vulnerability to transient effects (soft errors) and other reliability threats. To fulfill the strict reliability requirements in scientific computing and simulation technology, appropriate fault tolerance measures have to be integrated into simulation algorithms and applications on GPUs. Algorithm-Based Fault Tolerance has the potential to meet these requirements.

The research within the first project phase (Mapping Simulation Algorithms to NoC-MPSoC Computers) concentrates on the development of fault tolerant algorithms for GPU architectures and their integration into scientific simulation applications. Moreover, sophisticated simulations tasks from partners within the Cluster and Project Network PN2 are analyzed and adapted or re-designed for GPU architectures.

Acceleration of Monte-Carlo Molecular Simulations on Hybrid Computing Architectures

Stochastic-based simulation methods play an important role since they allow the solution of problems that tend to be very hard to be solved by deterministic algorithms. For search and optimization problems, evolutionary and genetic algorithms have been applied. Simulated annealing has been used to localize globally optimal problem solutions. One of the most important classes of such techniques are Monte Carlo (MC) methods, which approximate solutions for quantitative problems, with multiple coupled degrees of freedom, by random sampling. The problem, which is targeted in this work, is the parallelization of molecular simulations of the grand canonical ensemble, from the field of thermodynamics, on hybrid computing systems.

It can be shown, that these simulations are an instance of a special case of MC methods, the Markov-Chain Monte-Carlo (MCMC) simulation. Being the core of many tasks in thermodynamics, Monte-Carlo Molecular Simulations often forms the major bottleneck, which is typically tackled by coarse-grained parallelization and distribution of simulation instances on clusters or workstation grids. Commonly, this is associated with considerable overhead and costs. In our interdisciplinary collaboration with the Institute of Thermodynamics and Thermal Process Engineering we developed new methods for the parallel mapping and implementation of Markov-Chain Monte-Carlo molecular simulations on hybrid CPU-GPGPU systems. The mapping is characterized by data-parallel energy calculations and speculative computations in each Monte-Carlo step. The mapping is able to directly utilize the different architectural characteristics of hybrid computing systems.

It was shown that the parallel mapping achieves speedups of more than 87x. This significant speedup enables MCMC molecular simulations at workstation-level and the investigation of problem sizes, which previously required computing clusters or grid-based systems.

Evaluation of the Apoptotic Receptor-Clustering Process

Apoptosis, the prototype of programmed cell death allows multi-cellular organisms to remove damaged or infected cells. A profound understanding of the molecular mechanisms involved in this important physiological process is required for the control of cell death, especially focused on the initiation of the apoptotic signaling pathways. One of these signaling pathways is the extrinsic pro-apoptotic signaling pathway, which is initiated by signal competent clusters of e.g. tumor necrosis factor (TNF) receptors and the corresponding TNF ligands.

In recent years, different mathematical models have been developed in order to describe and simulate the formation of signal competent clusters consisting of receptors and their ligands. In our interdisciplinary collaboration with the Institute of Analysis, Dynamics, and Modeling and the Institute of Cell Biology and Immunology, we developed an efficient, parallel mapping of a novel mathematical model to a modern GPGPU many-core architecture. This model evaluates the apoptotic receptor-clustering on the cell membrane. Besides the translation of the receptors and ligands, the model additionally incorporates rotations. The model is based on a derivation of a nonlinearly coupled system of stochastic differential equations for the motion of the receptors and ligands. The system is solved by a Euler-Maruyama approximation. Due to the high costs of the simulation, the tailoring step to GPU many-core architectures was inevitable. Our efficient, parallel mapping exploits fine-grained intra-GPU parallelism with multiple active simulation instances per GPGPU device, as well as coarse-grained inter-GPU parallelism by utilizing all available GPGPU devices within a system.

The parallel evaluation algorithm for the mathematical model yields peak speedups of up to 400x relative to a grid-based implementation on a multi-core CPU. This finally reduces the computation times from months to days or hours.

Activities

H.-J. Wunderlich: "Fault Tolerance Meets Diagnosis", Keynote at the 21st IEEE International On-Line Testing Symposium (IOLTS), Elia, Halkidiki, Greece, July 6-8, 2015

Journals and Conference Proceedings

2017
1. Energy-efficient and Error-resilient Iterative Solvers for Approximate Computing. Alexander Schöll; Claus Braun and Hans-Joachim Wunderlich. In Proceedings of the 23rd IEEE International Symposium on On-Line Testing and Robust System Design (IOLTS′17), Thessaloniki, Greece, 2017, pp. 237–239. DOI: https://doi.org/10.1109/IOLTS.2017.8046244
  Abstract
  Iterative solvers like the Preconditioned Conjugate Gradient (PCG) method are widely-used in compute-intensive domains including science and engineering that often impose tight accuracy demands on computational results. At the same time, the error resilience of such solvers may change in the course of the iterations, which requires careful adaption of the induced approximation errors to reduce the energy demand while avoiding unacceptable results. A novel adaptive method is presented that enables iterative Preconditioned Conjugate Gradient (PCG) solvers on Approximate Computing hardware with high energy efficiency while still providing correct results. The method controls the underlying precision at runtime using a highly efficient fault tolerance technique that monitors the induced error and the quality of intermediate computational results.
  BibTeX
  @inproceedings{SchoeBW2017, abstract = {Iterative solvers like the Preconditioned Conjugate Gradient (PCG) method are widely-used in compute-intensive domains including science and engineering that often impose tight accuracy demands on computational results. At the same time, the error resilience of such solvers may change in the course of the iterations, which requires careful adaption of the induced approximation errors to reduce the energy demand while avoiding unacceptable results. A novel adaptive method is presented that enables iterative Preconditioned Conjugate Gradient (PCG) solvers on Approximate Computing hardware with high energy efficiency while still providing correct results. The method controls the underlying precision at runtime using a highly efficient fault tolerance technique that monitors the induced error and the quality of intermediate computational results.}, abteilung = {ra}, address = {Thessaloniki, Greece}, author = {Sch{\"o}ll, Alexander and Braun, Claus and Wunderlich, Hans-Joachim}, booktitle = {Proceedings of the 23rd IEEE International Symposium on On-Line Testing and Robust System Design (IOLTS'17)}, day = {3--5}, doi = {10.1109/IOLTS.2017.8046244}, location = {Thessaloniki, Greece}, month = {07}, owner = {hellmelr}, pages = {237--239}, project = {SimTech}, title = {{Energy-efficient and Error-resilient Iterative Solvers for Approximate Computing}}, url = {https://www.iti.uni-stuttgart.de/fileadmin/rami/files/publications/2017/IOLTS_SchoeBW2017.pdf}, year = 2017 }
  Link
  https://www.iti.uni-stuttgart.de/fileadmin/rami/files/publications/2017/IOLTS_SchoeBW2017.pdf
  DOI
  10.1109/IOLTS.2017.8046244
2016
1. Applying Efficient Fault Tolerance to Enable the Preconditioned Conjugate Gradient Solver on Approximate Computing Hardware. Alexander Schöll; Claus Braun and Hans-Joachim Wunderlich. In Proceedings of the IEEE International Symposium on Defect and Fault Tolerance in VLSI and Nanotechnology Systems (DFT′16), University of Connecticut, USA, 2016, pp. 21–26. DOI: https://doi.org/10.1109/DFT.2016.7684063
  Abstract
  A new technique is presented that allows to execute the preconditioned conjugate gradient (PCG) solver on approximate hardware while ensuring correct solver results. This technique expands the scope of approximate computing to scientific and engineering applications. The changing error resilience of PCG during the solving process is exploited by different levels of approximation which trade off numerical accuracy and hardware utilization. Such approximation levels are determined at runtime by periodically estimating the error resilience. An efficient fault tolerance technique allows reductions in hardware utilization by ensuring the continued exploitation of maximum allowed energy-accuracy trade-offs. Experimental results show that the hardware utilization is reduced on average by 14.5% and by up to 41.0% compared to executing PCG on accurate hardware.
  BibTeX
  @inproceedings{SchoeBW2016, abstract = {A new technique is presented that allows to execute the preconditioned conjugate gradient (PCG) solver on approximate hardware while ensuring correct solver results. This technique expands the scope of approximate computing to scientific and engineering applications. The changing error resilience of PCG during the solving process is exploited by different levels of approximation which trade off numerical accuracy and hardware utilization. Such approximation levels are determined at runtime by periodically estimating the error resilience. An efficient fault tolerance technique allows reductions in hardware utilization by ensuring the continued exploitation of maximum allowed energy-accuracy trade-offs. Experimental results show that the hardware utilization is reduced on average by 14.5% and by up to 41.0% compared to executing PCG on accurate hardware.}, abteilung = {ra}, address = {University of Connecticut, USA}, author = {Sch{\"o}ll, Alexander and Braun, Claus and Wunderlich, Hans-Joachim}, award = {DFT 2016 Best Paper Award}, booktitle = {Proceedings of the IEEE International Symposium on Defect and Fault Tolerance in VLSI and Nanotechnology Systems (DFT'16)}, day = {19--20}, doi = {10.1109/DFT.2016.7684063}, location = {University of Connecticut, USA}, month = {09}, owner = {hellmelr}, pages = {21-26}, project = {SimTech}, title = {{Applying Efficient Fault Tolerance to Enable the Preconditioned Conjugate Gradient Solver on Approximate Computing Hardware}}, url = {https://www.iti.uni-stuttgart.de/fileadmin/rami/files/publications/2016/DFT_SchoeBW2016.pdf}, year = 2016 }
  Link
  https://www.iti.uni-stuttgart.de/fileadmin/rami/files/publications/2016/DFT_SchoeBW2016.pdf
  DOI
  10.1109/DFT.2016.7684063
2. Pushing the Limits: How Fault Tolerance Extends the Scope of Approximate Computing. Hans-Joachim Wunderlich; Claus Braun and Alexander Schöll. In Proceedings of the 22nd IEEE International Symposium on On-Line Testing and Robust System Design (IOLTS′16), Sant Feliu de Guixols, Catalunya, Spain, 2016, pp. 133–136. DOI: https://doi.org/10.1109/IOLTS.2016.7604686
  Abstract
  Approximate computing in hardware and software promises significantly improved computational performance combined with very low power and energy consumption. This goal is achieved by both relaxing strict requirements on accuracy and precision, and by allowing a deviating behavior from exact Boolean specifications to a certain extent. Today, approximate computing is often limited to applications with a certain degree of inherent error tolerance, where perfect computational results are not always required. However, in order to fully utilize its benefits, the scope of applications has to be significantly extended to other compute-intensive domains including science and engineering. To meet the often rather strict quality and reliability requirements for computational results in these domains, the use of appropriate characterization and fault tolerance measures is highly required. In this paper, we evaluate some of the available techniques and how they may extend the scope of application for approximate computing.
  BibTeX
  @inproceedings{WundeBS2016, abstract = {Approximate computing in hardware and software promises significantly improved computational performance combined with very low power and energy consumption. This goal is achieved by both relaxing strict requirements on accuracy and precision, and by allowing a deviating behavior from exact Boolean specifications to a certain extent. Today, approximate computing is often limited to applications with a certain degree of inherent error tolerance, where perfect computational results are not always required. However, in order to fully utilize its benefits, the scope of applications has to be significantly extended to other compute-intensive domains including science and engineering. To meet the often rather strict quality and reliability requirements for computational results in these domains, the use of appropriate characterization and fault tolerance measures is highly required. In this paper, we evaluate some of the available techniques and how they may extend the scope of application for approximate computing.}, abteilung = {ra}, address = {Sant Feliu de Guixols, Catalunya, Spain}, author = {Wunderlich, Hans-Joachim and Braun, Claus and Sch{\"o}ll, Alexander}, booktitle = {Proceedings of the 22nd IEEE International Symposium on On-Line Testing and Robust System Design (IOLTS'16)}, day = {4--6}, doi = {10.1109/IOLTS.2016.7604686}, location = {Sant Feliu de Guixols, Catalunya, Spain}, month = {07}, owner = {hellmelr}, pages = {133--136}, project = {SimTech}, title = {{Pushing the Limits: How Fault Tolerance Extends the Scope of Approximate Computing}}, url = {https://www.iti.uni-stuttgart.de/fileadmin/rami/files/publications/2016/IOLTS_WundeBS2016.pdf}, year = 2016 }
  Link
  https://www.iti.uni-stuttgart.de/fileadmin/rami/files/publications/2016/IOLTS_WundeBS2016.pdf
  DOI
  10.1109/IOLTS.2016.7604686
3. Efficient Algorithm-Based Fault Tolerance for Sparse Matrix Operations. Alexander Schöll; Claus Braun; Michael A. Kochte and Hans-Joachim Wunderlich. In Proceedings of the 46th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN′16), Toulouse, France, 2016, pp. 251–262. DOI: https://doi.org/10.1109/DSN.2016.31
  Abstract
  We propose a fault tolerance approach for sparse matrix operations that detects and implicitly locates errors in the results for efficient local correction. This approach reduces the runtime overhead for fault tolerance and provides high error coverage. Existing algorithm-based fault tolerance approaches for sparse matrix operations detect and correct errors, but they often rely on expensive error localization steps. General checkpointing schemes can induce large recovery cost for high error rates. For sparse matrix-vector multiplications, experimental results show an average reduction in runtime overhead of 43.8%, while the error coverage is on average improved by 52.2% compared to related work. The practical applicability is demonstrated in a case study using the iterative Preconditioned Conjugate Gradient solver. When scaling the error rate by four orders of magnitude, the average runtime overhead increases only by 31.3% compared to low error rates.
  BibTeX
  @inproceedings{SchoeBKW2016, abstract = {We propose a fault tolerance approach for sparse matrix operations that detects and implicitly locates errors in the results for efficient local correction. This approach reduces the runtime overhead for fault tolerance and provides high error coverage. Existing algorithm-based fault tolerance approaches for sparse matrix operations detect and correct errors, but they often rely on expensive error localization steps. General checkpointing schemes can induce large recovery cost for high error rates. For sparse matrix-vector multiplications, experimental results show an average reduction in runtime overhead of 43.8%, while the error coverage is on average improved by 52.2% compared to related work. The practical applicability is demonstrated in a case study using the iterative Preconditioned Conjugate Gradient solver. When scaling the error rate by four orders of magnitude, the average runtime overhead increases only by 31.3% compared to low error rates.}, abteilung = {ra}, address = {Toulouse, France}, author = {Sch{\"o}ll, Alexander and Braun, Claus and Kochte, Michael A. and Wunderlich, Hans-Joachim}, booktitle = {Proceedings of the 46th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN'16)}, day = {28 June--1 July}, doi = {10.1109/DSN.2016.31}, location = {Toulouse, France}, owner = {hellmelr}, pages = {251--262}, project = {SimTech}, title = {{Efficient Algorithm-Based Fault Tolerance for Sparse Matrix Operations}}, url = {https://www.iti.uni-stuttgart.de/fileadmin/rami/files/publications/2016/DSN_SchoeBKW2016.pdf}, year = 2016 }
  Link
  https://www.iti.uni-stuttgart.de/fileadmin/rami/files/publications/2016/DSN_SchoeBKW2016.pdf
  DOI
  10.1109/DSN.2016.31
2015
1. Low-Overhead Fault-Tolerance for the Preconditioned Conjugate Gradient Solver. Alexander Schöll; Claus Braun; Michael A. Kochte and Hans-Joachim Wunderlich. In Proceedings of the International Symposium on Defect and Fault Tolerance in VLSI and Nanotechnology Systems (DFT′15), Amherst, Massachusetts, USA, 2015, pp. 60–65. DOI: https://doi.org/10.1109/DFT.2015.7315136
  Abstract
  Linear system solvers are an integral part for many different compute-intensive applications and they benefit from the compute power of heterogeneous computer architectures. However, the growing spectrum of reliability threats for such nano-scaled CMOS devices makes the integration of fault tolerance mandatory. The preconditioned conjugate gradient (PCG) method is one widely used solver as it finds solutions typically faster compared to direct methods. Although this iterative approach is able to tolerate certain errors, latest research shows that the PCG solver is still vulnerable to transient effects. Even single errors, for instance, caused by marginal hardware, harsh environments, or particle radiation, can considerably affect execution times, or lead to silent data corruption. In this work, a novel fault-tolerant PCG solver with extremely low runtime overhead is proposed. Since the error detection method does not involve expensive operations, it scales very well with increasing problem sizes. In case of errors, the method selects between three different correction methods according to the identified error. Experimental results show a runtime overhead for error detection ranging only from 0.04% to 1.70%.
  BibTeX
  @inproceedings{SchoeBKW2015a, abstract = {Linear system solvers are an integral part for many different compute-intensive applications and they benefit from the compute power of heterogeneous computer architectures. However, the growing spectrum of reliability threats for such nano-scaled CMOS devices makes the integration of fault tolerance mandatory. The preconditioned conjugate gradient (PCG) method is one widely used solver as it finds solutions typically faster compared to direct methods. Although this iterative approach is able to tolerate certain errors, latest research shows that the PCG solver is still vulnerable to transient effects. Even single errors, for instance, caused by marginal hardware, harsh environments, or particle radiation, can considerably affect execution times, or lead to silent data corruption. In this work, a novel fault-tolerant PCG solver with extremely low runtime overhead is proposed. Since the error detection method does not involve expensive operations, it scales very well with increasing problem sizes. In case of errors, the method selects between three different correction methods according to the identified error. Experimental results show a runtime overhead for error detection ranging only from 0.04% to 1.70%. }, abteilung = {ra}, address = {Amherst, Massachusetts, USA}, author = {Sch{\"o}ll, Alexander and Braun, Claus and Kochte, Michael A. and Wunderlich, Hans-Joachim}, booktitle = {Proceedings of the International Symposium on Defect and Fault Tolerance in VLSI and Nanotechnology Systems (DFT'15)}, day = {12--14}, doi = {10.1109/DFT.2015.7315136}, location = {Amherst, Massachusetts, USA}, month = {10}, owner = {hellmelr}, pages = {60-65}, project = {SimTech}, title = {{Low-Overhead Fault-Tolerance for the Preconditioned Conjugate Gradient Solver}}, url = {https://www.iti.uni-stuttgart.de/fileadmin/rami/files/publications/2015/DFTS_SchoeBKW2015.pdf}, year = 2015 }
  Link
  https://www.iti.uni-stuttgart.de/fileadmin/rami/files/publications/2015/DFTS_SchoeBKW2015.pdf
  DOI
  10.1109/DFT.2015.7315136
2. Efficient On-Line Fault-Tolerance for the Preconditioned Conjugate Gradient Method. Alexander Schöll; Claus Braun; Michael A. Kochte and Hans-Joachim Wunderlich. In Proceedings of the 21st IEEE International On-Line Testing Symposium (IOLTS′15), Elia, Halkidiki, Greece, 2015, pp. 95–100. DOI: https://doi.org/10.1109/IOLTS.2015.7229839
  Abstract
  Linear system solvers are key components of many scientific applications and they can benefit significantly from modern heterogeneous computer architectures. However, such nano-scaled CMOS devices face an increasing number of reliability threats, which make the integration of fault tolerance mandatory. The preconditioned conjugate gradient method (PCG) is a very popular solver since it typically finds solutions faster than direct methods, and it is less vulnerable to transient effects. However, as latest research shows, the vulnerability is still considerable. Even single errors caused, for instance, by marginal hardware, harsh operating conditions or particle radiation can increase execution times considerably or corrupt solutions without indication. In this work, a novel and highly efficient fault-tolerant PCG method is presented. The method applies only two inner products to reliably detect errors. In case of errors, the method automatically selects between roll-back and efficient on-line correction. This significantly reduces the error detection overhead and expensive re-computations.
  BibTeX
  @inproceedings{SchoeBKW2015, abstract = {Linear system solvers are key components of many scientific applications and they can benefit significantly from modern heterogeneous computer architectures. However, such nano-scaled CMOS devices face an increasing number of reliability threats, which make the integration of fault tolerance mandatory. The preconditioned conjugate gradient method (PCG) is a very popular solver since it typically finds solutions faster than direct methods, and it is less vulnerable to transient effects. However, as latest research shows, the vulnerability is still considerable. Even single errors caused, for instance, by marginal hardware, harsh operating conditions or particle radiation can increase execution times considerably or corrupt solutions without indication. In this work, a novel and highly efficient fault-tolerant PCG method is presented. The method applies only two inner products to reliably detect errors. In case of errors, the method automatically selects between roll-back and efficient on-line correction. This significantly reduces the error detection overhead and expensive re-computations.}, abteilung = {ra}, address = {Elia, Halkidiki, Greece}, author = {Sch{\"o}ll, Alexander and Braun, Claus and Kochte, Michael A. and Wunderlich, Hans-Joachim}, booktitle = {Proceedings of the 21st IEEE International On-Line Testing Symposium (IOLTS'15)}, day = {6--8}, doi = {10.1109/IOLTS.2015.7229839}, location = {Elia, Halkidiki, Greece}, month = {07}, owner = {hellmelr}, pages = {95--100}, project = {SimTech}, title = {{Efficient On-Line Fault-Tolerance for the Preconditioned Conjugate Gradient Method}}, url = {https://www.iti.uni-stuttgart.de/fileadmin/rami/files/publications/2015/IOLTS_SchoeBKW2015.pdf}, year = 2015 }
  Link
  https://www.iti.uni-stuttgart.de/fileadmin/rami/files/publications/2015/IOLTS_SchoeBKW2015.pdf
  DOI
  10.1109/IOLTS.2015.7229839
2014
1. A-ABFT: Autonomous Algorithm-Based Fault Tolerance for Matrix Multiplications on Graphics Processing Units. Claus Braun; Sebastian Halder and Hans-Joachim Wunderlich. In Proceedings of the 44th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN′14), Atlanta, Georgia, USA, 2014, pp. 443–454. DOI: https://doi.org/10.1109/DSN.2014.48
  Abstract
  Graphics processing units (GPUs) enable large-scale scientific applications and simulations on the desktop. To allow scientific computing on GPUs with high performance and reliability requirements, the application of software-based fault tolerance is attractive. Algorithm-Based Fault Tolerance (ABFT) protects important scientific operations like matrix multiplications. However, the application to floating-point operations necessitates the runtime classification of errors into inevitable rounding errors, allowed compute errors in the magnitude of such rounding errors, and into critical errors that are larger than those and not tolerable. Hence, an ABFT scheme needs suitable rounding error bounds to detect errors reliably. The determination of such error bounds is a highly challenging task, especially since it has to be integrated tightly into the algorithm and executed autonomously with low performance overhead. In this work, A-ABFT for matrix multiplications on GPUs is introduced, which is a new, parallel ABFT scheme that determines rounding error bounds autonomously at runtime with low performance overhead and high error coverage.
  BibTeX
  @inproceedings{BraunHW2014, abstract = {Graphics processing units (GPUs) enable large-scale scientific applications and simulations on the desktop. To allow scientific computing on GPUs with high performance and reliability requirements, the application of software-based fault tolerance is attractive. Algorithm-Based Fault Tolerance (ABFT) protects important scientific operations like matrix multiplications. However, the application to floating-point operations necessitates the runtime classification of errors into inevitable rounding errors, allowed compute errors in the magnitude of such rounding errors, and into critical errors that are larger than those and not tolerable. Hence, an ABFT scheme needs suitable rounding error bounds to detect errors reliably. The determination of such error bounds is a highly challenging task, especially since it has to be integrated tightly into the algorithm and executed autonomously with low performance overhead. In this work, A-ABFT for matrix multiplications on GPUs is introduced, which is a new, parallel ABFT scheme that determines rounding error bounds autonomously at runtime with low performance overhead and high error coverage.}, abteilung = {ra}, address = {Atlanta, Georgia, USA}, author = {Braun, Claus and Halder, Sebastian and Wunderlich, Hans-Joachim}, booktitle = {Proceedings of the 44th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN'14)}, day = {23--26}, doi = {10.1109/DSN.2014.48}, location = {Atlanta, Georgia, USA}, month = {06}, owner = {kakaraaa}, pages = {443--454}, project = {SimTech}, title = {{A-ABFT: Autonomous Algorithm-Based Fault Tolerance for Matrix Multiplications on Graphics Processing Units}}, url = {https://www.iti.uni-stuttgart.de/fileadmin/rami/files/publications/2014/DSN_BraunHH2014.pdf}, year = 2014 }
  Link
  https://www.iti.uni-stuttgart.de/fileadmin/rami/files/publications/2014/DSN_BraunHH2014.pdf
  DOI
  10.1109/DSN.2014.48
2013
1. Efficacy and Efficiency of Algorithm-Based Fault Tolerance on GPUs. Hans-Joachim Wunderlich; Claus Braun and Sebastian Halder. In Proceedings of the IEEE International On-Line Testing Symposium (IOLTS′13), Crete, Greece, 2013, pp. 240–243. DOI: https://doi.org/10.1109/IOLTS.2013.6604090
  Abstract
  Computer simulations drive innovations in science and industry, and they are gaining more and more importance. However, their high computational demand generates extraordinary challenges for computing systems. Typical highperformance computing systems, which provide sufficient performance and high reliability, are extremly expensive. Modern GPUs offer high performance at very low costs, and they enable simulation applications on the desktop. However, they are increasingly prone to transient effects and other reliability threats. To fulfill the strict reliability requirements in scientific computing and simulation technology, appropriate fault tolerance measures have to be integrated into simulation applications for GPUs. Algorithm-Based Fault Tolerance on GPUs has the potential to meet these requirements. In this work we investigate the efficiency and the efficacy of ABFT for matrix operations on GPUs. We compare ABFT against fault tolerance schemes that are based on redundant computations and we evaluate its error detection capabilities
  BibTeX
  @inproceedings{WundeBH2013, abstract = {Computer simulations drive innovations in science and industry, and they are gaining more and more importance. However, their high computational demand generates extraordinary challenges for computing systems. Typical highperformance computing systems, which provide sufficient performance and high reliability, are extremly expensive. Modern GPUs offer high performance at very low costs, and they enable simulation applications on the desktop. However, they are increasingly prone to transient effects and other reliability threats. To fulfill the strict reliability requirements in scientific computing and simulation technology, appropriate fault tolerance measures have to be integrated into simulation applications for GPUs. Algorithm-Based Fault Tolerance on GPUs has the potential to meet these requirements. In this work we investigate the efficiency and the efficacy of ABFT for matrix operations on GPUs. We compare ABFT against fault tolerance schemes that are based on redundant computations and we evaluate its error detection capabilities}, abteilung = {ra}, address = {Crete, Greece}, author = {Wunderlich, Hans-Joachim and Braun, Claus and Halder, Sebastian}, booktitle = {Proceedings of the IEEE International On-Line Testing Symposium (IOLTS'13)}, day = {8--10}, doi = {10.1109/IOLTS.2013.6604090}, location = {Crete, Greece}, month = {07}, owner = {kakaraaa}, pages = {240--243}, project = {SimTech}, title = {{Efficacy and Efficiency of Algorithm-Based Fault Tolerance on GPUs}}, url = {https://www.iti.uni-stuttgart.de/fileadmin/rami/files/publications/2013/IOLTS_WundeBH2013.pdf}, year = 2013 }
  Link
  https://www.iti.uni-stuttgart.de/fileadmin/rami/files/publications/2013/IOLTS_WundeBH2013.pdf
  DOI
  10.1109/IOLTS.2013.6604090
2012
1. Parallel Simulation of Apoptotic Receptor-Clustering on GPGPU Many-Core Architectures. Claus Braun; Markus Daub; Alexander Schöll; Guido Schneider and Hans-Joachim Wunderlich. In Proceedings of the IEEE International Conference on Bioinformatics and Biomedicine (BIBM′12), Philadelphia, Pennsylvania, USA, 2012, pp. 1–6. DOI: https://doi.org/10.1109/BIBM.2012.6392661
  Abstract
  Apoptosis, the programmed cell death, is a physiological process that handles the removal of unwanted or damaged cells in living organisms. The process itself is initiated by signaling through tumor necrosis factor (TNF) receptors and ligands, which form clusters on the cell membrane. The exact function of this process is not yet fully understood and currently subject of basic research. Different mathematical models have been developed to describe and simulate the apoptotic receptor-clustering. In this interdisciplinary work, a previously introduced model of the apoptotic receptor-clustering has been extended by a new receptor type to allow a more precise description and simulation of the signaling process. Due to the high computational requirements of the model, an ef?cient algorithmic mapping to a modern many-core GPGPU architecture has been developed. Such architectures enable high-performance computing (HPC) simulation tasks on the desktop at low costs. The developed mapping reduces average simulation times from months to days (peak speedup of 256x), allowing the productive use of the model in research.
  BibTeX
  @inproceedings{BraunDSSW2012, abstract = {Apoptosis, the programmed cell death, is a physiological process that handles the removal of unwanted or damaged cells in living organisms. The process itself is initiated by signaling through tumor necrosis factor (TNF) receptors and ligands, which form clusters on the cell membrane. The exact function of this process is not yet fully understood and currently subject of basic research. Different mathematical models have been developed to describe and simulate the apoptotic receptor-clustering. In this interdisciplinary work, a previously introduced model of the apoptotic receptor-clustering has been extended by a new receptor type to allow a more precise description and simulation of the signaling process. Due to the high computational requirements of the model, an ef?cient algorithmic mapping to a modern many-core GPGPU architecture has been developed. Such architectures enable high-performance computing (HPC) simulation tasks on the desktop at low costs. The developed mapping reduces average simulation times from months to days (peak speedup of 256x), allowing the productive use of the model in research.}, abteilung = {ra}, address = {Philadelphia, Pennsylvania, USA}, author = {Braun, Claus and Daub, Markus and Sch{\"o}ll, Alexander and Schneider, Guido and Wunderlich, Hans-Joachim}, booktitle = {Proceedings of the IEEE International Conference on Bioinformatics and Biomedicine (BIBM'12)}, day = {4--7}, doi = {10.1109/BIBM.2012.6392661}, language = {English}, location = {Philadelphia, Pennsylvania, USA}, month = {10}, owner = {Thomas}, pages = {1--6}, project = {SimTech}, title = {{Parallel Simulation of Apoptotic Receptor-Clustering on GPGPU Many-Core Architectures}}, url = {https://www.iti.uni-stuttgart.de//fileadmin/rami/files/publications/2012/BIBM_BraunDSSW2012.pdf}, year = 2012 }
  Link
  https://www.iti.uni-stuttgart.de//fileadmin/rami/files/publications/2012/BIBM_BraunDSSW2012.pdf
  DOI
  10.1109/BIBM.2012.6392661
2. Acceleration of Monte-Carlo Molecular Simulations on Hybrid Computing Architectures. Claus Braun; Stefan Holst; Hans-Joachim Wunderlich; Juan Manuel Castillo and Joachim Gross. In Proceedings of the 30th IEEE International Conference on Computer Design (ICCD′12), Montreal, Canada, 2012, pp. 207–212. DOI: https://doi.org/10.1109/ICCD.2012.6378642
  Abstract
  Markov-Chain Monte-Carlo (MCMC) methods are an important class of simulation techniques, which execute a sequence of simulation steps, where each new step depends on the previous ones. Due to this fundamental dependency, MCMC methods are inherently hard to parallelize on any architecture. The upcoming generations of hybrid CPU/GPGPU architectures with their multi-core CPUs and tightly coupled many-core GPGPUs provide new acceleration opportunities especially for MCMC methods, if the new degrees of freedom are exploited correctly. In this paper, the outcomes of an interdisciplinary collaboration are presented, which focused on the parallel mapping of a MCMC molecular simulation from thermodynamics to hybrid CPU/GPGPU computing systems. While the mapping is designed for upcoming hybrid architectures, the implementation of this approach on an NVIDIA Tesla system already leads to a substantial speedup of more than 87x despite the additional communication overheads.
  BibTeX
  @inproceedings{BraunHWCG2012, abstract = {Markov-Chain Monte-Carlo (MCMC) methods are an important class of simulation techniques, which execute a sequence of simulation steps, where each new step depends on the previous ones. Due to this fundamental dependency, MCMC methods are inherently hard to parallelize on any architecture. The upcoming generations of hybrid CPU/GPGPU architectures with their multi-core CPUs and tightly coupled many-core GPGPUs provide new acceleration opportunities especially for MCMC methods, if the new degrees of freedom are exploited correctly. In this paper, the outcomes of an interdisciplinary collaboration are presented, which focused on the parallel mapping of a MCMC molecular simulation from thermodynamics to hybrid CPU/GPGPU computing systems. While the mapping is designed for upcoming hybrid architectures, the implementation of this approach on an NVIDIA Tesla system already leads to a substantial speedup of more than 87x despite the additional communication overheads.}, abteilung = {ra}, address = {Montreal, Canada}, author = {Braun, Claus and Holst, Stefan and Wunderlich, Hans-Joachim and Castillo, Juan Manuel and Gross, Joachim}, booktitle = {Proceedings of the 30th IEEE International Conference on Computer Design (ICCD'12)}, day = {30 September--3 October}, doi = {10.1109/ICCD.2012.6378642}, language = {English}, location = {Montreal, Canada}, owner = {Thomas}, pages = {207--212}, project = {SimTech}, publisher = {IEEE Computer Society}, title = {{Acceleration of Monte-Carlo Molecular Simulations on Hybrid Computing Architectures}}, url = {https://www.iti.uni-stuttgart.de/fileadmin/rami/files/publications/2012/ICCD_BraunHWCG2012.pdf}, year = 2012 }
  Link
  https://www.iti.uni-stuttgart.de/fileadmin/rami/files/publications/2012/ICCD_BraunHWCG2012.pdf
  DOI
  10.1109/ICCD.2012.6378642
2010
1. Algorithmen-basierte Fehlertoleranz für Many-Core-Architekturen; Algorithm-based Fault-Tolerance on Many-Core Architectures. Claus Braun and Hans-Joachim Wunderlich. it - Information Technology 52, (August 2010), pp. 209–215. DOI: https://doi.org/10.1524/itit.2010.0593
  Abstract
  Moderne Many-Core-Architekturen bieten ein sehr hohes Potenzial an Rechenleistung. Dies macht sie besonders für Anwendungen aus dem Bereich des wissenschaftlichen Hochleistungsrechnens und der Simulationstechnik attraktiv. Die Architekturen folgen dabei einem Ausführungsparadigma, das sich am besten durch den Begriff “Many-Threading” beschreiben lässt. Wie alle nanoelektronischen Halbleiterschaltungen leiden auch Many-Core-Prozessoren potentiell unter störenden Einflüssen von transienten Fehlern (soft errors) und diversen Arten von Variationen. Diese Faktoren können die Zuverlässigkeit von Systemen negativ beeinflussen und erfordern Fehlertoleranz auf allen Ebenen, von der Hardware bis zur Software. Auf der Softwareseite stellt die Algorithmen-basierte Fehlertoleranz (ABFT) eine ausgereifte Technik zur Verbesserung der Zuverlässigkeit dar. Der Aufwand für die Anpassung dieser Technik an moderne Many-Threading-Architekturen darf jedoch keinesfalls unterschätzt werden. In diesem Beitrag wird eine effiziente und fehlertolerante Abbildung der Matrixmultiplikation auf eine moderne Many-Core-Architektur präsentiert. Die Fehlertoleranz ist dabei integraler Bestandteil der Abbildung und wird durch ein ABFT-Schema realisiert, das die Leistung nur unwesentlich beeinträchtigt. Modern many-core architectures provide a high computational potential, which makes them particularly interesting for applications from the fields of scientific high-performance computing and simulation technology. The execution paradigm of these architectures is best described as “Many-Threading”. Like all nano-scaled semiconductor devices, many-core processors are prone to transient errors (soft errors) and different kinds of variations that can have severe impact on the reliability of such systems. Therefore, fault-tolerance has to be incorporated at all levels, from the hardware up to the software. On the software side, Algorithm-based Fault Tolerance (ABFT) is a mature technique to improve the reliability. However, significant effort is required to adapt this technique to modern many-threading architectures. In this article, an efficient and fault-tolerant mapping of the matrix multiplication to a modern many-core architecture is presented. Fault-tolerance is thereby an integral part of the mapping and implemented through an ABFT scheme with marginal impact on the overall performance.
  BibTeX
  @article{BraunW2010a, abstract = {Moderne Many-Core-Architekturen bieten ein sehr hohes Potenzial an Rechenleistung. Dies macht sie besonders f{\"u}r Anwendungen aus dem Bereich des wissenschaftlichen Hochleistungsrechnens und der Simulationstechnik attraktiv. Die Architekturen folgen dabei einem Ausf{\"u}hrungsparadigma, das sich am besten durch den Begriff “Many-Threading” beschreiben l{\"a}sst. Wie alle nanoelektronischen Halbleiterschaltungen leiden auch Many-Core-Prozessoren potentiell unter st{\"o}renden Einfl{\"u}ssen von transienten Fehlern (soft errors) und diversen Arten von Variationen. Diese Faktoren k{\"o}nnen die Zuverl{\"a}ssigkeit von Systemen negativ beeinflussen und erfordern Fehlertoleranz auf allen Ebenen, von der Hardware bis zur Software. Auf der Softwareseite stellt die Algorithmen-basierte Fehlertoleranz (ABFT) eine ausgereifte Technik zur Verbesserung der Zuverl{\"a}ssigkeit dar. Der Aufwand f{\"u}r die Anpassung dieser Technik an moderne Many-Threading-Architekturen darf jedoch keinesfalls untersch{\"a}tzt werden. In diesem Beitrag wird eine effiziente und fehlertolerante Abbildung der Matrixmultiplikation auf eine moderne Many-Core-Architektur pr{\"a}sentiert. Die Fehlertoleranz ist dabei integraler Bestandteil der Abbildung und wird durch ein ABFT-Schema realisiert, das die Leistung nur unwesentlich beeintr{\"a}chtigt. Modern many-core architectures provide a high computational potential, which makes them particularly interesting for applications from the fields of scientific high-performance computing and simulation technology. The execution paradigm of these architectures is best described as “Many-Threading”. Like all nano-scaled semiconductor devices, many-core processors are prone to transient errors (soft errors) and different kinds of variations that can have severe impact on the reliability of such systems. Therefore, fault-tolerance has to be incorporated at all levels, from the hardware up to the software. On the software side, Algorithm-based Fault Tolerance (ABFT) is a mature technique to improve the reliability. However, significant effort is required to adapt this technique to modern many-threading architectures. In this article, an efficient and fault-tolerant mapping of the matrix multiplication to a modern many-core architecture is presented. Fault-tolerance is thereby an integral part of the mapping and implemented through an ABFT scheme with marginal impact on the overall performance.}, abteilung = {ra}, author = {Braun, Claus and Wunderlich, Hans-Joachim}, doi = {10.1524/itit.2010.0593}, issn = {1611-2776}, journal = {it - Information Technology}, language = {German}, month = {08}, number = 4, owner = {Thomas}, pages = {209--215}, project = {SimTech}, publisher = {Oldenbourg Wissenschaftsverlag}, title = {{Algorithmen-basierte Fehlertoleranz f{\"u}r Many-Core-Architekturen}; {Algorithm-based Fault-Tolerance on Many-Core Architectures}}, volume = 52, year = 2010 }
  DOI
  10.1524/itit.2010.0593
2. Algorithm-Based Fault Tolerance for Many-Core Architectures. Claus Braun and Hans-Joachim Wunderlich. In Proceedings of the 15th IEEE European Test Symposium (ETS′10), Praha, Czech Republic, 2010, pp. 253. DOI: https://doi.org/10.1109/ETSYM.2010.5512738
  Abstract
  Modern many-core architectures with hundreds of cores provide a high computational potential. This makes them particularly interesting for scientific high-performance computing and simulation technology. Like all nano scaled semiconductor devices, many-core processors are prone to reliability harming factors like variations and soft errors. One way to improve the reliability of such systems is software-based hardware fault tolerance. Here, the software is able to detect and correct errors introduced by the hardware. In this work, we propose a software-based approach to improve the reliability of matrix operations on many-core processors. These operations are key components in many scientific applications.
  BibTeX
  @inproceedings{BraunW2010, abstract = {Modern many-core architectures with hundreds of cores provide a high computational potential. This makes them particularly interesting for scientific high-performance computing and simulation technology. Like all nano scaled semiconductor devices, many-core processors are prone to reliability harming factors like variations and soft errors. One way to improve the reliability of such systems is software-based hardware fault tolerance. Here, the software is able to detect and correct errors introduced by the hardware. In this work, we propose a software-based approach to improve the reliability of matrix operations on many-core processors. These operations are key components in many scientific applications.}, abteilung = {ra}, address = {Praha, Czech Republic}, author = {Braun, Claus and Wunderlich, Hans-Joachim}, booktitle = {Proceedings of the 15th IEEE European Test Symposium (ETS'10)}, comment = {IEEE Catalog Number: CFP10216-USB}, cr-category = {B.8.1 Reliability, Testing, and Fault-Tolerance, C.1.4 Processor Architectures, Parallel Architectures, C.4 Performance of Systems, D.1.3 Concurrent Programming}, day = {24--28}, department = {Universit{\"a}t Stuttgart, Institut f{\"u}r Technische Informatik, Rechnerarchitektur}, doi = {10.1109/ETSYM.2010.5512738}, institution = {Universit{\"a}t Stuttgart, Fakult{\"a}t Informatik, Elektrotechnik und Informationstechnik, Germany}, isbn = {978-1-4244-5834-9}, issn = {1530-1877}, language = {English}, location = {Praha, Czech Republic}, month = {05}, owner = {Thomas}, pages = {253--253}, privnote = {IEEE Catalog Number: CFP10216-USB}, project = {SimTech}, publisher = {IEEE Computer Society}, title = {{Algorithm-Based Fault Tolerance for Many-Core Architectures}}, type = {Konferenz-Beitrag}, url = {https://www.iti.uni-stuttgart.de//fileadmin/rami/files/publications/2010/ETS_BraunW2010.pdf}, year = 2010 }
  Link
  https://www.iti.uni-stuttgart.de//fileadmin/rami/files/publications/2010/ETS_BraunW2010.pdf
  DOI
  10.1109/ETSYM.2010.5512738

Workshop Contributions

2016
1. Hardware/Software Co-Characterization for Approximate Computing. Alexander Schöll; Claus Braun and Hans-Joachim Wunderlich. In Workshop on Approximate Computing, Pittsburgh, Pennsylvania, USA, 2016.
  - BibTeX
  BibTeX
  @inproceedings{SchoeBW2016, abteilung = {rawork}, address = {Pittsburgh, Pennsylvania, USA}, author = {Schöll, Alexander and Braun, Claus and Wunderlich, Hans-Joachim}, booktitle = {Workshop on Approximate Computing}, day = 06, location = {Pittsburgh, Pennsylvania, USA}, month = {10}, owner = {hellmelr}, project = {SimTech}, title = {{Hardware/Software Co-Characterization for Approximate Computing}}, year = 2016 }
2015
1. ABFT with Probabilistic Error Bounds for Approximate and Adaptive-Precision Computing Applications. Claus Braun and Hans-Joachim Wunderlich. In Workshop on Approximate Computing, Paderborn, Germany, 2015.
  - BibTeX
  BibTeX
  @inproceedings{BraunW2015, abteilung = {rawork}, address = {Paderborn, Germany}, author = {Braun, Claus and Wunderlich, Hans-Joachim}, booktitle = {Workshop on Approximate Computing}, day = {15--16}, location = {Paderborn, Germany}, month = {10}, owner = {haefneht}, project = {SimTech}, title = {{ABFT with Probabilistic Error Bounds for Approximate and Adaptive-Precision Computing Applications}}, year = 2015 }
2014
1. A-ABFT: Autonomous Algorithm-Based Fault Tolerance on GPUs. Claus Braun; Sebastian Halder and Hans-Joachim Wunderlich. In International Workshop on Dependable GPU Computing, in conjunction with the ACM/IEEE DATE′14 Conference, Dresden, Germany, 2014.
  - Abstract
  - BibTeX
  Abstract
  General-purpose computations on graphics processing units (GPUs) enable large-scale scientific applications and simulations on the desktop. Such applications typically have high performance and reliability requirements. For GPUs, which are still designed for the graphics mass-market, hardware-based fault tolerance measures often do not have the highest priority, which makes the application of appropriate software-based fault tolerance mandatory. Algorithm-based Fault Tolerance (ABFT) allows the efficient and effective protection of important kernels from scientific computing. Some ABFT schemes have already been adapted for GPU architectures. However, due to roundoff error introduced by floating-point arithmetic, ABFT requires the determination of tight error bounds for the error detection. The determination of such error bounds is a highly challenging task. In this work, we introduce A-ABFT for GPUs, a new parallel ABFT scheme that determines appropriate error bounds for the checksum comparison step autonomously and which therefore enables the transparent operation of ABFT without any user interaction.
  BibTeX
  @inproceedings{BraunHW2014, abstract = {General-purpose computations on graphics processing units (GPUs) enable large-scale scientific applications and simulations on the desktop. Such applications typically have high performance and reliability requirements. For GPUs, which are still designed for the graphics mass-market, hardware-based fault tolerance measures often do not have the highest priority, which makes the application of appropriate software-based fault tolerance mandatory. Algorithm-based Fault Tolerance (ABFT) allows the efficient and effective protection of important kernels from scientific computing. Some ABFT schemes have already been adapted for GPU architectures. However, due to roundoff error introduced by floating-point arithmetic, ABFT requires the determination of tight error bounds for the error detection. The determination of such error bounds is a highly challenging task. In this work, we introduce A-ABFT for GPUs, a new parallel ABFT scheme that determines appropriate error bounds for the checksum comparison step autonomously and which therefore enables the transparent operation of ABFT without any user interaction.}, abteilung = {rawork}, address = {Dresden, Germany}, author = {Braun, Claus and Halder, Sebastian and Wunderlich, Hans-Joachim}, booktitle = {International Workshop on Dependable GPU Computing, in conjunction with the ACM/IEEE DATE'14 Conference}, day = 28, location = {Dresden, Germany}, month = {03}, owner = {hellmelr}, project = {SimTech}, title = {{A-ABFT: Autonomous Algorithm-Based Fault Tolerance on GPUs}}, year = 2014 }

This image shows Hans-Joachim Wunderlich

Heterogeneous Computing

Simulation on Reconfigurable Heterogeneous Computer Architectures

Journals and Conference Proceedings

2017

Abstract

BibTeX

Link

DOI

2016

Abstract

BibTeX

Link

DOI

Abstract

BibTeX

Link

DOI

Abstract

BibTeX

Link

DOI

2015

Abstract

BibTeX

Link

DOI

Abstract

BibTeX

Link

DOI

2014

Abstract

BibTeX

Link

DOI

2013

Abstract

BibTeX

Link

DOI

2012

Abstract

BibTeX

Link

DOI

Abstract

BibTeX

Link

DOI

2010

Abstract

BibTeX

DOI

Abstract

BibTeX

Link

DOI

Workshop Contributions

2016

BibTeX

2015

BibTeX

2014

Abstract

BibTeX

Hans-Joachim Wunderlich

Here you can reach us

Audience

Formalities

Services

Organization