Exploiting the Error Resilience of the Preconditioned Conjugate Gradient Method for Energy and Delay Optimization. Natalia Lylina; Stefan Holst; Hanieh Jafarzadeh; Alexandra . Kourfali and Hans-Joachim Wunderlich. In IEEE 29st International On-Line Testing Symposium (IOLTS`23), Chania (Crete), Greece, 2023, pp. 1–6.
Abstract
The Preconditioned Conjugate Gradient (PCG)
method is well-established for solving linear equations systems.
Running the PCG method on a hardware accelerator ensures
fast and efficient computation. At the same time, each hardware
accelerator may be slightly different due to process variability or
aging. To handle the variability, a rather pessimistic frequency
selection for the whole population of accelerators is often utilized.
Increasing the frequency may improve the performance but
may also increase the risk of computational errors, affect the
convergence of PCG or even corrupt the PCG results.
In this paper, we present a method to determine the frequency
for each hardware accelerator instance which optimizes the
execution time and the energy efficiency of the PCG method.
First, a technique is presented to analyze the error resilience of
a PCG algorithm to overclocking. Based on the analysis results,
we increase the frequency to speed up the convergence while
keeping the error rate below the required threshold.BibTeX
Test Aspects of System Health State Monitoring. Hans-Joachim Wunderlich; Hanieh Jafarzadeh; Alexandra Kourfali; Natalia Lylina and Zahra Paria Najafi-Haghi. In IEEE 24nd Latin American Test Symposium (LATS`23), Veracruz, Mexico, 2023, pp. 1–2.
Abstract
System health monitoring is an integral concept
that involves observing, evaluating, and adapting the system
behavior under varying operating conditions. The data can be
collected from embedded instruments throughout the lifetime.
Various techniques, including machine learning, have to be used
to analyze the data and adapt the underlying system behavior.
At the same time, the behavior of modern devices is affected
by different types of variations. In order to develop an efficient
and precise health monitoring scheme, the underlying analysis
and adaptation techniques must be robust even in the presence
of those variations. This contribution explores various strategies
for overcoming this challenge across the system stack.BibTeX
Guardband Optimization for the Preconditioned Conjugate Gradient
Algorithm. Natalia Lylina; Stefan Holst; Hanieh Jafarzadeh; Alexandra Kourfali and Hans-Joachim Wunderlich. In International Conference on Dependable Systems and Networks(DSN’23), Porto, Portugal, 2023.
Abstract
Many applications from Artificial Intelligence (AI)
and Scientific Computing rely on efficient algorithms for solving
large systems of linear equations. The Preconditioned Conjugate
Gradient (PCG) algorithm is a promising option and it is
a perfect candidate to be executed on specialized hardware
accelerators widely used in AI. Hardware accelerators, like other
modern devices, are prone to process variations. A conventional
approach to handle the variability is to use pessimistic guard�bands for
all the devices within the population, which implies
that the best and even the average accelerators are slowed down
significantly. Since the PCG algorithm is inherently error resilient
to some extent, it may also tolerate an error rate increase due
to overclocking. On another side, increasing the frequency may
increase the total execution time if more arithmetic operations are
needed until the convergence. This paper presents a method to
ensure efficient computing on each hardware accelerator instance
running the PCG algorithm. A cross-layer approach identifies an
optimized frequency that minimizes the total time to complete
the PCG algorithm. Simple high-level checks ensure the quality
of the solution. Experimental results validate the feasibility of
the developed approach for large systems of linear equations.BibTeX