PARSIVAL: Parallel High-Throughput Simulations for Efficient Nanoelectronic Design and Test Validation
In this project novel methods for versatile simulation-based VLSI design and test validation on high throughput data-parallel architectures will be developed, which enable a wide range of important state-of-the-art validation tasks for large circuits. In general, due to the nature of the design validation processes and due to the massive amount of data involved, parallelism and throughput-optimization are the keys for making design validation feasible for future industrial-sized designs. The main focus and key features lie in the structure of the simulation model, the abstraction level and the used algorithms, as well as their parallelization on data-parallel architectures. The simulation algorithms should be kept simple to run fast, yet accurate enough to produce acceptable and valuable data for cross-layer validation of complex digital systems.
Design and test validation is one of the most important and complex tasks within modern semi-conductor product development cycles. The design validation processes analyze and assess a developed design with respect to certain validation targets to ensure the compliance with given specifications and customer requirements. With thorough validation, test strategies can be assessed to increase test quality and the quality of the products delivered. The type of specification and requirements can range from the abstract high-level functional behavior of the circuit towards constraints of parameters at lower levels, such as peak power consumption or transistor stress. With the process scaling, more complex defect mechanisms arise and more and more low level parameters have to be considered, which is why validation at lower levels has become an essential part in current manufacturing processes. Yet, state-of-the-art algorithms rely on compute-intensive simulations being unable to scale on traditional computing architectures as a result of the increasing complexity of the designs and the required model accuracy. Over the past years, data-parallel architectures, such as Graphics Processing Units (GPUs), have evolved and introduced the many-core paradigm, which well established in general purpose computing. Current architectures nowadays provide thousands of processors on a single chip and are capable of achieving massive computational throughput of several Teraflops. By exploiting highly parallel hardware acceleration and careful abstraction, we strive for maximum throughput in order to enable a wide range of complex electronic design automation (EDA) applications applicable to industrial-sized designs.
Development of Simulation Models for accurate Validation
The abstraction levels to be considered during the first project phase cover descriptions from gate-level to electrical level and incorporate all the information required for an accurate evaluation of the circuit behavior and its functional properties. The evaluation models must be sufficiently comprehensive to describe all significant electrical parameters that have impact on the logic behavior and timing, but must still support an extremely efficient parallel algorithm environment. Instead of relying on continuous computation of differential equations as common in lower level simulation tools such as SPICE, the algorithm makes use of piecewise approximations of the electrical behavior through linearization in order to model functional properties and to compute the signal values. This offers an attractive alternative in terms of the tradeoff between achievable precision and computational effort.
In addition to the ideal logic and timing behavior, the functional model has to be extended to consider the impact of parasitic and external parameters including: Modeling the power grid, cross-talk, the impact of temperature and environmental influences.
Non-functional properties (NFPs) have to be evaluated over a very wide range of different time scales. Computing the vulnerability of circuit structures with respect to soft-errors is subject of single effects in the range of picoseconds. Circuit robustness is related to noise, inductivity or signal integrity whose duration is in the order of nanoseconds. Current and power-consumption have to be evaluated at this scale as well, while temperature may be evaluated over the range of milliseconds due to the heat capacitance and heat transfer functions of the device under evaluation. However, reliability with respect to wear-out mechanisms and aging has to be analyzed at the scale of months or years and requires a completely different approach. All these NFPs have in turn direct impact on the functional properties and have to be evaluated in an integrated way.
Massively Parallel Validation Algorithms
The developed simulation models will be evaluated by optimized algorithms for massively parallel compute structures like GPUs. On such architectures, high throughput is achieved by parallelizing computations and maximizing the number of occupied computational resources during runtime. Exploiting parallelism in various dimensions requires that the simulation algorithms will be tuned for the targeted data-parallel architectures. For an optimal result, this comprises not only a thorough understanding of the underlying architecture and instruction sets, but also requires a general and flexible algorithm design.
Parallelism will be exploited in many ways during evaluation:
- Structural parallelism, induced by mutually independent nodes in a circuit, allows the concurrent evaluation of the independent nodes.
- Multiple fault simulations can be simulated in parallel, e.g. when different fault propagation cones are involved such that interactions between the faults are prevented.
- Pattern parallelism, a type of data-parallelism, allows the evaluation of a circuit for different patterns at once.
- Instance parallelism takes advantage of circuit instances with different parameters, such as varying node delays, which can be evaluated at the same time for a given pattern or fault.
- Task parallelism allows to execute different subtasks for an instance concurrently on the device to evaluate multiple parameters at the same time.
The validation tasks can be significantly accelerated by optimal combination and exploitation of these different dimensions of parallelism, yielding a maximized throughput. However, this requires a thorough scheduling of the computational tasks and an elaborate utilization of the computational resources of the underlying many-core GPU architecture.
This work is supported by the German Research Foundation (DFG) under grant WU 245/16-1.