WIKI-Link (internal)

Zur Webseite der Uni Stuttgart

PARSIVAL: Parallel High-Throughput Simulations for Efficient Nanoelectronic Design and Test Validation

since 10.2014, DFG-Project: WU 245/16-1   

 

Project Description

In this project novel methods for versatile simulation-based VLSI design and test validation on high throughput data-parallel architectures will be developed, which enable a wide range of important state-of-the-art validation tasks for large circuits. In general, due to the nature of the design validation processes and due to the massive amount of data involved, parallelism and throughput-optimization are the keys for making design validation feasible for future industrial-sized designs. The main focus and key features lie in the structure of the simulation model, the abstraction level and the used algorithms, as well as their parallelization on data-parallel architectures. The simulation algorithms should be kept simple to run fast, yet accurate enough to produce acceptable and valuable data for cross-layer validation of complex digital systems.

Design and test validation is one of the most important and complex tasks within modern semi-conductor product development cycles. The design validation processes analyze and assess a developed design with respect to certain validation targets to ensure the compliance with given specifications and customer requirements. With thorough validation, test strategies can be assessed to increase test quality and the quality of the products delivered. The type of specification and requirements can range from the abstract high-level functional behavior of the circuit towards constraints of parameters at lower levels, such as peak power consumption or transistor stress. With the process scaling, more complex defect mechanisms arise and more and more low level parameters have to be considered, which is why validation at lower levels has become an essential part in current manufacturing processes. Yet, state-of-the-art algorithms rely on compute-intensive simulations being unable to scale on traditional computing architectures as a result of the increasing complexity of the designs and the required model accuracy. Over the past years, data-parallel architectures, such as Graphics Processing Units (GPUs), have evolved and introduced the many-core paradigm, which well established in general purpose computing. Current architectures nowadays provide thousands of processors on a single chip and are capable of achieving massive computational throughput of several Teraflops. By exploiting highly parallel hardware acceleration and careful abstraction, we strive for maximum throughput in order to enable a wide range of complex electronic design automation (EDA) applications applicable to industrial-sized designs.

Development of Simulation Models for accurate Validation

The abstraction levels to be considered during the first project phase cover descriptions from gate-level to electrical level and incorporate all the information required for an accurate evaluation of the circuit behavior and its functional properties. The evaluation models must be sufficiently comprehensive to describe all significant electrical parameters that have impact on the logic behavior and timing, but must still support an extremely efficient parallel algorithm environment. Instead of relying on continuous computation of differential equations as common in lower level simulation tools such as SPICE, the algorithm makes use of piecewise approximations of the electrical behavior through linearization in order to model functional properties and to compute the signal values. This offers an attractive alternative in terms of the tradeoff between achievable precision and computational effort.

In addition to the ideal logic and timing behavior, the functional model has to be extended to consider the impact of parasitic and external parameters including: Modeling the power grid, cross-talk, the impact of temperature and environmental influences.

Non-functional properties (NFPs) have to be evaluated over a very wide range of different time scales. Computing the vulnerability of circuit structures with respect to soft-errors is subject of single effects in the range of picoseconds. Circuit robustness is related to noise, inductivity or signal integrity whose duration is in the order of nanoseconds. Current and power-consumption have to be evaluated at this scale as well, while temperature may be evaluated over the range of milliseconds due to the heat capacitance and heat transfer functions of the device under evaluation. However, reliability with respect to wear-out mechanisms and aging has to be analyzed at the scale of months or years and requires a completely different approach. All these NFPs have in turn direct impact on the functional properties and have to be evaluated in an integrated way.

Massively Parallel Validation Algorithms

The developed simulation models will be evaluated by optimized algorithms for massively parallel compute structures like GPUs. On such architectures, high throughput is achieved by parallelizing computations and maximizing the number of occupied computational resources during runtime. Exploiting parallelism in various dimensions requires that the simulation algorithms will be tuned for the targeted data-parallel architectures. For an optimal result, this comprises not only a thorough understanding of the underlying architecture and instruction sets, but also requires a general and flexible algorithm design.

 

Parallelism will be exploited in many ways during evaluation:

  • Structural parallelism, induced by mutually independent nodes in a circuit, allows the concurrent evaluation of the independent nodes.
  • Multiple fault simulations can be simulated in parallel, e.g. when different fault propagation cones are involved such that interactions between the faults are prevented.
  • Pattern parallelism, a type of data-parallelism, allows the evaluation of a circuit for different patterns at once.
  • Instance parallelism takes advantage of circuit instances with different parameters, such as varying node delays, which can be evaluated at the same time for a given pattern or fault.
  • Task parallelism allows to execute different subtasks for an instance concurrently on the device to evaluate multiple parameters at the same time.

The validation tasks can be significantly accelerated by optimal combination and exploitation of these different dimensions of parallelism, yielding a maximized throughput. However, this requires a thorough scheduling of the computational tasks and an elaborate utilization of the computational resources of the underlying many-core GPU architecture.

Additional Information

This work is supported by the German Research Foundation (DFG) under grant WU 245/16-1.

 

Poster

 

Publications

Journals and Conference Proceedings
Matching entries: 0
settings...
11. Multi-Level Timing Simulation on GPUs
Schneider, E., Kochte, M.A. and Wunderlich, H.-J.
to appear in Proceedings of the 23rd Asia and South Pacific Design Automation Conference (ASP-DAC'18), Jeju Island, Korea, 22-25 January 2018 , pp. 1-6
2018
 
BibTeX:
@inproceedings{SchneKW2018,
  author = {Schneider, Eric and Kochte, Michael A. and Wunderlich, Hans-Joachim},
  title = {{Multi-Level Timing Simulation on GPUs}},
  booktitle = {to appear in Proceedings of the 23rd Asia and South Pacific Design Automation Conference (ASP-DAC'18)},
  year = { 2018 },
  pages = {1--6}
}
10. Analysis and Mitigation of IR-Drop Induced Scan Shift-Errors
Holst, S., Schneider, E., Kawagoe, K., Kochte, M.A., Miyase, K., Wunderlich, H.-J., Kajihara, S. and Wen, X.
to appear in Proceedings of the IEEE International Test Conference (ITC'17), Fort Worth, Texas, USA, 31 October-2 November 2017
2017
 
BibTeX:
@inproceedings{HolstSKKMWKW2017,
  author = {Holst, Stefan and Schneider, Eric and Kawagoe, Koshi and Kochte, Michael A. and Miyase, Kohei and Wunderlich, Hans-Joachim and Kajihara, Seiji and Wen, Xiaoqing},
  title = {{Analysis and Mitigation of IR-Drop Induced Scan Shift-Errors}},
  booktitle = {to appear in Proceedings of the IEEE International Test Conference (ITC'17)},
  year = {2017}
}
9. Probabilistic Sensitization Analysis for Variation-Aware Path Delay Fault Test Evaluation
Wagner, M. and Wunderlich, H.-J.
Proceedings of the 22nd IEEE European Test Symposium (ETS'17), Limassol, Cyprus, 22-26 May 2017, pp. 1-6
2017
DOI PDF 
Keywords: delay test, process variations, delay test quality
Abstract: With the ever increasing process variability in recent technology nodes, path delay fault testing of digital integrated circuits has become a major challenge. A randomly chosen long path often has no robust test and many of the existing non-robust tests are likely invalidated by process variations. To generate path delay fault tests that are more tolerant towards process variations, the delay test generation must evaluate different non-robust tests and only those tests that sensitize the target path with a sufficiently high probability in presence of process variations must be selected. This requires a huge number of probability computations for a large number of target paths and makes the development of very efficient approximation algorithms mandatory for any practical application. In this paper, a novel and efficient probabilistic sensitization analysis is presented which is used to extract a small subcircuit for a given test vector-pair. The probability that a target path is sensitized by the vector-pair is computed efficiently and without significant error by a Monte-Carlo simulation of the subcircuit.
BibTeX:
@inproceedings{WagneW2017,
  author = {Wagner, Marcus and Wunderlich, Hans-Joachim},
  title = {{Probabilistic Sensitization Analysis for Variation-Aware Path Delay Fault Test Evaluation}},
  booktitle = {Proceedings of the 22nd IEEE European Test Symposium (ETS'17)},
  year = {2017},
  pages = {1--6},
  keywords = {delay test, process variations, delay test quality},
  abstract = {With the ever increasing process variability in recent technology nodes, path delay fault testing of digital integrated circuits has become a major challenge. A randomly chosen long path often has no robust test and many of the existing non-robust tests are likely invalidated by process variations. To generate path delay fault tests that are more tolerant towards process variations, the delay test generation must evaluate different non-robust tests and only those tests that sensitize the target path with a sufficiently high probability in presence of process variations must be selected. This requires a huge number of probability computations for a large number of target paths and makes the development of very efficient approximation algorithms mandatory for any practical application. In this paper, a novel and efficient probabilistic sensitization analysis is presented which is used to extract a small subcircuit for a given test vector-pair. The probability that a target path is sensitized by the vector-pair is computed efficiently and without significant error by a Monte-Carlo simulation of the subcircuit.},
  doi = {http://dx.doi.org/10.1109/ETS.2017.7968226},
  file = {http://www.iti.uni-stuttgart.de/fileadmin/rami/files/publications/2017/ETS_WagneW2017.pdf}
}
8. GPU-Accelerated Simulation of Small Delay Faults
Schneider, E., Kochte, M.A., Holst, S., Wen, X. and Wunderlich, H.-J.
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (TCAD)
Vol. 36(5), May 2017, pp. 829-841
2017
DOI PDF 
Keywords: Circuit faults, Computational modeling, Delays, Instruction sets, Integrated circuit modeling, Logic gates, Fault simulation, graphics processing unit (GPU), parallel, process variation, small gate delay faults, timing-accurate, waveform
Abstract: Delay fault simulation is an essential task during test pattern generation and reliability assessment of electronic circuits. With the high sensitivity of current nano-scale designs towards even smallest delay deviations, the simulation of small gate delay faults has become extremely important. Since these faults have a subtle impact on the timing behavior, traditional fault simulation approaches based on abstract timing models are not sufficient. Furthermore, the detection of these faults is compromised by the ubiquitous variations in the manufacturing processes, which causes the actual fault coverage to vary from circuit instance to circuit instance, and makes the use of timing accurate methods mandatory. However, the application of timing accurate techniques quickly becomes infeasible for larger designs due to excessive computational requirements. In this work, we present a method for fast and waveformaccurate simulation of small delay faults on graphics processing units with exceptional computational performance. By exploiting multiple dimensions of parallelism from gates, faults, waveforms and circuit instances, the proposed approach allows for timing-accurate and exhaustive small delay fault simulation under process variation for designs with millions of gates.
BibTeX:
@article{SchneKHWW2016,
  author = {Schneider, Eric and Kochte, Michael A. and Holst, Stefan and Wen, Xiaoqing and Wunderlich, Hans-Joachim},
  title = {{GPU-Accelerated Simulation of Small Delay Faults}},
  journal = {IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (TCAD)},
  year = {2017},
  volume = {36},
  number = {5},
  pages = {829--841},
  keywords = {Circuit faults, Computational modeling, Delays, Instruction sets, Integrated circuit modeling, Logic gates, Fault simulation, graphics processing unit (GPU), parallel, process variation, small gate delay faults, timing-accurate, waveform},
  abstract = {Delay fault simulation is an essential task during test pattern generation and reliability assessment of electronic circuits. With the high sensitivity of current nano-scale designs towards even smallest delay deviations, the simulation of small gate delay faults has become extremely important. Since these faults have a subtle impact on the timing behavior, traditional fault simulation approaches based on abstract timing models are not sufficient. Furthermore, the detection of these faults is compromised by the ubiquitous variations in the manufacturing processes, which causes the actual fault coverage to vary from circuit instance to circuit instance, and makes the use of timing accurate methods mandatory. However, the application of timing accurate techniques quickly becomes infeasible for larger designs due to excessive computational requirements. In this work, we present a method for fast and waveformaccurate simulation of small delay faults on graphics processing units with exceptional computational performance. By exploiting multiple dimensions of parallelism from gates, faults, waveforms and circuit instances, the proposed approach allows for timing-accurate and exhaustive small delay fault simulation under process variation for designs with millions of gates.},
  doi = {http://dx.doi.org/10.1109/TCAD.2016.2598560},
  file = {http://www.iti.uni-stuttgart.de/fileadmin/rami/files/publications/2016/TCAD_SchneKHWW2016.pdf}
}
7. Aging Monitor Reuse for Small Delay Fault Testing
Liu, C., Kochte, M.A. and Wunderlich, H.-J.
Proceedings of the 35th VLSI Test Symposium (VTS'17), Caesars Palace, Las Vegas, Nevada, USA, 9-12 April 2017
2017
DOI PDF 
Keywords: Delay monitoring, delay test, faster-than-at-speed test, stability checker, small delay fault, ATPG
Abstract: Small delay faults receive more and more attention, since they may indicate a circuit reliability marginality even if they do not violate the timing at the time of production. At-speed test and faster-than-at-speed test (FAST) are rather expensive tasks to test for such faults. The paper at hand avoids complex on-chip structures or expensive high-speed ATE for test response evaluation, if aging monitors which are integrated into the device under test anyway are reused. The main challenge in reusing aging monitors for FAST consists in possible false alerts at higher frequencies. While a certain test vector pair makes a delay fault observable at one monitor, it may also exceed the time slack in the fault free case at a different monitor which has to be masked. Therefore, a multidimensional optimizing problem has to be solved for minimizing the masking overhead and the number of test vectors while maximizing delay fault coverage.
BibTeX:
@inproceedings{LiuKW2017,
  author = {Liu, Chang and Kochte, Michael A. and Wunderlich, Hans-Joachim},
  title = {{Aging Monitor Reuse for Small Delay Fault Testing}},
  booktitle = {Proceedings of the 35th VLSI Test Symposium (VTS'17)},
  year = {2017},
  keywords = {Delay monitoring, delay test, faster-than-at-speed test, stability checker, small delay fault, ATPG},
  abstract = {Small delay faults receive more and more attention, since they may indicate a circuit reliability marginality even if they do not violate the timing at the time of production. At-speed test and faster-than-at-speed test (FAST) are rather expensive tasks to test for such faults. The paper at hand avoids complex on-chip structures or expensive high-speed ATE for test response evaluation, if aging monitors which are integrated into the device under test anyway are reused. The main challenge in reusing aging monitors for FAST consists in possible false alerts at higher frequencies. While a certain test vector pair makes a delay fault observable at one monitor, it may also exceed the time slack in the fault free case at a different monitor which has to be masked. Therefore, a multidimensional optimizing problem has to be solved for minimizing the masking overhead and the number of test vectors while maximizing delay fault coverage.},
  doi = {http://dx.doi.org/10.1109/VTS.2017.7928921},
  file = {http://www.iti.uni-stuttgart.de/fileadmin/rami/files/publications/2017/VTS_LiuKW2017.pdf}
}
6. Timing-Accurate Estimation of IR-Drop Impact on Logic- and Clock-Paths During At-Speed Scan Test
Holst, S., Schneider, E., Wen, X., Kajihara, S., Yamato, Y., Wunderlich, H.-J. and Kochte, M.A.
Proceedings of the 25th IEEE Asian Test Symposium (ATS'16), Hiroshima, Japan, 21-24 November 2016, pp. 19-24
2016
DOI PDF 
Abstract: IR-drop induced false capture failures and test clock stretch are severe problems in at-speed scan testing. We propose a new method to efficiently and accurately identify these problems. For the first time, our approach considers the additional dynamic power caused by glitches, the spatial and temporal distribution of all toggles, and their impact on both logic paths and the clock tree without time-consuming electrical simulations.
BibTeX:
@inproceedings{HolstSWKYWK2016,
  author = {Holst, Stefan and Schneider, Eric and Wen, Xiaoqing and Kajihara, Seiji and Yamato, Yuta and Wunderlich, Hans-Joachim and Kochte, Michael A.},
  title = {{Timing-Accurate Estimation of IR-Drop Impact on Logic- and Clock-Paths During At-Speed Scan Test}},
  booktitle = {Proceedings of the 25th IEEE Asian Test Symposium (ATS'16)},
  year = {2016},
  pages = {19--24},
  abstract = {IR-drop induced false capture failures and test clock stretch are severe problems in at-speed scan testing. We propose a new method to efficiently and accurately identify these problems. For the first time, our approach considers the additional dynamic power caused by glitches, the spatial and temporal distribution of all toggles, and their impact on both logic paths and the clock tree without time-consuming electrical simulations.},
  doi = {http://dx.doi.org/10.1109/ATS.2016.49},
  file = {http://www.iti.uni-stuttgart.de/fileadmin/rami/files/publications/2016/ATS_HolstSWKYWK2016.pdf}
}
5. High-Throughput Transistor-Level Fault Simulation on GPUs
Schneider, E. and Wunderlich, H.-J.
Proceedings of the 25th IEEE Asian Test Symposium (ATS'16), Hiroshima, Japan, 21-24 November 2016, pp. 150-155
2016
DOI PDF 
Keywords: fault simulation; transistor level; switch level; GPUs
Abstract: Deviations in the first-order parameters of CMOS cells can lead to severe errors in the functional and time domain. With increasing sensitivity of these parameters to manufacturing defects and variation, parametric and parasitic-aware fault simulation is becoming crucial in order to support test pattern generation. Traditional approaches based on gate-level models are not sufficient to represent and capture the impact of deviations in these parameters in either an efficient or accurate manner. Evaluation at electrical level, on the other hand, severely lacks execution speed and quickly becomes inapplicable to larger designs due to high computational demands. This work presents a novel fault simulation approach considering first-order parameters in CMOS circuits to explicitly capture CMOS-specific behavior in the functional and time domain with transistor granularity. The approach utilizes massive parallelization in order to achieve high-throughput acceleration on Graphics Processing Units (GPUs) by exploiting parallelism of cells, stimuli and faults. Despite the more precise level of abstraction, the simulator is able to process designs with millions of gates and even outperforms conventional simulation at logic level in terms of modeling accuracy and simulation speed.
BibTeX:
@inproceedings{SchneW2016,
  author = {Schneider, Eric and Wunderlich, Hans-Joachim},
  title = {{High-Throughput Transistor-Level Fault Simulation on GPUs}},
  booktitle = {Proceedings of the 25th IEEE Asian Test Symposium (ATS'16)},
  year = {2016},
  pages = {150--155},
  keywords = {fault simulation; transistor level; switch level; GPUs},
  abstract = {Deviations in the first-order parameters of CMOS cells can lead to severe errors in the functional and time domain. With increasing sensitivity of these parameters to manufacturing defects and variation, parametric and parasitic-aware fault simulation is becoming crucial in order to support test pattern generation. Traditional approaches based on gate-level models are not sufficient to represent and capture the impact of deviations in these parameters in either an efficient or accurate manner. Evaluation at electrical level, on the other hand, severely lacks execution speed and quickly becomes inapplicable to larger designs due to high computational demands. This work presents a novel fault simulation approach considering first-order parameters in CMOS circuits to explicitly capture CMOS-specific behavior in the functional and time domain with transistor granularity. The approach utilizes massive parallelization in order to achieve high-throughput acceleration on Graphics Processing Units (GPUs) by exploiting parallelism of cells, stimuli and faults. Despite the more precise level of abstraction, the simulator is able to process designs with millions of gates and even outperforms conventional simulation at logic level in terms of modeling accuracy and simulation speed.},
  doi = {http://dx.doi.org/10.1109/ATS.2016.9},
  file = {http://www.iti.uni-stuttgart.de/fileadmin/rami/files/publications/2016/ATS_SchneW2016.pdf}
}
4. Logic/Clock-Path-Aware At-Speed Scan Test Generation for Avoiding False Capture Failures and Reducing Clock Stretch
Asada, K., Wen, X., Holst, S., Miyase, K., Kajihara, S., Kochte, M.A., Schneider, E., Wunderlich, H.-J. and Qian, J.
Proceedings of the 24th IEEE Asian Test Symposium (ATS'15), Mumbai, India, 22-25 November 2015, pp. 103-108
ATS 2015 Best Paper Award
2015
DOI PDF 
Keywords: launch switching activity, IR-drop, logic path, clock path, false capture failure, test clock stretch, X-filling
Abstract: IR-drop induced by launch switching activity (LSA) in capture mode during at-speed scan testing increases delay along not only logic paths (LPs) but also clock paths (CPs). Excessive extra delay along LPs compromises test yields due to false capture failures, while excessive extra delay along CPs compromises test quality due to test clock stretch. This paper is the first to mitigate the impact of LSA on both LPs and CPs with a novel LCPA (Logic/Clock-Path-Aware) at-speed scan test generation scheme, featuring (1) a new metric for assessing the risk of false capture failures based on the amount of LSA around both LPs and CPs, (2) a procedure for avoiding false capture failures by reducing LSA around LPs or masking uncertain test responses, and (3) a procedure for reducing test clock stretch by reducing LSA around CPs. Experimental results demonstrate the effectiveness of the LCPA scheme in improving test yields and test quality.
BibTeX:
@inproceedings{AsadaWHMKKSWQ2015,
  author = {Asada, Koji and Wen, Xiaoqing and Holst, Stefan and Miyase, Kohei and Kajihara, Seiji and Kochte, Michael A. and Schneider, Eric and Wunderlich, Hans-Joachim and Qian, Jun},
  title = {{Logic/Clock-Path-Aware At-Speed Scan Test Generation for Avoiding False Capture Failures and Reducing Clock Stretch}},
  booktitle = {Proceedings of the 24th IEEE Asian Test Symposium (ATS'15)},
  year = {2015},
  pages = {103-108},
  keywords = { launch switching activity, IR-drop, logic path, clock path, false capture failure, test clock stretch, X-filling },
  abstract = {IR-drop induced by launch switching activity (LSA) in capture mode during at-speed scan testing increases delay along not only logic paths (LPs) but also clock paths (CPs). Excessive extra delay along LPs compromises test yields due to false capture failures, while excessive extra delay along CPs compromises test quality due to test clock stretch. This paper is the first to mitigate the impact of LSA on both LPs and CPs with a novel LCPA (Logic/Clock-Path-Aware) at-speed scan test generation scheme, featuring (1) a new metric for assessing the risk of false capture failures based on the amount of LSA around both LPs and CPs, (2) a procedure for avoiding false capture failures by reducing LSA around LPs or masking uncertain test responses, and (3) a procedure for reducing test clock stretch by reducing LSA around CPs. Experimental results demonstrate the effectiveness of the LCPA scheme in improving test yields and test quality.},
  doi = {http://dx.doi.org/10.1109/ATS.2015.25},
  file = {http://www.iti.uni-stuttgart.de/fileadmin/rami/files/publications/2015/ATS_AsadaWHMKKSWQ2015.pdf}
}
3. High-Throughput Logic Timing Simulation on GPGPUs
Holst, S., Imhof, M.E. and Wunderlich, H.-J.
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Vol. 20(3), Jun 2015, pp. 37:1-37:21
2015
DOI URL PDF 
Keywords: Verification, Performance, Gate-Level Simulation, General Purpose computing on Graphics Processing Unit (GP-GPU), Hazards, Parallel CAD, Pin-to-Pin Delay, Pulse-Filtering, Timing Simulation
Abstract: Many EDA tasks like test set characterization or the precise estimation of power consumption, power droop and temperature development, require a very large number of time-aware gate-level logic simulations. Until now, such characterizations have been feasible only for rather small designs or with reduced precision due to the high computational demands. The new simulation system presented here is able to accelerate such tasks by more than two orders of magnitude and provides for the first time fast and comprehensive timing simulations for industrial-sized designs. Hazards, pulse-filtering, and pin-to-pin delay are supported for the first time in a GPGPU accelerated simulator, and the system can easily be extended to even more realistic delay models and further applications. A sophisticated mapping with efficient memory utilization and access patterns as well as minimal synchronizations and control flow divergence is able to use the full potential of GPGPU architectures. To provide such a mapping, we combine for the first time the versatility of event-based timing simulation and multidimensional parallelism used in GPU-based gate-level simulators. The result is a throughput-optimized timing simulation algorithm, which runs many simulation instances in parallel and at the same time fully exploits gate-parallelism within the circuit.
BibTeX:
@article{HolstIW2015,
  author = {Holst, Stefan and Imhof, Michael E. and Wunderlich, Hans-Joachim},
  title = {{High-Throughput Logic Timing Simulation on GPGPUs}},
  journal = {ACM Transactions on Design Automation of Electronic Systems (TODAES)},
  year = {2015},
  volume = {20},
  number = {3},
  pages = {37:1--37:21},
  keywords = {Verification, Performance, Gate-Level Simulation, General Purpose computing on Graphics Processing Unit (GP-GPU), Hazards, Parallel CAD, Pin-to-Pin Delay, Pulse-Filtering, Timing Simulation},
  abstract = {Many EDA tasks like test set characterization or the precise estimation of power consumption, power droop and temperature development, require a very large number of time-aware gate-level logic simulations. Until now, such characterizations have been feasible only for rather small designs or with reduced precision due to the high computational demands. The new simulation system presented here is able to accelerate such tasks by more than two orders of magnitude and provides for the first time fast and comprehensive timing simulations for industrial-sized designs. Hazards, pulse-filtering, and pin-to-pin delay are supported for the first time in a GPGPU accelerated simulator, and the system can easily be extended to even more realistic delay models and further applications. A sophisticated mapping with efficient memory utilization and access patterns as well as minimal synchronizations and control flow divergence is able to use the full potential of GPGPU architectures. To provide such a mapping, we combine for the first time the versatility of event-based timing simulation and multidimensional parallelism used in GPU-based gate-level simulators. The result is a throughput-optimized timing simulation algorithm, which runs many simulation instances in parallel and at the same time fully exploits gate-parallelism within the circuit.},
  url = {http://dl.acm.org/citation.cfm?id=2714564},
  doi = {http://dx.doi.org/10.1145/2714564},
  file = {http://www.iti.uni-stuttgart.de/fileadmin/rami/files/publications/2015/TODAES_HolstIW2015.pdf}
}
2. GPU-Accelerated Small Delay Fault Simulation
Schneider, E., Holst, S., Kochte, M.A., Wen, X. and Wunderlich, H.-J.
Proceedings of the ACM/IEEE Conference on Design, Automation Test in Europe (DATE'15), Grenoble, France, 9-13 March 2015, pp. 1174-1179
Best Paper Candidate
2015
URL PDF 
Abstract: The simulation of delay faults is an essential task in design validation and reliability assessment of circuits. Due to the high sensitivity of current nano-scale designs against smallest delay deviations, small delay faults recently became the focus of test research. Because of the subtle delay impact, traditional fault simulation approaches based on abstract timing models are not sufficient for representing small delay faults. Hence, timing accurate simulation approaches have to be utilized, which quickly become inapplicable for larger designs due to high computational requirements. In this work we present a waveform-accurate approach for fast high-throughput small delay fault simulation on Graphics Processing Units (GPUs). By exploiting parallelism from gates, faults and patterns, the proposed approach enables accurate exhaustive small delay fault simulation even for multi-million gate designs without fault dropping for the first time.
BibTeX:
@inproceedings{SchneHKWW2015,
  author = { Schneider, Eric and Holst, Stefan and Kochte, Michael A. and Wen, Xiaoqing and Wunderlich, Hans-Joachim },
  title = {{GPU-Accelerated Small Delay Fault Simulation}},
  booktitle = {Proceedings of the ACM/IEEE Conference on Design, Automation Test in Europe (DATE'15)},
  year = {2015},
  pages = {1174--1179},
  abstract = {The simulation of delay faults is an essential task in design validation and reliability assessment of circuits. Due to the high sensitivity of current nano-scale designs against smallest delay deviations, small delay faults recently became the focus of test research. Because of the subtle delay impact, traditional fault simulation approaches based on abstract timing models are not sufficient for representing small delay faults. Hence, timing accurate simulation approaches have to be utilized, which quickly become inapplicable for larger designs due to high computational requirements. In this work we present a waveform-accurate approach for fast high-throughput small delay fault simulation on Graphics Processing Units (GPUs). By exploiting parallelism from gates, faults and patterns, the proposed approach enables accurate exhaustive small delay fault simulation even for multi-million gate designs without fault dropping for the first time.},
  url = { http://dl.acm.org/citation.cfm?id=2757084 },
  file = {http://www.iti.uni-stuttgart.de/fileadmin/rami/files/publications/2015/DATE_SchneHKWW2015.pdf}
}
1. Data-Parallel Simulation for Fast and Accurate Timing Validation of CMOS Circuits
Schneider, E., Holst, S., Wen, X. and Wunderlich, H.-J.
Proceedings of the 33rd IEEE/ACM International Conference on Computer-Aided Design (ICCAD'14), San Jose, California, USA, 3-6 November 2014, pp. 17-23
2014
URL PDF 
Abstract: Gate-level timing simulation of combinational CMOS circuits is the foundation of a whole array of important EDA tools such as timing analysis and power-estimation, but the demand for higher simulation accuracy drastically increases the runtime complexity of the algorithms. Data-parallel accelerators such as Graphics Processing Units (GPUs) provide vast amounts of computing performance to tackle this problem, but require careful attention to control-flow and memory access patterns. This paper proposes the novel High-Throughput Oriented Parallel Switch-level Simulator (HiTOPS), which is especially designed to take full advantage of GPUs and provides accurate time- simulation for multi-million gate designs at an unprecedented throughput. HiTOPS models timing at transistor granularity and supports all major timing-related effects found in CMOS including pattern-dependent delay, glitch filtering and transition ramps, while achieving speedups of up to two orders of magnitude compared to traditional gate-level simulators.
BibTeX:
@inproceedings{SchneHWW2014,
  author = {Schneider, Eric and Holst, Stefan and Wen, Xiaoqing and Wunderlich, Hans-Joachim},
  title = {{Data-Parallel Simulation for Fast and Accurate Timing Validation of CMOS Circuits}},
  booktitle = {Proceedings of the 33rd IEEE/ACM International Conference on Computer-Aided Design (ICCAD'14)},
  year = {2014},
  pages = {17--23},
  abstract = {Gate-level timing simulation of combinational CMOS circuits is the foundation of a whole array of important EDA tools such as timing analysis and power-estimation, but the demand for higher simulation accuracy drastically increases the runtime complexity of the algorithms. Data-parallel accelerators such as Graphics Processing Units (GPUs) provide vast amounts of computing performance to tackle this problem, but require careful attention to control-flow and memory access patterns. This paper proposes the novel High-Throughput Oriented Parallel Switch-level Simulator (HiTOPS), which is especially designed to take full advantage of GPUs and provides accurate time- simulation for multi-million gate designs at an unprecedented throughput. HiTOPS models timing at transistor granularity and supports all major timing-related effects found in CMOS including pattern-dependent delay, glitch filtering and transition ramps, while achieving speedups of up to two orders of magnitude compared to traditional gate-level simulators.},
  url = { http://dl.acm.org/citation.cfm?id=2691369 },
  file = {http://www.iti.uni-stuttgart.de/fileadmin/rami/files/publications/2014/ICCAD_SchneHWW2014.pdf}
}
Created by JabRef on 20/10/2017.
Workshop Contributions
Matching entries: 0
settings...
1. Hochbeschleunigte Simulation von Verzögerungsfehlern unter Prozessvariationen
Schneider, E., Kochte, M.A. and Wunderlich, H.-J.
27th GI/GMM/ITG Workshop "Testmethoden und Zuverlässigkeit von Schaltungen und Systemen" (TuZ'15), Bad Urach, Germany, 1-3 March 2015
2015
 
Abstract: Die Simulation kleiner Verzögerungsfehler ist ein wichtiger Bestandteil der Validierung nano-elektronischer Schaltungen. Prozessvariationen während der Herstellung haben großen Einfluss auf die Erkennung dieser Fehler und müssen bei der Simulation berücksichtigt werden. Die zeitgenaue Simulation von Verzögerungsfehlern ist verglichen mit traditioneller Logiksimulation oder statischer Zeitanalyse sehr aufwändig und die Rechenkomplexität steigt durch die Berücksichtigung von Variationen zusätzlich an. In dieser Arbeit wird ein hochparalleles Verfahren vorgestellt, welches Grafikprozessoren zur beschleunigten parallelen Simulation kleiner Verzögerungsfehler unter Variation anwendet. Das Verfahren berechnet akkurate Signalverläufe in der Schaltung und ermöglicht die Bestimmung einer Monte-Carlo-basierten statistischen Fehlererfassung für industrielle Schaltkreise unter zufälliger sowie systematischer Variation.
BibTeX:
@inproceedings{SchneKW2015,
  author = {Schneider, Eric and Kochte, Michael A. and Wunderlich, Hans-Joachim},
  title = {{Hochbeschleunigte Simulation von Verzögerungsfehlern unter Prozessvariationen}},
  booktitle = {27th GI/GMM/ITG Workshop "Testmethoden und Zuverlässigkeit von Schaltungen und Systemen" (TuZ'15)},
  year = {2015},
  abstract = {Die Simulation kleiner Verzögerungsfehler ist ein wichtiger Bestandteil der Validierung nano-elektronischer Schaltungen. Prozessvariationen während der Herstellung haben großen Einfluss auf die Erkennung dieser Fehler und müssen bei der Simulation berücksichtigt werden. Die zeitgenaue Simulation von Verzögerungsfehlern ist verglichen mit traditioneller Logiksimulation oder statischer Zeitanalyse sehr aufwändig und die Rechenkomplexität steigt durch die Berücksichtigung von Variationen zusätzlich an. In dieser Arbeit wird ein hochparalleles Verfahren vorgestellt, welches Grafikprozessoren zur beschleunigten parallelen Simulation kleiner Verzögerungsfehler unter Variation anwendet. Das Verfahren berechnet akkurate Signalverläufe in der Schaltung und ermöglicht die Bestimmung einer Monte-Carlo-basierten statistischen Fehlererfassung für industrielle Schaltkreise unter zufälliger sowie systematischer Variation.}
}
Created by JabRef on 20/10/2017.

 

Contact