ES - Current Research Projects
NoC-Performability: Integrated Performance and Reliability Evaluation of Fault-Tolerant Networks-on-Chip
Technology scaling makes it possible to implement systems with hundreds of processing cores, and thousands in the future, on a single chip. The communication in such systems is enabled by Networks-on-Chips (NoCs). A downside of technology scaling is the increased susceptibility to failures emerging in NoC resources during operation. Ensuring reliable operation despite such failures degrades NoC performance and may even invalidate the performance benefits expected from scaling. Thus, it is not enough to analyze performance and reliability in isolation, as usually done. Instead, we research how both aspects can be treated together using the concept of performability and its analysis with Markov reward models. In addition to developing modelling and analysis techniques, we exemplify our methodology through application to compare various NoC topologies and fault-tolerant routing algorithms. We investigate how performability develops with scaling towards larger NoCs and explore the limits of scaling by determining the break-even failure rates under which scaling can achieve net performability increase.