Projects


B4: Generation of Distributed Monitors and Run-Time Verification of Invasive Applications

Principal Investigators:

Prof. U. Schlichtmann, Dr. D. Müller-Gritschneder

Scientific Researchers:

A. Listl, M. Mettler, B. Li, L. Zhang

Abstract

The TCRC 89 will direct its primary attention during the third funding phase towards runtime requirement enforcement and runtime requirement monitoring of invasive computing systems. Project B4 will contribute especially to the goals of runtime requirement monitoring. It will support this goal by investigating runtime verification and monitoring methods in hardware and software.

In the first funding phase Project B4 has developed concepts for monitoring invasive computing systems (both RISC and TCPA tiles). Specifically, concepts for monitoring power, temperature and ageing have been investigated. Communication interfaces between the monitors and higher levels and a control loop concept of invasive computing systems have been explored. For the essential monitoring concepts, a method has been developed to emulate them on an FPGA. The major challenge for FPGA emulation was that most monitors contain analogue circuits. With the achieved FPGA emulation, our concepts can be evaluated in the context of an entire invasive computing system even without an ASIC hardware implementation.

In the second funding phase , the focus of Project B4 was on "fix it before it breaks". By intelligently using the data of ageing, temperature and power monitors, the system predicts the point in lifetime when a component approaches a hardware failure. Here, a key challenge was identified that aging impacts every fabricated integrated system differently due to different stress profiles and random process variations during manufacturing. Hence, due to different timing margins, different fabricated systems will fail at different points in time, and from different workloads and environmental conditions. We showed that such predictions require the designer to apply a combination of design-time analysis, circuit tuning after fabrication and run-time monitoring. Additionally, for prototyping new ageing monitoring features were included in an enhanced FPGA-based monitor emulator. In cooperation with other partners, it was shown how monitoring data is used in the design of control loops in the architecture and software run-time environment of the system, e.g, for dark-silicon management.

In the third funding phase, Project B4 plans to evolve the monitors into a distributed monitoring system capable of supporting run-time verification of user-defined application properties such as latency, throughput, power, reliability and security. The run-time verification chain consists of probes, property checkers and event handlers. Probes are blocks that trace system events or states. Property checkers analyse the probed trace to generate the verdict whether a certain property is violated or validated. Hardware property checkers may be watchdog timers, software property checkers may be additional software code to check the values of variables. Hybrid property checkers may be a software code that compares the value of two hardware timers. The verdict is forwarded to the event handler that can trigger a reaction or generate a log message. A key event handler will be the runtime requirement enforcers (RREs) of Project A1 and A4. These RREs will enable a much closer control of non-functional program properties.

Synopsis

A major advantage of the invasive computing paradigm is its highly predictive run-time behaviour due to the exclusive assignment of resources to applications. This strong isolation between applications enables the use of the highly complex heterogeneous invasive multicore architecture in embedded domains that require high predictability. Yet, there remain some sources of unpredictability, which may impact run-time properties such as execution time, latencies, throughput or power consumption. Application input data may be only partly known at design time, e.g., the number of objects in a frame that must be identified by a robot. Also there may occur access conflicts at a limited number of non-isolated hardware resources, which are still shared by invasive applications such as main memory. Finally in embedded domains, random hardware faults, e.g., due to radiation, voltage noise or silicon wear-out effects (ageing), may have arbitrary impact on application behaviour.

In this context, Project B4 addresses the major research challenge to enable run-time verification for invasive multi-tile architectures. This requires us to work on the following research questions: How can we generate a programmable hardware monitoring system with probes and property checkers and insert software probes and property checkers into the application code? How do we establish communication between all distributed system components? What are the trade-offs between implementing the property checking in hardware blocks, SW annotated to the application or additional monitoring tasks? Also, at design time, the mapping of the application on the platform is unknown. On top, even the exact resource types are unknown as the run-time environment may select from a set of candidate operating points in the invade phase. So another question is how do we dynamically configure the system when the mapping information becomes available? To our knowledge, there exists no comparable approach that needs to tackle such a highly distributed and dynamic system for run-time verification.

Another important aspect is the usability of such a system, which leads to the following questions: Which automation support is required for the configuration of the distributed monitoring system with user-defined properties? Here we plan to investigate two domain-specific languages (DSLs), a so-called Instrumentation Language (IL) and a Property Language (PL), as interfaces for the invasive compiler tool chain to define probing data and property checks. From the application source code including information on non-functional properties, e.g., in the form of requirement tags of the domain specific language from Project A1, this IL/PL specification is derived as intermediate format. We intend to develop an automation tool to generate the run-time verification codes to configure the HW probes and monitors as well as to generate the SW probes and monitors from the IL/PL specification. Another interface to the compiler toolchain and run-time environment is investigated to integrate the run-time verification codes into the application. This leads to a highly automated flow to produce and forward verdicts to event handlers, such as Run-time Requirement Enforcers (RRE) from Project A1 and A4. Simply stated, the IL and PL act as programming language for the distributed monitoring system. The proposed automation tool generates the configuration and SW codes that implement the run-time verification in a compiler-like fashion.

Finally, we also want to exploit the predictability of the invasive platform together with the run-time verification capabilities. When properties can be validated statically at design time, they are expected to be also always valid by the run-time property check. Still, one can conduct a run-time check on these properties. When such checks detect an unexpected run-time violation, they may indicate abnormal behaviour due to some non-considered usually rare events placing another layer of safety into the system.

In order to support run-time requirement enforcement, verdicts will be forwarded to the run-time requirement enforcers from Project A1 and Project A4 or application-specific event handlers as illustrated in the Figure.
Overview of Runtime Monitoring System
Additionally, we plan to also provide an interface to the application that allows the user to define event handlers. These event handlers can trigger system reactions, e.g., even go so far as to switch off the system to avoid unsafe operation. Overall, the trust in an invasive computing platform can be improved by the planned work in Project B4 as the enforcers or applications obtain reliable data on system behaviour. Additionally, abnormal behaviour is detected, which may indicate rare events such as unexpected inputs from the environment, random hardware failures or a security issue.

Approach

We intend to achieve the goals mentioned above by employing the following methods:

Customisable run-time verification for invasive applications

The goals of the third funding phase extend Project B4 beyond the design of hardware monitors as investigated in the previous two funding phases. Project B4 investigates the design of a configurable run-time verification system that can be customised for different invasive platforms. This hybrid run-time verification system for invasive applications consists of software and hardware parts for the following tasks: Programmable probes generate a trace of system events and status information. Programmable property checkers analyse the trace to generate a verdict on a set of user-defined application properties. Our goal is to reliably generate verdicts on properties defined for latency, throughput, power budgets, reliability and the detection of application abnormalities indicating upcoming hardware failures or security concerns.

New concepts for distributed monitoring systems

We will investigate hardware and software property checkers, trade-offs and optimisation steps for the run-time verification system as well as model-based electronic design automation (EDA) tools to translate property specifications into hardware configurations as well as software codes executed at run time to generate and report the verdicts. The complete run-time verification system can be seen as a highly resource-constrained programmable distributed monitoring system that is deeply embedded into the invasive architecture. It requires its own custom hardware, domain-specific programming language and compiler-like code generation. Additionally, the system does not improve the computing power of a given invasive multicore platform, hence, the overhead in terms of area, memory and power must be kept minimal. All these aspects are investigated in Project B4, building upon a strong hardware probing and monitoring system from the previous funding phase.

Support of run-time requirement enforcers

The verdicts produced by the run-time verification system are forwarded to the event handlers. The primary event handlers considered in Project B4 are the run-time Requirement Enforcers (RREs) developed in Project A1 and A4. The RREs serve to control certain application behaviours such as latency, throughput or power corridors by applying appropriate control actions. Most run-time verification approaches generate solely Boolean verdicts indicating a violated or validated property. The RREs will require the run-time verification to generate also continuous verdicts to be able to take actions before the violation appears. Next to the RREs, there already exists support in invasive applications to specify application-level event handlers. The run-time verification system should also support an interface to forward the verdicts to these application-level event handlers. This allows the user to prespecify system reactions on certain property violations as part of his or her application.

With strict RREs application properties can be continuously enforced at run time. Hence, no verdicts indicating violations of enforced properties should become visible at the user-application level during nominal system operation. Yet, as was already outlined, it might be hard to foresee all possible environments and states that the system operates in. Therefore, another goal of our run-time verification system for invasive platforms, beyond generating data for the enforcers, is the identification of unexpected situations and rare events outside the specified system operation. Such indications are provided to the application as unexpected verdict outcomes, which can be caused by different reasons: Due to environmental changes during run-time, the environment may provide unexpected input data beyond specified bounds not foreseen at design time. An unexpected property violation might also indicate a severe hardware failure, that prohibits the enforcers from fulfilling their task. Additionally, with Project C5, we plan to investigate whether unexpected verdict outcomes might also indicate security issues caused by someone tampering with the system. For all these situations, the user can prespecify emergency system reactions, e.g. ranging from a reboot to a complete system stop. Such reactions then enable to assure safe system states for invasive applications in domains with tight constraints on application behaviour such as safety-critical systems.

A comprehensive summary of the major achievements of the first and second funding phase can be found by accessing Project B4 first phase and Project B4 second phase websites.

Publications

[1] Alexandra Listl, Daniel Mueller-Gritschneder, Sani R. Nassif, and Ulf Schlichtmann. Sram design exploration with integrated application-aware aging analysis. In Proceedings of Design, Automation and Test in Europe Conference (DATE), March 2019, 2019. accepted for publication.
[2] Alexandra Listl, Daniel Mueller-Gritschneder, Fabian Kluge, and Ulf Schlichtmann. Emulation of an asic power, temperature and aging monitor system for fpga prototyping. In International On-Line Testing Symposium (IOLTS), July 2018.
[3] Li Zhang. Advanced Timing for High-Performance Design and Security of Digital Circuits. Dissertation, Chair of Eletronic Design Automation, Department of Electrical and Computer Engineering, Technical University of Munich, Germany, 2018.
[4] Grace Li Zhang, Bing Li, Yiyu Shi, Jiang Hu, and Ulf Schlichtmann. Effitest2: Efficient delay test and prediction for post-silicon clock skew configuration under process variations. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2018. Early Access Paper, doi: 10.1109/TCAD.2018.2818713.
[5] Grace Li Zhang, Bing Li, Jinglan Liu, Yiyu Shi, and Ulf Schlichtmann. Design-phase buffer allocation for post-silicon clock binning by iterative learning. In IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, volume 37, 2018.
[6] E. Glocker, Q. Chen, U. Schlichtmann, and D. Schmitt-Landsiedel. Emulation of an asic power and temperature monitoring system (etpmon) for fpga prototyping. Microprocessors and Microsystems, 50:90–101, May 2017. [ DOI ]
[7] Jinglan Liu, Yukun Ding, Jianlei Yang, Ulf Schlichtmann, and Yiyu Shi. Generative adversarial network based scalable on-chip noise sensor placement. In 30th IEEE International System-on-Chip Conference, SOCC 2017, Munich, Germany, September 5-8, 2017, pages 239–242, 2017. [ DOI ]
[8] Elisabeth Glocker. Thermisches Verhalten und emuliertes online Temperatur-Monitorsystem für das FPGA-Prototyping von Multiprozessor-Architekturen. Dissertation, Chair of Technical Electronics, Department of Electrical and Computer Engineering, Technical University of Munich, Germany, 2017.
[9] Shushanik Karapetyan and Ulf Schlichtmann. 20nm finfet-based sram cell: Impact of variability and design choices on performance characteristics. In Int. Conf. Synthesis, Modeling, Analysis and Simulation Methods and Applications to Circuit Design (SMACD), 2017.
[10] Santiago Pagani, Lars Bauer, Qingqing Chen, Elisabeth Glocker, Frank Hannig, Andreas Herkersdorf, Heba Khdr, Anuj Pathania, Ulf Schlichtmann, Doris Schmitt-Landsiedel, Mark Sagi, Éricles Sousa, Philipp Wagner, Volker Wenzel, Thomas Wild, and Jörg Henkel. Dark silicon management: An integrated and coordinated cross-layer approach. it – Information Technology, 58(6):297–307, September 16, 2016. [ DOI ]
[11] U. Schlichtmann. The next frontier in ic design: Determining (and optimizing) robustness and resilience of integrated circuits and systems. In 2016 China Semiconductor Technology International Conference (CSTIC), pages 1–4, March 2016. [ DOI ]
[12] Ulf Schlichtmann, Masanori Hashimoto, Iris Hui-Ru Jiang, and Bing Li. Reliability, adaptability and flexibility in timing: Buy a life insurance for your circuits. In IEEE/ACM Asia and South Pacific Design Automation Conference (ASP-DAC), pages 705–711. IEEE/ACM Press, January 2016. [ DOI ]
[13] Grace Li Zhang, Bing Li, and Ulf Schlichtmann. Effitest: Efficient delay test and statistical prediction for configuring post-silicon tunable buffers. In Proceedings of the 53rd Annual Design Automation Conference (DAC), pages 60:1–60:6. ACM, 2016. [ DOI ]
[14] Bing Li and U. Schlichtmann. Statistical timing analysis and criticality computation for circuits with post-silicon clock tuning elements. Computer-Aided Design of Integrated Circuits and Systems, IEEE Transactions on, 34(11):1784–1797, November 2015. [ DOI ]
[15] E. Glocker, Q. Chen, A.M. Zaidi, U. Schlichtmann, and D. Schmitt-Landsiedel. Emulation of an ASIC power and temperature monitor system for FPGA prototyping. In Reconfigurable Communication-centric Systems-on-Chip (ReCoSoC), 2015 10th International Symposium on, pages 1–8, June 2015. [ DOI ]
[16] Éricles R. Sousa, Frank Hannig, Jürgen Teich, Qingqing Chen, and Ulf Schlichtmann. Runtime adaptation of application execution under thermal and power constraints in massively parallel processor arrays. In Proceedings of the 18th International Workshop on Software and Compilers for Embedded Systems (SCOPES), pages 121–124. ACM, June 2015. [ DOI ]
[17] Elisabeth Glocker, Qingqing Chen, Asheque M. Zaidi, Ulf Schlichtmann, and Doris Schmitt-Landsiedel. Emulated ASIC Power and Temperature Monitor System for FPGA Prototyping of an Invasive MPSoC Computing Architecture. In Proceedings of the First Workshop on Resource Awareness and Adaptivity in Multi-Core Computing (Racing 2014), pages 14–15, May 2014. [ arXiv ]
[18] Elisabeth Glocker, Qingqing Chen, Asheque M. Zaidi, Ulf Schlichtmann, and Doris Schmitt-Landsiedel. Emulierung eines ASIC-Leistungsverbrauchs- und Temperaturmonitorsystems für FPGA-Prototyping eines ressourcengewahren Computersystems. In 16. Workshop Analogschaltungen, Wien, Österreich, 2014.
[19] E. Glocker, S. Boppu, Q. Chen, U. Schlichtmann, J. Teich, and D. Schmitt-Landsiedel. Temperature modeling and emulation of an ASIC temperature monitor system for Tightly-Coupled Processor Arrays (TCPAs). Advances in Radio Science, 12:103–109, 2014. [ DOI ]
[20] Dominik Lorenz, Martin Barke, and Ulf Schlichtmann. Monitoring of aging in integrated circuits by identifying possible critical paths. Journal of Microelectronics Reliability, 54:1075 – 1082, 2014. [ DOI ]
[21] Nasim Pour Aryan, A. Listl, L. Heiss, C. Yilmaz, G. Georgakos, and D. Schmitt-Landsiedel. From an analytic NBTI device model to reliability assessment of complex digital circuits. In International On-Line Testing Symposium (IOLTS), pages 19–24, 2014.
[22] Elisabeth Glocker, Srinivas Boppu, Qingqing Chen, Ulf Schlichtmann, Jürgen Teich, and Doris Schmitt-Landsiedel. Temperature modeling and emulation of an ASIC temperature monitor system for Tightly-Coupled Processor Arrays (TCPAs) on FPGA. In Kleinheubacher Tagung 2013, September 2013.
[23] Martin Barke, Veit B. Kleeberger, Christoph Werner, Doris Schmitt-Landsiedel, and Ulf Schlichtmann. Analysis of Aging Mitigation Techniques for Digital Circuits Considering Recovery Effects. In edaWorkshop, May 2013.
[24] Bing Li, Ning Chen, Yang Xu, and Ulf Schlichtmann. On timing model extraction and hierachical statistical timing analysis. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (TCAD), 32(3):367–380, March 2013.
[25] Elisabeth Glocker and Doris Schmitt-Landsiedel. Modeling of Temperature Scenarios in a Multicore Processor System. 11:219–225, 2013. Advances in Radio Science (ARS), Volume 11. [ DOI ]
[26] Martin Wirnshofer. Variation-Aware Adaptive Voltage Scaling for Digital CMOS Circuits, volume 41. Springer Series in Advanced Microelectronics, 2013.
[27] Martin Wirnshofer. Variation-Aware Adaptive Voltage Scaling for Digital CMOS Circuits. Dissertation, Technische Universität München, München, 2013.
[28] Martin Wirnshofer, Nasim Pour Aryan, Leonhard Heiss, Doris Schmitt-Landsiedel, and Georg Georgakos. On-line supply voltage scaling based on in situ delay monitoring to adapt for PVTA variations. Journal of Circuits, Systems and Computers, 21(08), December 2012. [ DOI ]
[29] Bing Li, Ning Chen, and Ulf Schlichtmann. Statistical timing analysis for latch-controlled circuits with reduced iterations and graph transformations. In IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, pages 1670–1683, November 2012.
[30] N. Chen, B. Li, and U. Schlichtmann. Iterative timing analysis based on nonlinear and interdependent flipflop modelling. Circuits, Devices Systems, IET, 6(5):330–337, September 2012. [ DOI ]
[31] Dominik Lorenz, Martin Barke, and Ulf Schlichtmann. Efficiently analyzing the impact of aging effects on large integrated circuits. In Journal of Microelectronics Reliability, volume 52, pages 1546–1552, August 2012. [ DOI ]
[32] Sani R. Nassif, Veit B. Kleeberger, and Ulf Schlichtmann. Goldilocks failures: not too soft, not too hard. In IEEE International Reliability Physics Symposium (IRPS), April 2012.
[33] Martin Wirnshofer, Leonhard Heiss, A.N.Kakade, Nasim Pour Aryan, Georg Georgakos, and Doris Schmitt-Landsiedel. Adaptive voltage scaling by in-situ delay monitoring for an image processing circuit. In IEEE 15th International Symposium on Design and Diagnostics of Electronic Circuits & Systems (DDECS), pages 205–208, April 2012. [ DOI ]
[34] Christoph Knoth, Hela Jedda, and Ulf Schlichtmann. Current source modeling for power and timing analysis at different supply voltages. In Proceedings of Design, Automation and Test in Europe Conference (DATE), pages 923–928, March 2012. [ DOI ]
[35] Elisabeth Glocker and Doris Schmitt-Landsiedel. Modeling of Temperature Scenarios in a Multicore Processor System. In Kleinheubacher Tagung 2012, 2012.
[36] Nasim Pour Aryan, Leonhard Heiss, Doris Schmitt-Landsiedel, Georg Georgakos, and Martin Wirnshofer. Comparison of in-situ delay monitors for use in adaptive voltage scaling. Advances in Radio Science (ARS), 10:215–220, 2012.
[37] Shailesh More. Aging Degradation and Countermeasures in Deep-submicrometer Analog and Mixed Signal Integrated Circuits. Dissertation, Technische Universität München, München, 2012.
[38] Christoph Knoth. Accurate Waveform-based Timing Analysis with Systematic Current Source Models. Dissertation, Technische Universität München, München, 2012.
[39] Dominik Lorenz. Aging Analysis of Digital Integrated Circuits. Dissertation, Technische Universität München, München, 2012.
[40] Dominik Lorenz, Martin Barke, and Ulf Schlichtmann. Finding possible critical paths for on-line monitoring of aging in integrated circuits. Technical report, Technische Universität München, December 2011.
[41] Martin Wirnshofer, Leonhard Heiss, Georg Georgakos, and Doris Schmitt-Landsiedel. An energy-efficient supply voltage scheme using in-situ pre-error detection for on-the-fly adaptation to PVT variations. In International Symposium on Integrated Circuits (ISIC), pages 94–97, December 2011. [ DOI ]
[42] Ning Chen, Bing Li, and Ulf Schlichtmann. Timing modeling of flipflops considering aging effects. In International Workshop on Power and Timing Modeling, Optimization and Simulation (PATMOS), volume 6951 of Lecture Notes in Computer Science (LNCS), pages 63–72, September 2011.
[43] Christoph Knoth, Carsten Uphoff, Sebastian Kiesel, and Ulf Schlichtmann. SWAT: Simulator for waveform-accurate timing including parameter variations and transistor aging. In International Workshop on Power and Timing Modeling, Optimization and Simulation (PATMOS), volume 6951 of Lecture Notes in Computer Science (LNCS), pages 193–203, September 2011.
[44] Veit B. Kleeberger and Ulf Schlichtmann. Reliability Analysis of Digital Circuits Considering Intrinsic Noise. In Asia Symposium on Quality Electronic Design (ASQED), July 2011.
[45] Veit B. Kleeberger, Martin Barke, Christoph Werner, Doris Schmitt-Landsiedel, and Ulf Schlichtmann. A compact model for NBTI degradation and recovery under use-profile variations and its application to aging analysis of digital integrated circuits. Microelectronics Reliability, 54(6–7):1083–1089, Jun 5, 2011. [ DOI ]
[46] Nasim Pour Aryan, Leonhard Heiss, Doris Schmitt-Landsiedel, Georg Georgakos, and Martin Wirnshofer. Comparison of in-situ delay monitors for use in adaptive voltage scaling. In Kleinheubacher Tagung 2011, 2011.
[47] Jürgen Teich, Jörg Henkel, Andreas Herkersdorf, Doris Schmitt-Landsiedel, Wolfgang Schröder-Preikschat, and Gregor Snelting. Invasive computing: An overview. In Michael Hübner and Jürgen Becker, editors, Multiprocessor System-on-Chip – Hardware Design and Tool Integration, pages 241–268. Springer, Berlin, Heidelberg, 2011. [ DOI ]
[48] Martin Wirnshofer, Leonard Heiss, Georg Georgakos, and Doris Schmitt-Landsiedel. A variation-aware adaptive voltage scaling technique based on in-situ delay monitoring. In IEEE 14th International Symposium on Design and Diagnostics of Electronic Circuits & Systems, pages 261–266, 2011.
[49] Jürgen Teich. Invasive algorithms and architectures. it - Information Technology, 50(5):300–310, 2008.