Projects


C2: Simulative Design Space Exploration

Principal Investigators:

Dr. F. Hannig

Scientific Researchers:

S. Roloff M. Witterauf

Abstract

In the first funding phase, Project C2 developed both techniques and the framework InvadeSIM for the timed functional simulation of invasive resource-aware programs on heterogeneous tiled architectures. This enabled us to gain important insights into the concepts of invasive computing across different platform layers, i.e., modelling of different heterogeneous tiled architectures, invasion strategies (solving of constraint satisfaction problems by agents), and resource-aware application programming. Based on this foundation, in the second funding phase, the main focus of Project C2 is the systematic co-exploration across all these platform layers. Investigated is the systematic co-exploration of optimal architecture configurations and invasion strategies fitting best a given mix of resource-aware applications. The research includes (a) design space exploration techniques based on (semi-)simulative and probabilistic methods, (b) the classification of invasive applications into parallel computing patterns, and (c) modelling the dynamic behaviour of invasive applications.

Synopsis

In the first funding phase, Project C2 investigated novel simulation techniques that enable the validation and variants' exploration of all essential features of invasive computing. It had two major research fields: (a) Timed functional simulation of invasive resource-aware programs and (b) performance evaluation of individual architectures and an integrated simulation methodology to co-simulate different types of invasive architectures. In order to handle the complexity and diversity of the considered architectures as well as different invasive programming, resource management and invasion strategies, new methods for the modularisation and orthogonalisation of these exploration concerns have been developed. Such simulation facilities enabled the optimisation of the concepts of invasive computing across all project areas, especially, without the need to have full hardware or software implementations available. One fundamental development was the timed functional simulation framework InvadeSIM for the simulation of invasive resource-aware programs on heterogeneous tiled architectures. It enables us to gain important insights into the concepts of invasive computing across all different platform layers including architectures, invasion strategies, and applications. Based on this foundation, in the second funding phase, the main focus of Project C2 is the automatic co-exploration across all these platform layers as shown in Figure 1.

Design Space Exploration

Figure 1: Overview of the design space exploration.

More specifically, we intend to investigate important research questions such as: How does an optimal invasive architecture configuration look like for a given set of characteristic applications? And: How to model the inherent dynamic behaviour of resource-aware programs as well as of multiple competing applications? In this realm, we want to study variants of invasive architectures and evaluate them for several application scenarios for multiple objectives (e.g., cost, performance, power, predictability). Furthermore, we want to classify and model invasive applications and sets of them by invasive parallel patterns.

Approach

The main contribution of the first funding phase and the basis for our research goals in the second funding phase is the simulation framework InvadeSIM, which will be explained in the next sections in more detail. The main goal of a timed functional simulation framework is to provide simulation capabilities comprising invasive architecture and run-time simulation for parallel invasive applications written in X10. In order to fulfil these requirements, the simulation platform InvadeSIM has been developed providing (a) a fast simulation approach for hundreds of competing applications on large heterogeneous architectures, (b) allowing the modelling of customised heterogeneous invasive multi-tile architectures, and (c) supporting the simulation of the complete set of X10 language constructs as well as (d) all novel invasive programming constructs such as invade, infect, etc. as provided by the InvadeX10 language extension. As a final result of these goals, we can state that all parts were successfully accomplished. This required a close cooperation with several projects of the TCRC (A1, B5, C1, D1, D3). In the following paragraphs, we will explain our achievements in more detail and relevant additional work that was achieved.

An overview of the developed timed functional simulation platform InvadeSIM, which will be also called simulator in the following, is depicted in Figure 2. The simulator allows to quickly customise an invasive multi-tile architecture to be evaluated by changing a number of parameters such as topology, network parameters, or number of tiles and processor types in each tile. This is denoted as architecture model in Figure 2. Applications using InvadeSIM are written in X10 and may use the InvadeX10 framework for exploiting the invasive command set. This is denoted as application modelling in the figure. The main concepts of the timed functional simulation approach are shown in the centre of the figure. Shown are the core components to simulate several parallel applications on a modelled heterogeneous multi-tile architecture. These contain a novel concept of approximately timed simulation and a discrete event synchronisation mechanism between multiple processor simulations that will be explained in the following. The simulation results may be visualised by a graphical user interface or by a trace viewer.

InvadeSIM Simulation Framework

Figure 2: Overview of the resource-aware timed functional simulation platform InvadeSIM.

In order to provide a fast simulation of invasive parallel program behaviour on heterogeneous multi-tile MPSoCs, we developed a completely novel processor simulation approach, which is able to tackle the complexity as well as the heterogeneity of current and future large-scale multicore platforms. Existing simulation frameworks for heterogeneous MPSoCs including either cycle-accurate or trace-based approaches are typically much too slow. Our processor simulation approach is a hybrid approach based on performance counters and analytic models, which we call time warping. A synchronisation mechanism preserves the causality of simulation events in case of multiple processor simulations. The basic ideas of both parts are depicted in the central box of Figure 2. The discrete event simulation approach combines the functional execution of an application i-let with the timing simulation corresponding to the computational properties of the target processor on which it was mapped at run time. This is done by scaling the time elapsed on the simulation host to the time the code would have taken on the simulated processor. As experiments have shown, our simulation is much faster than cycle-accurate simulation. Moreover, this work was presented at the HiPEAC ACACES Summer School 2012.

For the architectural model, we consider typical invasive heterogeneous tiled architectures consisting of multiple tiles that are connected by a network-on-chip. Here, each tile is customisable in size and types of local memory, a CiC, and number and types of processing elements, such as RISC CPUs, i-Cores, or TCPAs. The CiC schedules the application i-lets that are spawned on a tile to the processors contained in the claim of the application. Furthermore, each processor is characterised by a set of attributes like cycles per instruction (CPI), clock frequency and other processor-specific features. An architecture configuration file is used to describe these configuration settings of a given invasive architecture. A remarkable highlight of the architecture model is the design and implementation of a flit-accurate (in NoCs, network packets consist of small pieces called flow control digits [flits]) simulation model of an invasive network-on-chip (iNoC) in X10, which was developed in close cooperation with Project B5. This enables the simulation of network latencies between communicating i-lets on different tiles where the interference of different communications at the same time is also taken into account. Furthermore, we have investigated concepts of accelerating the network simulation by proposing an analytic calculation of network latencies in case of guaranteed service communications.

As mentioned earlier, the application model for our simulations are real applications written and supporting full X10. Especially the simulation of spawning, starting, blocking and locking of i-lets as well as the communication between i-lets across tiles is a feature of InvadeSIM and all X10 language constructs for parallelism (e.g., async), synchronisation (e.g., finish, when, or clocks) and communication (e.g., at, dist arrays, remote arrays) are supported. In this realm, we have modified the X10 run-time system and during simulation each of these primitives is re-directed to our simulator kernel. As an example, the primitives for communication between tiles, which are invoked by an at call, are re-directed to our network simulation kernel to calculate the latency of the corresponding data transfer instead of calling the real X10 communication library. This integration of our simulation methodology into the X10 run-time system was an important step towards fulfilling the requirements of the application projects of the TCRC. Here, Project D3 and Project D1 provided example applications for testing purposes.

Finally, in order to support all invasion commands and constructs, the InvadeX10 as developed by Project A1 was integrated into our simulation framework. Here, the most crucial task was the design and implementation of an agent-based invasive run-time system at X10 level. This required an interface to exchange information between the invasive run-time system and the simulated hardware. Whereas Project C1 primarily developed and evaluated agent-based resource allocation strategies for a very simplified application model of task graphs, Project C2 considers the execution of entire X10 applications and thus can also simulate application-inherent dynamic behaviour. We implemented different resource management strategies in cooperation with Project A1 (resource management based on game theory) and Project C1 (DistRM). We evaluated these resource management strategies in terms of quality (e.g., average speedup of multiple applications), computation and communication overheads.

Besides the achievements mentioned above, an interactive visualisation tool to show the invasion status of processors during simulation was developed in order to represent the simulation results in a handy way instead of just viewing the raw information of a trace file. Here, different information about the processors and tiles (e.g., invasion status, infection status, load, temperature, etc.) may be displayed in an architectural view at a certain point of simulation time. A final contribution was the realisation of a task graph library in X10, which will be a starting point for the investigations of Project A1 in the second funding phase for modelling and mapping of streaming applications invasively. Using this task graph extension of X10, we started to model a communicating task chain of a robot object tracking application, which is composed of three sub-algorithms: Harris corner detection, SIFT feature description and SIFT feature matching. This work was done with a class of thirteen selected students in a two week summer school during the Sarntal Academy 2013 and finally tuned in cooperation with Project D1. Furthermore, we organised a TCRC-internal simulation workshop where the participants were taught how to use the simulation framework and got the opportunity to contribute to its development with knowledge in their specific area.

Summarising, the fast simulation of the interplay between invasive program behaviour and the resulting states of the underlying processing resources such as their load, temperature, or faultiness in dependence of their state of invasion is one of the key features of our developed timed functional simulation platform InvadeSIM. It delivers important timing information about the invasion states and parallel execution of several applications running on an invasive architecture, taking into account the computational properties of the different processing elements. Therefore, it provides a backbone and will pave also the way to investigate different invasive architecture platform configurations during architectural exploration and to validate invasive programming concepts before real hardware is available.

A comprehensive summary of the major achievements of the first funding phase can be found by accessing Project C2 first phase website.

Publications

[1] Andreas Weichslgartner, Stefan Wildermann, Michael Glaß, and Jürgen Teich. Invasive Computing for Mapping Parallel Programs to Many-Core Architectures. Springer, 2018.
[2] Sascha Roloff, Frank Hannig, and Jürgen Teich. High performance network-on-chip simulation by interval-based timing predictions. In Proceedings of the 15th IEEE/ACM Symposium on Embedded Systems for Real-time Multimedia (ESTIMedia), October 2017. Forthcoming. [ DOI ]
[3] Michael Witterauf, Frank Hannig, and Jürgen Teich. Constructing fast and cycle-accurate simulators for configurable accelerators using C++ templates. In Proceedings of the 28th International Symposium on Rapid System Prototyping (RSP), pages 9–15. ACM, October 2017. [ DOI ]
[4] Jürgen Teich. Invasive computing – editorial. it – Information Technology, 58(6):263–265, November 24, 2016. [ DOI ]
[5] Vahid Lari, Andreas Weichslgartner, Alex Tanase, Michael Witterauf, Faramarz Khosravi, Jürgen Teich, Jürgen Becker, Jan Heißwolf, and Stephanie Friederich. Providing fault tolerance through invasive computing. it – Information Technology, 58(6):309–328, October 19, 2016. [ DOI ]
[6] Stefan Wildermann, Michael Bader, Lars Bauer, Marvin Damschen, Dirk Gabriel, Michael Gerndt, Michael Glaß, Jörg Henkel, Johny Paul, Alexander Pöppl, Sascha Roloff, Tobias Schwarzer, Gregor Snelting, Walter Stechele, Jürgen Teich, Andreas Weichslgartner, and Andreas Zwinkau. Invasive computing for timing-predictable stream processing on MPSoCs. it – Information Technology, 58(6):267–280, September 30, 2016. [ DOI ]
[7] Jürgen Teich, Michael Glaß, Sascha Roloff, Wolfgang Schröder-Preikschat, Gregor Snelting, Andreas Weichslgartner, and Stefan Wildermann. Language and compilation of parallel programs for *-predictable MPSoC execution using invasive computing. In Proceedings of the 10th IEEE International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC), pages 313–320, Lyon, France, September 2016. [ DOI ]
[8] Vahid Lari. Providing fault tolerance through invasive computing. Talk at DTC 2016, The Munich Workshop on Design Technology Coupling, Munich, Germany, June 30, 2016.
[9] Sascha Roloff, Alexander Pöppl, Tobias Schwarzer, Stefan Wildermann, Michael Bader, Michael Glaß, Frank Hannig, and Jürgen Teich. ActorX10: An actor library for X10. In Proceedings of the 6th ACM SIGPLAN X10 Workshop (X10), pages 24–29. ACM, June 14, 2016. [ DOI ]
[10] Alexandru Tanase, Michael Witterauf, Éricles R. Sousa, Vahid Lari, Frank Hannig, and Jürgen Teich. LoopInvader: A compiler for tightly coupled processor arrays. Tool Presentation at the University Booth at Design, Automation and Test in Europe (DATE), Dresden, Germany, March 2016. [ .pdf ]
[11] Sascha Roloff, Frank Hannig, and Jürgen Teich. InvadeSIM: A simulator for heterogeneous multi-processor systems-on-chip. Tool Presentation at the University Booth at Design, Automation and Test in Europe (DATE), Dresden, Germany, March 2016. [ .pdf ]
[12] Vahid Lari. Invasive Tightly Coupled Processor Arrays. Springer Singapore, 2016. [ DOI ]
[13] Vahid Lari. Invasive Tightly Coupled Processor Arrays. Dissertation, Hardware/Software Co-Design, Department of Computer Science, Friedrich-Alexander-Universität Erlangen-Nürnberg, Germany, November 18, 2015.
[14] Frank Hannig and Andreas Herkersdorf. Introduction to the special issue on testing, prototyping, and debugging of multi-core architectures. Journal of Systems Architecture, 61(10):600, November 7, 2015. [ DOI ]
[15] Vahid Lari, Jürgen Teich, Alexandru Tanase, Michael Witterauf, Faramarz Khosravi, and Brett H. Meyer. Techniques for on-demand structural redundancy for massively parallel processor arrays. Journal of Systems Architecture (JSA), 61(10):615–627, November 2015. [ DOI ]
[16] Sascha Roloff, Stefan Wildermann, Frank Hannig, and Jürgen Teich. Invasive computing for predictable stream processing: A simulation-based case study. In Proceedings of the 13th IEEE Symposium on Embedded Systems for Real-time Multimedia (ESTIMedia). IEEE, October 2015. [ DOI ]
[17] Alexandru Tanase, Michael Witterauf, Jürgen Teich, Frank Hannig, and Vahid Lari. On-demand fault-tolerant loop processing on massively parallel processor arrays. In Proceedings of the 26th IEEE International Conference on Application-specific Systems, Architectures and Processors (ASAP), pages 194–201. IEEE, July 2015. [ DOI ]
[18] Vahid Lari, Alexandru Tanase, Jürgen Teich, Michael Witterauf, Faramarz Khosravi, Frank Hannig, and Brett H. Meyer. A co-design approach for fault-tolerant loop execution on coarse-grained reconfigurable arrays. In Proceedings of the 2015 NASA/ESA Conference on Adaptive Hardware and Systems (AHS), pages 1–8. IEEE, June 2015. [ DOI ]
[19] Sascha Roloff, David Schafhauser, Frank Hannig, and Jürgen Teich. Execution-driven parallel simulation of PGAS applications on heterogeneous tiled architectures. In Proceedings of the 52nd ACM/EDAC/IEEE Design Automation Conference (DAC), pages 44:1–44:6. ACM, June 2015. [ DOI ]
[20] Vahid Lari, Alexandru Tanase, Frank Hannig, and Jürgen Teich. Massively parallel processor architectures for resource-aware computing. In Proceedings of the First Workshop on Resource Awareness and Adaptivity in Multi-Core Computing (Racing 2014), pages 1–7, May 2014. [ arXiv ]
[21] Aurang Zaib, Prashanth Raju, Thomas Wild, and Andreas Herkersdorf. A layered modeling and simulation approach to investigate resource-aware computing in mpsocs. In Proceedings of the First Workshop on Resource Awareness and Adaptivity in Multi-Core Computing (Racing 2014), pages 51–56, May 2014. [ arXiv ]
[22] Frank Hannig and Jürgen Teich, editors. Proceedings of the First Workshop on Resource Awareness and Adaptivity in Multi-Core Computing (Racing 2014). May 2014. [ arXiv ]
[23] Sascha Roloff, Frank Hannig, and Jürgen Teich. Towards actor-oriented programming on PGAS-based multicore architectures. In Workshop Proceedings of the 27th International Conference on Architecture of Computing Systems (ARCS). VDE Verlag, February 2014.
[24] Frank Hannig, Vahid Lari, Srinivas Boppu, Alexandru Tanase, and Oliver Reiche. Invasive tightly-coupled processor arrays: A domain-specific architecture/compiler co-design approach. ACM Transactions on Embedded Computing Systems (TECS), 13(4s):133:1–133:29, 2014. [ DOI ]
[25] Jörg Henkel, Vijaykrishnan Narayanan, Sri Parameswaran, and Jürgen Teich. Run-time adaptation for highly-complex multi-core systems. In Proceedings of the IEEE International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS), September 2013. [ DOI ]
[26] Sascha Roloff, Andreas Weichslgartner, Jan Heißwolf, Frank Hannig, and Jürgen Teich. NoC simulation in heterogeneous architectures for PGAS programming model. In Proceedings of the 16th International Workshop on Software and Compilers for Embedded Systems (M-SCOPES), pages 77–85. ACM, June 2013. [ DOI ]
[27] Frank Hannig. Resource-aware computing on domain-specific accelerators. In Proceedings of the 10st Workshop on Optimizations for DSP and Embedded Systems (ODES), page 35. ACM, February 24, 2013. Keynote. [ DOI ]
[28] Vahid Lari, Shravan Muddasani, Srinivas Boppu, Frank Hannig, Moritz Schmid, and Jürgen Teich. Hierarchical power management for adaptive tightly-coupled processor arrays. ACM Transactions on Design Automation of Electronic Systems (TODAES), 18(1):2:1–2:25, January 2013. [ DOI ]
[29] Frank Hannig. Invasive tightly-coupled processor arrays. Talk, 1st International Workshop on Domain-Specific Multicore Computing (DSMC) at International Conference on Computer-Aided Design (ICCAD), San Jose, CA, USA, November 8, 2012.
[30] Frank Hannig. Why do we see more and more domain-specific accelerators in multi-processor systems? Guest Lecture at University of California, Riverside in CS 287 Colloquium in Computer Science, Riverside, CA, USA, November 9, 2012.
[31] Michael Gerndt, Frank Hannig, Andreas Herkersdorf, Andreas Hollmann, Marcel Meyer, Sascha Roloff, Josef Weidendorfer, Thomas Wild, and Aurang Zaib. An integrated simulation framework for invasive computing. In Proceedings of the Forum on Specification and Design Languages (FDL), pages 209–216. IEEE, September 2012.
[32] Vahid Lari, Shravan Muddasani, Srinivas Boppu, Frank Hannig, and Jürgen Teich. Design of low power on-chip processor arrays. In Proceedings of the 23rd IEEE International Conference on Application-specific Systems, Architectures, and Processors (ASAP), pages 165–168. IEEE Computer Society, July 2012. [ DOI ]
[33] Sascha Roloff, Frank Hannig, and Jürgen Teich. Simulation of resource-aware applications on heterogeneous architectures. In Proceedings of the 8th International Summer School on Advanced Computer Architecture and Compilation for High-Performance and Embedded Systems (ACACES), pages 127–130, July 2012.
[34] Sascha Roloff, Frank Hannig, and Jürgen Teich. Fast architecture evaluation of heterogeneous MPSoCs by host-compiled simulation. In Proceedings of the 15th International Workshop on Software and Compilers for Embedded Systems (SCOPES), pages 52–61. ACM Press, May 2012. [ DOI ]
[35] Sascha Roloff, Frank Hannig, and Jürgen Teich. Approximate time functional simulation of resource-aware programming concepts for heterogeneous MPSoCs. In Proceedings of the 17th Asia and South Pacific Design Automation Conference (ASP-DAC), pages 187–192, January 2012. [ DOI ]
[36] Vahid Lari, Srinivas Boppu, Shravan Muddasani, Frank Hannig, and Jürgen Teich. Hierarchical power management for adaptive tightly-coupled processor arrays. Talk, International Workshop on Adaptive Power Management with Machine Intelligence at International Conference on Computer-Aided Design (ICCAD), San Jose, CA, USA, November 10, 2011.
[37] Vahid Lari, Andriy Narovlyanskyy, Frank Hannig, and Jürgen Teich. Decentralized dynamic resource management support for massively parallel processor arrays. In Proceedings of the 22nd IEEE International Conference on Application-specific Systems, Architectures and Processors (ASAP), pages 87–94. IEEE Computer Society, September 2011. [ DOI ]
[38] Jürgen Teich. Programming invasively parallel – an introduction. Pervasive Parallelism Laboratory (PPL) Seminar Talk, Stanford University, CA, USA, July 25, 2011.
[39] Jürgen Teich. Invasive parallel computing – an introduction. Par Lab and AMP Lab Seminar Talk, UC Berkeley, CA, USA, July 22, 2011.
[40] Frank Hannig, Sascha Roloff, Gregor Snelting, Jürgen Teich, and Andreas Zwinkau. Resource-aware programming and simulation of MPSoC architectures through extension of X10. In Proceedings of the 14th International Workshop on Software and Compilers for Embedded Systems (SCOPES), pages 48–55. ACM Press, June 2011. [ DOI ]
[41] Vahid Lari, Frank Hannig, and Jürgen Teich. Distributed resource reservation in massively parallel processor arrays. In Proceedings of the International Parallel and Distributed Processing Symposium Workshops (IPDPSW), pages 318–321. IEEE Computer Society, May 2011. [ DOI ]
[42] Jürgen Teich, Jörg Henkel, Andreas Herkersdorf, Doris Schmitt-Landsiedel, Wolfgang Schröder-Preikschat, and Gregor Snelting. Invasive computing: An overview. In Michael Hübner and Jürgen Becker, editors, Multiprocessor System-on-Chip – Hardware Design and Tool Integration, pages 241–268. Springer, Berlin, Heidelberg, 2011. [ DOI ]
[43] Frank Hannig. Retargetable mapping of loop programs on coarse-grained reconfigurable arrays. Talk, International Conference on Hardware-Software Codesign and System Synthesis (CODES+ISSS), Scottsdale, AZ, USA, October 26, 2010.
[44] Jürgen Teich. Invasive computing – basic concepts and foreseen benefits. Artist Network of Excellence on Embedded System Design Summer School Europe 2010, Autrans, France, Invited Tutorial, September 7, 2010.
[45] Farhadur Arifin, Richard Membarth, Amouri Abdulazim, Frank Hannig, and Jürgen Teich. FSM-controlled architectures for linear invasion. In Proceedings of the 17th IFIP/IEEE International Conference on Very Large Scale Integration (VLSI-SoC), pages 59–64, October 2009. [ DOI ]
[46] Jürgen Teich. Invasive algorithms and architectures. it - Information Technology, 50(5):300–310, 2008.