Projects


A4: Characterisation and Analysis of Invasive Algorithmic Patterns

Principal Investigators:

Prof. M. Bader, Dr. S. Wildermann

Scientific Researchers:

T. Schwarzer, B. Pourmohensi, A. Pöppl, Dr. J. Falk

Abstract

Project A4 characterises, investigates existing and develops novel invasive algorithmic patterns, which are motivated by applications from project area D. Using design space exploration (DSE), we precompute operating points to achieve desired quality numbers (performance, energy consumption, etc.) of applications.

In the second funding phase, we addressed three major research questions:

  1. How can we derive knowledge about an application’s performance and pass it to the run-time system? Together with Projects A1 and C2, we developed the ActorX10 library. The use of actor-based design for InvadeX10 applications realises a separation of computation and communication and automatically serves as a design entry for a subsequent DSE. The DSE obtains multiple operating points with different trade-offs between resource requirements and achievable quality numbers. This information is passed to the run-time system.
  2. How can we prepare run-time system decisions based on this knowledge, to improve the system performance w.r.t. various performance metrics? Each operating point found by DSE has Pareto-optimal quality numbers and resource claim constraints. We distil this usually huge set of Pareto-optimal operating points to a manageable and still relevant subset of points. These are passed to the run-time system, which can use the provided information for resource management.
  3. How can invasion-aware algorithmic patterns be developed that benefit from invasion on heterogeneous platforms? For this purpose, we developed and examined SWE-X10, a proxy application for space- and time-adaptive tsunami simulation, and a prototypical image detection chain. Both proxy applications are written in InvadeX10 using the ActorX10 library and serve to evaluate novel invasive algorithmic patterns and how to exploit operating points distilled by DSE.

The developed design flow allows the run-time system to execute applications with predictable performance (w.r.t. upper and lower bounds on various quality numbers), where applications are statically optimised under worst-case assumptions and for hardware with known performance behaviour.

In Phase III, we will focus on programming and characterising applications with varying execution phases and changing workloads to run on hardware with uncertain performance behaviour.

Synopsis

Application execution may be exposed to diverse workload scenarios depending on varying inputs, execution phases, and environmental conditions. Rather than application mappings being tailored to the worst-case or average-case scenarios, it is necessary to provide run-time enforcement strategies that enable switching between different algorithmic patterns and resource claims dynamically to cope with such variances. Additionally, on modern hardware, such uncertain performance may already result from small variations in the manufacturing process—for example, compute core instances of the same type will have to run at slightly different clock frequencies to meet a certain power limit or energy budget. Furthermore, hardware faults are expected to occur more often thus making resources temporally or, due to manufacturing variability and ageing, even permanently unavailable. Therefore, it is our goal to provide optimised and robust application execution by dynamically adapting the application behaviour and structure (i.e. the actor model) as well as the set of allocated resources by means of re-invasion and run-time requirement enforcement (RRE) to such variances and fault scenarios. Consequently, the concept of operating points needs to be extended to include such variations and provide an operational corridor for run-time requirement enforcers in the run-time system or in the invasive hardware.

Approach

Project A4’s mission is to explore and establish application characterisation in the invasive computing paradigm and to investigate and evaluate how algorithms and applications may exploit and profit from the resulting predictability features.

A major challenge is that our study of invasive proxy applications in Phase II and the experiences gained within the embedded system and HPC communities reveal that application execution is getting increasingly dynamic:

Preview: Fictional scenario of a tsunami in the Eastern Mediterranean Sea. Scenario used by invasive hardware prototype.


  1. Workload scenarios and execution phases of an application vary strongly, such that a static worst- or average-case characterisation for a single scenario becomes inadequate.
  2. Performance behaviour on modern hardware is highly uncertain, as external influences, differences in manufacturing and measures to enforce power limits or energy budgets affect quality numbers in an unforeseeable way. For example, compute core instances of the same type will have to run at slightly different clock frequencies to meet a certain power limit or energy budget. (Intel Sandy Bridge architecture) compute nodes of the SuperMUC petascale machine. Observing the the average node power of compute nodes in the SuperMUC petascale supercomputer, one can see a Gaussian distribution of the power consumption. If we assume power capping for the compute nodes, we have to expect a similar Gaussian distribution for the achievable clock speeds.
  3. Hardware faults are expected to occur more often thus making resources temporally or, due to manufacturing variability and ageing, even permanently unavailable.

Project A1 investigates the enforcement of non-functional requirements for the execution of one operating point. Such adaptations concentrate on modifications of the claimed resources, and thus always happen within a corridor defined by the respective operating point. Project A4 expands this approach beyond the boundaries of a single operating point by considering algorithmic adaptations of the application as well as modifications of the resource claim. Our goal is to provide optimised and robust application execution in the context of above aspects by characterising and enforcing adaptation of the application behaviour (e.g. algorithmic variants), the application structure (i.e. the actor model), and the set of allocated resources by means of re-invasion and run-time requirement enforcement (RRE) to such variances and fault scenarios.

A comprehensive summary of the major achievements of the second funding phase can be found by accessing Project A4 second phase website.

Publications

[1] Jörg Henkel, Jürgen Teich, Stefan Wildermann, and Hussam Amrouch. Dynamic Resource Management for Heterogeneous Many-Cores. In Proceedings of International Conference On Computer Aided Design 2018, November 2018.
[2] Tobias Schwarzer, Sascha Roloff, Valentina Richthammer, Rami Khaldi, Stefan Wildermann, Michael Glaß, and Jürgen Teich. On the Complexity of Mapping Feasibility in Many-Core Architectures. In Proceedings of Multicore/Many-core Systems-on-Chip (MCSoC-2018), September 2018.
[3] Valentina Richthammer, Tobias Schwarzer, Stefan Wildermann, Jürgen Teich, and Michael Glaß. Architecture Decomposition in System Synthesis of Heterogeneous Many-Core Systems. In 55th ACM/EDAC/IEEE Design Automation Conference (DAC 2018), June 2018.
[4] Tobias Schwarzer, Andreas Weichslgartner, Michael Glaß, Stefan Wildermann, Peter Brand, and Jürgen Teich. Symmetry-eliminating Design Space Exploration for Hybrid Application Mapping on Many-Core Architectures. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 37(2):297–310, February 2018. doi: 10.1109/TCAD.2017.2695894. [ DOI ]
[5] Andreas Weichslgartner, Stefan Wildermann, Michael Glaß, and Jürgen Teich. Invasive Computing for Mapping Parallel Programs to Many-Core Architectures. Springer, January 15, 2018. [ DOI ]
[6] Andreas Weichslgartner, Stefan Wildermann, Deepak Gangadharan, Michael Glaß, and Jürgen Teich. A design-time/run-time application mapping methodology for predictable execution time in mpsocs. ACM Transactions on Embedded Computing Systems (TECS), 2018. To appear.
[7] Alexander Pöppl, Marvin Damschen, Florian Schmaus, Andreas Fried, Manuel Mohr, Matthias Blankertz, Lars Bauer, Jörg Henkel, Wolfgang Schröder-Preikschat, and Michael Bader. Shallow water waves on a deep technology stack: Accelerating a finite volume tsunami model using reconfigurable hardware in invasive computing. In Dora B. Heras, Luc Bougé, Gabriele Mencagli, Emmanuel Jeannot, Rizos Sakellariou, Rosa M. Badia, Jorge G. Barbosa, Laura Ricci, Stephen L. Scott, Stefan Lankes, and Josef Weidendorfer, editors, Euro-Par 2017: Proceedings of the 10th Workshop on UnConventional High Performance Computing (UCHPC 2017), Lecture Notes in Computer Science (LNCS), pages 676–687, Cham, 2018. Springer International Publishing.
[8] Michael Glaß, Jürgen Teich, Martin Lukasiewycz, and Felix Reimann. Hybrid optimization techniques for system-level design space exploration. In Soonhoi Ha and Jürgen Teich, editors, Handbook of Hardware/Software Codesign, volume 1, pages 217–246. Springer, Dordrecht, The Netherlands, September 2017. [ DOI ]
[9] Hananeh Aliee. Reliability Analysis and Optimization of Embedded Systems using Stochastic Logic and Importance Measures. Dissertation, Hardware/Software Co-Design, Department of Computer Science, Friedrich-Alexander-Universität Erlangen-Nürnberg, Germany, March 15, 2017.
[10] Behnaz Pourmohseni, Stefan Wildermann, Michael Glaß, and Jürgen Teich. Predictable run-time mapping reconfiguration for real-time applications on many-core systems. In Proceedings of the International Conference on Real-Time Networks and Systems (RTNS). IEEE, 2017. Outstanding paper award. [ DOI ]
[11] Behnaz Pourmohseni, Michael Glaß, and Jürgen Teich. Automatic Operating Point Distillation for Hybrid Mapping Methodologies. In Proceedings of Design, Automation and Test in Europe Conference Exhibition (DATE), pages 1135–1140. IEEE, 2017. [ DOI ]
[12] Jürgen Teich. Invasive computing – editorial. it – Information Technology, 58(6):263–265, November 24, 2016. [ DOI ]
[13] Alexander Pöppl, Michael Bader, Tobias Schwarzer, and Michael Glaß. SWE-X10: Simulating shallow water waves with lazy activation of patches using ActorX10. In Proceedings of the 2nd International Workshop on Extreme Scale Programming Models and Middleware (ESPM2), pages 32–39. IEEE, November 2016. [ .pdf ]
[14] Stefan Wildermann, Michael Bader, Lars Bauer, Marvin Damschen, Dirk Gabriel, Michael Gerndt, Michael Glaß, Jörg Henkel, Johny Paul, Alexander Pöppl, Sascha Roloff, Tobias Schwarzer, Gregor Snelting, Walter Stechele, Jürgen Teich, Andreas Weichslgartner, and Andreas Zwinkau. Invasive computing for timing-predictable stream processing on MPSoCs. it – Information Technology, 58(6):267–280, September 30, 2016. [ DOI ]
[15] Jürgen Teich, Michael Glaß, Sascha Roloff, Wolfgang Schröder-Preikschat, Gregor Snelting, Andreas Weichslgartner, and Stefan Wildermann. Language and compilation of parallel programs for *-predictable MPSoC execution using invasive computing. In Proceedings of the 10th IEEE International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC), pages 313–320, Lyon, France, September 2016. [ DOI ]
[16] Sascha Roloff, Alexander Pöppl, Tobias Schwarzer, Stefan Wildermann, Michael Bader, Michael Glaß, Frank Hannig, and Jürgen Teich. ActorX10: An actor library for X10. In Proceedings of the 6th ACM SIGPLAN X10 Workshop (X10), pages 24–29. ACM, June 14, 2016. [ DOI ]
[17] Andreas Weichslgartner, Stefan Wildermann, Johannes Götzfried, Felix Freiling, Michael Glaß, and Jürgen Teich. Design-time/run-time mapping of security-critical applications in heterogeneous mpsocs. In Proceedings of the 19th International Workshop on Software and Compilers for Embedded Systems (SCOPES), pages 153–162. ACM, May 23, 2016. [ DOI ]
[18] Hans Michael Gerndt, Michael Glaß, Sri Parameswaran, and Barry L. Rountree. Dark Silicon: From Embedded to HPC Systems (Dagstuhl Seminar 16052). Dagstuhl Reports, 6(1):224–244, 2016. [ DOI ]
[19] Alexander Pöppl and Alexander Herz. A cache-aware performance prediction framework for gpgpu computations. In Sascha Hunold, Jesper Larsson Träff, and Francesco Versaci, editors, Euro-Par 2015: Parallel Processing Workshops, Lecture Notes in Computer Science, Berlin, Heidelberg, 2016. Springer-Verlag.
[20] Alexander Pöppl and Michael Bader. Swe-x10: An actor-based and locally coordinated solver for the shallow water equations. In Proceedings of the 6th ACM SIGPLAN Workshop on X10, X10 2016, pages 30–31. ACM, 2016. [ DOI ]
Keywords: APGAS, Actor Model, Parallel Algorithms, Shallow Water Equations
[21] Michael Bader. Performance optimzation vs. power - experiences with petascale earthquake simulations on supermuc. Talk at the Workshop on Power-Bounded HPC Performance Optimisation at Schloss Dagstuhl, August 2015.
[22] Alexander Breuer, Alexander Heinecke, Leonhard Rannabauer, and Michael Bader. High-order ader-dg minimizes energy- and time-to-solution of seissol. In High Performance Computing, 30th International Conference, ISC High Performance 2015, Frankfurt, Germany, July 12-16, 2015, Proceedings, volume 9137 of Lecture Notes in Computer Science, pages 340–357. Springer, July 2015.
[23] Michael Bader. Vectorization of riemann solvers for the shallow water equations. Minisymposium Flooding the Cores - Computing Flooding Events with Many-Core and Accelerator Technologies at SIAM Conference on Computational Science and Engineering, March 2015.
[24] Sebastian Graf, Felix Reimann, Michael Glaß, and Jürgen Teich. Towards scalable symbolic routing for multi-objective networked embedded system design and optimization. In Proceedings of the International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS 2014), pages 2:1–2:10, October 12, 2014. [ DOI ]
[25] Andreas Weichslgartner, Deepak Gangadharan, Stefan Wildermann, Michael Glaß, and Jürgen Teich. Daarm: Design-time application analysis and run-time mapping for predictable execution in many-core systems. In Proceedings of the International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS 2014), pages 1–10, October 2014. [ DOI ]
[26] Tobias Weinzierl, Michael Bader, Kristof Unterweger, and Roland Wittmann. Block fusion on dynamically adaptive spacetree grids for shallow water waves. Parallel Processing Letters, 24(3):1441006, September 2014.
[27] Michael Glaß, Michael Bader, Jürgen Teich, and Stefan Wildermann. Assisting run-time optimization of many-core systems by design-time characterization. HiPEAC Spring Computing Systems Week, Barcelona, Invited Talk, May 13, 2014.
[28] Stefan Wildermann, Michael Glaß, and Jürgen Teich. Multi-objective distributed run-time resource management for many-cores. In Proceedings of Design, Automation and Test in Europe (DATE), pages 1–6, March 2014. [ DOI ]
[29] Michael Bader. On the performance of adaptive mesh-based simulations on modern hpc architectures. Invited presentation at SIAM Conference on Parallel Processing in Scientific Computing - SIAM PP 2014, February 2014. [ .pdf ]
[30] Michael Bader. Exploiting locality properties of space-filling curves in scientific computing. Colloquium talk at TU Eindhoven, November 2013.