D3: Invasion for High-Performance Computing
The research agenda of D3 is threefold: First, we consider numerical core routines widespread in both supercomputing and embedded applications with respect to an invasive enhancement. Second, we investigate advantages of invasive computing on HPC systems by integrating such invasive algorithms into real-life simulation scenarios. Third, concepts are developed to support invasion in standard HPC programming models on state-of-the-art HPC systems.
In the first funding phase, we demonstrated that invasion is a promising paradigm for a wide range of highly relevant numerical algorithms by defining requirements for a set of representative numerical building blocks. Furthermore, invasion allows for more flexible resource usage of HPC architectures by extending the standard programming model for shared-memory systems with invasive run-time mechanisms dynamically redistributing cores iOMP. This leads to benefits for our target scenario based on dynamic adaptive mesh refinement (DAMR) as used in our tsunami simulation. Indeed, it has been shown that executing multiple invasive DAMR simulations in parallel leads to improved utilisation of hardware and, thus, to increased efficiency.
All three directions mentioned above (applications, algorithms, and programming models and resource management) will be extended towards large-scale distributed and heterogeneous systems.
First, concerning large-scale invasive applications, we will start from our successful tsunami case study and explore how invasive computing can provide innovative solutions to issues currently not solvable on large-scale systems in a satisfactory way. This includes efficient dynamic load distribution for optimising the application throughput, optimisations of energy efficiency, urgent computing and time-to-solution considerations, and flexible dynamic job scheduling on HPC systems.
Second, concerning invasive resource management, the programming model on HPC systems is typically a combination of MPI and OpenMP with, possibly, a programming interface for accelerators. One obvious new requirement is the explicit distribution of data across the nodes. For that, we will extend MPI for invasive computing and develop a scalable resource management infrastructure. This infrastructure will be based on the iOMP resource manager that will perform a model-based multi-objective optimisation.
Third, with respect to our algorithmic developments, we will extend our contributions to topics such as claim specification, reconfigurable hardware, TCPA-accelerated computations, and dark silicon. We will do that looking into matrix exponentials, direct solvers, and dynamic tree traversals as core routines relevant for embedded applications, as well as having a closer look at more general multi-level solvers as an algorithm class of utmost importance both for HPC and embedded systems such as MPSoCs (multiprocessor systems-on-a-chip).
The overall research goals of D3 are, first, to contribute to the development of the invasive core language and the invasive x10 framework via selected state-of-the-art numerical algorithms widespread in both supercomputing and embedded applications that show a high potential for benefiting from invasive computing; second, to demonstrate how this paradigm can be transferred to and exploited on HPC systems by integrating invasive algorithms into realistic large-scale simulation scenarios; and, third, to develop concepts to support invasion in standard HPC programming models on state-of-the-art HPC systems. In the first funding phase, as depicted in the previous section, the three main research threads towards these goals were the development of invasive numerical core algorithms for invasive MPSoCs, the development of our framework for tsunami simulations as the demonstrator HPC application for invasion, and the integration of invasion into OpenMP as the standard programming model for shared-memory systems.While these threads will be continued, the focus of the second funding phase will be on the extension of our work from shared-memory systems to heterogeneous-hybrid and large-scale systems, which leads to new demands on invasive computing on the application, algorithmic, and programming model level.
Invasive applications with HPC Starting from the successful tsunami case study, we will explore how invasive computing can provide innovative solutions to HPC issues currently not solvable on a large-scale system in a satisfactory way. This includes efficient dynamic load distribution on large-scale distributed systems for optimising throughput of applications, optimisations of energy efficiency, urgent computing and time-to-solution studies, and, in general, flexible dynamic job scheduling on HPC systems demanding for resource-aware algorithms.
Invasive numerical algorithms on invasive MPSoCs Our experiences with and collections of invasive numerical core routines will be used to extend our contributions to topics such as claim specifications, reconfigurable hardware on instruction set and memory hierarchy level with iCores, TCPA-accelerated computations, dark silicon, and, in particular, the evaluation of invasive computing on the demonstrator hardware. On the one hand, the focus will be on matrix exponentials, direct solvers and dynamic tree traversals as core routines that are most relevant for embedded applications and that allow for innovative invasive implementations. On the other hand, we will concentrate on multi-level solvers extending the prototypical V-cycle scheme from the first funding period as an algorithm class of utmost importance both for large-scale PDE-based simulation scenarios and for embedded systems (image processing, \eg).
Invasive resource management for HPC Applications on large-scale parallel systems are typically parallelised by combining MPI with OpenMP and possibly a programming interface for accelerators. In contrast to mere shared-memory systems, the application data are explicitly distributed across the nodes, and memory in a node is a scarce resource. It is our goal to provide an invasive resource management for such applications. New challenges are to allow for invasive programming in MPI, to support applications in their data redistribution strategies, to increase the scalability of the resource manager with respect to the size of the respective applications, and to build up application models in the resource manager that support complex resource constraints, different classes of applications, detailed performance hints, and heterogeneous execution.
A comprehensive summary of the major achievements of the first funding phase can be found by accessing Project D3 first phase website.
|||Hans Michael Gerndt, Michael Glaß, Sri Parameswaran, and Barry L. Rountree. Dark Silicon: From Embedded to HPC Systems (Dagstuhl Seminar 16052). Dagstuhl Reports, 6(1):224–244, 2016. [ DOI | http ]|
Weifeng Liu, Michael Gerndt, and Bin Gong.
Model-based MPI-IO tuning with Periscope tuning framework.
Concurrency and Computation: Practice and Experience,
[ DOI ]
Keywords: parallel I/O, automatic tuning, MPI-IO, performance model, high-performance computing
|||Martin Schreiber, Christoph Riesinger, Tobias Neckel, Hans-Joachim Bungartz, and Alexander Breuer. Invasive compute balancing for applications with shared and hybrid parallelization. International Journal of Parallel Programming, September 2014. [ DOI | http ]|
|||Carsten Tradowsky, Martin Schreiber, Malte Vesper, Ivan Domladovec, Maximilian Braun, Hans-Joachim Bungartz, and Jürgen Becker. Towards dynamic cache and bandwidth invasion. In Reconfigurable Computing: Architectures, Tools, and Applications, volume 8405 of Lecture Notes in Computer Science, pages 97–107. Springer International Publishing, April 2014. [ DOI ]|
|||Martin Schreiber. Cluster-Based Parallelization of Simulations on Dynamically Adaptive Grids and Dynamic Resource Management. Dissertation, Institut für Informatik, Technische Universität München, January 2014. [ .pdf ]|
|||Martin Schreiber, Christoph Riesinger, Tobias Neckel, and Hans-Joachim Bungartz. Invasive compute balancing for applications with hybrid parallelization. In Proceedings of the International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD). IEEE, October 2013.|
|||Martin Schreiber, Tobias Weinzierl, and Hans-Joachim Bungartz. Sfc-based communication metadata encoding for adaptive mesh refinement. In Michael Bader, editor, Proceedings of the International Conference on Parallel Computing (ParCo), October 2013. In press.|
|||Martin Schreiber, Tobias Weinzierl, and Hans-Joachim Bungartz. Cluster optimization and parallelization of simulations with dynamically adaptive grids. In Euro-Par 2013, August 2013.|
|||Hans-Joachim Bungartz, Christoph Riesinger, Martin Schreiber, Gregor Snelting, and Andreas Zwinkau. Invasive computing in HPC with X10. In X10 Workshop (X10'13), X10 '13, pages 12–19, New York, NY, USA, 2013. ACM. [ DOI ]|
|||Michael Gerndt, Andreas Hollmann, Marcel Meyer, Martin Schreiber, and Josef Weidendorfer. Invasive computing with iOMP. In Proceedings of the Forum on Specification and Design Languages (FDL), pages 225–231, September 2012.|
|||Isaías A. Comprés Ureña, Michael Riepen, Michael Konow, and Michael Gerndt. Invasive MPI on intel's single-chip cloud computer. In Andreas Herkersdorf, Kay Römer, and Uwe Brinkschulte, editors, Proceedings of the 25th International Conference on Architecture of Computing System (ARCS), volume 7179 of Lecture Notes in Computer Science, pages 74–85. Springer, February 2012. [ DOI ]|
|||Michael Bader, Hans-Joachim Bungartz, and Martin Schreiber. Invasive computing on high performance shared memory systems. In Facing the Multicore-Challenge III, volume 7686 of Lecture Notes in Computer Science, pages 1–12, 2012.|
Andreas Hollmann and Michael Gerndt.
Invasive computing: An application assisted resource management
In Victor Pankratius and Michael Philippsen, editors, Multicore
Software Engineering, Performance, and Tools, volume 7303 of Lecture
Notes in Computer Science, pages 82–85. Springer Berlin Heidelberg, 2012.
[ DOI |
Keywords: resource management; resource awareness; numa; parallel programming; OpenMP
|||Martin Schreiber, Hans-Joachim Bungartz, and Michael Bader. Shared memory parallelization of fully-adaptive simulations using a dynamic tree-split and -join approach. In Proceedings of HiPC 2012, pages 1–10. IEEE, 2012.|
|||Michael Bader, Hans-Joachim Bungartz, Michael Gerndt, Andreas Hollmann, and Josef Weidendorfer. Invasive programming as a concept for HPC. In Proceedings of the 10th IASTED International Conference on Parallel and Distributed Computing and Networks 2011 (PDCN), February 2011. [ DOI ]|
|||Hans-Joachim Bungartz, Bernhard Gatzhammer, Michael Lieb, Miriam Mehl, and Tobias Neckel. Towards multi-phase flow simulations in the PDE framework peano. Computational Mechanics, 48(3):365–376, 2011. [ .pdf ]|
|||Jürgen Teich, Jörg Henkel, Andreas Herkersdorf, Doris Schmitt-Landsiedel, Wolfgang Schröder-Preikschat, and Gregor Snelting. Invasive computing: An overview. In Michael Hübner and Jürgen Becker, editors, Multiprocessor System-on-Chip – Hardware Design and Tool Integration, pages 241–268. Springer, Berlin, Heidelberg, 2011. [ DOI ]|
|||Jürgen Teich. Invasive algorithms and architectures. it - Information Technology, 50(5):300–310, 2008.|
|||Isaías A. Comprés Ureña and Michael Gerndt. Improved RCKMPI's SCCMPB channel: Scaling and dynamic processes support. 4th MARC Symposium.|
|||Andreas Hollmann and Michael Gerndt. iOMP language specification 1.0. Internal Report.|
|||Andreas Hollmann and Michael Gerndt. Invasive computing: An application assisted resource management approach. In MSEPT, pages 82–85.|