Projects


C3: Compilation and Code Generation for Invasive Programs

Principal Investigators:

Prof. G. Snelting, Prof. J. Teich

Scientific Researchers:

M. Braun, S. Buchwald, Jorge A. Echavarria, Dr. F. Hannig, M. Mohr, É. R. Sousa, A. Tanase

Abstract

Project C3 considers compilation and code generation as well as program transformation and optimisation techniques for non-regular (procedural) code as well as for task-level and regularly-structured (i.e. loop -level) code.

In the first funding phase, a compiler for the concrete language defined in Project A1 has been developed. The compiler is based on an existing X10 compiler, but has been extended with new transformation phases to support tightly coupled processor arrays (TCPAs) as well as SPARC processors and i-Cores through libFIRM. For TCPAs, the loop compiler LoopInvader was developed that detects and extracts nested loop programs from a given X10 input program and transforms these loop nests into single assignment code. A breakthrough in symbolic compilation techniques was achieved here by being able to determine an optimal mapping of loop iterations to processors as well as their latency-optimal scheduling in dependence of an unknown number of available processors (claim size). For RISC cores, a new transformation phase builds the intermediate representation FIRM and then generates code using a newly developed SPARC back end. Additionally, an invasive X10 run-time library has been created to efficiently map X10 language constructs to operating system interfaces.

In the second funding phase, a major focus of research will be the quest for (higher) predictability of non-functional aspects of invasive parallel program execution, i.e. performance, fault tolerance, and security. While predictable performance in terms of latency and throughput shall be proven for loop nests compiled to TCPA targets, we will also investigate compiler support to improve the predictability of invasive software and hardware in general. The major research topics in the second phase include (a) compilation methods exploiting different fault tolerance schemes for loops, (b) multi-level symbolic loop tiling transformations, (c) performance predictability proofs for TCPAs, (d) approaches for automatic program invasification, (e) optimisations tailored to invasive architectures, and (f) information flow control for invasive applications.

Synopsis

This project investigates compilation techniques for invasive computing architectures. Its central role is the development of a compiler framework for code generation as well as program transformations and optimisations for a wide range of heterogeneous invasive architectures, including RISC cores, TCPAs (tightly coupled processor arrays), and i-Core reconfigurable processors.

In Phase II, a major focus of research of this central project on invasive program compilation will be driven by the quest for (higher) predictability of non-functional aspects of invasive parallel program execution, i.e. performance, fault tolerance, and security. While predictable performance in terms of latency and throughput shall be proven for loop nests compiled to TCPA targets, we will also investigate compiler support to improve the predictability of invasive system software and hardware. Moreover, the trusted handling of sensitive application data using information flow control will be a focus for RISC targets. Due to a growing vulnerability of complex MPSoC designs to failures, compilation methods exploiting fault tolerance schemes such as dual (DMR) and triple modular (TMR) redundancy for loop programs shall also be exploited systematically for TCPA targets. This poses also challenges to support such schemes on demand by dynamic i-Core reconfiguration and for compilation with a fault tolerance option.

Additionally, our research efforts are guided by the quest for productivity and automation. Automatic program invasification shall reduce programmer burden by automatically identifying and transforming program parts that benefit from resource invasion. Moreover, by devising compiler optimisations tailored to invasive architectures, we will facilitate the development of efficient invasive programs. Concerning TCPA targets, balance loop iterations to an unknown number symbolic multi-level tiling schemes must balance memory and communication requirements. As such, symbolic multi-level tiling schemes shall be investigated and automated as compiler transformations. Finally, as a requirement of massively-parallel loop processing on TCPAs, compilation aids for the control of data transport to and from an array tile need to be investigated.

Overview of compiler framework

Overview of the compiler framework for invasive computing.

Overview of the compiler framework for invasive computing. The front end is based on the X10 compiler, which is open source. The back end for SPARC targets is generated using the FIRM compiler infrastructure. The back end for invasive parallel loop codes to run on tightly coupled processor arrays (TCPAs) is provided by the tool LoopInvader.


Approach

X10 front end

The existing X10 compiler has two back ends: A Java back end and a C++ back end. We added another back end that targets our compiler's intermediate representation FIRM without involving a high-level intermediate language. This is the first X10 back end that does not rely on a full-blown post compiler for code generation. Consequently, we need to handle some high-level constructs, such as generic classes and methods, in the compiler itself instead of leaving the handling to a post compiler. Additionally, we created the library lib00, which eases the FIRM construction for object-oriented language features as, for example, dynamic dispatch. FIRM is based on static single assignment (SSA) form. For the construction of SSA form, the most efficient algorithms require a non-SSA intermediate representation that features a control flow graph (CFG). In contrast, we wanted to create FIRM directly from X10's abstract syntax tree (AST). Thus, we developed a novel SSA construction algorithm that directly constructs an intermediate representation in SSA form. We have shown that this optimisation leads to minimal SSA form for programs with reducible control flow. Since our algorithm constructs pruned SSA form by design, we end up with minimal and pruned SSA form.

LoopInvader front end

Loop programs play an important role in parallel computing in general and invasive computing in particular. In order to exploit invasive computing concepts at the level of loop programs and massively-parallel architectures such as TCPAs as investigated in Project B2, a compiler tool called LoopInvader (see the above compiler overview picture) has been developed. Its front end extracts potential invasive loop candidates from a given X10 program. Here, an interface was defined based on the X10 abstract syntax tree (AST) to extract X10 loop candidates and convert these automatically into a single assignment form including the static dependence analysis of array references. This step is necessary to reveal the entire data parallelism hidden in a loop nest for subsequent mapping to massively-parallel processor array targets, such as TCPAs. The challenge posed by the principles of invasive computing is that the size of the available processor array region is not known at compile time. The major theoretic results and achievements of a required theory and transformations for symbolic loop parallelisation are reported later.

SPARC back end

The general purpose cores found in our investigated invasive computing platform have a SPARC instruction set, optionally with i-Core extensions. There will be variants with and without floating-point support. To this end, a new back end has been developed using the existing infrastructure in libFIRM. This involved creating code selection strategies, and handling the SPARC calling convention, which employs register windows. Register allocation and scheduling is performed with the generic libFIRM infrastructure. We further developed peephole optimisations and special code generation phases to fill delay slots and respect the stack alignment requirements of the application binary interface. To handle software floating-point, a new pass has been created that replaces arithmetic operations with calls to an emulating library. We are in the process of extending our register allocator to support aliased floating-point registers as found in the SPARC architecture. The back end has matured to a point where efficient code is generated. Furthermore, the back end now passes a full run of the SPEC CPU2000 benchmark suite with the large input data sets.

The collaboration with Project B1 allowed us to explore extended instruction sets. Our previous work shows that register allocation for programs in SSA form leads to an optimal assignment of registers. Translating out of SSA form, however, requires parallel copy constructs, which are traditionally implemented by sequences of register-register copy and exchange instructions. Minimising the number of needed parallel copies is an NP-hard problem. This means that in practice, the number of remaining parallel copies depends on the register allocation quality. Hence, fast register allocators, e.g. allocators suitable for just-in-time compilation scenarios, produce a large number of parallel copy operations. To reduce the cost of implementing parallel copies, we developed an instruction set extension comprising additional instructions to permute the register contents within a single cycle. The extended instruction set has been defined in its final form in collaboration with Project B1. We have developed an efficient compilation approach that is able to generate good code for all practically relevant parallel copy constructs using the additional instructions. Multiple register allocators that were already implemented have been adapted to support the generation of permutation instructions. Furthermore, we have modified a SPARC emulator to support the additional instructions in order to get accurate measurements on the number of saved instructions. Together with the FPGA-based hardware implementation provided by Project B1, we were able to conduct benchmarks using real programs.

Code generation for invasive constructs

Besides supporting a range of architectures including SPARC, TCPA, and the i-Core, a compiler targeting invasive computing platforms has to use the iRTSS operating system and leverage the iNoC for DMA transfers. We tackled these requirements by writing a custom back end for the existing X10 compiler front end and called this extended X10 front end iX10. The programs analysed by this front end are transformed into the FIRM intermediate representation to facilitate generating SPARC code and supporting the i-Core extensions. This transformation also converts parallel programming constructs like async, at, invade, and infect into the lower level APIs provided by the iRTSS. Since the X10 run-time library partly implements functionality that is already provided by the iRTSS, we adapted the X10 run-time library to use the iRTSS API instead. The compiler correctly transforms X10 programs with all modern language features, such as closures, generic code, parallelisation, and synchronisation through async, at, and finish. We also support serialisation and deserialisation, which is needed for inter-tile communication via at.

Compiler validation and demonstrator activities

All three newly developed components described in the previous sections, i.e. (1) the X10 front end, (2) the SPARC back end, and (3) the mapping of X10 run-time calls to OctoPOS (developed in Project C1) interfaces, were extensively tested both in isolation (where possible) and in conjunction with each other. For testing the compiler parts in isolation, we reused existing infrastructure and took care to only change one parameter at a time. To ensure the valid transformation of X10 to FIRM in our X10 front end, we implemented the X10 runtime using POSIX interfaces. This makes it possible to transform the X10 AST into FIRM, use libFIRM's existing and well-tested IA32 back end to generate code and run the generated executables on standard hardware. As input programs, we assembled a test suite currently consisting of 150 X10 programs. Additionally, we used the official X10 compiler test suite provided by IBM that contains over 1000 test cases. Furthermore, we also tested the complete compiler chain involving all components. To this end, we compiled all X10 tests contained in our test suite to SPARC executables using a SPARC version of the OctoPOS variant of our X10 runtime as well as the SPARC version of OctoPOS. All executables were successfully run on the CHIPit demonstrator platform as well as on a Xilinx XUPV5 FPGA board. In both cases, we used the invasive hardware design that was current at that point. Additionally, a few selected test cases from other projects, including a parallel matrix multiplication provided by Project D3, were successfully executed on the CHIPit demonstrator platform. We have also adapted the invasive computing paradigm to other tiled architectures, namely Tilera's TILEPro64 and Intel's Single-Chip Cloud Computer which have 64 and 48 processors, respectively. For porting our resource-aware programming concepts, proper software libraries have been designed that provide support for the basic invasive language constructs. We studied the overheads of invasion on these architectures and analysed the trade-offs of centralised versus distributed resource management approaches. Moreover, computationally intensive loop kernels stemming from Project D1, i.e. optical flow and Harris corner detection algorithms were successfully mapped onto TCPAs in collaboration with Project D1.

Symbolic loop parallelisation

We investigated fundamental compiler transformations for the parallel execution of invasive loop programs on processor arrays such as TCPAs. A simplified drawing of a TCPA, developed in Project B2, with 24 processor elements (PEs) is sketched in the center of the figure below. Here, the highlighted rectangular areas denote three applications running simultaneously on the array.

In this realm, we proposed and formalised for the first time symbolic tiling as an automatic program transformation for symbolic parallelisation of nested loop programs with uniform data dependencies. This symbolic loop parallelisation step is essential for invasive programming on MPSoCs, because the claimed region of processors, whose shape and size determines the forms of tiles during parallelisation, is not known until run time. Indeed, tiling is needed as an important compiler transformation that partitions the iteration space of a given loop nest into typically congruent and parallelepiped or orthotope shapes. In that case, the size and shape of each tile may be described by a so-called tiling matrix P. Often, the iterations of each tile are uniquely assigned to one processor each. Accordingly, the tile size is typically chosen statically to reflect the number of available processors in the architecture. Then, all intra-tile iterations need to be executed by one processor sequentially (or in a pipelined manner with overlapped execution of consecutive iterations (software pipelining)). For illustration, consider the nested loop program on the left hand side presented in the illustration below. As can be seen, tiling increases the depth of the loop nest of our example from a double-nested loop to a quad-nested loop.

Loop parallelisation for tightly coupled processor arrays (TCPAs)

Overview of the compiler framework for invasive computing.

In (a), a 2D invasive TCPA is shown where three concurrently running applications (shown by different
colours) have invaded the set of processing elements. In (b), a statically tiled code is shown that matches
a 4 × 2 processor array. In (c), symbolically tiled code is shown. This code may be subsequently mapped
onto an arbitrarily-sized array configuration later by symbolic scheduling of the computations of the parameterised iteration spaces.

The figure above shows on the right hand side the resulting code of a loop whose iteration space has been statically tiled into 4 × 2 tiles corresponding to a mapping onto a 4 × 2 processor array region within the TCPA (in this example represented by the orange region). After tiling, the outer loops iterate over the iterations contained in a tile and the inner loops iterate over the origins of tiles. However, when the number of available processors in the array is not known at compile time, a new approach needs to be considered. Here, we have studied the idea of symbolic loop tiling: We consider a tiling transformation to be specified by a parametric tiling matrix P = diag(pi). If for example, at run time, a processor array of size n × m has been successfully invaded, a tiling of size p1 = ⌈N/n⌉, p2 = ⌈M/m⌉ would be needed to map the loop program onto the considered processor array. An example of a symbolically tiled loop nest and the corresponding symbolic tiling matrix P is shown in the above picture on the right hand side.

Whereas tiling is used mainly for the assignment of iterations to processors, a breakthrough was achieved in this project to also be able to symbolically schedule nested loop programs in a latency-optimal way while satisfying data dependencies and resource constraints. A loop schedule assigns each loop iteration a start time of execution. A loop schedule is called symbolic if it may contain expressions involving iteration vectors and tile size parameters. Finally, a symbolic schedule vector is called feasible if it satisfies all data dependencies of the tiled algorithm. In the case of symbolic tiling, unfortunately, it is not possible to solve the problem of finding both a latency optimising inter-tile and intra-tile schedule vector in closed form because of products of tile parameters in corresponding scheduling constraints. Similar products of parameters and variables appear in the objective function. However, in this project, for the first time we proved that is possible to solve such a problem by using a four-step analytical methodology. At run time, only a few latency expressions must be evaluated, and the corresponding minimal latency schedule then uniquely determines which of the statically determined processor program configurations need to be loaded and executed on the invaded array.

Our symbolic loop parallelisation theory has been presented at ASAP 2013 (entitled: Symbolic Parallelization of Loop Programs for Massively Parallel Processor Arrays) and has received the best paper award among more than 150 submissions.

Code generation for TCPAs

In order to map loop programs onto TCPAs, appropriate assembly code generation techniques for the processors based on a given tiling and scheduling of iterations are necessary. Here, in close collaboration with Project B2, we tackled the problem of efficient yet compact code generation for massively-parallel processor arrays that are typically dominated by an abundance of computation resources at the cost of lacking memory resources. This is completely opposite to standard multiprocessors where more than 70% of the MPSoC chip area may be dominated by dispatch units, register files, and one or two levels of instruction and data caches. Therefore, due to the constraint of tiny instruction memories available in each of the hundreds of processors in a TCPA, we proposed novel code generation techniques, which are problem-size independent in (a) the loop bounds as well as in (b) the size of the available processor array. Based on a given tiling and schedule of instructions for the whole loop program, so-called processor classes are distinguished first. Each processor belonging to the same class will obtain the same binary program to execute. So, the number of different programs to generate may be reduced for many applications to a constant number. Subsequently, instead of generating flat code instruction by instruction for each processor, looping of repetitive code sequences is exploited and represented in a special data structure called program block control graph. Finally, for preserving a given schedule of instructions, zero-loop overhead techniques are proposed such that the repetitive execution of each unique program block does not cause any extra cycles.

A comprehensive summary of the major achievements of the first funding phase can be found by accessing Project C3 first phase website.

Publications

[1] Oliver Reiche, M. Akif Özkan, Richard Membarth, Jürgen Teich, and Frank Hannig. Generating FPGA-based image processing accelerators with Hipacc. In Proceedings of the International Conference On Computer Aided Design (ICCAD). IEEE, November 2017. Invited Paper.
[2] Oliver Reiche, Christof Kobylko, Frank Hannig, and Jürgen Teich. Auto-vectorization for image processing DSLs. In Proceedings of the ACM SIGPLAN/SIGBED Conference on Languages, Compilers and Tools for Embedded systems (LCTES). ACM, June 2017. [ DOI ]
[3] Alexandru Tanase, Michael Witterauf, Jürgen Teich, and Frank Hannig. Symbolic multi-level loop mapping of loop programs for massively parallel processor arrays. ACM Transactions on Embedded Computing Systems (TECS), May 15, 2017. Accepted.
[4] Manuel Mohr and Carsten Tradowsky. Pegasus: Efficient data transfers for PGAS languages on non-cache-coherent many-cores. In Design, Automation and Test in Europe Conference Exhibition (DATE), pages 1781–1786, March 30, 2017.
[5] Soonhoi Ha and Jürgen Teich, editors. The Handbook of Hardware/Software Codesign. Springer, 2017.
[6] Alexander Pöppl, Marvin Damschen, Florian Schmaus, Andreas Fried, Manuel Mohr, Matthias Blankertz, Lars Bauer, Jörg Henkel, Wolfgang Schröder-Preikschat, and Michael Bader. Shallow water waves on a deep technology stack: Accelerating a finite volume tsunami model using reconfigurable hardware in invasive computing. In Euro-Par 2017: Proceedings of the 10th Workshop on UnConventional High Performance Computing (UCHPC 2017), Lecture Notes in Computer Science (LNCS). Springer, 2017.
[7] Jürgen Teich. Invasive computing – editorial. it – Information Technology, 58(6):263–265, November 24, 2016. [ DOI ]
[8] Vivek Singh Bhadouria, Alexandru Tanase, Moritz Schmid, Frank Hannig, Jürgen Teich, and Dibyendu Ghoshal. A novel image impulse noise removal algorithm optimized for hardware accelerators. Journal of Signal Processing Systems, 89(2):225–242, November 1, 2016. [ DOI ]
[9] Vahid Lari, Andreas Weichslgartner, Alex Tanase, Michael Witterauf, Faramarz Khosravi, Jürgen Teich, Jürgen Becker, Jan Heißwolf, and Stephanie Friederich. Providing fault tolerance through invasive computing. it – Information Technology, 58(6):309–328, October 19, 2016. [ DOI ]
[10] Stefan Wildermann, Michael Bader, Lars Bauer, Marvin Damschen, Dirk Gabriel, Michael Gerndt, Michael Glaß, Jörg Henkel, Johny Paul, Alexander Pöppl, Sascha Roloff, Tobias Schwarzer, Gregor Snelting, Walter Stechele, Jürgen Teich, Andreas Weichslgartner, and Andreas Zwinkau. Invasive computing for timing-predictable stream processing on MPSoCs. it – Information Technology, 58(6):267–280, September 30, 2016. [ DOI ]
[11] Jürgen Teich, Michael Glaß, Sascha Roloff, Wolfgang Schröder-Preikschat, Gregor Snelting, Andreas Weichslgartner, and Stefan Wildermann. Language and compilation of parallel programs for *-predictable MPSoC execution using invasive computing. In Proceedings of the 10th IEEE International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC), pages 313–320, Lyon, France, September 2016. [ DOI ]
[12] Jürgen Teich. Predictability, fault tolerance, and security on demand using invasive computing. Invited Talk, University of Lübeck, Germany, July 29, 2016.
[13] Michael Witterauf, Alexandru Tanase, Frank Hannig, and Jürgen Teich. Modulo scheduling of symbolically tiled loops for tightly coupled processor arrays. In Proceedings of the 27th IEEE International Conference on Application-specific Systems, Architectures and Processors (ASAP), pages 58–66. IEEE, July 2016. [ DOI ]
[14] Jürgen Teich. Invasive Computing - The DFG Transregional Research Center 89. DTC 2016, The Munich Workshop on Design Technology Coupling, Munich, Germany, June 30, 2016.
[15] Jürgen Teich. Predictable MPSoC stream processing using invasive computing. Seminar Talk, Electrical and Computer Engineering, The University of Texas at Austin, USA, June 6, 2016.
[16] Jürgen Teich. Adaptive restriction and isolation for predictable MPSoC stream procesing. Invited Talk, DATE 2016 Friday Workshop on Resource Awareness and Application Autotuning in Adaptive and Heterogeneous Computing, Dresden, Germany, March 18, 2016.
[17] Alexandru Tanase, Michael Witterauf, Éricles R. Sousa, Vahid Lari, Frank Hannig, and Jürgen Teich. LoopInvader: A compiler for tightly coupled processor arrays. Tool Presentation at the University Booth at Design, Automation and Test in Europe (DATE), Dresden, Germany, March 2016. [ .pdf ]
[18] Jürgen Teich. Symbolic loop parallelization for adaptive multi-core systems - recent advances and benefits. Keynote, IMPACT 2016, the 6th International Workshop on Polyhedral Compilation Techniques, 19 January, 2016, Prague, Czech Republic, January 19, 2016.
[19] Jürgen Teich. The role of restriction and isolation for increasing the predictability of MPSoC stream processing. Keynote, 8th Workshop on Rapid Simulation and Performance Evaluation: Methods and Tools (RAPIDO 2016), Prague, Czech Republic, January 18, 2016.
[20] Sebastian Buchwald, Denis Lohner, and Sebastian Ullrich. Verified construction of static single assignment form. In Manuel Hermenegildo, editor, 25th International Conference on Compiler Construction, CC 2016, pages 67–76. ACM, 2016. [ DOI ]
[21] Oliver Reiche, Konrad Häublein, Marc Reichenbach, Moritz Schmid, Frank Hannig, Jürgen Teich, and Dietmar Fey. Synthesis and optimization of image processing accelerators using domain knowledge. Journal of Systems Architecture (JSA), December 2015. [ DOI ]
[22] Vahid Lari, Jürgen Teich, Alexandru Tanase, Michael Witterauf, Faramarz Khosravi, and Brett H. Meyer. Techniques for on-demand structural redundancy for massively parallel processor arrays. Journal of Systems Architecture (JSA), 61(10):615–627, November 2015. [ DOI ]
[23] Alexandru Tanase, Michael Witterauf, Jürgen Teich, and Frank Hannig. Symbolic loop parallelization for balancing I/O and memory accesses on processor arrays. In Proceedings of the 13th ACM-IEEE International Conference on Formal Methods and Models for System Design (MEMOCODE), pages 188–197. IEEE, September 2015. [ DOI ]
[24] Sebastian Buchwald, Manuel Mohr, and Ignaz Rutter. Optimal shuffle code with permutation instructions. In Frank Dehne, Jörg-Rüdiger Sack, and Ulrike Stege, editors, Algorithms and Data Structures, volume 9214 of Lecture Notes in Computer Science, pages 528–541. Springer International Publishing, August 2015. [ DOI ]
[25] Moritz Schmid. Rapid Prototyping for Hardware Accelerators in the Medical Imaging Domain. Dissertation, Hardware/Software Co-Design, Department of Computer Science, Friedrich-Alexander-Universität Erlangen-Nürnberg, Germany, July 24, 2015.
[26] Moritz Schmid, Oliver Reiche, Frank Hannig, and Jürgen Teich. Loop coarsening in C-based high-level synthesis. In Proceedings of the 26th IEEE International Conference on Application-specific Systems, Architectures and Processors (ASAP), pages 166–173. IEEE, July 2015.
[27] Alexandru Tanase, Michael Witterauf, Jürgen Teich, Frank Hannig, and Vahid Lari. On-demand fault-tolerant loop processing on massively parallel processor arrays. In Proceedings of the 26th IEEE International Conference on Application-specific Systems, Architectures and Processors (ASAP), pages 194–201. IEEE, July 2015. [ DOI ]
[28] Michael Witterauf, Alexandru Tanase, Frank Hannig, and Jürgen Teich. Adaptive fault tolerance in tightly coupled processor arrays with invasive computing. In Proceedings of the 11th International Summer School on Advanced ComputerArchitecture and Compilation for High-Performance and Embedded Systems (ACACES), pages 205–208, July 2015.
[29] Manuel Mohr, Sebastian Buchwald, Andreas Zwinkau, Christoph Erhardt, Benjamin Oechslein, Jens Schedel, and Daniel Lohmann. Cutting out the middleman: OS-level support for X10 activities. In Proceedings of the fifth ACM SIGPLAN X10 Workshop, X10 '15, pages 13–18, New York, NY, USA, June 14, 2015. ACM. [ DOI ]
[30] Vahid Lari, Alexandru Tanase, Jürgen Teich, Michael Witterauf, Faramarz Khosravi, Frank Hannig, and Brett H. Meyer. A co-design approach for fault-tolerant loop execution on coarse-grained reconfigurable arrays. In Proceedings of the 2015 NASA/ESA Conference on Adaptive Hardware and Systems (AHS), pages 1–8. IEEE, June 2015. [ DOI ]
[31] Jürgen Teich. Adaptive isolation for predictable mpsoc stream processing. Keynote, SCOPES 2015, 18th International Workshop on Software and Compilers for Embedded Systems, Schloss Rheinfels, St. Goar, Germany, June 2, 2015.
[32] Jürgen Teich. Adaptive isolation for predictable mpsoc stream processing. In Proceedings of the 18th International Workshop on Software and Compilers for Embedded Systems (SCOPES 2015), pages 1–2, June 2015. [ DOI ]
[33] Michael Witterauf, Alexandru Tanase, Jürgen Teich, Vahid Lari, Andreas Zwinkau, and Gregor Snelting. Adaptive fault tolerance through invasive computing. In Proceedings of the 2015 NASA/ESA Conference on Adaptive Hardware and Systems (AHS), pages 1–8. IEEE, June 2015. [ DOI ]
[34] Sebastian Buchwald. Optgen: A generator for local optimizations. In Björn Franke, editor, Proceedings of the International Conference on Compiler Construction (CC), volume 9031 of Lecture Notes in Computer Science, pages 171–189. Springer Berlin Heidelberg, April 2015. [ DOI ]
[35] Jürgen Teich, Srinivas Boppu, Frank Hannig, and Vahid Lari. Compact code generation and throughput optimization for coarse-grained reconfigurable arrays. In Wayne Luk and George A. Constantinides, editors, Transforming Reconfigurable Systems: A Festschrift Celebrating the 60th Birthday of Professor Peter Cheung, chapter 10, pages 167–206. Imperial College Press, London, UK, April 2015. [ DOI ]
[36] Jürgen Teich. Invasive computing. Invited Talk, SE 2015, Software Engineering and Management, Special Session Software Engineering in der DFG, Dresden, Germany, March 19, 2015.
[37] Dennis Giffhorn and Gregor Snelting. A new algorithm for low-deterministic security. International Journal of Information Security, 14(3):263–287, 2015. [ DOI ]
[38] Gregor Snelting. Understanding probabilistic software leaks. Science of Computer Programming, 97, Part 1(0):122–126, January 2015. Special Issue on New Ideas and Emerging Results in Understanding Software. [ DOI ]
[39] Jürgen Teich. Reconfigurable computing for mpsoc. Invited Lecture, Winter School Design and Applications of Multi Processor System on Chip, Tunis, Tunesia, November 26, 2014.
[40] Alexandru Tanase, Michael Witterauf, Jürgen Teich, and Frank Hannig. Symbolic inner loop parallelisation for massively parallel processor arrays. In Proceedings of the 12th ACM-IEEE International Conference on Formal Methods and Models for System Design (MEMOCODE), pages 219–228, October 2014. [ DOI ]
[41] Jürgen Teich. Invasive computing – concepts and benefits. Keynote, DASIP 2014, Conference on Design and Architectures for Signal and Image Processing, Madrid, Spain, October 8, 2014.
[42] Éricles Sousa, Deepak Gangadharan, Frank Hannig, and Jürgen Teich. Runtime reconfigurable bus arbitration for concurrent applications on heterogeneous MPSoC architectures. In Proceedings of the EUROMICRO Digital System Design Conference (DSD), pages 74–81. IEEE, August 2014. [ DOI ]
[43] Jürgen Teich. Foundations and benefits of invasive computing. Seminar, Mc Gill University, Montreal, July 29, 2014.
[44] Jürgen Teich, Alexandru Tanase, and Frank Hannig. Symbolic mapping of loop programs onto processor arrays. Journal of Signal Processing Systems, 77(1–2):31–59, July 11, 2014. [ DOI ]
[45] Srinivas Boppu, Frank Hannig, and Jürgen Teich. Compact code generation for tightly-coupled processor arrays. Journal of Signal Processing Systems, 77(1–2):5–29, May 31, 2014. [ DOI ]
[46] Jürgen Teich. Introduction to invasive computing. Workshop on Resource Awareness and Adaptivity in Multi-Core Computing (Racing 2014), Paderborn, Germany, Tutorial Talk, May 29, 2014.
[47] Jürgen Teich. Foundations and benefits of invasive computing. University of Bologna, Italy, Invited Talk in the Seminar Series Trends in Electronics, May 23, 2014.
[48] Vahid Lari, Alexandru Tanase, Frank Hannig, and Jürgen Teich. Massively parallel processor architectures for resource-aware computing. In Proceedings of the First Workshop on Resource Awareness and Adaptivity in Multi-Core Computing (Racing 2014), pages 1–7, May 2014. [ arXiv ]
[49] Frank Hannig and Jürgen Teich, editors. Proceedings of the First Workshop on Resource Awareness and Adaptivity in Multi-Core Computing (Racing 2014). May 2014. [ arXiv ]
[50] Deepak Gangadharan, Alexandru Tanase, Frank Hannig, and Jürgen Teich. Timing analysis of a heterogeneous architecture with massively parallel processor arrays. In DATE Friday Workshop on Performance, Power and Predictability of Many-Core Embedded Systems (3PMCES). ECSI, March 28, 2014. [ http ]
[51] Éricles Sousa, Vahid Lari, Johny Paul, Frank Hannig, Jürgen Teich, and Walter Stechele. Resource-aware computer vision application on heterogeneous multi-tile architecture. Hardware and Software Demo at the University Booth at Design, Automation and Test in Europe (DATE), Dresden, Germany, March 2014.
[52] Frank Hannig, Vahid Lari, Srinivas Boppu, Alexandru Tanase, and Oliver Reiche. Invasive tightly-coupled processor arrays: A domain-specific architecture/compiler co-design approach. ACM Transactions on Embedded Computing Systems (TECS), 13(4s):133:1–133:29, 2014. [ DOI ]
[53] Gregor Snelting, Dennis Giffhorn, Jürgen Graf, Christian Hammer, Martin Hecker, Martin Mohr, and Daniel Wasserrab. Checking probabilistic noninterference using joana. IT - Information Technology, 2014. invited article, currently under review. [ .pdf ]
[54] Jürgen Teich. Invasive computing – the quest for many-core efficiency and predictability. Keynote Talk, Sixth Swedish Workshop on Multicore Computing, Halmstad, Sweden, November 25, 2013.
[55] Jürgen Teich. Invasive computing - the quest for many-core efficiency and predictability. Invited Talk, 5th tubs.CITY Symposium, Managing change and autonomy or critical applications, Braunschweig, Germany, October 30, 2013.
[56] Manuel Mohr, Artjom Grudnitsky, Tobias Modschiedler, Lars Bauer, Sebastian Hack, and Jörg Henkel. Hardware acceleration for programs in SSA form. In International Conference on Compilers, Architecture and Synthesis for Embedded Systems (CASES), Montreal, Canada, October 2013. [ DOI ]
[57] Éricles Sousa, Alexandru Tanase, Frank Hannig, and Jürgen Teich. Accuracy and performance analysis of harris corner computation on tightly-coupled processor arrays. In Proceedings of the Conference on Design and Architectures for Signal and Image Processing (DASIP), pages 88–95. IEEE, October 2013.
[58] Éricles Sousa, Alexandru Tanase, Frank Hannig, and Jürgen Teich. A prototype of an adaptive computer vision algorithm on an MPSoC architecture. In Proceedings of the Conference on Design and Architectures for Signal and Image Processing (DASIP), pages 361–362. IEEE, October 2013.
[59] Jürgen Teich. The invasive computing paradigm as a solution for highly adaptive and efficient multi-core systems. Talk, Special Session on Run-Time Adaption for Highly-Compley Multi-Core Systems, CODES+ISSS 2013, Montral, Canada, September 30, 2013.
[60] Alexandru Tanase, Vahid Lari, Frank Hannig, and Jürgen Teich. Exploitation of quality/throughput tradeoffs in image processing through invasive computing. In Proceedings of the International Conference on Parallel Computing (ParCo), pages 53–62, September 2013. [ DOI ]
[61] Srinivas Boppu, Frank Hannig, and Jürgen Teich. Loop program mapping and compact code generation for programmable hardware accelerators. In Proceedings of the 24th IEEE International Conference on Application-specific Systems, Architectures and Processors (ASAP), pages 10–17. IEEE, June 2013. [ DOI ]
[62] Jürgen Teich, Alexandru Tanase, and Frank Hannig. Symbolic parallelization of loop programs for massively parallel processor arrays. In Proceedings of the 24th IEEE International Conference on Application-specific Systems, Architectures and Processors (ASAP), pages 1–9. IEEE, June 2013. Best Paper Award. [ DOI ]
[63] Éricles Sousa, Alexandru Tanase, Vahid Lari, Frank Hannig, Jürgen Teich, Johny Paul, Walter Stechele, Manfred Kröhnert, and Tamim Asfour. Acceleration of optical flow computations on tightly-coupled processor arrays. In Proceedings of the 25th Workshop on Parallel Systems and Algorithms (PARS), volume 30 of Mitteilungen – Gesellschaft für Informatik e. V., Parallel-Algorithmen und Rechnerstrukturen, pages 80–89. Gesellschaft für Informatik e.V., April 2013.
[64] Frank Hannig. Resource-aware computing on domain-specific accelerators. In Proceedings of the 10st Workshop on Optimizations for DSP and Embedded Systems (ODES), page 35. ACM, February 24, 2013. Keynote. [ DOI ]
[65] Jürgen Teich. Safe(r) loop computations on multi-cores. Invited Talk, 2nd Workshop on Design Tools and Architectures for Multi-Core Embedded Computing Platforms (DITAM 2013), Berlin, Germany, January 22, 2013.
[66] Matthias Braun, Sebastian Buchwald, Sebastian Hack, Roland Leißa, Christoph Mallon, and Andreas Zwinkau. Simple and efficient construction of static single assignment form. In Ranjit Jhala and Koen Bosschere, editors, Compiler Construction, volume 7791 of LNCS, pages 102–122. Springer, 2013. [ DOI ]
[67] Hans-Joachim Bungartz, Christoph Riesinger, Martin Schreiber, Gregor Snelting, and Andreas Zwinkau. Invasive computing in HPC with X10. In X10 Workshop (X10'13), X10 '13, pages 12–19, New York, NY, USA, 2013. ACM. [ DOI ]
[68] Frank Hannig. Invasive tightly-coupled processor arrays. Talk, 1st International Workshop on Domain-Specific Multicore Computing (DSMC) at International Conference on Computer-Aided Design (ICCAD), San Jose, CA, USA, November 8, 2012.
[69] Frank Hannig. Why do we see more and more domain-specific accelerators in multi-processor systems? Guest Lecture at University of California, Riverside in CS 287 Colloquium in Computer Science, Riverside, CA, USA, November 9, 2012.
[70] Jürgen Teich, Andreas Weichslgartner, Benjamin Oechslein, and Wolfgang Schröder-Preikschat. Invasive computing – concepts and overheads. In Proceedings of the Forum on Specification and Design Languages (FDL), pages 193–200, September 2012.
[71] Alexandru Tanase, Frank Hannig, and Jürgen Teich. Symbolic loop parallelization of static control programs. In Proceedings of the 8th International Summer School on Advanced Computer Architecture and Compilation for High-Performance and Embedded Systems (ACACES), pages 33–36, July 2012.
[72] Alexandru Tanase, Frank Hannig, and Jürgen Teich. Towards symbolic loop parallelization for tightly-coupled processor arrays. Work-In-Progress Presentation at the 49th Design Automation Conference (DAC), San Francisco, USA, June 2012.
[73] Richard Membarth, Frank Hannig, Jürgen Teich, Mario Körner, and Wieland Eckert. Generating Device-specific GPU Code for Local Operators in Medical Imaging. In Proceedings of the 26th IEEE International Parallel & Distributed Processing Symposium (IPDPS), pages 569–581, May 2012. [ DOI ]
[74] Jürgen Teich. Hardware/software co-design: The past, present, and predicting the future. Proceedings of the IEEE, 100(Centennial-Issue):1411–1430, May 2012. [ DOI ]
[75] Matthias Braun, Sebastian Buchwald, Manuel Mohr, and Andreas Zwinkau. An X10 compiler for invasive architectures. Technical Report 9, Karlsruhe Institute of Technology, 2012. [ http ]
[76] Srinivas Boppu, Frank Hannig, Jürgen Teich, and Roberto Perez-Andrade. Towards symbolic run-time reconfiguration in tightly-coupled processor arrays. In Proceedings of the International Conference on Reconfigurable Computing and FPGAs (ReConFig), pages 392–397. IEEE, November 2011. [ DOI ]
[77] Jürgen Teich. Programming invasively parallel – an introduction. Pervasive Parallelism Laboratory (PPL) Seminar Talk, Stanford University, CA, USA, July 25, 2011.
[78] Jürgen Teich. Invasive parallel computing – an introduction. Par Lab and AMP Lab Seminar Talk, UC Berkeley, CA, USA, July 22, 2011.
[79] Georgia Kouveli, Frank Hannig, Jan-Hugo Lupp, and Jürgen Teich. Towards resource-aware programming on Intel's single-chip cloud computer processor. In 3rd Many-core Applications Research Community (MARC) Symposium, volume 7598 of KIT Scientific Reports, pages 111–114. KIT Scientific Publishing, July 2011.
[80] Sebastian Buchwald, Andreas Zwinkau, and Thomas Bersch. SSA-based register allocation with PBQP. In Jens Knoop, editor, Proceedings of the International Conference on Compiler Construction (CC), volume 6601 of LNCS, pages 42–61. Springer, 2011. [ DOI ]
[81] Jürgen Teich, Jörg Henkel, Andreas Herkersdorf, Doris Schmitt-Landsiedel, Wolfgang Schröder-Preikschat, and Gregor Snelting. Invasive computing: An overview. In Michael Hübner and Jürgen Becker, editors, Multiprocessor System-on-Chip – Hardware Design and Tool Integration, pages 241–268. Springer, Berlin, Heidelberg, 2011. [ DOI ]
[82] Frank Hannig. Retargetable mapping of loop programs on coarse-grained reconfigurable arrays. Talk, International Conference on Hardware-Software Codesign and System Synthesis (CODES+ISSS), Scottsdale, AZ, USA, October 26, 2010.
[83] Tom Vander Aa, Praveen Raghavan, Scott Mahlke, Bjorn De Sutter, Aviral Shrivastava, and Frank Hannig. Compilation techniques for CGRAs: Exploring all parallelization approaches. In Proceedings of the International Conference on Hardware-Software Codesign and System Synthesis (CODES+ISSS), pages 185–186. ACM, October 2010. [ DOI ]
[84] Jürgen Teich. Invasive computing – basic concepts and foreseen benefits. Artist Network of Excellence on Embedded System Design Summer School Europe 2010, Autrans, France, Invited Tutorial, September 7, 2010.
[85] Amouri Abdulazim, Farhadur Arifin, Frank Hannig, and Jürgen Teich. FPGA implementation of an invasive computing architecture. In Proceedings of the IEEE International Conference on Field Programmable Technology (FPT), pages 135–142. IEEE, December 2009. [ DOI ]
[86] Christian Hammer and Gregor Snelting. Flow-sensitive, context-sensitive, and object-sensitive information flow control based on program dependence graphs. International Journal of Information Security, 8(6):399–422, December 2009. [ DOI ]
[87] Farhadur Arifin, Richard Membarth, Amouri Abdulazim, Frank Hannig, and Jürgen Teich. FSM-controlled architectures for linear invasion. In Proceedings of the 17th IFIP/IEEE International Conference on Very Large Scale Integration (VLSI-SoC), pages 59–64, October 2009. [ DOI ]
[88] Matthias Braun and Sebastian Hack. Register spilling and live-range splitting for SSA-form programs. In Proceedings of the International Conference on Compiler Construction (CC), pages 174–189. Springer, March 2009. [ DOI ]
[89] Jürgen Teich. Invasive algorithms and architectures. it - Information Technology, 50(5):300–310, 2008.
[90] Sebastian Hack, Daniel Grund, and Gerhard Goos. Register allocation for programs in SSA-form. In Andreas Zeller and Alan Mycroft, editors, Proceedings of the International Conference on Compiler Construction (CC), volume 3923 of Lecture Notes In Computer Science (LNCS), pages 247–262. Springer, March 2006. [ DOI ]