Projects


B1: Adaptive Application-Specific Invasive Micro-Architectures

Principal Investigators:

Prof. J. Henkel, Prof. J. Becker, Dr. L. Bauer

Scientific Researchers:

M. Damschen, T. Harbaum

Abstract

Project B1 investigates mechanisms that provide run-time adaptivity in the micro-architecture (μArch) and by using a run-time reconfigurable fabric. In the first two funding phases, we advanced the concepts of state-of-the-art reconfigurable processors towards invasion, and we exploited their benefits in the invasive computing project. In particular, in Phase I, we proposed concepts and methods that allow invading the reconfigurable fabric and μArch within the invasive core (i-Core). We investigated run-time adaptivity at the μArch level (e.g. dynamic L1 cache configuration or branch prediction) and provided so-called Special Instructions (SIs; implemented by i-let-specific accelerators) on demand. In Phase II, we increased the performance of the i-Core even further (e.g. by generating SIs automatically on-the-fly), we investigated an intra-tile multicore support (the regular cores in the i-Core tile can now also use the reconfigurable fabric), we developed a dynamic intra-tile cache reallocation, and we also improved the usability of the i-Core (support for offline SI development and performance models for WCET analysis).

i-Core consisting of adaptive microarchitecture and reconfigurable fabric The figure above provides an overview of the i-Core architecture as it was developed in the first two funding phases. The μArch was provided with the new capability to allow the application developer to use i-let-specific adaptations. The adaptive μArch mechanisms include adaptive branch prediction, adaptation of the pipeline length, and a dynamically parameterisable L1 cache. In addition to the SPARC V8 instruction set, the i-Core introduces SIs that are implemented by run-time reconfigurable accelerators. The accelerators are loaded into reconfigurable containers, i.e. designated regions that support partial reconfiguration without disrupting the rest of the system. They are connected to an interconnect infrastructure that establishes communication between the accelerators and to the i-Core μArch, tile-local memory, and data cache.

Synopsis

Building on top of the developed i-Core architecture, the main focus in Phase III is in the realm of the common global and interdisciplinary research questions: Run-time requirement enforcement (e.g. analysing and enforcing WCETs and enforcing security requirements), and robust multi-objective optimisations (e.g. optimising for computational performance, memory performance, and configurable accuracy). For the first problem area, the interference of best-effort applications sharing an i-Core tile must be monitored and properly enforced such that static WCET analysis will always be valid for hard real-time applications. Building upon the results of Project C5, isolation concepts within the i-Core shall be leveraged to accomplish and enforce security requirements such as data integrity and bounded information leakage. For the second problem area, the concept of SIs shall be generalised to leverage approximate accelerators and to support complex control flow. A novel intra-tile memory reorganisation is envisioned in collaboration with Project B5 to provide dedicated access options between network adapter, tile-local memory (TLM), and cores. We will also continue our active collaboration with Project C3 about compiler-based SI generation.

Approach

In Phase III, Project B1 focuses on the common global and interdisciplinary research questions of run-time requirement enforcement (e.g. analysing and enforcing WCETs and enforcing security requirements), and robust multi-objective optimisations (e.g. optimising for computational performance, memory performance, and configurable accuracy). In the following, we explain the particular goals of the particular Working plan.

Timing analysis and enforcement

We plan to start by implementing the concepts of restartable and resumable SIs in our in-house i-Core simulator to learn more about their different performance/area trade-offs. Depending on the outcome, we will decide which alternative (might also be a combination of both) to focus on, and then we will implement it in hardware and integrate it in our stand-alone prototype. The cache analysis for reconfigurable caches and SIs that access the L1 cache shall be developed and integrated into OTAWA first. In a parallel effort, we will investigate how to modify/constrain aiT accordingly.

Intra-tile memory hierarchy reorganisation

In this WP, the current memory architecture will be reorganised. To find the best solution, several approaches must be thoroughly weighed up against each other. The current traffic on the bus has to be tracked and analysed, as well as the access of the i-Core to the TLM. Additionally, the architecture of the i-Core tile will be extended by the capability of near-data processing.

Control flow within special instructions

As the challenging parts will be on the tool side, we plan to start by carefully extending the data structures and interfaces of our tools for designing SIs and for analysing the worst-case execution time. After we are certain which is the best representation for the tools, we will start the actual tool development together with the required hardware modifications of the SI Execution Controller, predicate buses, and integration with COREFAB. In parallel, we will modify some prominent existing SIs (that were designed by manually unrolling loops) to use the new features for evaluation.

Approximated accelerators

In a first step, we will choose different applications according to their error-resilience and design an approximate accelerator for the GPP and an SI for the i-Core. Based on the first results, a higher-level-instance will be designed to initiate and track the usage of the approximate accelerator and the i-Core resources. Afterwards, the main task is to design an adaptive instance which can control the usage of the different hardware components at run time.

Information leakage protection

At first, we will introduce a statistical model based on the information flow of all channels of the cache subsystem. Afterwards, we will investigate our dynamic intra-tile cache reallocation with regard to security abilities. In parallel, we will work on implementing the different isolation mechanisms and their enforcers for the reconfigurable fabric. Afterwards, the TLM-MMU will be extended to provide access protection mechanisms.

Hardware support for compiler-based SI generation

We plan to start by investigating the area overhead for different numbers of Repack operations, to understand how many different modes we can afford. Then, we will extract the currently used modes and generalise them. The goal is that the used modes should be directly supported by Repack, whereas other modes may only be executable indirectly. For the generic accelerators, we will start by collecting suitable patterns from standard benchmarks to identify similarities and combine them to generic data paths.

Joint emulation, prototype, and demonstrator integration

All i-Core architecture innovations of Phase III will be developed and implemented in the WPs mentioned above. In this WP, we will at first integrate these features into our stand-alone FPGA prototype for development and testing purposes, and subsequently, we will integrate them into the ProDesign proFPGA demonstrator platform of Project Z2. We will use these prototypes for measurements and detailed evaluations. Together with all other projects, we will work on common demonstration scenarios and provide dedicated hardware acceleration for Project A4, Project D1, and Project D3.

A comprehensive summary of the major achievements of the first and second funding phase can be found by accessing Project B1 first phase and Project B1 second phase websites.

Publications

[1] Jörg Henkel. Power density and circuit aging – system-level means for mitigation. Keynote, IEEE Computer Society Annual Symposium on VLSI, Hong Kong, July 10 2018.
[2] Alexander Pöppl, Marvin Damschen, Florian Schmaus, Andreas Fried, Manuel Mohr, Matthias Blankertz, Lars Bauer, Jörg Henkel, Wolfgang Schröder-Preikschat, and Michael Bader. Shallow water waves on a deep technology stack: Accelerating a finite volume tsunami model using reconfigurable hardware in invasive computing. In Dora B. Heras, Luc Bougé, Gabriele Mencagli, Emmanuel Jeannot, Rizos Sakellariou, Rosa M. Badia, Jorge G. Barbosa, Laura Ricci, Stephen L. Scott, Stefan Lankes, and Josef Weidendorfer, editors, Euro-Par 2017: Proceedings of the 10th Workshop on UnConventional High Performance Computing (UCHPC 2017), Lecture Notes in Computer Science (LNCS), pages 676–687, Cham, 2018. Springer International Publishing.
[3] Tanja Harbaum, Christoph Schade, Marvin Damschen, Carsten Tradowsky, Lars Bauer, Jörg Henkel, and Jürgen Becker. Auto-SI: An adaptive reconfigurable processor with run-time loop detection and acceleration. In 30th IEEE International System-on-Chip Conference (SOCC), pages 224–229, September 2017.
[4] Jörg Henkel. The triangle of power density, circuit degradation and reliability. Invited Keynote Speech, 30th IEEE International System-On-Chip Conference (SoCC 2017), Munich, Germany, September 7, 2017.
[5] Manuel Mohr and Carsten Tradowsky. Pegasus: Efficient data transfers for PGAS languages on non-cache-coherent many-cores. In Design, Automation and Test in Europe Conference Exhibition (DATE), pages 1781–1786, March 30, 2017.
[6] Artjom Grudnitsky, Lars Bauer, and Jörg Henkel. Efficient partial online-synthesis of special instructions for reconfigurable processors. IEEE Transactions on Very Large Scale Integration Systems (TVLSI), 25(2):594–607, February 2017. [ DOI ]
[7] Marvin Damschen, Lars Bauer, and Jörg Henkel. Timing analysis of tasks on runtime reconfigurable processors. IEEE Transactions on Very Large Scale Integration Systems (TVLSI), 25(1):294–307, January 2017. [ DOI ]
[8] Manuel Mohr and Carsten Tradowsky. Pegasus: Efficient data transfers for PGAS languages on non-cache-coherent many-cores. In Proceedings of Design, Automation and Test in Europe Conference Exhibition (DATE), pages 1781–1786. IEEE, 2017. [ DOI ]
[9] Carsten Tradowsky. Methoden zur applikationsspezifischen Effizienzsteigerung adaptiver Prozessorplattformen. Dissertation, Institut für Technik der Informationsverarbeitung (ITIV), Fakultät für Elektrotechnik und Informationstechnik, Karlsruher Institut für Technologie (KIT), December 20, 2016.
[10] Jürgen Teich. Invasive computing – editorial. it – Information Technology, 58(6):263–265, November 24, 2016. [ DOI ]
[11] Stefan Wildermann, Michael Bader, Lars Bauer, Marvin Damschen, Dirk Gabriel, Michael Gerndt, Michael Glaß, Jörg Henkel, Johny Paul, Alexander Pöppl, Sascha Roloff, Tobias Schwarzer, Gregor Snelting, Walter Stechele, Jürgen Teich, Andreas Weichslgartner, and Andreas Zwinkau. Invasive computing for timing-predictable stream processing on MPSoCs. it – Information Technology, 58(6):267–280, September 30, 2016. [ DOI ]
[12] Fazal Hameed, Lars Bauer, and Jörg Henkel. Architecting on-chip DRAM cache for simultaneous miss rate and latency reduction. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (TCAD), 35(4):651–664, April 2016.
[13] Carsten Tradowsky, Enrique Cordero, Christoph Orsinger, Malte Vesper, and Jürgen Becker. A Dynamic Cache Architecture for Efficient Memory Resource Allocation in Many-Core Systems. Springer International Publishing, Cham, 2016. [ DOI ]
[14] Carsten Tradowsky, Enrique Cordero, Christoph Orsinger, Malte Vesper, and Jürgen Becker. Adaptive Cache Structures. Springer International Publishing, Cham, 2016. [ DOI ]
[15] Carsten Tradowsky, Tanja Harbaum, Leonard Masing, and Jürgen Becker. A novel adl-based approach to design adaptive application-specific processors. In Best of IEEE Computer Society Annual Symposium on VLSI (ISVLSI). 2016.
[16] Artjom Grudnitsky. A Reconfigurable Processor for Heterogeneous Multi-Core Architectures. Dissertation, Chair for Embedded Systems (CES), Department of Computer Science, Karlsruhe Institute of Technology (KIT), Germany, December 21, 2015.
[17] Johny Paul, Walter Stechele, Benjamin Oechslein, Christoph Erhardt, Jens Schedel, Daniel Lohmann, Wolfgang Schröder-Preikschat, Manfred Kröhnert, Tamim Asfour, Éricles R. Sousa, Vahid Lari, Frank Hannig, Jürgen Teich, Artjom Grudnitsky, Lars Bauer, and Jörg Henkel. Resource-awareness on heterogeneous MPSoCs for image processing. Journal of Systems Architecture, 61(10):668–680, November 6, 2015. [ DOI ]
[18] Lars Bauer, Artjom Grudnitsky, Marvin Damschen, Srinivas Rao Kerekare, and Jörg Henkel. Floating point acceleration for stream processing applications in dynamically reconfigurable processors. In IEEE Symposium on Embedded Systems for Real-time Multimedia (ESTIMedia), October 2015. Invited Paper for the Special Session “Dynamics and Predictability in Stream Processing – A Contradiction?”. [ DOI ]
[19] C. Diniz, M. Shafique, S. Bampi, and J. Henkel. A reconfigurable hardware architecture for fractional pixel interpolation in high efficiency video coding. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (TCAD), 34(2), February 2015.
[20] Fazal Hameed. DRAM aware Last-Level-Cache policies for Multi-core Systems. Dissertation, Chair for Embedded Systems (CES), Department of Computer Science, Karlsruhe Institute of Technology (KIT), Germany, February 6, 2015.
[21] Peter Figuli, Carsten Tradowsky, Jose Martinez, Harry Sidiropoulos, Kostas Siozios, Holger Stenschke, Dimitrios Soudris, and Jürgen Becker. A novel concept for adaptive signal processing on reconfigurable hardware. In Applied Reconfigurable Computing, volume 9040 of Lecture Notes in Computer Science, pages 311–320. Springer International Publishing, 2015.
[22] Artjom Grudnitsky, Lars Bauer, and Jörg Henkel. COREFAB: Concurrent reconfigurable fabric utilization in heterogeneous multi-core systems. In International Conference on Compilers, Architecture and Synthesis for Embedded Systems (CASES), October 2014. [ DOI ]
[23] Martin Haaß, Lars Bauer, and Jörg Henkel. Automatic custom instruction identification in memory streaming algorithms. In International Conference on Compilers, Architectures and Synthesis for Embedded Systems (CASES), October 2014. [ DOI ]
[24] Jörg Henkel, Lars Bauer, Artjom Grudnitsky, and Hongyan Zhang. Adaptive embedded computing with i-Core. In ACM SIGBED Review – Special Issue on the 6th Workshop on Adaptive and Reconfigurable Embedded Systems, volume 11, pages 20–21, October 2014. Extended Abstract for Keynote Talk. [ DOI ]
[25] Fazal Hameed, Lars Bauer, and Jörg Henkel. Reducing latency in an SRAM/DRAM cache hierarchy via a novel tag-cache architecture. In IEEE/ACM Design Automation Conference (DAC), June 2014. [ DOI ]
[26] Jörg Henkel. Adaptive embedded computing with i-Core. Keynote Talk, 6th Workshop on Adaptive and Reconfigurable Embedded Systems, CPSWeek (APRES), April 14, 2014.
[27] Carsten Tradowsky, Martin Schreiber, Malte Vesper, Ivan Domladovec, Maximilian Braun, Hans-Joachim Bungartz, and Jürgen Becker. Towards dynamic cache and bandwidth invasion. In Reconfigurable Computing: Architectures, Tools, and Applications, volume 8405 of Lecture Notes in Computer Science, pages 97–107. Springer International Publishing, April 2014. [ DOI ]
[28] Artjom Grudnitsky, Lars Bauer, and Jörg Henkel. MORP: Makespan optimization for processors with an embedded reconfigurable fabric. In Proceedings of the 22nd ACM/SIGDA International Symposium on Field Programmable Gate Arrays (FPGA), pages 127–136, February 2014. [ DOI ]
[29] C. Tradowsky, T. Gädeke, T. Bruckschlögl, W. Stork, K.-D. Müller-Glaser, and J. Becker. Smartlocore: A concept for an adaptive power-aware localization processor. In Parallel, Distributed and Network-Based Processing (PDP), 2014 22nd Euromicro International Conference on, pages 478–481, February 2014. [ DOI ]
[30] Muhammad Shafique, Lars Bauer, and Jörg Henkel. Adaptive energy management for dynamically reconfigurable processors. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (TCAD), 33(1):50–63, January 2014. [ DOI ]
[31] Timo Stripf. Softwareframework für Prozessoren mit variablen Befehlssatzarchitekturen. Dissertation, Institut für Technik der Informationsverarbeitung (ITIV), Fakultät für Elektrotechnik und Informationstechnik, Karlsruher Institut für Technologie (KIT), December 11, 2013.
[32] Peter Figuli, Carsten Tradowsky, Nadine Gaertner, and Jürgen Becker. Visa: A highly efficient slot architecture enabling multi-objective ASIP cores. In International Symposium on System on Chip (SoC), pages 1–8, October 2013. [ DOI ]
[33] Manuel Mohr, Artjom Grudnitsky, Tobias Modschiedler, Lars Bauer, Sebastian Hack, and Jörg Henkel. Hardware acceleration for programs in SSA form. In International Conference on Compilers, Architecture and Synthesis for Embedded Systems (CASES), Montreal, Canada, October 2013. [ DOI ]
[34] Fazal Hameed, Lars Bauer, and Jörg Henkel. Simultaneously optimizing DRAM cache hit latency and miss rate via novel set mapping policies. In International Conference on Compilers Architecture and Synthesis for Embedded Systems (CASES), September 2013. [ DOI ]
[35] Fazal Hameed, Lars Bauer, and Jörg Henkel. Reducing inter-core cache contention with an adaptive bank mapping policy in DRAM cache. In International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS), September 2013. [ DOI ]
[36] Carsten Tradowsky, Tanja Harbaum, Shaver Deyerle, and Jürgen Becker. Limbic: An adaptable architecture description language model for developing an application-specific image processor. In IEEE Computer Society Annual Symposium on VLSI (ISVLSI), pages 34–39, August 2013. [ DOI ]
[37] Lars Braun. Methoden zur Erstellung eines laufzeitadaptiven und zweidimensional rekonfigurierbaren Systems. Dissertation, Institut für Technik der Informationsverarbeitung (ITIV), Fakultät für Elektrotechnik und Informationstechnik, Karlsruher Institut für Technologie (KIT), February 19, 2013.
[38] Carsten Tradowsky, Enrique Cordero, Thorsten Deuser, Michael Hübner, and Jürgen Becker. Determination of on-chip temperature gradients on reconfigurable hardware. In Proceedings of the International Conference on Reconfigurable Computing and FPGAs (ReConFig), pages 1–8, December 2012. [ DOI ]
[39] Michael Hübner, Diana Göhringer, Carsten Tradowsky, Jörg Henkel, and Jürgen Becker. Adaptive processor architecture. In International Conference on Embedded Computer Systems (SAMOS), pages 244–251, July 2012. Invited paper. [ DOI ]
[40] Carsten Tradowsky, Florian Thoma, Michael Hübner, and Jürgen Becker. Lisparc: Using an architecture description language approach for modelling an adaptive processor microarchitecture. In 7th IEEE International Symposium on Industrial Embedded Systems (SIES'12), pages 279–282, June 2012. Best Work-in-Progress (WiP) Paper Award. [ DOI ]
[41] Jörg Henkel. i-Core: Adaptive computing for multi-core architectures. Embedded System Design from MultiMedia to Cloud, Hong Kong, Invited Talk, May 18, 2012.
[42] Lars Bauer, Artjom Grudnitsky, Muhammad Shafique, and Jörg Henkel. PATS: a performance aware task scheduler for runtime reconfigurable processors. In 20th Annual International IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM), pages 208–215, May 2012. [ DOI ]
[43] Carsten Tradowsky, Florian Thoma, Michael Hübner, and Jürgen Becker. On dynamic run-time processor pipeline reconfiguration. In IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum (IPDPSW), pages 419–424, May 2012. [ DOI ]
[44] Artjom Grudnitsky, Lars Bauer, and Jörg Henkel. Partial online-synthesis for mixed-grained reconfigurable architectures. In Proceedings of Design, Automation and Test in Europe Conference (DATE), pages 1555–1560, March 2012. [ DOI ]
[45] Peter Figuli, Michael Hübner, Romuald Girardey, F. Bapp, Thomas Bruckschlögl, Florian Thoma, Jörg Henkel, and Jürgen Becker. A heterogeneous SoC architecture with embedded virtual FPGA cores and runtime core fusion. In NASA/ESA 6th Conference on Adaptive Hardware and Systems (AHS), pages 96–103, 2012. [ DOI ]
[46] Jörg Henkel, Andreas Herkersdorf, Lars Bauer, Thomas Wild, Michael Hübner, Ravi Kumar Pujari, Artjom Grudnitsky, Jan Heisswolf, Aurang Zaib, Benjamin Vogel, Vahid Lari, and Sebastian Kobbe. Invasive manycore architectures. In Proceedings of the 17th Asia and South Pacific Design Automation Conference (ASP-DAC), pages 193–200, January 2012. [ DOI ]
[47] Alexander Klimm. Computing Architectures for Security Applications on Reconfigurable Hardware in Embedded Systems. Dissertation, Institut für Technik der Informationsverarbeitung (ITIV), Fakultät für Elektrotechnik und Informationstechnik, Karlsruher Institut für Technologie (KIT), December 22, 2011.
[48] M. Hübner, C. Tradowsky, D. Göhringer, L. Braun, F. Thoma, J. Henkel, and J. Becker. Dynamic processor reconfiguration. In Proceedings of the International Conference on Reconfigurable Computing and FPGAs (ReConFig), pages 123–128, November 2011. [ DOI ]
[49] Jörg Henkel, Lars Bauer, Michael Hübner, and Artjom Grudnitsky. i-Core: A run-time adaptive processor for embedded multi-core systems. In Proceedings of the International Conference on Engineering of Reconfigurable Systems and Algorithms (ERSA), July 2011. Invited paper.
[50] Lars Bauer, Muhammad Shafique, and Jörg Henkel. Concepts, architectures, and run-time systems for efficient and adaptive reconfigurable processors. In NASA/ESA 6th Conference on Adaptive Hardware and Systems (AHS), pages 80–87, June 2011. Invited paper; Received the MaXentric Technologies AHS Best Paper Award. [ DOI ]
[51] Michael Hübner, Peter Figuli, Romuald Girardey, Dimitrios Soudris, Kostas Siozios, and Jürgen Becker. A heterogeneous multicore system on chip with run-time reconfigurable virtual fpga architecture. In Proceedings of the International Parallel and Distributed Processing Symposium Workshops (IPDPSW), May 2011.
[52] Jürgen Teich, Jörg Henkel, Andreas Herkersdorf, Doris Schmitt-Landsiedel, Wolfgang Schröder-Preikschat, and Gregor Snelting. Invasive computing: An overview. In Michael Hübner and Jürgen Becker, editors, Multiprocessor System-on-Chip – Hardware Design and Tool Integration, pages 241–268. Springer, Berlin, Heidelberg, 2011. [ DOI ]
[53] Jürgen Teich. Invasive algorithms and architectures. it - Information Technology, 50(5):300–310, 2008.
[54] Diana Göhringer, Jonathan Obie, Michael Hübner, and Jürgen Becker. Impact of task distribution, processor configurations and dynamic clock frequency scaling on the power consumption of fpga-based multiprocessors. In Proceedings of the 5th International Workshop on Reconfigurable Communication-centric Systems on Chip (ReCoSoC), pages 13–20. KIT Scientific Publishing.
[55] Michael Hübner, Diana Göhringer, J. Noguera, and Jürgen Becker. Fast dynamic and partial reconfiguration data path with low hardware overhead on Xilinx FPGAs. In Proceedings of the International Parallel and Distributed Processing Symposium Workshops (IPDPSW).
[56] Carsten Tradowsky, Peter Figuli, Erik Seidenspinner, Felix Held, and Jürgen Becker. A new approach to model-based development for audio signal processing. In 134th International AES Convention.
[57] Michael Hübner and Jürgen Becker, editors. Multiprocessor System-on-Chip: Hardware Design and Tool Integration. Springer.