Projects


B2: Invasive Tightly-Coupled Processor Arrays

Principal Investigators:

Prof. J. Teich

Scientific Researchers

A. Becher, M. Brand, Dr. F. Hannig, F. Khosravi, É. R. Sousa

Abstract

Project B2 investigates invasive computing on tightly coupled processor arrays (TCPAs). These have been shown to provide highly energy-efficient and, at the same time, timing-predictable acceleration for many computationally intensive applications that may be expressed by nested loops from diverse areas such as scientific computing and image and signal processing, to name a few.

In the first funding phase, concepts for hardware-controlled invasion through a cycle-wise propagation of invasion control signals between neighbouring processing elements (PEs) have been investigated. Not only may such decentralised parallel invasion strategies reduce the invasion overhead by two orders of magnitude w.r.t. a centralised software-based approach. Even bounds on the invasion time of invading N processing elements in Ο(N) clock cycles have been shown to be achievable. For invasion control, two variants, namely finite state machine based (FSM-based) and programmable variants have been proposed, and different 1D and 2D invasion strategies were evaluated. Moreover, the self-adaptive nature of invasive computing was also exploited for the purpose of dynamic power management by controlling the wake up as well as the powering down regions of processors directly by the invade and retreat signals, respectively. As invades and retreats are initiated application-driven, a TCPA may therefore nicely adopt itself to the application requirements in terms of power needs. Finally, a first invasive TCPA prototype for basic visual computing algorithms was demonstrated in cooperation with Project Z2 on the CHIPit prototyping system.

In the second funding phase, a major focus of research will be the quest for (higher) predictability of non-functional aspects of invasive parallel program execution, i.e. performance, fault tolerance, and energy consumption. While predictable performance in terms of latency and throughput shall be proven for loop nests compiled to TCPA targets, we will also investigate Micro- and macroarchitectural support to improve the predictability of invasive TCPAs in general. The investigations include techniques for enabling (a) fault tolerance (DMR, TMR) through invasion of redundant array regions of processors as well as novel hardware concepts for reaction and recovery from faults, (b) orthogonal instruction processing as a new concept for increasing the code density of the VLIW cores of a TCPA, and (c) methods for fine-granular power management of TCPA regions in trade-off between predictable yet energy-efficient computing as well as in view of dark silicon.

Synopsis

The goal of this project is to provide concepts and solutions for invasive tightly coupled processor arrays (TCPAs) that may be embedded as high-speed and low-energy tiles within a heterogeneous MPSoC.
They may be found on many MPSoC platforms to provide area- and power-efficient computing structures for fine- to medium-grained highly-parallel computations such as specified by nested loop programs. Domains of particular interest are image and signal processing, linear algebra type of computations, and many others. Here, they play out their full advantage of a cycle-wise data processing and delivering results over dedicated regular interconnect links with very low overhead. In our project, each node of a TCPA is a customisable VLIW processor.

In phase II, the major focus is on the aspect of *-predictability, i.e. concepts and proofs that invasive TCPAs may be used as not only scalable loop program accelerators, but being highly predictable in multiple (the *) qualities of interest to a programmer. The first quality of interest to be investigated is performance predictability, in which, based on our results on symbolic loop parallelisation, we may guarantee a statically computable bound on latency and throughput of a program in dependence of its input parameters and dynamic size of the claimed processor array. Another objective of interest in view of an increased vulnerability of semiconductors to errors due to the high integration densities is fault tolerance. Here, dual (DMR) and triple (TMR) replication of loop computations shall be investigated systematically with either trade-off in time (latency) or in claim size needed to guarantee error detection and error tolerance of single or even multiple bit errors induced by soft errors. Furthermore, efficient methods and support to recover from faults need to be investigated such that predictability in terms of error detectability and/or fault tolerance may be achieved for an invading application.

Code efficiency (density) is another important goal driven by tight memory constraints within processing elements of a TCPA and effecting also the time needed to infect a claimed array with proper instruction sequences in parallel. Here, we will propose a new processing element architecture for VLIW processors called orthogonal instruction processor that shall enable to run loop nests on TCPAs at a much higher code density, better energy efficiency and, as a resulting effect, also much lower expected configuration (infect) time.

Finally, Energy is the third objective under investigation in our project. We will characterise TCPAs in terms of energy efficiency (number of instructions per Joule) to have a measure indicating applications that would benefit from one or more TCPA tiles on the MPSoC in order to avoid---at least reduce---the threat of dark silicon menacing the number of active cores on a chip. In this realm, also architectural concepts of invasive TCPAs need to be extended towards fine-tunable energy management.

Invasive TCPA tile architecture

Processing elements: Light weight, programmable tightly-coupled processor array having VLIW architecture

Reconfiguration and Communication Processor: Controls the communication between the different architectural components of the TCPA tile and reconfigures the processor arraye

Configuration Manager: Configures the processor array for different applications

Global Controller (GC): Controls the execution of loop programs by sending appropriate control signals to the array

Reconfigurable I/O Buffers and address generators (AG): Data buffers of surrounding the processor array responsible for proper data feeds into the array

Invasion Manager (IM): Handles the invasion requests of a TCPA and keeps track of the availability of processor regions for admission of new applications within the array.

Invasion Controller (iCtrl): Each PE consists not only of a VLIW CPU but an additional controller that implements multiple invasion strategies that may capture PEs either in a linear or rectangular connected regions.

Network Adapter: Interface between the iNoC and TCPA tile

TCPA tile

Invasion Strategies

Linear (1D) invasion strategy: This strategy tries to invade a linear array of consecutive locally connected processors. Different options include row- or column-wise as well as random and meander-like topologies.

Rectangular (2D) invasion strategy: Here, the goal is to invade a rectangular region of processors. For two dimensional strategies, one implementation could be performing horizontal followed by vertical one-dimensional invasions by processors in the first row of the rectangular region.

random invasion meander invasion rectangular invasion
Random linear invasion Meander linear invasion Rectangular invasion strategy

Invasion controller designs

In order to be able to perform fast invasions by the processing elements, an invasion controller is integrated into each of the processing elements. Here, two different design options are suggested:

1) FSM-based invasion controllers provide a dedicated hardware implementation of a single invasion strategy. Moreover, it is shown that a linear array of PEs in time Ο(N) (i.e. two clock cycles per PE in our implementation). In addition, they impose only very little area and power overheads. However, they lack the flexibility and extendability to support potentially other and more complex invasion strategies.

2) Programmable invasion controllers, as the name suggests, are tiny processors that implement multiple invasion strategies as (micro-)programs. Their flexibility, however, comes at the cost of an increased invasion time, area, and power.

Different hardware designs for invasion controllers

A processing element with an invasion controller integrated into it. Two different design options are developed, an FSM-based invasion controller and a programmable one.

Invasive computing as an enabler for power management

Resource-aware computing shows its importance and advantages when targeting manycore architectures consisting of tens to thousands of processing elements. Such a great number of computational resources allows to support very high levels of parallelism but on the other side may also cause a high power consumption. In the context of invasive computing, we exploited invasion requests to wake up processors and retreat requests to shut down the processors in order to save power. As these invasion and retreat requests are initiated by each application, the architecture itself adopts to the application requirements in terms of power needs. During the invasion phase, two different kinds of power domains are considered: processing unit power domains and invasion controller power domains. These domains are controlled hierarchically, based on the system utilisation which is in turn controlled by the invasion controllers.

Power gating of individual invasion controllers may reduce the power consumption of the MPSoC but at the cost of timing overhead of power switching delays. We therefore studied the effects of grouping multiple invasion controllers in the same power domain. Such grouping mechanisms may reduce the hardware cost for power gating, yet sacrificing the granularity of power gating capabilities. The finer the granularity for the power control, the more power we may save. In contrast, grouping more invasion controllers together will reduce the timing overhead that is needed for power switching during both invasion and retreat phases. The image below shows different proposed example architectures for grouping the invasion controllers. Experimental results show that up to 70% of the total energy consumption of a processor array may be saved for selected applications and different resource utilisations.

Different grouping aoptions for iCtrl power domains

Different designs for grouping invasion controllers into one power domain. a) Invasion controller power domains controlling the power state of a single invasion controller. b) An invasion controller power domain controlling the power state of four invasion controllers belonging to four processing elements.

A comprehensive summary of the major achievements of the first funding phase can be found by accessing Project B2 first phase website.

Publications

[1] Marcel Brand, Frank Hannig, Alexandru Tanase, and Jürgen Teich. Orthogonal instruction processing: An alternative to lightweight VLIW processors. In Proceedings of the IEEE 11th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC). IEEE Computer Society, September 2017.
[2] Marcel Brand, Frank Hannig, Alexandru Tanase, and Jürgen Teich. Efficiency in ILP processing by using orthogonality. In Proceedings of the 28th Annual IEEE International Conference on Application-specific Systems, Architectures and Processors (ASAP). IEEE, July 2017.
[3] Alexandru Tanase, Michael Witterauf, Jürgen Teich, and Frank Hannig. Symbolic multi-level loop mapping of loop programs for massively parallel processor arrays. ACM Transactions on Embedded Computing Systems (TECS), May 15, 2017. Accepted.
[4] Heba Khdr, Santiago Pagani, Éricles R. Sousa, Vahid Lari, Anuj Pathania, Frank Hannig, Muhammad Shafique, Jürgen Teich, and Jörg Henkel. Power density-aware resource management for heterogeneous tiled multicores. IEEE Transactions on Computers (TC), 66(3):488–501, March 1, 2017. [ DOI ]
[5] Soonhoi Ha and Jürgen Teich, editors. The Handbook of Hardware/Software Codesign. Springer, 2017.
[6] Jürgen Teich. Invasive computing – editorial. it – Information Technology, 58(6):263–265, November 24, 2016. [ DOI ]
[7] Vahid Lari, Andreas Weichslgartner, Alex Tanase, Michael Witterauf, Faramarz Khosravi, Jürgen Teich, Jürgen Becker, Jan Heißwolf, and Stephanie Friederich. Providing fault tolerance through invasive computing. it – Information Technology, 58(6):309–328, October 19, 2016. [ DOI ]
[8] Santiago Pagani, Lars Bauer, Qingqing Chen, Elisabeth Glocker, Frank Hannig, Andreas Herkersdorf, Heba Khdr, Anuj Pathania, Ulf Schlichtmann, Doris Schmitt-Landsiedel, Mark Sagi, Éricles Sousa, Philipp Wagner, Volker Wenzel, Thomas Wild, and Jörg Henkel. Dark silicon management: An integrated and coordinated cross-layer approach. it – Information Technology, 58(6):297–307, September 16, 2016. [ DOI ]
[9] Jürgen Teich, Michael Glaß, Sascha Roloff, Wolfgang Schröder-Preikschat, Gregor Snelting, Andreas Weichslgartner, and Stefan Wildermann. Language and compilation of parallel programs for *-predictable MPSoC execution using invasive computing. In Proceedings of the 10th IEEE International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC), pages 313–320, Lyon, France, September 2016. [ DOI ]
[10] Jürgen Teich. Predictability, fault tolerance, and security on demand using invasive computing. Invited Talk, University of Lübeck, Germany, July 29, 2016.
[11] Jürgen Teich. Invasive Computing - The DFG Transregional Research Center 89. DTC 2016, The Munich Workshop on Design Technology Coupling, Munich, Germany, June 30, 2016.
[12] Vahid Lari. Providing fault tolerance through invasive computing. Talk at DTC 2016, The Munich Workshop on Design Technology Coupling, Munich, Germany, June 30, 2016.
[13] Jürgen Teich. Predictable MPSoC stream processing using invasive computing. Seminar Talk, Electrical and Computer Engineering, The University of Texas at Austin, USA, June 6, 2016.
[14] Jürgen Teich. Adaptive restriction and isolation for predictable MPSoC stream procesing. Invited Talk, DATE 2016 Friday Workshop on Resource Awareness and Application Autotuning in Adaptive and Heterogeneous Computing, Dresden, Germany, March 18, 2016.
[15] Alexandru Tanase, Michael Witterauf, Éricles R. Sousa, Vahid Lari, Frank Hannig, and Jürgen Teich. LoopInvader: A compiler for tightly coupled processor arrays. Tool Presentation at the University Booth at Design, Automation and Test in Europe (DATE), Dresden, Germany, March 2016. [ .pdf ]
[16] Jürgen Teich. Symbolic loop parallelization for adaptive multi-core systems - recent advances and benefits. Keynote, IMPACT 2016, the 6th International Workshop on Polyhedral Compilation Techniques, 19 January, 2016, Prague, Czech Republic, January 19, 2016.
[17] Jürgen Teich. The role of restriction and isolation for increasing the predictability of MPSoC stream processing. Keynote, 8th Workshop on Rapid Simulation and Performance Evaluation: Methods and Tools (RAPIDO 2016), Prague, Czech Republic, January 18, 2016.
[18] Vahid Lari. Invasive Tightly Coupled Processor Arrays. Springer Singapore, 2016. [ DOI ]
[19] Srinivas Boppu. Code Generation for Tightly Coupled Processor Arrays. Dissertation, Hardware/Software Co-Design, Department of Computer Science, Friedrich-Alexander-Universität Erlangen-Nürnberg, Germany, December 18, 2015.
[20] Oliver Reiche, Konrad Häublein, Marc Reichenbach, Moritz Schmid, Frank Hannig, Jürgen Teich, and Dietmar Fey. Synthesis and optimization of image processing accelerators using domain knowledge. Journal of Systems Architecture (JSA), December 2015. [ DOI ]
[21] Vahid Lari. Invasive Tightly Coupled Processor Arrays. Dissertation, Hardware/Software Co-Design, Department of Computer Science, Friedrich-Alexander-Universität Erlangen-Nürnberg, Germany, November 18, 2015.
[22] Vahid Lari, Jürgen Teich, Alexandru Tanase, Michael Witterauf, Faramarz Khosravi, and Brett H. Meyer. Techniques for on-demand structural redundancy for massively parallel processor arrays. Journal of Systems Architecture (JSA), 61(10):615–627, November 2015. [ DOI ]
[23] Johny Paul, Walter Stechele, Benjamin Oechslein, Christoph Erhardt, Jens Schedel, Daniel Lohmann, Wolfgang Schröder-Preikschat, Manfred Kröhnert, Tamim Asfour, Éricles R. Sousa, Vahid Lari, Frank Hannig, Jürgen Teich, Artjom Grudnitsky, Lars Bauer, and Jörg Henkel. Resource awareness on heterogeneous mpsocs for image processing. Journal of Systems Architecture, 61(10):668–680, November 6, 2015. [ DOI ]
[24] Johny Paul, Walter Stechele, Benjamin Oechslein, Christoph Erhardt, Jens Schedel, Daniel Lohmann, Wolfgang Schröder-Preikschat, Manfred Kröhnert, Tamim Asfour, Éricles R. Sousa, Vahid Lari, Frank Hannig, Jürgen Teich, Artjom Grudnitsky, Lars Bauer, and Jörg Henkel. Resource-awareness on heterogeneous MPSoCs for image processing. Journal of Systems Architecture, 61(10):668–680, November 6, 2015. [ DOI ]
[25] Éricles R. Sousa, Frank Hannig, and Jürgen Teich. Reconfigurable buffer structures for coarse-grained reconfigurable arrays. In Proceedings of the International Embedded Systems Symposium (IESS). LNCS, November 2015.
[26] Moritz Schmid. Rapid Prototyping for Hardware Accelerators in the Medical Imaging Domain. Dissertation, Hardware/Software Co-Design, Department of Computer Science, Friedrich-Alexander-Universität Erlangen-Nürnberg, Germany, July 24, 2015.
[27] Moritz Schmid, Oliver Reiche, Frank Hannig, and Jürgen Teich. Loop coarsening in C-based high-level synthesis. In Proceedings of the 26th IEEE International Conference on Application-specific Systems, Architectures and Processors (ASAP), pages 166–173. IEEE, July 2015.
[28] Alexandru Tanase, Michael Witterauf, Jürgen Teich, Frank Hannig, and Vahid Lari. On-demand fault-tolerant loop processing on massively parallel processor arrays. In Proceedings of the 26th IEEE International Conference on Application-specific Systems, Architectures and Processors (ASAP), pages 194–201. IEEE, July 2015. [ DOI ]
[29] Vahid Lari, Alexandru Tanase, Jürgen Teich, Michael Witterauf, Faramarz Khosravi, Frank Hannig, and Brett H. Meyer. A co-design approach for fault-tolerant loop execution on coarse-grained reconfigurable arrays. In Proceedings of the 2015 NASA/ESA Conference on Adaptive Hardware and Systems (AHS), pages 1–8. IEEE, June 2015. [ DOI ]
[30] Éricles R. Sousa, Frank Hannig, Jürgen Teich, Qingqing Chen, and Ulf Schlichtmann. Runtime adaptation of application execution under thermal and power constraints in massively parallel processor arrays. In Proceedings of the 18th International Workshop on Software and Compilers for Embedded Systems (SCOPES), pages 121–124. ACM, June 2015. [ DOI ]
[31] Jürgen Teich. Adaptive isolation for predictable mpsoc stream processing. Keynote, SCOPES 2015, 18th International Workshop on Software and Compilers for Embedded Systems, Schloss Rheinfels, St. Goar, Germany, June 2, 2015.
[32] Jürgen Teich. Adaptive isolation for predictable mpsoc stream processing. In Proceedings of the 18th International Workshop on Software and Compilers for Embedded Systems (SCOPES 2015), pages 1–2, June 2015. [ DOI ]
[33] Michael Witterauf, Alexandru Tanase, Jürgen Teich, Vahid Lari, Andreas Zwinkau, and Gregor Snelting. Adaptive fault tolerance through invasive computing. In Proceedings of the 2015 NASA/ESA Conference on Adaptive Hardware and Systems (AHS), pages 1–8. IEEE, June 2015. [ DOI ]
[34] Jürgen Teich, Srinivas Boppu, Frank Hannig, and Vahid Lari. Compact code generation and throughput optimization for coarse-grained reconfigurable arrays. In Wayne Luk and George A. Constantinides, editors, Transforming Reconfigurable Systems: A Festschrift Celebrating the 60th Birthday of Professor Peter Cheung, chapter 10, pages 167–206. Imperial College Press, London, UK, April 2015. [ DOI ]
[35] Jürgen Teich. Invasive computing. Invited Talk, SE 2015, Software Engineering and Management, Special Session Software Engineering in der DFG, Dresden, Germany, March 19, 2015.
[36] Jürgen Teich. Reconfigurable computing for mpsoc. Invited Lecture, Winter School Design and Applications of Multi Processor System on Chip, Tunis, Tunesia, November 26, 2014.
[37] Deepak Gangadharan, Éricles Sousa, Vahid Lari, Frank Hannig, and Jürgen Teich. Application-driven reconfiguration of shared resources for timing predictability of mpsoc platforms. In Proceedings of Asilomar Conference on Signals, Systems, and Computers (ASILOMAR), pages 398–403. IEEE, November 2014. [ DOI ]
[38] Johny Paul, Walter Stechele, Éricles R. Sousa, Vahid Lari, Frank Hannig, Jürgen Teich, Manfred Kröhnert, and Tamim Asfour. Self-adaptive harris corner detector on heterogeneous many-core processor. In Proceedings of the Conference on Design and Architectures for Signal and Image Processing (DASIP). IEEE, October 2014. [ DOI ]
[39] Jürgen Teich. Invasive computing – concepts and benefits. Keynote, DASIP 2014, Conference on Design and Architectures for Signal and Image Processing, Madrid, Spain, October 8, 2014.
[40] Éricles Sousa, Deepak Gangadharan, Frank Hannig, and Jürgen Teich. Runtime reconfigurable bus arbitration for concurrent applications on heterogeneous MPSoC architectures. In Proceedings of the EUROMICRO Digital System Design Conference (DSD), pages 74–81. IEEE, August 2014. [ DOI ]
[41] Jürgen Teich. Foundations and benefits of invasive computing. Seminar, Mc Gill University, Montreal, July 29, 2014.
[42] Srinivas Boppu, Frank Hannig, and Jürgen Teich. Compact code generation for tightly-coupled processor arrays. Journal of Signal Processing Systems, 77(1–2):5–29, May 31, 2014. [ DOI ]
[43] Jürgen Teich. Introduction to invasive computing. Workshop on Resource Awareness and Adaptivity in Multi-Core Computing (Racing 2014), Paderborn, Germany, Tutorial Talk, May 29, 2014.
[44] Jürgen Teich. Foundations and benefits of invasive computing. University of Bologna, Italy, Invited Talk in the Seminar Series Trends in Electronics, May 23, 2014.
[45] Vahid Lari, Alexandru Tanase, Frank Hannig, and Jürgen Teich. Massively parallel processor architectures for resource-aware computing. In Proceedings of the First Workshop on Resource Awareness and Adaptivity in Multi-Core Computing (Racing 2014), pages 1–7, May 2014. [ arXiv ]
[46] Frank Hannig and Jürgen Teich, editors. Proceedings of the First Workshop on Resource Awareness and Adaptivity in Multi-Core Computing (Racing 2014). May 2014. [ arXiv ]
[47] Deepak Gangadharan, Alexandru Tanase, Frank Hannig, and Jürgen Teich. Timing analysis of a heterogeneous architecture with massively parallel processor arrays. In DATE Friday Workshop on Performance, Power and Predictability of Many-Core Embedded Systems (3PMCES). ECSI, March 28, 2014. [ http ]
[48] Éricles Sousa, Vahid Lari, Johny Paul, Frank Hannig, Jürgen Teich, and Walter Stechele. Resource-aware computer vision application on heterogeneous multi-tile architecture. Hardware and Software Demo at the University Booth at Design, Automation and Test in Europe (DATE), Dresden, Germany, March 2014.
[49] Frank Hannig, Vahid Lari, Srinivas Boppu, Alexandru Tanase, and Oliver Reiche. Invasive tightly-coupled processor arrays: A domain-specific architecture/compiler co-design approach. ACM Transactions on Embedded Computing Systems (TECS), 13(4s):133:1–133:29, 2014. [ DOI ]
[50] Jürgen Teich. Invasive computing – the quest for many-core efficiency and predictability. Keynote Talk, Sixth Swedish Workshop on Multicore Computing, Halmstad, Sweden, November 25, 2013.
[51] Jürgen Teich. Invasive computing - the quest for many-core efficiency and predictability. Invited Talk, 5th tubs.CITY Symposium, Managing change and autonomy or critical applications, Braunschweig, Germany, October 30, 2013.
[52] Éricles Sousa, Alexandru Tanase, Frank Hannig, and Jürgen Teich. Accuracy and performance analysis of harris corner computation on tightly-coupled processor arrays. In Proceedings of the Conference on Design and Architectures for Signal and Image Processing (DASIP), pages 88–95. IEEE, October 2013.
[53] Éricles Sousa, Alexandru Tanase, Frank Hannig, and Jürgen Teich. A prototype of an adaptive computer vision algorithm on an MPSoC architecture. In Proceedings of the Conference on Design and Architectures for Signal and Image Processing (DASIP), pages 361–362. IEEE, October 2013.
[54] Jürgen Teich. The invasive computing paradigm as a solution for highly adaptive and efficient multi-core systems. Talk, Special Session on Run-Time Adaption for Highly-Compley Multi-Core Systems, CODES+ISSS 2013, Montral, Canada, September 30, 2013.
[55] Elisabeth Glocker, Srinivas Boppu, Qingqing Chen, Ulf Schlichtmann, Jürgen Teich, and Doris Schmitt-Landsiedel. Temperature modeling and emulation of an ASIC temperature monitor system for Tightly-Coupled Processor Arrays (TCPAs) on FPGA. In Kleinheubacher Tagung 2013, September 2013.
[56] Alexandru Tanase, Vahid Lari, Frank Hannig, and Jürgen Teich. Exploitation of quality/throughput tradeoffs in image processing through invasive computing. In Proceedings of the International Conference on Parallel Computing (ParCo), pages 53–62, September 2013. [ DOI ]
[57] Srinivas Boppu, Frank Hannig, and Jürgen Teich. Loop program mapping and compact code generation for programmable hardware accelerators. In Proceedings of the 24th IEEE International Conference on Application-specific Systems, Architectures and Processors (ASAP), pages 10–17. IEEE, June 2013. [ DOI ]
[58] Vahid Lari, Srinivas Boppu, Frank Hannig, Jürgen Teich, and Troy Scott. Hybrid prototyping of tightly-coupled processor arrays for MPSoC designs. Designer Track Poster Presentation at the 50th Design Automation Conference (DAC), Austin, TX, USA, June 2013.
[59] Srinivas Boppu, Vahid Lari, Frank Hannig, and Jürgen Teich. Transactor-based prototyping of heterogeneous multiprocessor system-on-chip architectures. In Proceedings of the Synopsys Users Group Conference (SNUG), May 14, 2013.
[60] Frank Hannig, Moritz Schmid, Vahid Lari, Srinivas Boppu, and Jürgen Teich. System integration of tightly-coupled processor arrays using reconfigurable buffer structures. In Proceedings of the ACM International Conference on Computing Frontiers (CF), pages 2:1–2:4. ACM, May 2013. [ DOI ]
[61] Éricles Sousa, Alexandru Tanase, Vahid Lari, Frank Hannig, Jürgen Teich, Johny Paul, Walter Stechele, Manfred Kröhnert, and Tamim Asfour. Acceleration of optical flow computations on tightly-coupled processor arrays. In Proceedings of the 25th Workshop on Parallel Systems and Algorithms (PARS), volume 30 of Mitteilungen – Gesellschaft für Informatik e. V., Parallel-Algorithmen und Rechnerstrukturen, pages 80–89. Gesellschaft für Informatik e.V., April 2013.
[62] Vahid Lari, Srinivas Boppu, Frank Hannig, Shravan Muddasani, Boris Kuzmin, and Jürgen Teich. Resource-aware video processing on tightly-coupled processor arrays. Hardware and Software Demo at the University Booth at Design, Automation and Test in Europe (DATE), Grenoble, France, March 2013. [ .pdf ]
[63] Frank Hannig. Resource-aware computing on domain-specific accelerators. In Proceedings of the 10st Workshop on Optimizations for DSP and Embedded Systems (ODES), page 35. ACM, February 24, 2013. Keynote. [ DOI ]
[64] Jürgen Teich. Safe(r) loop computations on multi-cores. Invited Talk, 2nd Workshop on Design Tools and Architectures for Multi-Core Embedded Computing Platforms (DITAM 2013), Berlin, Germany, January 22, 2013.
[65] Vahid Lari, Shravan Muddasani, Srinivas Boppu, Frank Hannig, Moritz Schmid, and Jürgen Teich. Hierarchical power management for adaptive tightly-coupled processor arrays. ACM Transactions on Design Automation of Electronic Systems (TODAES), 18(1):2:1–2:25, January 2013. [ DOI ]
[66] Frank Hannig. Invasive tightly-coupled processor arrays. Talk, 1st International Workshop on Domain-Specific Multicore Computing (DSMC) at International Conference on Computer-Aided Design (ICCAD), San Jose, CA, USA, November 8, 2012.
[67] Frank Hannig. Why do we see more and more domain-specific accelerators in multi-processor systems? Guest Lecture at University of California, Riverside in CS 287 Colloquium in Computer Science, Riverside, CA, USA, November 9, 2012.
[68] Shravan Muddasani, Srinivas Boppu, Frank Hannig, Boris Kuzmin, Vahid Lari, and Jürgen Teich. A prototype of an invasive tightly-coupled processor array. In Proceedings of the Conference on Design and Architectures for Signal and Image Processing (DASIP), pages 393–394. IEEE, October 2012.
[69] Vahid Lari, Shravan Muddasani, Srinivas Boppu, Frank Hannig, and Jürgen Teich. Design of low power on-chip processor arrays. In Proceedings of the 23rd IEEE International Conference on Application-specific Systems, Architectures, and Processors (ASAP), pages 165–168. IEEE Computer Society, July 2012. [ DOI ]
[70] Jürgen Teich. Hardware/software co-design: The past, present, and predicting the future. Proceedings of the IEEE, 100(Centennial-Issue):1411–1430, May 2012. [ DOI ]
[71] Jörg Henkel, Andreas Herkersdorf, Lars Bauer, Thomas Wild, Michael Hübner, Ravi Kumar Pujari, Artjom Grudnitsky, Jan Heisswolf, Aurang Zaib, Benjamin Vogel, Vahid Lari, and Sebastian Kobbe. Invasive manycore architectures. In Proceedings of the 17th Asia and South Pacific Design Automation Conference (ASP-DAC), pages 193–200, January 2012. [ DOI ]
[72] Vahid Lari, Srinivas Boppu, Shravan Muddasani, Frank Hannig, and Jürgen Teich. Hierarchical power management for adaptive tightly-coupled processor arrays. Talk, International Workshop on Adaptive Power Management with Machine Intelligence at International Conference on Computer-Aided Design (ICCAD), San Jose, CA, USA, November 10, 2011.
[73] Srinivas Boppu, Frank Hannig, Jürgen Teich, and Roberto Perez-Andrade. Towards symbolic run-time reconfiguration in tightly-coupled processor arrays. In Proceedings of the International Conference on Reconfigurable Computing and FPGAs (ReConFig), pages 392–397. IEEE, November 2011. [ DOI ]
[74] Vahid Lari, Andriy Narovlyanskyy, Frank Hannig, and Jürgen Teich. Decentralized dynamic resource management support for massively parallel processor arrays. In Proceedings of the 22nd IEEE International Conference on Application-specific Systems, Architectures and Processors (ASAP), pages 87–94. IEEE Computer Society, September 2011. [ DOI ]
[75] Josef Angermeier, Eugen Sibirko, Rolf Wanka, and Jürgen Teich. Bitonic sorting on dynamically reconfigurable architectures. In Proceedings of the International Parallel and Distributed Processing Symposium Workshops (IPDPSW), pages 309–312, May 2011.
[76] Vahid Lari, Frank Hannig, and Jürgen Teich. Distributed resource reservation in massively parallel processor arrays. In Proceedings of the International Parallel and Distributed Processing Symposium Workshops (IPDPSW), pages 318–321. IEEE Computer Society, May 2011. [ DOI ]
[77] Dmitrij Kissler, Daniel Gran, Zoran A. Salcic, Frank Hannig, and Jürgen Teich. Scalable many-domain power gating in coarse-grained reconfigurable processor arrays. IEEE Embedded Systems Letters, 3(2):58–61, 2011. [ DOI ]
[78] Jürgen Teich, Jörg Henkel, Andreas Herkersdorf, Doris Schmitt-Landsiedel, Wolfgang Schröder-Preikschat, and Gregor Snelting. Invasive computing: An overview. In Michael Hübner and Jürgen Becker, editors, Multiprocessor System-on-Chip – Hardware Design and Tool Integration, pages 241–268. Springer, Berlin, Heidelberg, 2011. [ DOI ]
[79] Tom Vander Aa, Praveen Raghavan, Scott Mahlke, Bjorn De Sutter, Aviral Shrivastava, and Frank Hannig. Compilation techniques for CGRAs: Exploring all parallelization approaches. In Proceedings of the International Conference on Hardware-Software Codesign and System Synthesis (CODES+ISSS), pages 185–186. ACM, October 2010. [ DOI ]
[80] Amouri Abdulazim, Farhadur Arifin, Frank Hannig, and Jürgen Teich. FPGA implementation of an invasive computing architecture. In Proceedings of the IEEE International Conference on Field Programmable Technology (FPT), pages 135–142. IEEE, December 2009. [ DOI ]
[81] Farhadur Arifin, Richard Membarth, Amouri Abdulazim, Frank Hannig, and Jürgen Teich. FSM-controlled architectures for linear invasion. In Proceedings of the 17th IFIP/IEEE International Conference on Very Large Scale Integration (VLSI-SoC), pages 59–64, October 2009. [ DOI ]
[82] Jürgen Teich. Invasive algorithms and architectures. it - Information Technology, 50(5):300–310, 2008.