Chair for Computer Science 12 - Hardware-Software-Co-Design

Overview

Our research centers around the systematic design (CAD) of hardware/software systems, ranging from embedded systems to HPC platforms. One principal research direction is domain-specific computing that tries to tackle the very complex programming and design challenge of parallel heterogeneous computer architectures. Domain-specific computing drastically separates the concerns of algorithm development and target architecture implementation, including parallelization and low-level implementation details.

The key idea is to take advantage of the knowledge being inherent in a particular problem area or field of application, i.e., a particular domain, in a well-directed manner and thus, to master the complexity of heterogeneous systems. Such domain knowledge can be captured by reasonable abstractions, augmentations, and notations, e.g., libraries, Domain-specific programming languages (DSLs), or combinations of both (e.g., embedded DSLs implemented via template metaprogramming). On this basis, patterns can be utilized to transform and optimize the input description in a goal-oriented way during compilation, and, finally, to generate code for a specific target architecture. Thus, DSLs provide high productivity plus typically also high performance.

We develop DSLs and target platform languages to capture both domain and architecture knowledge, which is utilized during the different phases of compilation, parallelization, mapping, as well as code generation for a wide variety of architectures, e.g., multi-core processors, GPUs, MPSoCs, FPGAs. All these steps usually go along with optimizing and exploring the vast space of design options and trading off multiple objectives, such as performance, cost, energy, or reliability.

Our considered application domains include multigrid methods based on stencil computations, iterative algorithms on unstructured grids, image processing and computer vision tasks (e.g., for medical and automotive applications), and high-performance processing of Big Data.

People

Prof. Dr. Jürgen Teich

PD Dr. Frank Hannig

Dr. Stefan Wildermann

Research topics

ExaStencils — Advanced Stencil-Code Engineering

Project ExaStencils investigates and provides a unique, tool-assisted, domain-specific codesign approach for the important class of stencil codes, which play a central role in high-performance simulation on structured or block-structured grids. Stencils are regular access patterns on (usually multidimensional) data grids. Multigrid methods involve a hierarchy of very fine to successively coarser grids. The challenge of exascale is that, for the coarser grids, less processing power is required, and communication dominates. From the computational algorithm perspective, domain-specific investigations include the extraction and development of suitable stencils, the analysis of performance-relevant algorithmic tradeoffs (e.g., the number of grid levels), and the analysis and reduction of synchronization requirements guided by a template model of the targeted cluster architecture. Based on this analysis, sophisticated programming and software tool support is developed by capturing the relevant data structures and program segments for stencil computations in a domain-specific language and applying a generator-based product-line technology to generate and optimize automatically stencil codes tailored to each application–platform pair. A central distinguishing mark of ExaStencils is that domain knowledge is being pursued in a coordinated manner across all abstraction levels, from the formulation of the application scenario down to the generation of highly-optimized stencil code.

Further information: ExaStencils Project Website
Software: ExaStencils Code Generation Framework

HighPerMeshes — Domain-Specific Programming and Target-Platform-Aware Compiler Infrastructure for Algorithms on Unstructured Grids

The goal of HighPerMeshes is to develop a pragmatically valuable domain-specific framework for the efficient, parallel, and scaling implementation of iterative algorithms on unstructured grids. Simulation software in the time domain that falls into this category (e.g., TD-FEM, TD-DG, network simulations), has increasingly been used in scientific and industrial domains in recent years and complements or supplements comparable methods on structured grids. With the results of this project, developers can, with moderate effort, extend existing source codes in high-level languages by domain-specific library and language elements. The intelligent compiler infrastructure uses domain knowledge to enable performance-optimized, highly parallel execution on all relevant modern hardware architectures (Multicore, Manycore, GPU, FPGA), also in heterogeneous systems. Thus, the project offers to many HPC developers from science and industry a comfortable and sustainable path towards scaling usage of the most efficient current and future target architectures.

Further information: HighPerMeshes Project Page

Hipacc — The Heterogeneous Image Processing Acceleration Framework

Hipacc is a DSL embedded in C++ and a compiler framework for the domain of image processing. It captures domain knowledge in a compact and intuitive language and employs source-to-source translation combined with various optimizations to achieve excellent productivity paired with performance portability. The Hipacc approach has been applied and evaluated for a wide variety of parallel accelerator architectures, including manycore processors, such as NVIDIA and AMD GPUs and Intel Xeon Phi, embedded GPUs, Xilinx and Intel FPGAs, as well as vector units.

Software: Hipacc DSL and Compilation Framework

ReProVide — Query Optimisation and Near-Data Processing on Reconfigurable SoCs for Big Data Analysis

The goal of this project is to provide novel hardware and optimization techniques for scalable, high-performance processing of Big Data. We particularly target huge data sets with flexible schemata (row-oriented, column-oriented, document-oriented, irregular, or non-indexed) as well as data streams as found in click-stream enterprise analytics, software logs, and discussion-forum archives, as well as produced by sensors in IoT and Industrie 4.0. In this realm, the project investigates the potential of hardware-reconfigurable, FPGA-based SoCs for near-data processing where computations are pushed towards such heterogeneous data sources. Based on FPGA technology and in particular on their dynamic reconfiguration, we propose a generic architecture called ReProVide for low-cost processing of database queries.

Further information: ReProVide Project Website

Selected publications

Schmitt C., Kronawitter S., Hannig F., Teich J., Lengauer C.:
Automating the Development of High-Performance Multigrid Solvers
In: Proceedings of the IEEE 106 (2018), p. 1969-1984
ISSN: 0018-9219
DOI: 10.1109/JPROC.2018.2854229

Unat D., Dubey A., Hoefler T., Shalf J., Abraham M., Bianco M., Chamberlain BL., Cledat R., Edwards HC., Finkel H., Fürlinger K., Hannig F., Jeannot E., Kamil A., Keasler J., Kelly PHJ., Leung VJ., Ltaief H., Maruyama N., Newburn C., Pericàs M.:
Trends in Data Locality Abstractions for HPC Systems
In: IEEE Transactions on Parallel and Distributed Systems (2017)
ISSN: 1045-9219
DOI: 10.1109/TPDS.2017.2703149

Schmitt C., Schmid M., Kuckuk S., Köstler H., Teich J., Hannig F.:
Reconfigurable Hardware Generation of Multigrid Solvers with Conjugate Gradient Coarse-Grid Solution
In: Parallel Processing Letters 28 (2018), Article No.: 1850016
ISSN: 0129-6264
DOI: 10.1142/S0129626418500160

Kenter T., Mahale G., Alhaddad S., Grynko Y., Schmitt C., Afzal A., Hannig F., Förstner J., Plessl C.:
OpenCL-based FPGA Design to Accelerate the Nodal Discontinuous Galerkin Method for Unstructured Meshes
The 26th IEEE International Symposium on Field-Programmable Custom Computing Machines (FCCM) (Boulder, CO, USA, 29. April 2018 - 1. May 2018)
In: Proceedings of the 26th IEEE International Symposium on Field-Programmable Custom Computing Machines (FCCM) 2018
DOI: 10.1109/FCCM.2018.00037

Groth S., Schmitt C., Teich J., Hannig F.:
SYCL Code Generation for Multigrid Methods
22nd International Workshop on Software and Compilers for Embedded Systems (SCOPES '19) (Sankt Goar, Germany, 27. May 2019 - 29. May 2019)
In: 22nd International Workshop on Software and Compilers for Embedded Systems (SCOPES '19) 2019
DOI: 10.1145/3323439.3323984

Qiao B., Özkan MA., Teich J., Hannig F.:
The Best of Both Worlds: Combining CUDA Graph with an Image Processing DSL
57th Annual Design Automation Conference (DAC) (San Francisco, CA, 19. July 2020 - 23. July 2020)
In: Proceedings of the 57th Annual Design Automation Conference (DAC) 2020
DOI: 10.1109/DAC18072.2020.9218531

Qiao B., Reiche O., Hannig F., Teich J.:
From Loop Fusion to Kernel Fusion: A Domain-specific Approach to Locality Optimization
2019 International Symposium on Code Generation and Optimization (CGO) (Washington, DC, USA, 16. February 2019 - 20. February 2019)
In: Proceedings of the 2019 IEEE/ACM International Symposium on Code Generation and Optimization (CGO) 2019
DOI: 10.1109/CGO.2019.8661176

Reiche O., Özkan MA., Membarth R., Teich J., Hannig F.:
Generating FPGA-based Image Processing Accelerators with Hipacc
International Conference on Computer Aided Design (ICCAD) (Irvine, 13. November 2017 - 16. November 2017)
In: Proceedings of the International Conference on Computer Aided Design (ICCAD) 2017
DOI: 10.1109/ICCAD.2017.8203894

Membarth R., Reiche O., Hannig F., Teich J., Körner M., Eckert W.:
HIPAcc: A Domain-Specific Language and Compiler for Image Processing
In: IEEE Transactions on Parallel and Distributed Systems 27 (2016), p. 210-224
ISSN: 1045-9219
DOI: 10.1109/TPDS.2015.2394802

Becher A., Beena Gopalakrishnan Nair L., Broneske D., Drewes T., Gurumurthy B., Meyer-Wegener K., Pionteck T., Saake G., Teich J., Wildermann S.:
Integration of FPGAs in Database Management Systems: Challenges and Opportunities
In: Datenbank-Spektrum (2018)
ISSN: 1618-2162
DOI: 10.1007/s13222-018-0294-9

Chair for Computer Science 12 – Hardware-Software-Co-Design