Productive Construction of High-Performance Systolic Arrays on FPGAs

FCCM Tutorial 2021

May 12


Zhiru Zhang (Cornell University)
Jason Cong (UCLA)
Hongbo Rong (Intel Labs)


Recent years have seen a growing number of application-specific systolic arrays (SAs) implemented on modern FPGAs for efficient compute acceleration. The characteristics of near-neighbor connections make SAs a great match for FPGAs, where it is particularly important to minimize long interconnects to meet the target clock frequency. However, it requires a tremendous amount of human effort to design and implement a high-performance SA for a given algorithm using the traditional RTL-based methodology. On the other hand, existing high-level synthesis (HLS) tools force the programmers to do “micro-coding” where many optimizations must be carried out through tedious code restructuring and/or insertion of vendor-specific pragmas. In this tutorial, we introduce our recent efforts on developing new programming models and automatic synthesis capabilities that enable FPGA programmers to productively build high-performance SAs.


Each segmant includes a technical presentation (30-35 mins) followed by a short demo and Q&A (5-10 mins).

Segment Speakers Resources Title & Abstract
1 Jason Cong, Jie Wang (UCLA) Slides Video AutoSA: A Polyhedral Compiler for High-Performance Systolic Arrays on FPGAs
AutoSA[1], an end-to-end compilation framework for generating systolic arrays on FPGA. AutoSA is based on the polyhedral framework, and further incorporates a set of optimizations on different dimensions to boost performance. As an example, we also show how AutoSA is used in an end-to-end deep learning acceleration framework FlexCNN[2].
2 Hongbo Rong (Intel Labs) Slides Video T2S: Programming Spatial Architectures for Productive Performance
T2S/SuSy[3][4], a programming framework built upon Halide for productively building high-performance SAs on FPGAs. T2S decouples the algorithm specification from spatial optimizations, where the former can concisely express any systolic algorithm while the latter can describe essential optimizations for systolic arrays.
3 Yi-Hsiang Lai, Shaojie Xiang (Cornell University) Slides Video Building High-Performance Systolic Arrays with HeteroCL
HeteroCL[5], a Python-based DSL and an automated compilation flow that maps the input algorithm into special-purpose accelerators through HLS. HeteroCL integrates AutoSA as a compiler backend for mapping systolic algorithms to efficient SA architectures.