Skip to content

An FPGA Reading List


Version History
Date Description
Aug 26, 2020 Add those 2 ISCA‘20 papers to Host Virtual Memory Section
Nov 30, 2019 Add a lot security papers
Oct 22, 2019 Shuffle scheduling section. More focused. Add two more recent fpga-virt papers
Oct 5, 2019 More on scheduling. Add NoC. Add Security.
Oct 4, 2019 Add more papers extracted from AmophOS
Oct 3, 2019 Initial version from Github

A list of related papers I came across while doing FPGA-related research. If you’d like to contribute, please comment below or create PR here.



Scheduling is big topic for FPGA. Unlike the traditional CPU scheduling, there are more aspects to consider, e.g., 1) Partial reconfiguration (PR), 2) Dynamic self PR, 3) Preemptive scheduling, 4) Relocation, 5) Floorplanning, and so on.

Preemptive Scheduling

  • Preemptive multitasking on FPGAs, 2000
  • Multitasking on FPGA Coprocessors, 2000
  • Context saving and restoring for multitasking in reconfigurable systems, 2005
  • ReconOS Cooperative multithreading in dynamically reconfigurable systems, FPL‘09
  • Block, drop or roll(back): Alternative preemption methods for RH multi-tasking, FCCM‘09
  • Hardware Context-Switch Methodology for Dynamically Partially Reconfigurable Systems, 2010
  • On-chip Context Save and Restore of Hardware Tasks on Partially Reconfigurable FPGAs, 2013
  • HTR: on-chip Hardware Task Relocation For Partially Reconfigurable FPGAs, 2013
  • Preemptive Hardware Multitasking in ReconOS, 2015

Preemptive Reconfiguration

  • Preemption of the Partial Reconfiguration Process to Enable Real-Time Computing, 2018


  • Github 7-series bitmap reverse engineering
  • PARBIT: A Tool to Transform Bitfiles to Implement Partial Reconfiguration of Field Programmable Gate Arrays (FPGAs), 2001
  • BITMAN: A Tool and API for FPGA Bitstream Manipulations, 2017


  • Context saving and restoring for multitasking in reconfigurable systems, 2005
  • REPLICA2Pro: Task Relocation by Bitstream Manipulation in Virtex-II/Pro FPGAs, 2006
  • Relocation and Automatic Floor-planning of FPGA Partial Configuration Bit-Streams, MSR 2008
  • Internal and External Bitstream Relocation for Partial Dynamic Reconfiguration, 2009
  • PRR-PRR Dynamic Relocation, 2009
  • HTR: on-chip Hardware Task Relocation For Partially Reconfigurable FPGAs, 2003
  • AutoReloc, 2016
  • HTR: on-chip Hardware Task Relocation For Partially Reconfigurable FPGAs, 2013



Network-on-Chip on FPGA.

Memory Hierarchy

Papers deal with BRAM, registers, on-board DRAM, and host DRAM.

Dynamic Memory Allocation

malloc() and free() for FPGA on-board DRAM.

Integrate with Host Virtual Memory

Papers deal with OS Virtual Memory System (VMS). Note that, all these papers introduce some form of MMU into the FPGA to let FPGA be able to work with host VMS. This added MMU is similar to CPU’s MMU and RDMA NIC’s internal cache. Note that the VMS still runs inside Linux (include pgfault, swapping, TLB shootdown and so on.), except one recent ISCA‘20 paper.

Integrate with Host OSs


If I were to recommend, I’d suggest start from:

  • Recent Attacks and Defenses on FPGA-based Systems, 2019
  • Physical Side-Channel Attacks and Covert Communication on FPGAs: A Survey, 2019
  • FPGA security: Motivations, features, and applications, 2014

The whole list:

  • FPGAhammer : Remote Voltage Fault Attacks on Shared FPGAs , suitable for DFA on AES
  • FPGA-Based Remote Power Side-Channel Attacks
  • Characterization of long wire data leakage in deep submicron FPGAS
  • Protecting against cryptographic Trojans in FPGAS
  • FPGA Side Channel Attacks without Physical Access
  • FPGA security: Motivations, features, and applications
  • FPGA side-channel receivers
  • Security of FPGAs in data centers
  • Secure Function Evaluation Using an FPGA Overlay Architecture
  • Mitigating Electrical-level Attacks towards Secure Multi-Tenant FPGAs in the Cloud
  • The Costs of Confidentiality in Virtualized FPGAs
  • Temporal Thermal Covert Channels in Cloud FPGAs
  • Characterizing Power Distribution Attacks in Multi-User FPGA Environments
  • FASE: FPGA Acceleration of Secure Function Evaluation
  • Securing Cryptographic Circuits by Exploiting Implementation Diversity and Partial Reconfiguration on FPGAs
  • Measuring Long Wire Leakage with Ring Oscillators in Cloud FPGAs
  • Physical Side-Channel Attacks and Covert Communication on FPGAs: A Survey
  • Leaky Wires: Information Leakage and Covert Communication Between FPGA Long Wires
  • Using the Power Side Channel of FPGAs for Communication
  • An Inside Job: Remote Power Analysis Attacks on FPGAs
  • Leakier Wires: Exploiting FPGA Long Wires for Covert- and Side-channel Attacks
  • Voltage drop-based fault attacks on FPGAs using valid bitstreams
  • Moats and Drawbridges: An Isolation Primitive for Reconfigurable Hardware Based Systems
  • Sensing nanosecond-scale voltage attacks and natural transients in FPGAs
  • Holistic Power Side-Channel Leakage Assessment:
  • Hiding Intermittent Information Leakage with Architectural Support for Blinking
  • Examining the consequences of high-level synthesis optimizations on power side-channel
  • Register transfer level information flow tracking for provably secure hardware design
  • A Protection and Pay-per-use Licensing Scheme for On-cloud FPGA Circuit IPs
  • Recent Attacks and Defenses on FPGA-based Systems
  • PFC: Privacy Preserving FPGA Cloud - A Case Study of MapReduce
  • A Pay-per-Use Licensing Scheme for Hardware IP Cores in Recent SRAM-Based FPGAs
  • FPGAs for trusted cloud computing


Summary on current FPGA Virtualization Status. Prior art mainly focus on: 1) How to virtualize on-chip BRAM (e.g., CoRAM, LEAP Scratchpad), 2) How to work with host, specifically, how to use the host DRAM, how to use host virtual memory. 3) How to schedule bitstreams inside a FPGA chip. 4) How to provide certain services to make FPGA programming easier (mostly work with host OS).

Languages, Runtime, and Framework

Innovations in the toolchain space.

Xilinx HLS

Xilinx CAD

High-Level Languages and Platforms

Integrate with Frameworks

  • Map-reduce as a Programming Model for Custom Computing Machines, FCCM‘08
    • This paper proposes a model to translate MapReduce code written in C to code that could run on FPGA and GPU. Many details are omitted, and they don’t really have the compiler.
    • Single-host framework, everything is in FPGA and GPU.
  • Axel: A Heterogeneous Cluster with FPGAs and GPUs, FPGA‘10
    • A distributed MapReduce Framework, targets clusters with CPU, GPU, and FPGA. Mainly the idea of scheduling FPGA/GPU jobs.
    • Distributed Framework.
  • FPMR: MapReduce Framework on FPGA, FPGA‘10
    • A MapReduce framework on a single host’s FPGA. You need to write Verilog/HLS for processing logic to hook with their framework. The framework mainly includes a data transfer controller, a simple schedule that enable certain blocks at certain time.
    • Single-host framework, everything is in FPGA.
  • Melia: A MapReduce Framework on OpenCL-Based FPGAs, IEEE‘16
    • Another framework, written in OpenCL, and users can use OpenCL to program as well. Similar to previous work, it’s more about the framework design, not specific algorithms on FPGA.
    • Single-host framework, everything is in FPGA. But they have a discussion on running on multiple FPGAs.
    • Four MapReduce FPGA papers here, I believe there are more. The marriage between MapReduce and FPGA is not something hard to understand. FPGA can be viewed as another core with different capabilities. The thing is, given FPGA’s reprogram-time and limited on-board memory, how to design a good scheduling algorithm and data moving/caching mechanisms. Those papers give some hints on this.
  • UCLA: When Apache Spark Meets FPGAs: A Case Study for Next-Generation DNA Sequencing Acceleration, HotCloud‘16
  • UCLA: Programming and Runtime Support to Blaze FPGA Accelerator Deployment at Datacenter Scale, SoCC‘16
    • A system that hooks FPGA with Spark.
    • There is a line of work that hook FPGA with big data processing framework (Spark), so the implementation of FPGA and the scale-out software can be separated. The Spark can schedule FPGA jobs to different machines, and take care of scale-out, failure handling etc. But, I personally think this line of work is really just an extension to ReconOS/FUSE/BORPH line of work. The main reason is: both these two lines of work try to integrate jobs run on CPU and jobs run on FPGA, so CPU and FPGA have an easier way to talk, or put in another way, CPU and FPGA have a better division of labor. Whether it’s single-machine (like ReconOS, Melia), or distributed (like Blaze, Axel), they are essentially the same.
  • UCLA: Heterogeneous Datacenters: Options and Opportunities, DAC‘16
    • Follow up work of Blaze. Nice comparison of big and wimpy cores.

Cloud Infrastructure



Programmable Network

Database and SQL


Machine Learning

  • TABLA: A Unified Template-based Framework for Accelerating Statistical Machine Learning, HPCA‘16
  • Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural Networks, FPGA‘15
  • From High-Level Deep Neural Models to FPGAs, ISCA‘16
  • Deep Learning on FPGAs: Past, Present, and Future, arXiv‘16
  • Accelerating binarized neural networks: Comparison of FPGA, CPU, GPU, and ASIC, FPT‘16
  • FINN: A Framework for Fast, Scalable Binarized Neural Network Inference, FPGA‘17
  • In-Datacenter Performance Analysis of a Tensor Processing Unit, ISCA‘17
  • Accelerating Binarized Convolutional Neural Networks with Software-Programmable FPGAs, FPGA‘17
  • A Configurable Cloud-Scale DNN Processor for Real-Time AI, ISCA‘18
    • Microsoft Project Brainware. Built on Catapult.
  • A Network-Centric Hardware/Algorithm Co-Design to Accelerate Distributed Training of Deep Neural Networks, MICRO‘18
  • DNNBuilder: an Automated Tool for Building High-Performance DNN Hardware Accelerators for FPGAs, ICCAD‘18
  • FA3C : FPGA-Accelerated Deep Reinforcement Learning, ASPLOS’19
  • Cognitive SSD: A Deep Learning Engine for In-Storage Data Retrieval, ATC‘19


  • A Scalable Processing-in-Memory Accelerator for Parallel Graph Processing, ISCA‘15
  • Energy Efficient Architecture for Graph Analytics Accelerators, ISCA‘16
  • Boosting the Performance of FPGA-based Graph Processor using Hybrid Memory Cube: A Case for Breadth First Search, FPGA‘17
  • FPGA-Accelerated Transactional Execution of Graph Workloads, FPGA‘17
  • An FPGA Framework for Edge-Centric Graph Processing, CF‘18

Key-Value Store

  • Achieving 10Gbps line-rate key-value stores with FPGAs, HotCloud‘13
  • Thin Servers with Smart Pipes: Designing SoC Accelerators for Memcached, ISCA‘13
  • An FPGA Memcached Appliance, FPGA‘13
  • Scaling out to a Single-Node 80Gbps Memcached Server with 40Terabytes of Memory, HotStorage‘15
  • KV-Direct: High-Performance In-Memory Key-Value Store with Programmable NIC, SOSP‘17
  • Ultra-Low-Latency and Flexible In-Memory Key-Value Store System Design on CPU-FPGA, FPT‘18



  • Consensus in a Box: Inexpensive Coordination in Hardware, NSDI‘16

Video Processing

  • Quantifying the Benefits of Dynamic Partial Reconfiguration for Embedded Vision Applications (FPL 2019)
  • Time-Shared Execution of Realtime Computer Vision Pipelines by Dynamic Partial Reconfiguration (FPL 2018)

FPGA Internal

FPGA20: Highlighting Significant Contributions from 20 Years of the International Symposium on Field-Programmable Gate Arrays (1992–2011)


Partial Reconfiguration

Logical Optimization and Technology Mapping

Place and Route


Last update: August 26, 2020


Back to top