Modern datacenters are suffering from “datacenter tax” in supporting high-performance feature-rich devices such as GPU, SSD, NPU, and NIC. This tax takes a huge amount of CPU bandwidth, and thus modern datacenters have kept increasing the number of servers to provide sufficient CPU bandwidth to numerous devices. But, the unnecessarily scaled-out servers have to limit the number of the devices deployable per server and the performance of the devices to reduce the cost.

To alleviate the CPU overhead, Data Processing Unit (DPU) and its equivalents (i.e., IPU, SmartNIC, xPU) have been proposed and actively adopted by the industry. DPU or its equivalents are independent devices deployed per server (1) to offload the device-critical system software stack from host CPU to them, and (2) to accelerate the offloaded system software stack by exploiting their custom-designed architectures.

In this tutorial, we will first introduce modern DPU solutions with their opportunities and challenges. Next, we will introduce Alice DPU, an FPGA-based DPU platform available from MangoBoost (, and illustrate how our DPU solutions can boost the performance of the real-world datacenter applications by minimizing the datacenter tax. Finally, we will discuss research directions on how computer architects can take advantage of DPU for their own research purposes.


Related Publications

  1. F4T: A Fast and Flexible FPGA-based Full-stack TCP Acceleration Framework [ISCA 2023]
  2. SmartFVM: A Fast, Flexible, and Scalable Hardware-based Virtualization for Commodity Storage Devices [TOS 2022]
  3. A Fast and Flexible Hardware-based Virtualization Mechanism for Computational Storage Devices [ATC 2021]
  4. FVM: FPGA-assisted Virtual Device Emulation for Fast, Scalable, and Flexible Storage Virtualization [OSDI 2020]
  5. TrainBox: An Extreme-Scale Neural Network Training Server by Systematically Balancing Operations [MICRO 2020]
  6. FIDR: A Scalable Storage System for Fine-Grain Inline Data Reduction with Efficient Memory Handling [MICRO 2019]
  7. CIDR: A Cost-Effective In-line Data Reduction System for Terabit-per-Second Scale SSD Arrays [HPCA 2019]
  8. DCS-ctrl: A Fast and Flexible Device-Control Mechanism for Device-Centric Server Architecture [ISCA 2018]
  9. DCS: A Fast and Scalable Device-Centric Server Architecture [MICRO 2015]

Our tutorial will be held in Pier-2 on Saturday, October 28, 2023, prior to the 56th IEEE/ACM International Symposium on Microarchitecture (MICRO).