Banner Banner

Analyzing Near-Network Hardware Acceleration with Co-Processing on DPUs

Dimitrios Giouroukis
Dwi P. A. Nugroho
Varun Pandey
Steffen Zeuch
Volker Markl

January 07, 2026

Data Processing Units (DPUs) are PCIe network cards (SmartNICs) equipped with specialized hardware accelerators for data processing. DPUs offer the opportunity to process data near the hardware network stack (near-network). By enabling near-network computation, DPUs reduce CPU load and improve end-to-end performance, an increasingly attractive approach to trends like compute-storage disaggregation and real-time data ingestion. However, existing research on DPU-based processing often overlooks hardware acceleration or relies on static offloading to the ARM subsystem, leaving open questions about how best to split work (or co-process) with the host CPU. In this paper, we analyze near-network hardware acceleration with co-processing on DPUs, revealing that DPU performance varies significantly depending on input data types, task and query-imposed configurations. Through our micro-benchmark experiments, we explore partial offloads and co-processing strategies that demonstrate the trade-offs between higher throughput against reconfiguration overhead on DPUs. Our findings offer practical insights for data systems practitioners seeking to leverage near-network accelerators in data processing pipelines.