Banner Banner

The Art of Losing to Win: Using Lossy Image Compression to Improve Data Loading in Deep Learning Pipelines

Lennart Behme
Saravanan Thirumuruganathan
Alireza Rezaei Mahdiraji
Jorge-Arnulfo Quiané-Ruiz
Volker Markl

April 03, 2023

Training deep learning (DL) models often takes a significant amount of time and is thus typically performed on expensive GPUs to speed up the process. However, data loading has recently been identified as one of the main performance bottlenecks in DL, resulting in GPU under-utilization. Looking forward, the combination of larger datasets and faster GPUs will exacerbate the problem. The data management community has started to address this by proposing data loading optimization techniques, including lossy image compression. While lossy compression is a conceptually promising approach for mitigating data loading bottlenecks in DL, there is only limited understanding of its efficacy in terms of impact on model throughput and accuracy. In this paper, we present an extensive experimental analysis of lossy image compression as a means to improve the performance of neural network training. We find that lossy compression can improve both throughput and accuracy of DL pipelines if resources such as time or storage capacity are limited. Furthermore, the choice of compression quality and codec are important hyperparameters when training deep neural networks.