Redirecting to original paper in 30 seconds...
Click below to go immediately or wait for automatic redirect
📄 Abstract
Abstract: To raise awareness of the environmental impact of deep learning (DL), many
studies estimate the energy use of DL systems. However, energy estimates during
DL training often rely on unverified assumptions. This work addresses that gap
by investigating how model architecture and training environment affect energy
consumption. We train a variety of computer vision models and collect energy
consumption and accuracy metrics to analyze their trade-offs across
configurations. Our results show that selecting the right model-training
environment combination can reduce training energy consumption by up to 80.68%
with less than 2% loss in $F_1$ score. We find a significant interaction effect
between model and training environment: energy efficiency improves when GPU
computational power scales with model complexity. Moreover, we demonstrate that
common estimation practices, such as using FLOPs or GPU TDP, fail to capture
these dynamics and can lead to substantial errors. To address these
shortcomings, we propose the Stable Training Epoch Projection (STEP) and the
Pre-training Regression-based Estimation (PRE) methods. Across evaluations, our
methods outperform existing tools by a factor of two or more in estimation
accuracy.