Redirecting to original paper in 30 seconds...
Click below to go immediately or wait for automatic redirect
📄 Abstract
Abstract: Vision foundation models achieve remarkable performance but are only
available in a limited set of pre-determined sizes, forcing sub-optimal
deployment choices under real-world constraints. We introduce SnapViT:
Single-shot network approximation for pruned Vision Transformers, a new
post-pretraining structured pruning method that enables elastic inference
across a continuum of compute budgets. Our approach efficiently combines
gradient information with cross-network structure correlations, approximated
via an evolutionary algorithm, does not require labeled data, generalizes to
models without a classification head, and is retraining-free. Experiments on
DINO, SigLIPv2, DeIT, and AugReg models demonstrate superior performance over
state-of-the-art methods across various sparsities, requiring less than five
minutes on a single A100 GPU to generate elastic models that can be adjusted to
any computational budget. Our key contributions include an efficient pruning
strategy for pretrained Vision Transformers, a novel evolutionary approximation
of Hessian off-diagonal structures, and a self-supervised importance scoring
mechanism that maintains strong performance without requiring retraining or
labels. Code and pruned models are available at: https://elastic.ashita.nl/
Authors (5)
Walter Simoncini
Michael Dorkenwald
Tijmen Blankevoort
Cees G. M. Snoek
Yuki M. Asano
Submitted
October 20, 2025
Key Contributions
Introduces SnapViT, a retraining-free structured pruning method for Vision Transformers that enables elastic inference across a range of compute budgets. It efficiently combines gradient information with cross-network structure correlations via an evolutionary algorithm, achieving superior performance over state-of-the-art methods with minimal computational cost.
Business Value
Allows for flexible deployment of powerful vision models on devices with varying computational capabilities, reducing costs and expanding accessibility.