Redirecting to original paper in 30 seconds...
Click below to go immediately or wait for automatic redirect
📄 Abstract
Abstract: Data-free quantization (DFQ) enables model quantization without accessing
real data, addressing concerns regarding data security and privacy. With the
growing adoption of Vision Transformers (ViTs), DFQ for ViTs has garnered
significant attention. However, existing DFQ methods exhibit two limitations:
(1) semantic distortion, where the semantics of synthetic images deviate
substantially from those of real images, and (2) semantic inadequacy, where
synthetic images contain extensive regions with limited content and
oversimplified textures, leading to suboptimal quantization performance. To
address these limitations, we propose SARDFQ, a novel Semantics Alignment and
Reinforcement Data-Free Quantization method for ViTs. To address semantic
distortion, SARDFQ incorporates Attention Priors Alignment (APA), which
optimizes synthetic images to follow randomly generated structure attention
priors. To mitigate semantic inadequacy, SARDFQ introduces Multi-Semantic
Reinforcement (MSR), leveraging localized patch optimization to enhance
semantic richness across synthetic images. Furthermore, SARDFQ employs
Soft-Label Learning (SL), wherein multiple semantic targets are adapted to
facilitate the learning of multi-semantic images augmented by MSR. Extensive
experiments demonstrate the effectiveness of SARDFQ, significantly surpassing
existing methods. For example, SARDFQ improves top-1 accuracy on ImageNet by
15.52% for W4A4 ViT-B. The code is at https://github.com/zysxmu/SARDFQ.
Authors (8)
Yunshan Zhong
Yuyao Zhou
Yuxin Zhang
Wanchen Sui
Shen Li
Yong Li
+2 more
Submitted
December 21, 2024
Key Contributions
Proposes SARDFQ, a novel Data-Free Quantization method for Vision Transformers that addresses semantic distortion and inadequacy. It uses Attention Priors Alignment (APA) to optimize synthetic images based on attention patterns and Multi-Semantic Reinforcement (MSR) to enrich synthetic data content.
Business Value
Enables the efficient deployment of large Vision Transformer models on resource-constrained devices (e.g., edge devices, mobile phones) without compromising performance, while also addressing data privacy concerns.