Redirecting to original paper in 30 seconds...

Click below to go immediately or wait for automatic redirect

arxiv_cv 85% Match Research Paper ML Engineers,AI Researchers,Hardware Engineers,Data Scientists 2 days ago

Semantic Alignment and Reinforcement for Data-Free Quantization of Vision Transformers

generative-ai › diffusion
📄 Abstract

Abstract: Data-free quantization (DFQ) enables model quantization without accessing real data, addressing concerns regarding data security and privacy. With the growing adoption of Vision Transformers (ViTs), DFQ for ViTs has garnered significant attention. However, existing DFQ methods exhibit two limitations: (1) semantic distortion, where the semantics of synthetic images deviate substantially from those of real images, and (2) semantic inadequacy, where synthetic images contain extensive regions with limited content and oversimplified textures, leading to suboptimal quantization performance. To address these limitations, we propose SARDFQ, a novel Semantics Alignment and Reinforcement Data-Free Quantization method for ViTs. To address semantic distortion, SARDFQ incorporates Attention Priors Alignment (APA), which optimizes synthetic images to follow randomly generated structure attention priors. To mitigate semantic inadequacy, SARDFQ introduces Multi-Semantic Reinforcement (MSR), leveraging localized patch optimization to enhance semantic richness across synthetic images. Furthermore, SARDFQ employs Soft-Label Learning (SL), wherein multiple semantic targets are adapted to facilitate the learning of multi-semantic images augmented by MSR. Extensive experiments demonstrate the effectiveness of SARDFQ, significantly surpassing existing methods. For example, SARDFQ improves top-1 accuracy on ImageNet by 15.52% for W4A4 ViT-B. The code is at https://github.com/zysxmu/SARDFQ.
Authors (8)
Yunshan Zhong
Yuyao Zhou
Yuxin Zhang
Wanchen Sui
Shen Li
Yong Li
+2 more
Submitted
December 21, 2024
arXiv Category
cs.CV
arXiv PDF

Key Contributions

Proposes SARDFQ, a novel Data-Free Quantization method for Vision Transformers that addresses semantic distortion and inadequacy. It uses Attention Priors Alignment (APA) to optimize synthetic images based on attention patterns and Multi-Semantic Reinforcement (MSR) to enrich synthetic data content.

Business Value

Enables the efficient deployment of large Vision Transformer models on resource-constrained devices (e.g., edge devices, mobile phones) without compromising performance, while also addressing data privacy concerns.