Redirecting to original paper in 30 seconds...

Click below to go immediately or wait for automatic redirect

arxiv_cv 90% Match Research Paper ML engineers,Researchers in model compression,Developers of edge AI applications,ViT researchers 3 weeks ago

I&S-ViT: An Inclusive & Stable Method for Pushing the Limit of Post-Training ViTs Quantization

computer-vision › object-detection
📄 Abstract

Abstract: Albeit the scalable performance of vision transformers (ViTs), the dense computational costs (training & inference) undermine their position in industrial applications. Post-training quantization (PTQ), tuning ViTs with a tiny dataset and running in a low-bit format, well addresses the cost issue but unluckily bears more performance drops in lower-bit cases. In this paper, we introduce I&S-ViT, a novel method that regulates the PTQ of ViTs in an inclusive and stable fashion. I&S-ViT first identifies two issues in the PTQ of ViTs: (1) Quantization inefficiency in the prevalent log2 quantizer for post-Softmax activations; (2) Rugged and magnified loss landscape in coarse-grained quantization granularity for post-LayerNorm activations. Then, I&S-ViT addresses these issues by introducing: (1) A novel shift-uniform-log2 quantizer (SULQ) that incorporates a shift mechanism followed by uniform quantization to achieve both an inclusive domain representation and accurate distribution approximation; (2) A three-stage smooth optimization strategy (SOS) that amalgamates the strengths of channel-wise and layer-wise quantization to enable stable learning. Comprehensive evaluations across diverse vision tasks validate I&S-ViT' superiority over existing PTQ of ViTs methods, particularly in low-bit scenarios. For instance, I&S-ViT elevates the performance of 3-bit ViT-B by an impressive 50.68%.

Key Contributions

I&S-ViT introduces a novel method for inclusive and stable post-training quantization (PTQ) of Vision Transformers (ViTs). It addresses quantization inefficiency in post-Softmax activations with a Shift-Uniform-Log2 Quantizer (SULQ) and mitigates rugged loss landscapes in post-LayerNorm activations, significantly reducing performance drops in low-bit scenarios.

Business Value

Enables the deployment of powerful Vision Transformer models on resource-constrained devices like mobile phones and edge hardware, reducing inference costs and latency for AI applications.