Redirecting to original paper in 30 seconds...
Click below to go immediately or wait for automatic redirect
📄 Abstract
Abstract: Post-training quantization (PTQ) for vision transformers (ViTs) has garnered
significant attention due to its efficiency in compressing models. However,
existing methods typically overlook the relationship between a well-trained NN
and the quantized model, leading to considerable quantization error for PTQ.
However, it is unclear how to efficiently train a model-agnostic neural network
which is tailored for a predefined precision low-bit model. In this paper, we
firstly discover that a flat full precision neural network is crucial for
low-bit quantization. To achieve this, we propose a framework that proactively
pre-conditions the model by measuring and disentangling the error sources.
Specifically, both the Activation Quantization Error (AQE) and the Weight
Quantization Error (WQE) are statistically modeled as independent Gaussian
noises. We study several noise injection optimization methods to obtain a flat
minimum. Experimental results attest to the effectiveness of our approach.
These results open novel pathways for obtaining low-bit PTQ models.
Authors (3)
Peng Xia
Junbiao Pang
Tianyang Cai
Submitted
November 3, 2025
Key Contributions
Proposes a framework to efficiently train a 'flat' full-precision neural network that is crucial for low-bit quantization. It achieves this by measuring and disentangling quantization error sources (AQE and WQE) and using noise injection optimization, significantly reducing quantization error in PTQ.
Business Value
Enables deployment of powerful deep learning models on resource-constrained devices (e.g., edge AI, mobile), reducing hardware costs and power consumption.