Redirecting to original paper in 30 seconds...
Click below to go immediately or wait for automatic redirect
📄 Abstract
Abstract: Deploying large language models (LLMs) often faces challenges from
substantial memory and computational costs. Quantization offers a solution, yet
performance degradation in the sub-1-bit regime remains particularly difficult.
This paper introduces LittleBit, a novel method for extreme LLM compression. It
targets levels like 0.1 bits per weight (BPW), achieving nearly 31$\times$
memory reduction, e.g., Llama2-13B to under 0.9 GB. LittleBit represents
weights in a low-rank form using latent matrix factorization, subsequently
binarizing these factors. To counteract information loss from this extreme
precision, it integrates a multi-scale compensation mechanism. This includes
row, column, and an additional latent dimension that learns per-rank
importance. Two key contributions enable effective training: Dual
Sign-Value-Independent Decomposition (Dual-SVID) for quantization-aware
training (QAT) initialization, and integrated Residual Compensation to mitigate
errors. Extensive experiments confirm LittleBit's superiority in sub-1-bit
quantization: e.g., its 0.1 BPW performance on Llama2-7B surpasses the leading
method's 0.7 BPW. LittleBit establishes a new, viable size-performance
trade-off--unlocking a potential 11.6$\times$ speedup over FP16 at the kernel
level--and makes powerful LLMs practical for resource-constrained environments.
Authors (4)
Banseok Lee
Dongkyu Kim
Youngcheon You
Youngmin Kim
Key Contributions
Introduces LittleBit, a novel method for extreme LLM compression targeting sub-1-bit quantization (e.g., 0.1 BPW) via latent matrix factorization and binarization. It employs multi-scale compensation and specialized QAT techniques (Dual-SVID, Residual Compensation) to mitigate performance loss.
Business Value
Enables the deployment of powerful LLMs on devices with limited memory and computational power (e.g., mobile phones, edge devices), democratizing access to advanced AI capabilities and reducing operational costs.