Redirecting to original paper in 30 seconds...

Click below to go immediately or wait for automatic redirect

arxiv_ai 95% Match Research Paper LLM researchers,ML engineers,Researchers focused on efficient AI 2 weeks ago

Bi-Mamba: Towards Accurate 1-Bit State Space Models

large-language-models › model-architecture
📄 Abstract

Abstract: The typical Selective State-Space Model (SSM) used in Mamba addresses several limitations of Transformers, such as the quadratic computational complexity with respect to sequence length and the significant memory requirements during inference due to the key-value (KV) cache. However, the increasing size of Mamba models continues to pose challenges for training and deployment, particularly due to their substantial computational demands during both training and inference. In this work, we introduce $\texttt{Bi-Mamba}$, a scalable and powerful 1-bit Mamba architecture designed to enable more efficient large language models (LLMs), with model sizes of 780M, 1.3B, and 2.7B parameters. $\texttt{Bi-Mamba}$ models are trained from scratch on a standard LLM-scale dataset using an autoregressive distillation loss. Extensive experiments on language modeling benchmarks demonstrate that $\texttt{Bi-Mamba}$ achieves performance comparable to its full-precision (FP16 or BF16) counterparts, while outperforming post-training binarization (PTB) Mamba and binarization-aware training (BAT) Transformer baselines. Moreover, $\texttt{Bi-Mamba}$ drastically reduces memory usage and computational cost compared to the original Mamba. Our work pioneers a new line of linear-complexity LLMs under low-bit representation and provides the way for the design of specialized hardware optimized for efficient 1-bit Mamba-based models. Code and the pre-trained weights are available at https://github.com/Tangshengku/Bi-Mamba.
Authors (5)
Shengkun Tang
Liqun Ma
Haonan Li
Mingjie Sun
Zhiqiang Shen
Submitted
November 18, 2024
arXiv Category
cs.CL
arXiv PDF

Key Contributions

Introduces Bi-Mamba, a scalable 1-bit Mamba architecture designed for efficient large language models. Bi-Mamba models achieve performance comparable to full-precision counterparts while significantly reducing computational demands and memory requirements, enabling more efficient LLM training and deployment.

Business Value

Makes powerful LLMs more accessible and cost-effective to train and deploy by drastically reducing their computational footprint and memory usage.