Redirecting to original paper in 30 seconds...
Click below to go immediately or wait for automatic redirect
📄 Abstract
Abstract: To deploy LLMs on resource-contained platforms such as mobile robots and
smartphones, non-transformers LLMs have achieved major breakthroughs. Recently,
a novel RNN-based LLM family, Repentance Weighted Key Value (RWKV) has shown
strong computational efficiency; nevertheless, RWKV models still have high
parameter counts which limited their deployment. In this paper, we propose a
suite of compression techniques, ranging from model architecture optimizations
to post-training compression, tailored to the RWKV architecture. Combined, our
techniques reduce the memory footprint of RWKV models by 3.4x -- 5x with only
negligible degradation in accuracy; compared to transformer LLMs with similar
accuracy, our models require 4x less memory footprint.
Authors (3)
Wonkyo Choe
Yangfeng Ji
Felix Xiaozhu Lin
Submitted
December 14, 2024
Key Contributions
This paper introduces a suite of compression techniques specifically tailored for the RWKV architecture, significantly reducing its memory footprint (3.4x-5x) with negligible accuracy degradation. This enables the deployment of powerful LLMs on resource-constrained devices like smartphones and mobile robots, outperforming transformers in memory efficiency for similar accuracy.
Business Value
Enables the integration of advanced AI capabilities, like natural language understanding and generation, into a wider range of consumer electronics and embedded systems, driving innovation in mobile applications and robotics.