Redirecting to original paper in 30 seconds...

Click below to go immediately or wait for automatic redirect

arxiv_ml 95% Match Research Paper ML engineers,Robotics engineers,Mobile app developers,Researchers in efficient AI 1 week ago

RWKV-edge: Deeply Compressed RWKV for Resource-Constrained Devices

large-language-models › model-architecture
📄 Abstract

Abstract: To deploy LLMs on resource-contained platforms such as mobile robots and smartphones, non-transformers LLMs have achieved major breakthroughs. Recently, a novel RNN-based LLM family, Repentance Weighted Key Value (RWKV) has shown strong computational efficiency; nevertheless, RWKV models still have high parameter counts which limited their deployment. In this paper, we propose a suite of compression techniques, ranging from model architecture optimizations to post-training compression, tailored to the RWKV architecture. Combined, our techniques reduce the memory footprint of RWKV models by 3.4x -- 5x with only negligible degradation in accuracy; compared to transformer LLMs with similar accuracy, our models require 4x less memory footprint.
Authors (3)
Wonkyo Choe
Yangfeng Ji
Felix Xiaozhu Lin
Submitted
December 14, 2024
arXiv Category
cs.LG
arXiv PDF

Key Contributions

This paper introduces a suite of compression techniques specifically tailored for the RWKV architecture, significantly reducing its memory footprint (3.4x-5x) with negligible accuracy degradation. This enables the deployment of powerful LLMs on resource-constrained devices like smartphones and mobile robots, outperforming transformers in memory efficiency for similar accuracy.

Business Value

Enables the integration of advanced AI capabilities, like natural language understanding and generation, into a wider range of consumer electronics and embedded systems, driving innovation in mobile applications and robotics.