Redirecting to original paper in 30 seconds...

Click below to go immediately or wait for automatic redirect

arxiv_ai 85% Match Research Paper Machine Learning Researchers,AI Engineers,NLP Practitioners 2 weeks ago

Continual Learning via Sparse Memory Finetuning

large-language-models › training-methods
📄 Abstract

Abstract: Modern language models are powerful, but typically static after deployment. A major obstacle to building models that continually learn over time is catastrophic forgetting, where updating on new data erases previously acquired capabilities. Motivated by the intuition that mitigating forgetting is challenging because trainable parameters are shared across all tasks, we investigate whether sparse parameter updates can enable learning without catastrophic forgetting. We introduce sparse memory finetuning, leveraging memory layer models (Berges et al., 2024), which are sparsely updated by design. By updating only the memory slots that are highly activated by a new piece of knowledge relative to usage on pretraining data, we reduce interference between new knowledge and the model's existing capabilities. We evaluate learning and forgetting compared to full finetuning and parameter-efficient finetuning with LoRA on two question answering tasks. We find that sparse memory finetuning learns new knowledge while exhibiting substantially less forgetting: while NaturalQuestions F1 drops by 89% after full finetuning on new facts and 71% with LoRA, sparse memory finetuning yields only an 11% drop with the same level of new knowledge acquisition. Our results suggest sparsity in memory layers offers a promising path toward continual learning in large language models.
Authors (7)
Jessy Lin
Luke Zettlemoyer
Gargi Ghosh
Wen-Tau Yih
Aram Markosyan
Vincent-Pierre Berges
+1 more
Submitted
October 16, 2025
arXiv Category
cs.CL
arXiv PDF

Key Contributions

This paper introduces Sparse Memory Finetuning, a novel method to mitigate catastrophic forgetting in language models by sparsely updating parameters based on memory layer activations. This approach reduces interference between new and old knowledge, enabling models to learn continually without significant loss of prior capabilities.

Business Value

Enables the development of AI systems that can adapt and learn over time without requiring complete retraining, leading to more dynamic and responsive applications in areas like personalized assistants or evolving knowledge bases.