arxiv_ai 85% Match Research Paper Machine Learning Researchers,AI Engineers,NLP Practitioners 2 weeks ago

Continual Learning via Sparse Memory Finetuning

large-language-models › training-methods

📄 Abstract

Abstract: Modern language models are powerful, but typically static after deployment. A major obstacle to building models that continually learn over time is catastrophic forgetting, where updating on new data erases previously acquired capabilities. Motivated by the intuition that mitigating forgetting is challenging because trainable parameters are shared across all tasks, we investigate whether sparse parameter updates can enable learning without catastrophic forgetting. We introduce sparse memory finetuning, leveraging memory layer models (Berges et al., 2024), which are sparsely updated by design. By updating only the memory slots that are highly activated by a new piece of knowledge relative to usage on pretraining data, we reduce interference between new knowledge and the model's existing capabilities. We evaluate learning and forgetting compared to full finetuning and parameter-efficient finetuning with LoRA on two question answering tasks. We find that sparse memory finetuning learns new knowledge while exhibiting substantially less forgetting: while NaturalQuestions F1 drops by 89% after full finetuning on new facts and 71% with LoRA, sparse memory finetuning yields only an 11% drop with the same level of new knowledge acquisition. Our results suggest sparsity in memory layers offers a promising path toward continual learning in large language models.

Authors (7)

Jessy Lin

Luke Zettlemoyer

Gargi Ghosh

Wen-Tau Yih

Aram Markosyan

Vincent-Pierre Berges

+1 more

Submitted

October 16, 2025

arXiv Category

cs.CL

arXiv PDF

Key Contributions

This paper introduces Sparse Memory Finetuning, a novel method to mitigate catastrophic forgetting in language models by sparsely updating parameters based on memory layer activations. This approach reduces interference between new and old knowledge, enabling models to learn continually without significant loss of prior capabilities.

Business Value

Enables the development of AI systems that can adapt and learn over time without requiring complete retraining, leading to more dynamic and responsive applications in areas like personalized assistants or evolving knowledge bases.

Paper Metadata

Innovation Type

Algorithmic Innovation

Deployment Feasibility

Potentially high, as it focuses on efficient parameter updates, which can reduce computational overhead during fine-tuning and deployment.

Limitations Addressed

Addresses the limitation of catastrophic forgetting in continuously learning models, where updating on new data erases previously acquired knowledge. It also aims to improve upon parameter-efficient finetuning methods like LoRA by offering better knowledge retention.

Technical Tags

continual learningsparse updatesmemory networkscatastrophic forgettingparameter-efficient finetuningLoRAlanguage modelsknowledge retention

Research Topics

Continual LearningModel AdaptationParameter EfficiencyKnowledge PreservationMachine Learning Theory

Methods & Architectures

Sparse Memory FinetuningLoRA (Low-Rank Adaptation)Memory Layer Models Transformer-based Language ModelsMemory Networks

Applications & Tasks

Natural Language Processing Machine Learning Catastrophic ForgettingContinual Learning ObstaclesModel Update Interference Question AnsweringContinual Learning

Related Fields

Machine LearningNatural Language ProcessingDeep LearningArtificial Intelligence

Keywords

continual learningcatastrophic forgettingsparse updatesmemory networkslanguage modelsfinetuningparameter efficiencyLoRAknowledge retentionmodel adaptationNLPdeep learning

Academic Context

#Continual Learning#Model Adaptation#Parameter Efficiency#Knowledge Preservation#Machine Learning Theory

Technology Stack

Frameworks & Libraries

LoRA

Commercial Potential

Potential Products

Adaptive AI AssistantsEvolving Knowledge SystemsPersonalized Recommendation Engines

Target Industries

TechnologySoftware DevelopmentCustomer Service

Use Case Examples

Updating a chatbot with new information without losing its conversational history.Continuously improving a content moderation system as new types of harmful content emerge.

Competitive Edge

Offers a more effective approach to continual learning than standard full finetuning or existing parameter-efficient methods by specifically addressing catastrophic forgetting through sparse updates.

Market Opportunity

Growing market for adaptive and continuously learning AI systems.

Revenue Models

Licensing of the technologyintegration into AI platformsor offering adaptive AI services.

Resource Requirements

Compute Needs

Moderate to high, depending on the base language model size, but potentially lower during fine-tuning compared to full finetuning.

Data Requirements

Requires datasets for pretraining and subsequent learning tasks to evaluate continual learning capabilities.

Deployment Constraints

Model size and inference latency remain considerations, though parameter efficiency might alleviate some of these.

Scalability

The sparse update mechanism could potentially improve scalability by reducing the amount of data and computation needed for updates.

Production Readiness

Maturity Level

Research/Experimental

Time to Market

1-3 years

Patent Potential

Moderate, depending on the novelty and specific implementation details of the sparse update mechanism.

View Full Paper Back to Papers