Redirecting to original paper in 30 seconds...

Click below to go immediately or wait for automatic redirect

arxiv_ml 90% Match Research Paper Machine Learning Researchers,AI Engineers,NLP Practitioners,Computer Vision Engineers 1 week ago

MoORE: SVD-based Model MoE-ization for Conflict- and Oblivion-Resistant Multi-Task Adaptation

large-language-models › model-architecture
📄 Abstract

Abstract: Adapting large-scale foundation models in multi-task scenarios often suffers from task conflict and oblivion. To mitigate such issues, we propose a novel ''model MoE-ization'' strategy that leads to a conflict- and oblivion-resistant multi-task adaptation method. Given a weight matrix of a pre-trained model, our method applies SVD to it and introduces a learnable router to adjust its singular values based on tasks and samples. Accordingly, the weight matrix becomes a Mixture of Orthogonal Rank-one Experts (MoORE), in which each expert corresponds to the outer product of a left singular vector and the corresponding right one. We can improve the model capacity by imposing a learnable orthogonal transform on the right singular vectors. Unlike low-rank adaptation (LoRA) and its MoE-driven variants, MoORE guarantees the experts' orthogonality and maintains the column space of the original weight matrix. These two properties make the adapted model resistant to the conflicts among the new tasks and the oblivion of its original tasks, respectively. Experiments on various datasets demonstrate that MoORE outperforms existing multi-task adaptation methods consistently, showing its superiority in terms of conflict- and oblivion-resistance. The code of the experiments is available at https://github.com/DaShenZi721/MoORE.
Authors (5)
Shen Yuan
Yin Zheng
Taifeng Wang
Binbin Liu
Hongteng Xu
Submitted
June 17, 2025
arXiv Category
cs.LG
arXiv PDF

Key Contributions

Introduces MoORE, a novel SVD-based 'model MoE-ization' strategy for conflict- and oblivion-resistant multi-task adaptation. It decomposes weight matrices into orthogonal rank-one experts, allowing for improved capacity and resistance to task interference, unlike traditional LoRA methods.

Business Value

Enables more efficient and effective adaptation of large pre-trained models to multiple downstream tasks, reducing the need for task-specific models and mitigating performance degradation.