Redirecting to original paper in 30 seconds...

Click below to go immediately or wait for automatic redirect

arxiv_ml 85% Match Research Paper Machine Learning Researchers,AI Engineers,NLP Practitioners 4 weeks ago

SyMerge: From Non-Interference to Synergistic Merging via Single-Layer Adaptation

large-language-models › model-architecture
📄 Abstract

Abstract: Model merging offers an efficient alternative to multi-task learning by combining independently fine-tuned models, but most prior approaches focus mainly on avoiding task interference. We argue instead that the real potential of merging lies in achieving synergy, where tasks enhance one another. Our intuition comes from a pilot study showing that when a classifier trained on one task is paired with the encoder of another, the resulting cross-task performance strongly predicts merge quality. Moreover, adapting even a single task-specific layer can substantially improve this compatibility, suggesting a simple yet powerful lever for synergy. Building on this insight, we introduce SyMerge, a lightweight framework that jointly optimizes one task-specific layer and merging coefficients. To ensure stability without labels, SyMerge employs a robust self-labeling strategy guided by expert model predictions, avoiding the pitfalls of entropy-based adaptation. This minimalist yet principled design achieves state-of-the-art results across vision, dense prediction, and NLP benchmarks, while also producing adapted layers that transfer effectively to other merging methods. Our code is available at https://aim-skku.github.io/SyMerge/

Key Contributions

SyMerge introduces a novel approach to model merging that focuses on achieving synergy between tasks rather than just avoiding interference. By adapting a single task-specific layer and optimizing merging coefficients, it enables tasks to enhance each other, leading to improved performance. The framework uses a robust self-labeling strategy for stability without requiring labels, making it a practical and efficient solution for combining independently fine-tuned models.

Business Value

Enables more efficient deployment of AI models by combining specialized models into a single, more capable one, reducing computational costs and memory footprint. This is valuable for applications requiring diverse capabilities without retraining large models from scratch.