Redirecting to original paper in 30 seconds...
Click below to go immediately or wait for automatic redirect
📄 Abstract
Abstract: Model merging offers an efficient alternative to multi-task learning by
combining independently fine-tuned models, but most prior approaches focus
mainly on avoiding task interference. We argue instead that the real potential
of merging lies in achieving synergy, where tasks enhance one another. Our
intuition comes from a pilot study showing that when a classifier trained on
one task is paired with the encoder of another, the resulting cross-task
performance strongly predicts merge quality. Moreover, adapting even a single
task-specific layer can substantially improve this compatibility, suggesting a
simple yet powerful lever for synergy. Building on this insight, we introduce
SyMerge, a lightweight framework that jointly optimizes one task-specific layer
and merging coefficients. To ensure stability without labels, SyMerge employs a
robust self-labeling strategy guided by expert model predictions, avoiding the
pitfalls of entropy-based adaptation. This minimalist yet principled design
achieves state-of-the-art results across vision, dense prediction, and NLP
benchmarks, while also producing adapted layers that transfer effectively to
other merging methods. Our code is available at
https://aim-skku.github.io/SyMerge/
Key Contributions
SyMerge introduces a novel approach to model merging that focuses on achieving synergy between tasks rather than just avoiding interference. By adapting a single task-specific layer and optimizing merging coefficients, it enables tasks to enhance each other, leading to improved performance. The framework uses a robust self-labeling strategy for stability without requiring labels, making it a practical and efficient solution for combining independently fine-tuned models.
Business Value
Enables more efficient deployment of AI models by combining specialized models into a single, more capable one, reducing computational costs and memory footprint. This is valuable for applications requiring diverse capabilities without retraining large models from scratch.