Redirecting to original paper in 30 seconds...

Click below to go immediately or wait for automatic redirect

arxiv_ai 90% Match Research Paper NLP Researchers,Information Retrieval Specialists,ML Engineers 2 weeks ago

MOSAIC: Masked Objective with Selective Adaptation for In-domain Contrastive Learning

large-language-models › training-methods
📄 Abstract

Abstract: We introduce MOSAIC (Masked Objective with Selective Adaptation for In-domain Contrastive learning), a multi-stage framework for domain adaptation of sentence embedding models that incorporates joint domain-specific masked supervision. Our approach addresses the challenges of adapting large-scale general-domain sentence embedding models to specialized domains. By jointly optimizing masked language modeling (MLM) and contrastive objectives within a unified training pipeline, our method enables effective learning of domain-relevant representations while preserving the robust semantic discrimination properties of the original model. We empirically validate our approach on both high-resource and low-resource domains, achieving improvements up to 13.4% in NDCG@10 (Normalized Discounted Cumulative Gain) over strong general-domain baselines. Comprehensive ablation studies further demonstrate the effectiveness of each component, highlighting the importance of balanced joint supervision and staged adaptation.
Authors (2)
Vera Pavlova
Mohammed Makhlouf
Submitted
October 19, 2025
arXiv Category
cs.CL
arXiv PDF

Key Contributions

Introduces MOSAIC, a multi-stage framework for domain adaptation of sentence embedding models that combines joint domain-specific masked language modeling (MLM) and contrastive objectives. It effectively adapts large general-domain models to specialized domains, improving retrieval performance.

Business Value

Enables more accurate and relevant search results and semantic understanding within specialized domains (e.g., legal, medical, scientific), improving information access and decision-making.