Redirecting to original paper in 30 seconds...
Click below to go immediately or wait for automatic redirect
📄 Abstract
Abstract: Dense Retrieval Models (DRMs) are a prominent development in Information
Retrieval (IR). A key challenge with these neural Transformer-based models is
that they often struggle to generalize beyond the specific tasks and domains
they were trained on. To address this challenge, prior research in IR
incorporated the Mixture-of-Experts (MoE) framework within each Transformer
layer of a DRM, which, though effective, substantially increased the number of
additional parameters. In this paper, we propose a more efficient design, which
introduces a single MoE block (SB-MoE) after the final Transformer layer. To
assess the retrieval effectiveness of SB-MoE, we perform an empirical
evaluation across three IR tasks. Our experiments involve two evaluation
setups, aiming to assess both in-domain effectiveness and the model's zero-shot
generalizability. In the first setup, we fine-tune SB-MoE with four different
underlying DRMs on seven IR benchmarks and evaluate them on their respective
test sets. In the second setup, we fine-tune SB-MoE on MSMARCO and perform
zero-shot evaluation on thirteen BEIR datasets. Additionally, we perform
further experiments to analyze the model's dependency on its hyperparameters
(i.e., the number of employed and activated experts) and investigate how this
variation affects SB-MoE's performance. The obtained results show that SB-MoE
is particularly effective for DRMs with lightweight base models, such as
TinyBERT and BERT-Small, consistently exceeding standard model fine-tuning
across benchmarks. For DRMs with more parameters, such as BERT-Base and
Contriever, our model requires a larger number of training samples to achieve
improved retrieval performance. Our code is available online at:
https://github.com/FaySokli/SB-MoE.
Authors (4)
Effrosyni Sokli
Pranav Kasela
Georgios Peikos
Gabriella Pasi
Submitted
October 17, 2025
Key Contributions
Proposes a more efficient Mixture-of-Experts (MoE) design, SB-MoE, which places a single MoE block after the final Transformer layer of Dense Retrieval Models (DRMs). This aims to improve generalization without the substantial parameter increase of traditional layer-wise MoE.
Business Value
Enables more robust and adaptable search and information retrieval systems, leading to better user experiences and more efficient knowledge discovery.