arxiv_ai 95% Match Research Paper Information retrieval researchers,NLP engineers,Machine learning scientists,Search engine developers 2 weeks ago

Mixture of Experts Approaches in Dense Retrieval Tasks

graph-neural-networks › graph-learning

📄 Abstract

Abstract: Dense Retrieval Models (DRMs) are a prominent development in Information Retrieval (IR). A key challenge with these neural Transformer-based models is that they often struggle to generalize beyond the specific tasks and domains they were trained on. To address this challenge, prior research in IR incorporated the Mixture-of-Experts (MoE) framework within each Transformer layer of a DRM, which, though effective, substantially increased the number of additional parameters. In this paper, we propose a more efficient design, which introduces a single MoE block (SB-MoE) after the final Transformer layer. To assess the retrieval effectiveness of SB-MoE, we perform an empirical evaluation across three IR tasks. Our experiments involve two evaluation setups, aiming to assess both in-domain effectiveness and the model's zero-shot generalizability. In the first setup, we fine-tune SB-MoE with four different underlying DRMs on seven IR benchmarks and evaluate them on their respective test sets. In the second setup, we fine-tune SB-MoE on MSMARCO and perform zero-shot evaluation on thirteen BEIR datasets. Additionally, we perform further experiments to analyze the model's dependency on its hyperparameters (i.e., the number of employed and activated experts) and investigate how this variation affects SB-MoE's performance. The obtained results show that SB-MoE is particularly effective for DRMs with lightweight base models, such as TinyBERT and BERT-Small, consistently exceeding standard model fine-tuning across benchmarks. For DRMs with more parameters, such as BERT-Base and Contriever, our model requires a larger number of training samples to achieve improved retrieval performance. Our code is available online at: https://github.com/FaySokli/SB-MoE.

Authors (4)

Effrosyni Sokli

Pranav Kasela

Georgios Peikos

Gabriella Pasi

Submitted

October 17, 2025

arXiv Category

cs.IR

arXiv PDF

Key Contributions

Proposes a more efficient Mixture-of-Experts (MoE) design, SB-MoE, which places a single MoE block after the final Transformer layer of Dense Retrieval Models (DRMs). This aims to improve generalization without the substantial parameter increase of traditional layer-wise MoE.

Business Value

Enables more robust and adaptable search and information retrieval systems, leading to better user experiences and more efficient knowledge discovery.

Paper Metadata

Innovation Type

Algorithmic Improvement

Deployment Feasibility

Moderate, requires integration into existing DRM pipelines and evaluation of computational trade-offs.

Limitations Addressed

DRMs struggle to generalize beyond trained tasks/domains.,Traditional MoE in Transformers substantially increases parameters.,Need for efficient methods to improve DRM generalization.

Performance Gains

Aims to improve generalization and retrieval effectiveness with a more parameter-efficient MoE design.

Technical Tags

Dense Retrieval Models (DRMs)Information Retrieval (IR)Mixture-of-Experts (MoE)Transformer-based modelsGeneralizationSingle MoE block (SB-MoE)Zero-shot generalizabilityIn-domain effectivenessEmpirical evaluationTransformer layers

Research Topics

Information RetrievalDense RetrievalModel ArchitecturesGeneralization in Deep LearningMixture of Experts

Methods & Architectures

Single MoE block (SB-MoE) after final Transformer layerEmpirical evaluationFine-tuning DRMs Dense Retrieval Models (DRMs)TransformerMixture-of-Experts (MoE)

Applications & Tasks

Information Retrieval Systems Search Engines Question Answering Improving generalization of DRMsReducing parameter increase from MoEEnhancing retrieval effectiveness Retrieving relevant documentsImproving zero-shot generalizationAdapting DRMs to new tasks/domains

Datasets & Benchmarks

Benchmarks

Seven IR benchmarks

Retrieval effectivenessIn-domain effectivenessZero-shot generalizability

Related Fields

Information RetrievalNatural Language ProcessingMachine LearningDeep LearningComputer Science

Keywords

Dense RetrievalInformation RetrievalMixture of ExpertsMoETransformersGeneralizationSB-MoEZero-shot learningDRMParameter efficiency

Academic Context

#Information Retrieval#Dense Retrieval#Model Architectures#Generalization in Deep Learning#Mixture of Experts

Technology Stack

Frameworks & Libraries

Transformer

Commercial Potential

Potential Products

Next-generation search enginesIntelligent document retrieval systemsAdvanced question-answering platforms

Target Industries

TechnologyInformation ServicesE-commerceResearch

Use Case Examples

Improving search relevance in large document repositories.Enhancing the ability of a QA system to answer questions across different domains.Developing more adaptable information retrieval tools.

Competitive Edge

Offers a more parameter-efficient MoE approach for DRMs compared to traditional layer-wise MoE implementations.

Resource Requirements

Compute Needs

Moderate, depends on the underlying DRM and MoE configuration.

Data Requirements

Standard IR datasets for evaluation.

Deployment Constraints

Integration complexity and potential latency trade-offs.

Scalability

The SB-MoE approach aims for parameter efficiency, which can aid scalability.

View Full Paper Back to Papers