arxiv_cl 95% Match Research Paper AI Researchers,LLM Developers,NLP Engineers 4 weeks ago

Self-Routing RAG: Binding Selective Retrieval with Knowledge Verbalization

large-language-models › reasoning

📄 Abstract

Abstract: Selective retrieval improves the accuracy and efficiency of retrieval-augmented generation (RAG) by reducing distractions from low-quality retrievals. However, existing approaches underutilize the inherent knowledge of large language models (LLMs), leading to suboptimal retrieval decisions and degraded generation performance. To bridge this gap, we propose Self-Routing RAG (SR-RAG), a novel framework that binds selective retrieval with knowledge verbalization. SR-RAG enables an LLM to dynamically decide whether to retrieve external knowledge or verbalize its own parametric knowledge. To this end, we design a multi-task objective that jointly optimizes an LLM for knowledge source selection, knowledge verbalization, and response generation. SR-RAG further incorporates a nearest neighbor search mechanism at inference time to improve the accuracy of knowledge source decisions under domain shifts. Fine-tuning three LLMs with SR-RAG significantly improves both their response accuracy and reduces the inference latency. Compared to the strongest selective retrieval baseline, SR-RAG reduces the number of retrievals by 29% while improving performance by 5.1%.

Key Contributions

SR-RAG enhances Retrieval-Augmented Generation (RAG) by enabling LLMs to dynamically choose between retrieving external knowledge or verbalizing their own parametric knowledge. It achieves this through a multi-task objective that jointly optimizes knowledge source selection, verbalization, and response generation, and uses nearest neighbor search for improved accuracy under domain shifts.

Business Value

Improves the reliability and accuracy of LLM-powered applications that rely on external knowledge, such as advanced chatbots, knowledge assistants, and research tools. This leads to more trustworthy and effective AI systems.

Paper Metadata

Innovation Type

Framework/Methodology

Deployment Feasibility

Moderate to High. Requires integration with existing RAG pipelines and LLMs, but the framework itself is designed for inference-time improvement.

Limitations Addressed

Suboptimal retrieval decisions in RAG due to low-quality retrievals.,Underutilization of LLM's inherent parametric knowledge.,Degraded generation performance in RAG.,Accuracy degradation under domain shifts.

Performance Gains

Significant improvement in response accuracy (fine-tuning three LLMs).

Technical Tags

retrieval-augmented generationselective retrievalknowledge verbalizationLLM reasoningparametric knowledgeexternal knowledgenearest neighbor searchmulti-task learningdomain shift adaptationresponse generation

Research Topics

Knowledge IntegrationLLM AugmentationReasoning and InferenceInformation RetrievalNatural Language Generation

Methods & Architectures

Self-Routing RAG (SR-RAG)Knowledge VerbalizationMulti-task ObjectiveNearest Neighbor SearchJoint Optimization Large Language Models (LLMs)Retrieval-Augmented Generation (RAG)

Applications & Tasks

Question Answering Knowledge Management Content Creation Suboptimal Retrieval DecisionsDegraded Generation PerformanceUnderutilization of LLM KnowledgeDomain Shift Improve RAG accuracy and efficiencyDynamically select knowledge source (parametric vs. external)Generate accurate and relevant responses

Related Fields

Knowledge RepresentationInformation RetrievalMachine LearningArtificial Intelligence

Keywords

RAGLLMSelective RetrievalKnowledge VerbalizationParametric KnowledgeExternal KnowledgeReasoningDomain AdaptationNearest Neighbor SearchMulti-task Learning

Academic Context

#Knowledge Integration#LLM Augmentation#Reasoning and Inference#Information Retrieval#Natural Language Generation

Commercial Potential

Potential Products

Enhanced RAG modules for AI platformsKnowledge-intensive AI assistants

Target Industries

TechnologyInformation ServicesCustomer SupportResearch

Use Case Examples

A customer support chatbot that can either look up specific product details or use its internal knowledge to answer common questions.A research assistant that intelligently decides whether to query a database or rely on its learned information for a query.

Competitive Edge

Addresses limitations in existing RAG methods by integrating parametric knowledge and improving retrieval decisions, offering a more robust and versatile approach.

Market Opportunity

Large, as RAG is a key technology for improving LLM applications.

Revenue Models

Licensing the SR-RAG frameworkoffering it as a service.

Resource Requirements

Compute Needs

Moderate, requires LLM inference capabilities and potentially a separate index for nearest neighbor search.

Data Requirements

Requires access to external knowledge sources and LLMs trained on relevant data.

Deployment Constraints

Integration complexity with existing RAG systems.,Latency introduced by the retrieval and decision-making process.

Scalability

Scalability depends on the underlying LLM and the efficiency of the nearest neighbor search mechanism.

Production Readiness

Maturity Level

Research/Prototype

Time to Market

1-2 years

Patent Potential

Moderate, for the SR-RAG framework and its specific components.

View Full Paper Back to Papers