Redirecting to original paper in 30 seconds...
Click below to go immediately or wait for automatic redirect
📄 Abstract
Abstract: Selective retrieval improves the accuracy and efficiency of
retrieval-augmented generation (RAG) by reducing distractions from low-quality
retrievals. However, existing approaches underutilize the inherent knowledge of
large language models (LLMs), leading to suboptimal retrieval decisions and
degraded generation performance. To bridge this gap, we propose Self-Routing
RAG (SR-RAG), a novel framework that binds selective retrieval with knowledge
verbalization. SR-RAG enables an LLM to dynamically decide whether to retrieve
external knowledge or verbalize its own parametric knowledge. To this end, we
design a multi-task objective that jointly optimizes an LLM for knowledge
source selection, knowledge verbalization, and response generation. SR-RAG
further incorporates a nearest neighbor search mechanism at inference time to
improve the accuracy of knowledge source decisions under domain shifts.
Fine-tuning three LLMs with SR-RAG significantly improves both their response
accuracy and reduces the inference latency. Compared to the strongest selective
retrieval baseline, SR-RAG reduces the number of retrievals by 29% while
improving performance by 5.1%.
Key Contributions
SR-RAG enhances Retrieval-Augmented Generation (RAG) by enabling LLMs to dynamically choose between retrieving external knowledge or verbalizing their own parametric knowledge. It achieves this through a multi-task objective that jointly optimizes knowledge source selection, verbalization, and response generation, and uses nearest neighbor search for improved accuracy under domain shifts.
Business Value
Improves the reliability and accuracy of LLM-powered applications that rely on external knowledge, such as advanced chatbots, knowledge assistants, and research tools. This leads to more trustworthy and effective AI systems.