Redirecting to original paper in 30 seconds...
Click below to go immediately or wait for automatic redirect
📄 Abstract
Abstract: Current RAG retrievers are designed primarily for human readers, emphasizing
complete, readable, and coherent paragraphs. However, LLMs benefit more from
precise, compact, and well-structured input, which enhances reasoning quality
and efficiency. Existing methods often rely on reranking or summarization to
identify key sentences, but may suffer from semantic breaks and unfaithfulness.
Thus, efficiently extracting and organizing answer-relevant clues from
large-scale documents while reducing LLM reasoning costs remains a challenge
for RAG. Inspired by Occam's razor, we frame LLM-centric retrieval as a MinMax
optimization: maximizing the extraction of potential clues and reranking them
for well-organization, while minimizing reasoning costs by truncating to the
smallest sufficient clues set. In this paper, we propose CompSelect, a Compact
clue Selection mechanism for LLM-centric RAG, consisting of a clue extractor, a
reranker, and a truncator. (1) The clue extractor first uses answer-containing
sentences as fine-tuning targets, aiming to extract sufficient potential clues;
(2) The reranker is trained to prioritize effective clues based on real LLM
feedback; (3) The truncator uses the truncated text containing the minimum
sufficient clues for answering the question as fine-tuning targets, thereby
enabling efficient RAG reasoning. Experiments on three QA datasets show that
CompSelect improves QA performance by approximately 11\% and reduces Total
Latency and Online Latency by approximately 17\% and 67\% compared to various
baseline methods on both LLaMA3 and Qwen3. Further analysis confirms its
robustness to unreliable retrieval and generalization across different
scenarios, offering a scalable and cost-efficient solution for web-scale RAG
applications.
Key Contributions
Proposes CompSelect, a novel mechanism for LLM-centric RAG that frames retrieval as a MinMax optimization problem. It efficiently extracts, reranks, and truncates answer-relevant clues from large documents to minimize LLM reasoning costs while maximizing reasoning quality, addressing the limitations of traditional RAG retrievers designed for human readers.
Business Value
Significantly reduces the computational cost of using RAG with LLMs, making advanced AI applications more affordable and faster, enabling wider adoption in areas like customer support, knowledge management, and content generation.