Redirecting to original paper in 30 seconds...
Click below to go immediately or wait for automatic redirect
📄 Abstract
Abstract: As retrieval-augmented generation (RAG) tackles complex tasks, increasingly
expanded contexts offer richer information, but at the cost of higher latency
and increased cognitive load on the model. To mitigate this bottleneck,
especially for intricate multi-hop questions, we introduce BRIEF-Pro. It is a
universal, lightweight compressor that distills relevant evidence for a given
query from retrieved documents into a concise summary for seamless integration
into in-context RAG. Using seed data consisting of relatively short contexts
(fewer than 1k words), BRIEF-Pro is trained to perform abstractive compression
of extended contexts exceeding 10k words across a wide range of scenarios.
Furthermore, BRIEF-Pro offers flexible user control over summary length by
allowing users to specify the desired number of sentences. Experiments on four
open-domain multi-hop question-answering datasets show that BRIEF-Pro generates
more concise and relevant summaries, enhancing performance across small, large,
and proprietary language models. With the 70B reader model, 32x compression by
BRIEF-Pro improves QA performance by 4.67% on average over LongLLMLingua's 9x,
while requiring only 23% of its computational overhead.
Key Contributions
Introduces BRIEF-Pro, a universal, lightweight compressor for RAG systems that distills extended contexts into concise summaries for multi-hop reasoning. Trained using a 'short-to-long synthesis' approach, it reduces latency and enhances performance on complex Q&A tasks, offering user control over summary length.
Business Value
Significantly improves the speed and accuracy of AI systems performing complex reasoning tasks, making them more practical for real-time applications and reducing computational costs. This is valuable for advanced search, analysis, and decision-support tools.