Redirecting to original paper in 30 seconds...
Click below to go immediately or wait for automatic redirect
📄 Abstract
Abstract: Traditional Information Retrieval (IR) metrics, such as nDCG, MAP, and MRR,
assume that human users sequentially examine documents with diminishing
attention to lower ranks. This assumption breaks down in Retrieval Augmented
Generation (RAG) systems, where search results are consumed by Large Language
Models (LLMs), which, unlike humans, process all retrieved documents as a whole
rather than sequentially. Additionally, traditional IR metrics do not account
for related but irrelevant documents that actively degrade generation quality,
rather than merely being ignored. Due to these two major misalignments, namely
human vs. machine position discount and human relevance vs. machine utility,
classical IR metrics do not accurately predict RAG performance. We introduce a
utility-based annotation schema that quantifies both the positive contribution
of relevant passages and the negative impact of distracting ones. Building on
this foundation, we propose UDCG (Utility and Distraction-aware Cumulative
Gain), a metric using an LLM-oriented positional discount to directly optimize
the correlation with the end-to-end answer accuracy. Experiments on five
datasets and six LLMs demonstrate that UDCG improves correlation by up to 36%
compared to traditional metrics. Our work provides a critical step toward
aligning IR evaluation with LLM consumers and enables more reliable assessment
of RAG components
Authors (5)
Giovanni Trappolini
Florin Cuconasu
Simone Filice
Yoelle Maarek
Fabrizio Silvestri
Submitted
October 24, 2025
Key Contributions
This paper argues that traditional IR metrics (nDCG, MAP, MRR) are misaligned with Retrieval Augmented Generation (RAG) systems because LLMs process documents differently than humans. It proposes a utility-based annotation schema and new metrics (like UDCG) that account for both the positive contribution of relevant passages and the negative impact of distracting ones.
Business Value
Enables more accurate and meaningful evaluation of RAG systems, leading to better search and generation quality, and more reliable AI-powered information access tools.