Redirecting to original paper in 30 seconds...
Click below to go immediately or wait for automatic redirect
📄 Abstract
Abstract: Natural Language Processing and Generation systems have recently shown the
potential to complement and streamline the costly and time-consuming job of
professional fact-checkers. In this work, we lift several constraints of
current state-of-the-art pipelines for automated fact-checking based on the
Retrieval-Augmented Generation (RAG) paradigm. Our goal is to benchmark, under
more realistic scenarios, RAG-based methods for the generation of verdicts -
i.e., short texts discussing the veracity of a claim - evaluating them on
stylistically complex claims and heterogeneous, yet reliable, knowledge bases.
Our findings show a complex landscape, where, for example, LLM-based retrievers
outperform other retrieval techniques, though they still struggle with
heterogeneous knowledge bases; larger models excel in verdict faithfulness,
while smaller models provide better context adherence, with human evaluations
favouring zero-shot and one-shot approaches for informativeness, and fine-tuned
models for emotional alignment.
Authors (4)
Daniel Russo
Stefano Menini
Jacopo Staiano
Marco Guerini
Submitted
December 19, 2024
Key Contributions
This work benchmarks RAG-based fact-checking pipelines under more realistic conditions, using stylistically complex claims and heterogeneous knowledge bases. It reveals nuanced findings: LLM-based retrievers outperform others but struggle with heterogeneous KBs; larger models improve verdict faithfulness while smaller ones enhance context adherence. Human evaluations favor zero-shot/one-shot approaches for informativeness, highlighting the complexity of optimizing RAG for fact-checking.
Business Value
Provides critical insights for developing more reliable and accurate automated fact-checking systems, helping to combat the spread of misinformation and improve the trustworthiness of information online.