Redirecting to original paper in 30 seconds...
Click below to go immediately or wait for automatic redirect
📄 Abstract
Abstract: Chat assistants increasingly integrate web search functionality, enabling
them to retrieve and cite external sources. While this promises more reliable
answers, it also raises the risk of amplifying misinformation from
low-credibility sources. In this paper, we introduce a novel methodology for
evaluating assistants' web search behavior, focusing on source credibility and
the groundedness of responses with respect to cited sources. Using 100 claims
across five misinformation-prone topics, we assess GPT-4o, GPT-5, Perplexity,
and Qwen Chat. Our findings reveal differences between the assistants, with
Perplexity achieving the highest source credibility, whereas GPT-4o exhibits
elevated citation of non-credibility sources on sensitive topics. This work
provides the first systematic comparison of commonly used chat assistants for
fact-checking behavior, offering a foundation for evaluating AI systems in
high-stakes information environments.
Key Contributions
Introduces a novel methodology to evaluate chat assistants' web search behavior, focusing on source credibility and response groundedness. It provides the first systematic comparison of popular assistants (GPT-4o, GPT-5, Perplexity, Qwen Chat) on fact-checking, revealing differences in their handling of credible vs. non-credible sources, especially on sensitive topics.
Business Value
Helps users and developers understand the reliability of AI assistants in providing factual information, crucial for applications where accuracy is paramount. Enables the development of more trustworthy AI assistants that minimize the spread of misinformation.