Redirecting to original paper in 30 seconds...

Click below to go immediately or wait for automatic redirect

arxiv_cl 95% Match Research Paper AI Safety Researchers,Healthcare AI Developers,Machine Learning Engineers,Clinicians,AI Ethicists 1 week ago

Towards Transparent Reasoning: What Drives Faithfulness in Large Language Models?

ai-safety › interpretability
📄 Abstract

Abstract: Large Language Models (LLMs) often produce explanations that do not faithfully reflect the factors driving their predictions. In healthcare settings, such unfaithfulness is especially problematic: explanations that omit salient clinical cues or mask spurious shortcuts can undermine clinician trust and lead to unsafe decision support. We study how inference and training-time choices shape explanation faithfulness, focusing on factors practitioners can control at deployment. We evaluate three LLMs (GPT-4.1-mini, LLaMA 70B, LLaMA 8B) on two datasets-BBQ (social bias) and MedQA (medical licensing questions), and manipulate the number and type of few-shot examples, prompting strategies, and training procedure. Our results show: (i) both the quantity and quality of few-shot examples significantly impact model faithfulness; (ii) faithfulness is sensitive to prompting design; (iii) the instruction-tuning phase improves measured faithfulness on MedQA. These findings offer insights into strategies for enhancing the interpretability and trustworthiness of LLMs in sensitive domains.
Authors (4)
Teague McMillan
Gabriele Dominici
Martin Gjoreski
Marc Langheinrich
Submitted
October 28, 2025
arXiv Category
cs.CL
arXiv PDF

Key Contributions

This paper investigates factors influencing explanation faithfulness in LLMs, particularly for healthcare applications. It finds that few-shot example quality/quantity and prompting design significantly impact faithfulness, while instruction tuning improves it on medical tasks, offering practical insights for deployment.

Business Value

Increases trust and safety in AI-driven decision support systems, especially in healthcare, by ensuring explanations are reliable and clinicians can understand the basis of AI recommendations.