Redirecting to original paper in 30 seconds...

Click below to go immediately or wait for automatic redirect

arxiv_ml 95% Match Research Paper AI Researchers,LLM Developers,AI Ethicists,Product Managers evaluating LLM capabilities 1 day ago

Assessing LLM Reasoning Steps via Principal Knowledge Grounding

large-language-models › reasoning
📄 Abstract

Abstract: Step-by-step reasoning has become a standard approach for large language models (LLMs) to tackle complex tasks. While this paradigm has proven effective, it raises a fundamental question: How can we verify that an LLM's reasoning is accurately grounded in knowledge? To address this question, we introduce a novel evaluation suite that systematically assesses the knowledge grounding of intermediate reasoning. Our framework comprises three key components. (1) Principal Knowledge Collection, a large-scale repository of atomic knowledge essential for reasoning. Based on the collection, we propose (2) knowledge-grounded evaluation metrics designed to measure how well models recall and apply prerequisite knowledge in reasoning. These metrics are computed by our (3) evaluator LLM, a lightweight model optimized for cost-effective and reliable metric computation. Our evaluation suite demonstrates remarkable effectiveness in identifying missing or misapplied knowledge elements, providing crucial insights for uncovering fundamental reasoning deficiencies in LLMs. Beyond evaluation, we demonstrate how these metrics can be integrated into preference optimization, showcasing further applications of knowledge-grounded evaluation.
Authors (8)
Hyeon Hwang
Yewon Cho
Chanwoong Yoon
Yein Park
Minju Song
Kyungjae Lee
+2 more
Submitted
November 2, 2025
arXiv Category
cs.CL
arXiv PDF

Key Contributions

Introduces a novel evaluation suite to systematically assess the knowledge grounding of intermediate reasoning steps in LLMs. It comprises a Principal Knowledge Collection, knowledge-grounded evaluation metrics, and an evaluator LLM, enabling cost-effective and reliable verification of LLM reasoning accuracy.

Business Value

Enables more reliable and trustworthy deployment of LLMs for complex tasks by providing a robust method to evaluate their reasoning capabilities and ensure they are based on factual knowledge.