Redirecting to original paper in 30 seconds...

Click below to go immediately or wait for automatic redirect

arxiv_cl 95% Match Research Paper AI Researchers,Software Engineers,Machine Learning Engineers,Programmers 1 day ago

\texttt{ReMind}: Understanding Deductive Code Reasoning in LLMs

large-language-models › reasoning
📄 Abstract

Abstract: Large Language Models (LLMs) have achieved remarkable progress in code-related tasks. Despite their advancement, empirical evidence reveals that they still struggle with \emph{deductive code reasoning}, the ability to reason about the program execution process. While prior studies have recognized this limitation, the underlying causes remain largely underexplored. In this paper, we begin by presenting a comprehensive empirical study that reveals three key challenges undermining deductive code reasoning: (1) an intrinsic gap between generation and reasoning abilities, (2) a consistent bias towards code sources, and (3) weak zero-shot generalization on complex benchmarks. In light of these challenges, we propose \texttt{ReMind}, a multi-agent framework composed of \texttt{Mutator}, \texttt{Executor}, and \texttt{Inspector}. The \texttt{Mutator} generates code variants to mitigate bias towards code sources, the \texttt{Executor} traces variable states step-by-step to expose inconsistency, and the \texttt{Inspector} identifies problematic reasoning steps and provides control-flow refinement to bridge the intrinsic reasoning gap. Through their coordinated collaboration, \texttt{ReMind} systematically identifies and refines reasoning flaws, achieving outstanding performance and enabling robust zero-shot generalization. Extensive experiments on two benchmarks with five LLMs demonstrate the superior advantages of \texttt{ReMind} compared to baseline approaches in deductive code reasoning.
Authors (3)
Jun Gao
Yun Peng
Xiaoxue Ren
Submitted
November 1, 2025
arXiv Category
cs.PL
arXiv PDF

Key Contributions

Introduces ReMind, a multi-agent framework designed to improve deductive code reasoning in LLMs by addressing key challenges: the generation-reasoning gap, bias towards code sources, and weak zero-shot generalization. The framework uses a Mutator to generate variants, an Executor to trace execution, and an Inspector to analyze states, enhancing LLMs' ability to understand program execution.

Business Value

Enhances the reliability and accuracy of LLMs in code-related tasks, potentially leading to better code generation, debugging tools, and automated program analysis, improving developer productivity.