Redirecting to original paper in 30 seconds...
Click below to go immediately or wait for automatic redirect
📄 Abstract
Abstract: Large Language Models (LLMs) have achieved remarkable progress in
code-related tasks. Despite their advancement, empirical evidence reveals that
they still struggle with \emph{deductive code reasoning}, the ability to reason
about the program execution process. While prior studies have recognized this
limitation, the underlying causes remain largely underexplored. In this paper,
we begin by presenting a comprehensive empirical study that reveals three key
challenges undermining deductive code reasoning: (1) an intrinsic gap between
generation and reasoning abilities, (2) a consistent bias towards code sources,
and (3) weak zero-shot generalization on complex benchmarks. In light of these
challenges, we propose \texttt{ReMind}, a multi-agent framework composed of
\texttt{Mutator}, \texttt{Executor}, and \texttt{Inspector}. The
\texttt{Mutator} generates code variants to mitigate bias towards code sources,
the \texttt{Executor} traces variable states step-by-step to expose
inconsistency, and the \texttt{Inspector} identifies problematic reasoning
steps and provides control-flow refinement to bridge the intrinsic reasoning
gap. Through their coordinated collaboration, \texttt{ReMind} systematically
identifies and refines reasoning flaws, achieving outstanding performance and
enabling robust zero-shot generalization. Extensive experiments on two
benchmarks with five LLMs demonstrate the superior advantages of
\texttt{ReMind} compared to baseline approaches in deductive code reasoning.
Authors (3)
Jun Gao
Yun Peng
Xiaoxue Ren
Submitted
November 1, 2025
Key Contributions
Introduces ReMind, a multi-agent framework designed to improve deductive code reasoning in LLMs by addressing key challenges: the generation-reasoning gap, bias towards code sources, and weak zero-shot generalization. The framework uses a Mutator to generate variants, an Executor to trace execution, and an Inspector to analyze states, enhancing LLMs' ability to understand program execution.
Business Value
Enhances the reliability and accuracy of LLMs in code-related tasks, potentially leading to better code generation, debugging tools, and automated program analysis, improving developer productivity.