arxiv_cl 95% Match Research Paper AI Researchers,Software Engineers,Machine Learning Engineers,Programmers 1 day ago

\texttt{ReMind}: Understanding Deductive Code Reasoning in LLMs

large-language-models › reasoning

📄 Abstract

Abstract: Large Language Models (LLMs) have achieved remarkable progress in code-related tasks. Despite their advancement, empirical evidence reveals that they still struggle with \emph{deductive code reasoning}, the ability to reason about the program execution process. While prior studies have recognized this limitation, the underlying causes remain largely underexplored. In this paper, we begin by presenting a comprehensive empirical study that reveals three key challenges undermining deductive code reasoning: (1) an intrinsic gap between generation and reasoning abilities, (2) a consistent bias towards code sources, and (3) weak zero-shot generalization on complex benchmarks. In light of these challenges, we propose \texttt{ReMind}, a multi-agent framework composed of \texttt{Mutator}, \texttt{Executor}, and \texttt{Inspector}. The \texttt{Mutator} generates code variants to mitigate bias towards code sources, the \texttt{Executor} traces variable states step-by-step to expose inconsistency, and the \texttt{Inspector} identifies problematic reasoning steps and provides control-flow refinement to bridge the intrinsic reasoning gap. Through their coordinated collaboration, \texttt{ReMind} systematically identifies and refines reasoning flaws, achieving outstanding performance and enabling robust zero-shot generalization. Extensive experiments on two benchmarks with five LLMs demonstrate the superior advantages of \texttt{ReMind} compared to baseline approaches in deductive code reasoning.

Authors (3)

Jun Gao

Yun Peng

Xiaoxue Ren

Submitted

November 1, 2025

arXiv Category

cs.PL

arXiv PDF

Key Contributions

Introduces ReMind, a multi-agent framework designed to improve deductive code reasoning in LLMs by addressing key challenges: the generation-reasoning gap, bias towards code sources, and weak zero-shot generalization. The framework uses a Mutator to generate variants, an Executor to trace execution, and an Inspector to analyze states, enhancing LLMs' ability to understand program execution.

Business Value

Enhances the reliability and accuracy of LLMs in code-related tasks, potentially leading to better code generation, debugging tools, and automated program analysis, improving developer productivity.

Paper Metadata

Innovation Type

Multi-Agent Framework for Reasoning

Deployment Feasibility

Moderate. Requires integrating multiple agents and potentially specialized execution environments.

Limitations Addressed

LLMs struggle with deductive code reasoning due to an intrinsic gap between generation and reasoning, bias towards code sources, and weak zero-shot generalization on complex benchmarks.

Technical Tags

Deductive Code ReasoningLarge Language Models (LLMs)Multi-Agent FrameworkCode GenerationProgram ExecutionZero-Shot GeneralizationCode VariantsReMindBias Mitigation

Research Topics

AI Reasoning CapabilitiesLLMs for CodeProgram AnalysisAI Agent SystemsMachine Learning Interpretability

Methods & Architectures

Multi-agent framework (Mutator, Executor, Inspector)Code variant generationStep-by-step execution tracingEmpirical study Large Language Models (LLMs)Multi-Agent Systems

Applications & Tasks

Software Development Code Analysis AI Research Improving deductive code reasoning in LLMsAddressing bias towards code sourcesEnhancing zero-shot generalization Deductive code reasoningProgram execution tracingCode variant generation

Datasets & Benchmarks

Benchmarks

Complex benchmarks

Deductive reasoning accuracyZero-shot generalization performance

Related Fields

Artificial IntelligenceSoftware EngineeringProgramming LanguagesMachine LearningFormal Methods

Keywords

LLMsCode ReasoningDeductive ReasoningProgram ExecutionMulti-Agent SystemsSoftware AnalysisAI FrameworkZero-Shot LearningCode GenerationReMind

Academic Context

#AI Reasoning Capabilities#LLMs for Code#Program Analysis#AI Agent Systems#Machine Learning Interpretability

Commercial Potential

Potential Products

AI-powered code analysis toolsAdvanced debugging assistantsAutomated program verification systems

Target Industries

Software DevelopmentTechnologyIT Services

Use Case Examples

Debugging complex code by tracing execution flowVerifying program correctness through deductive reasoningImproving LLM code generation accuracy

Competitive Edge

Addresses the specific challenge of deductive code reasoning in LLMs with a novel multi-agent approach, going beyond standard code generation benchmarks.

Resource Requirements

Compute Needs

Significant compute for running LLMs, code execution, and multi-agent coordination.

Data Requirements

Code datasets, benchmarks for deductive reasoning.

Deployment Constraints

Complexity of the multi-agent system, potential overhead from execution tracing.

Scalability

Scalability depends on the efficiency of the Executor and Inspector agents.

Production Readiness

Maturity Level

Research

View Full Paper Back to Papers