arxiv_ai 92% Match Research Paper AI Researchers,Software Developers,Machine Learning Engineers,Researchers in Automated Programming 2 weeks ago

CodeRL+: Improving Code Generation via Reinforcement with Execution Semantics Alignment

large-language-models › reasoning

📄 Abstract

Abstract: While Large Language Models (LLMs) excel at code generation by learning from vast code corpora, a fundamental semantic gap remains between their training on textual patterns and the goal of functional correctness, which is governed by formal execution semantics. Reinforcement Learning with Verifiable Rewards (RLVR) approaches attempt to bridge this gap using outcome rewards from executing test cases. However, solely relying on binary pass/fail signals is inefficient for establishing a well-aligned connection between the textual representation of code and its execution semantics, especially for subtle logical errors within the code. In this paper, we propose CodeRL+, a novel approach that integrates execution semantics alignment into the RLVR training pipeline for code generation. CodeRL+ enables the model to infer variable-level execution trajectory, providing a direct learning signal of execution semantics. CodeRL+ can construct execution semantics alignment directly using existing on-policy rollouts and integrates seamlessly with various RL algorithms. Extensive experiments demonstrate that CodeRL+ outperforms post-training baselines (including RLVR and Distillation), achieving a 4.6% average relative improvement in pass@1. CodeRL+ generalizes effectively to other coding tasks, yielding 15.5% and 4.4% higher accuracy on code-reasoning and test-output-generation benchmarks, respectively. CodeRL+ shows strong applicability across diverse RL algorithms and LLMs. Furthermore, probe analyses provide compelling evidence that CodeRL+ strengthens the alignment between code's textual representations and its underlying execution semantics.

Authors (13)

Xue Jiang

Yihong Dong

Mengyang Liu

Hongyi Deng

Tian Wang

Yongding Tao

+7 more

Submitted

October 21, 2025

arXiv Category

cs.SE

arXiv PDF

Key Contributions

CodeRL+ enhances code generation by integrating execution semantics alignment into the RLVR pipeline. It enables LLMs to infer variable-level execution trajectories, providing a direct learning signal that bridges the semantic gap between textual code patterns and functional correctness. This approach is more effective than relying solely on binary test case outcomes for identifying and correcting subtle logical errors.

Business Value

Leads to more reliable and functionally correct code generation, reducing debugging time and improving the quality of software produced by AI, which can significantly boost developer productivity.

Paper Metadata

Innovation Type

Algorithmic Improvement

Deployment Feasibility

Moderate. Requires robust execution environments and sophisticated reward mechanisms, but the potential for improved code quality is high.

Limitations Addressed

The inefficiency of using only binary pass/fail rewards in RLVR for code generation, which struggles to address subtle logical errors and fully align generated code with execution semantics.

Technical Tags

Code GenerationLarge Language Models (LLMs)Reinforcement Learning (RL)Execution SemanticsVerifiable RewardsFunctional CorrectnessVariable-level ExecutionSemantic GapTest Case ExecutionLogical Errors

Research Topics

AI for Code GenerationReinforcement Learning for LLMsBridging Semantic Gaps in AIFormal Verification in MLImproving LLM Reasoning

Methods & Architectures

Reinforcement Learning with Verifiable Rewards (RLVR)Execution semantics alignmentVariable-level execution trajectory inferenceDirect learning signal from execution Large Language Models (LLMs)

Applications & Tasks

Software Development Automated Programming AI Code Assistants Semantic gap between textual patterns and functional correctnessInefficiency of binary pass/fail rewardsSubtle logical errors in generated code Improving code generation accuracyEnsuring functional correctness of generated codeAligning code generation with execution semantics

Related Fields

Artificial IntelligenceMachine LearningNatural Language ProcessingSoftware EngineeringReinforcement LearningFormal Methods

Keywords

CodeRL+Code GenerationLLMReinforcement LearningExecution SemanticsFunctional CorrectnessRLVRSoftware EngineeringAI ProgrammingLogical Errors

Academic Context

#AI for Code Generation#Reinforcement Learning for LLMs#Bridging Semantic Gaps in AI#Formal Verification in ML#Improving LLM Reasoning

Commercial Potential

Potential Products

Advanced AI code completion toolsAutomated code generation platformsTools for verifying code correctness

Target Industries

Software DevelopmentTechnologyIT ServicesGaming

Use Case Examples

Generating complex algorithms with guaranteed functional correctnessAutomating the creation of bug-free code snippetsAssisting developers in writing more robust code

Competitive Edge

Improves upon existing RLVR methods for code generation by providing a richer, semantic-level learning signal, leading to more accurate and reliable code.

Resource Requirements

Compute Needs

High, due to the need for code execution and reinforcement learning training.

Data Requirements

Requires code corpora and execution environments with test cases.

Deployment Constraints

Requires a robust execution environment for verifying code semantics.

Scalability

Scalability depends on the efficiency of the code execution and RL training process.

Production Readiness

Maturity Level

Research

View Full Paper Back to Papers