Redirecting to original paper in 30 seconds...

Click below to go immediately or wait for automatic redirect

arxiv_ml 95% Match Research Paper AI Researchers,ML Engineers,LLM Developers,Researchers in AI Reasoning and Evaluation 4 days ago

Limits of Generalization in RLVR: Two Case Studies in Mathematical Reasoning

large-language-models › reasoning
📄 Abstract

Abstract: Mathematical reasoning is a central challenge for large language models (LLMs), requiring not only correct answers but also faithful reasoning processes. Reinforcement Learning with Verifiable Rewards (RLVR) has emerged as a promising approach for enhancing such capabilities; however, its ability to foster genuine reasoning remains unclear. We investigate RLVR on two combinatorial problems with fully verifiable solutions: \emph{Activity Scheduling} and the \emph{Longest Increasing Subsequence}, using carefully curated datasets with unique optima. Across multiple reward designs, we find that RLVR improves evaluation metrics but often by reinforcing superficial heuristics rather than acquiring new reasoning strategies. These findings highlight the limits of RLVR generalization, emphasizing the importance of benchmarks that disentangle genuine mathematical reasoning from shortcut exploitation and provide faithful measures of progress. Code available at https://github.com/xashru/rlvr-seq-generalization.
Authors (2)
Md Tanvirul Alam
Nidhi Rastogi
Submitted
October 30, 2025
arXiv Category
cs.LG
arXiv PDF Code

Key Contributions

This study investigates the generalization limits of RLVR for LLM mathematical reasoning using two combinatorial problems. It finds that RLVR often improves metrics by reinforcing superficial heuristics rather than acquiring new reasoning strategies, highlighting the limits of generalization and the need for benchmarks that measure genuine reasoning.

Business Value

Ensuring that AI models truly reason rather than rely on superficial patterns is critical for building reliable and trustworthy AI systems, especially in high-stakes domains like mathematics and science.

View Code on GitHub