Redirecting to original paper in 30 seconds...

Click below to go immediately or wait for automatic redirect

arxiv_ai 90% Match Research Paper Reinforcement Learning Researchers,Robotics Engineers,Control Systems Engineers 2 weeks ago

Actor-Free Continuous Control via Structurally Maximizable Q-Functions

reinforcement-learning › offline-rl
📄 Abstract

Abstract: Value-based algorithms are a cornerstone of off-policy reinforcement learning due to their simplicity and training stability. However, their use has traditionally been restricted to discrete action spaces, as they rely on estimating Q-values for individual state-action pairs. In continuous action spaces, evaluating the Q-value over the entire action space becomes computationally infeasible. To address this, actor-critic methods are typically employed, where a critic is trained on off-policy data to estimate Q-values, and an actor is trained to maximize the critic's output. Despite their popularity, these methods often suffer from instability during training. In this work, we propose a purely value-based framework for continuous control that revisits structural maximization of Q-functions, introducing a set of key architectural and algorithmic choices to enable efficient and stable learning. We evaluate the proposed actor-free Q-learning approach on a range of standard simulation tasks, demonstrating performance and sample efficiency on par with state-of-the-art baselines, without the cost of learning a separate actor. Particularly, in environments with constrained action spaces, where the value functions are typically non-smooth, our method with structural maximization outperforms traditional actor-critic methods with gradient-based maximization. We have released our code at https://github.com/USC-Lira/Q3C.
Authors (4)
Yigit Korkmaz
Urvi Bhuwania
Ayush Jain
Erdem Bıyık
Submitted
October 21, 2025
arXiv Category
cs.LG
arXiv PDF

Key Contributions

This paper proposes a novel actor-free, purely value-based framework for continuous control reinforcement learning. By revisiting structural maximization of Q-functions and introducing specific architectural and algorithmic choices, it aims to achieve efficient and stable learning without the instability often associated with actor-critic methods in continuous action spaces.

Business Value

Enables more stable and efficient training of AI agents for complex control tasks in robotics and autonomous systems, potentially reducing development time and improving performance.