Redirecting to original paper in 30 seconds...
Click below to go immediately or wait for automatic redirect
📄 Abstract
Abstract: Value-based algorithms are a cornerstone of off-policy reinforcement learning
due to their simplicity and training stability. However, their use has
traditionally been restricted to discrete action spaces, as they rely on
estimating Q-values for individual state-action pairs. In continuous action
spaces, evaluating the Q-value over the entire action space becomes
computationally infeasible. To address this, actor-critic methods are typically
employed, where a critic is trained on off-policy data to estimate Q-values,
and an actor is trained to maximize the critic's output. Despite their
popularity, these methods often suffer from instability during training. In
this work, we propose a purely value-based framework for continuous control
that revisits structural maximization of Q-functions, introducing a set of key
architectural and algorithmic choices to enable efficient and stable learning.
We evaluate the proposed actor-free Q-learning approach on a range of standard
simulation tasks, demonstrating performance and sample efficiency on par with
state-of-the-art baselines, without the cost of learning a separate actor.
Particularly, in environments with constrained action spaces, where the value
functions are typically non-smooth, our method with structural maximization
outperforms traditional actor-critic methods with gradient-based maximization.
We have released our code at https://github.com/USC-Lira/Q3C.
Authors (4)
Yigit Korkmaz
Urvi Bhuwania
Ayush Jain
Erdem Bıyık
Submitted
October 21, 2025
Key Contributions
This paper proposes a novel actor-free, purely value-based framework for continuous control reinforcement learning. By revisiting structural maximization of Q-functions and introducing specific architectural and algorithmic choices, it aims to achieve efficient and stable learning without the instability often associated with actor-critic methods in continuous action spaces.
Business Value
Enables more stable and efficient training of AI agents for complex control tasks in robotics and autonomous systems, potentially reducing development time and improving performance.