AIPapers.ai - AI Research Papers Daily

Today's Reinforcement Learning Research Top Papers

Wednesday, November 5, 2025

📊 Read Full Intelligence Reports:

Natural-gas storage modelling by deep reinforcement learning

Introduces GasRL, a simulator coupling a calibrated natural gas market with deep reinforcement learning-trained storage policies. Achieves superior performance with SAC, optimizing stockpile management to affect equilibrium prices and market dynamics.

Reinforcement learning based data assimilation for unknown state model

Proposes a reinforcement learning-based approach for data assimilation with unknown system dynamics. Leverages RL to construct a surrogate state transition model, overcoming reliance on pre-computed, noise-free training datasets.

Optimizing Multi-Lane Intersection Performance in Mixed Autonomy Environments

Presents a novel traffic signal control framework combining Graph Attention Networks with Soft Actor-Critic RL. Models dynamic graph-structured traffic flow to optimize coordination between human-driven and autonomous vehicles.

Interpretable end-to-end Neurosymbolic Reinforcement Learning agents

Instantiates the SCoBots framework for interpretable neurosymbolic RL agents. Decomposes RL tasks into interpretable representations, addressing deep RL's shortcut learning and generalization issues from raw pixel states.

Path-Coordinated Continual Learning with Neural Tangent Kernel-Justified Plasticity: A Theoretical Framework with Near State-of-the-Art Performance

Proposes a path-coordinated continual learning framework combining Neural Tangent Kernel theory, statistical validation, and multiple path quality metrics. Addresses catastrophic forgetting by justifying plasticity bounds for improved performance.

SAIL-RL: Guiding MLLMs in When and How to Think via Dual-Reward RL Tuning

Introduces SAIL-RL, a reinforcement learning post-training framework to enhance multimodal LLM reasoning. Teaches models when and how to think using dual-reward RL tuning, addressing outcome-only supervision and uniform thinking strategies.

Automata-Conditioned Cooperative Multi-Agent Reinforcement Learning

Presents Automata-Conditioned Cooperative Multi-Agent Reinforcement Learning for multi-task cooperative objectives. Uses automata to decompose tasks for agents, improving sample efficiency and enabling multi-task learning.

Large-scale automatic carbon ion treatment planning for head and neck cancers via parallel multi-agent reinforcement learning

Develops a large-scale automatic treatment planning system for carbon ion therapy using parallel multi-agent reinforcement learning. Optimizes numerous treatment planning parameters to improve dose conformity and OAR sparing.

Sort by:

arxiv_cv

A Kullback-Leibler divergence method for input-system-state identification

Abstract: Abstract: The capability of a novel Kullback-Leibler divergence method is examined herein within the Kalman filter framework to select the input-parameter-state estimation execution with the most plausible results. This identification suffers from th...

#Control Theory#State Estimation#Machine Learning#Information Theory#System Identification

17 hours ago

70%

arxiv_cl

The Collaboration Gap

Abstract: Abstract: The trajectory of AI development suggests that we will increasingly rely on agent-based systems composed of independently developed agents with different information, privileges, and tools. The success of these systems will critically depen...

#Multi-Agent Systems#AI Collaboration#Reinforcement Learning#Agent Coordination#Benchmarking

17 hours ago

92%

arxiv_cl

I Want to Break Free! Persuasion and Anti-Social Behavior of LLMs in Multi-Agent Settings with Social Hierarchy

Abstract: Abstract: As LLM-based agents become increasingly autonomous and will more freely interact with each other, studying the interplay among them becomes crucial to anticipate emergent phenomena and potential risks. In this work, we provide an in-depth a...

#AI Safety#Multi-Agent Systems#Human-AI Interaction#AI Ethics#Large Language Models

17 hours ago

90%

arxiv_ml

AI Research Agents for Machine Learning: Search, Exploration, and Generalization in MLE-bench

Abstract: Abstract: AI research agents are demonstrating great potential to accelerate scientific progress by automating the design, implementation, and training of machine learning models. We focus on methods for improving agents' performance on MLE-bench, a ...

#Automated Machine Learning (AutoML)#AI Agents#Search Algorithms#Reinforcement Learning#Machine Learning Benchmarking

17 hours ago

80%

arxiv_ml

Optimizing Multi-Lane Intersection Performance in Mixed Autonomy Environments

Abstract: Abstract: One of the main challenges in managing traffic at multilane intersections is ensuring smooth coordination between human-driven vehicles (HDVs) and connected autonomous vehicles (CAVs). This paper presents a novel traffic signal control fram...

#Intelligent Transportation Systems#Reinforcement Learning#Graph Neural Networks#Autonomous Driving#Traffic Management

17 hours ago

90%

arxiv_ml

Closing the Intent-to-Behavior Gap via Fulfillment Priority Logic

Abstract: Abstract: Practitioners designing reinforcement learning policies face a fundamental challenge: translating intended behavioral objectives into representative reward functions. This challenge stems from behavioral intent requiring simultaneous achiev...

#Multi-Objective Reinforcement Learning#Reward Engineering#Robotics Control#AI Alignment#Decision Making Under Uncertainty

17 hours ago

80%

arxiv_ml

Near Optimal Convergence to Coarse Correlated Equilibrium in General-Sum Markov Games

Abstract: Abstract: No-regret learning dynamics play a central role in game theory, enabling decentralized convergence to equilibrium for concepts such as Coarse Correlated Equilibrium (CCE) or Correlated Equilibrium (CE). In this work, we improve the converge...

#Game Theory#Multi-Agent Reinforcement Learning#Online Learning#Convergence Analysis#Algorithmic Game Theory

17 hours ago

90%

arxiv_ml

Assessing win strength in MLB win prediction models

Abstract: Abstract: In Major League Baseball, strategy and planning are major factors in determining the outcome of a game. Previous studies have aided this by building machine learning models for predicting the winning team of any given game. We extend this w...

#Sports Analytics#Machine Learning#Predictive Modeling#Statistical Analysis#Behavioral Economics

17 hours ago

75%

arxiv_ml

Imagine Beyond! Distributionally Robust Auto-Encoding for State Space Coverage in Online Reinforcement Learning

Abstract: Abstract: Goal-Conditioned Reinforcement Learning (GCRL) enables agents to autonomously acquire diverse behaviors, but faces major challenges in visual environments due to high-dimensional, semantically sparse observations. In the online setting, whe...

#Reinforcement Learning#Representation Learning#Exploration Strategies#Goal-Conditioned Learning#Online Learning

17 hours ago

90%

arxiv_ml

Bayesian Optimization by Kernel Regression and Density-based Exploration

Abstract: Abstract: Bayesian optimization is highly effective for optimizing expensive-to-evaluate black-box functions, but it faces significant computational challenges due to the high computational complexity of Gaussian processes, which results in a total t...

#Optimization#Machine Learning#Bayesian Methods#Algorithm Design

17 hours ago

85%

arxiv_ml

Accelerated Frank-Wolfe Algorithms: Complementarity Conditions and Sparsity

Abstract: Abstract: We develop new accelerated first-order algorithms in the Frank-Wolfe (FW) family for minimizing smooth convex functions over compact convex sets, with a focus on two prominent constraint classes: (1) polytopes and (2) matrix domains given b...

#Optimization Algorithms#Convex Optimization#Machine Learning Theory#Algorithm Design#Sparse Optimization

17 hours ago

70%

arxiv_ml

Reset & Distill: A Recipe for Overcoming Negative Transfer in Continual Reinforcement Learning

Abstract: Abstract: We argue that the negative transfer problem occurring when the new task to learn arrives is an important problem that needs not be overlooked when developing effective Continual Reinforcement Learning (CRL) algorithms. Through comprehensive...

#Reinforcement Learning#Continual Learning#Machine Learning#Robotics#Artificial Intelligence

17 hours ago

91%

arxiv_ml

Detection Augmented Bandit Procedures for Piecewise Stationary MABs: A Modular Approach

Abstract: Abstract: Conventional Multi-Armed Bandit (MAB) algorithms are designed for stationary environments, where the reward distributions associated with the arms do not change with time. In many applications, however, the environment is more accurately mo...

#Reinforcement Learning#Online Learning#Change Detection#Algorithm Analysis

17 hours ago

90%

arxiv_ml

Two-Player Zero-Sum Games with Bandit Feedback

Abstract: Abstract: We study a two-player zero-sum game in which the row player aims to maximize their payoff against an adversarial column player, under an unknown payoff matrix estimated through bandit feedback. We propose three algorithms based on the Explo...

#Game Theory#Online Learning#Reinforcement Learning#Algorithmic Game Theory#Decision Theory

17 hours ago

85%

arxiv_ml

Option-aware Temporally Abstracted Value for Offline Goal-Conditioned Reinforcement Learning

Abstract: Abstract: Offline goal-conditioned reinforcement learning (GCRL) offers a practical learning paradigm in which goal-reaching policies are trained from abundant state-action trajectory datasets without additional environment interaction. However, offl...

#Offline Reinforcement Learning#Hierarchical Reinforcement Learning#Goal-Conditioned Reinforcement Learning#Long-Horizon Planning#Value Function Approximation

17 hours ago

95%

arxiv_ml

LEASE: Offline Preference-based Reinforcement Learning with High Sample Efficiency

Abstract: Abstract: Offline preference-based reinforcement learning (PbRL) provides an effective way to overcome the challenges of designing reward and the high costs of online interaction. However, since labeling preference needs real-time human feedback, acq...

#Offline Reinforcement Learning#Preference Learning#Sample Efficiency#Reward Modeling#Human-in-the-Loop Learning

17 hours ago

95%

arxiv_ml

Overcoming Non-stationary Dynamics with Evidential Proximal Policy Optimization

Abstract: Abstract: Continuous control of non-stationary environments is a major challenge for deep reinforcement learning algorithms. The time-dependency of the state transition dynamics aggravates the notorious stability problems of model-free deep actor-cri...

#Reinforcement Learning#Non-Stationary Environments#Control Theory#Uncertainty Quantification#Exploration Strategies

17 hours ago

85%

arxiv_ml

RL-Aided Cognitive ISAC: Robust Detection and Sensing-Communication Trade-offs

Abstract: Abstract: This paper proposes a reinforcement learning (RL)-aided cognitive framework for massive MIMO-based integrated sensing and communication (ISAC) systems employing a uniform planar array (UPA). The focus is on enhancing radar sensing performan...

#Integrated Sensing and Communication (ISAC)#Reinforcement Learning in Communications#Cognitive Radio#Signal Processing#Wireless Systems Optimization

17 hours ago

80%

arxiv_ml

Automata-Conditioned Cooperative Multi-Agent Reinforcement Learning

Abstract: Abstract: We study the problem of learning multi-task, multi-agent policies for cooperative, temporal objectives, under centralized training, decentralized execution. In this setting, using automata to represent tasks enables the decomposition of com...

#Multi-Agent Systems#Reinforcement Learning#Cooperative AI#Task Planning#Decentralized Control

17 hours ago

95%

arxiv_ml

Limit Theorems for Stochastic Gradient Descent in High-Dimensional Single-Layer Networks

Abstract: Abstract: This paper studies the high-dimensional scaling limits of online stochastic gradient descent (SGD) for single-layer networks. Building on the seminal work of Saad and Solla, which analyzed the deterministic (ballistic) scaling limits of SGD...

#Machine Learning Theory#Optimization Algorithms#Deep Learning Analysis#Statistical Mechanics#Stochastic Processes

17 hours ago

70%

arxiv_ml

Many-vs-Many Missile Guidance via Virtual Targets

Abstract: Abstract: This paper presents a novel approach to many-vs-many missile guidance using virtual targets (VTs) generated by a Normalizing Flows-based trajectory predictor. Rather than assigning n interceptors directly to m physical targets through conve...

#Guidance, Navigation, and Control (GNC)#Operations Research#Machine Learning#Defense Technology#Multi-Agent Systems

17 hours ago

80%

arxiv_ml

Gradient-Variation Online Adaptivity for Accelerated Optimization with H\"older Smoothness

Abstract: Abstract: Smoothness is known to be crucial for acceleration in offline optimization, and for gradient-variation regret minimization in online learning. Interestingly, these two problems are actually closely connected -- accelerated optimization can ...

#Online Learning Theory#Optimization Algorithms#Regret Minimization#Adaptive Methods#Convex Analysis

17 hours ago

70%

arxiv_ai

Learning Complementary Policies for Human-AI Teams

Abstract: Abstract: This paper tackles the critical challenge of human-AI complementarity in decision-making. Departing from the traditional focus on algorithmic performance in favor of performance of the human-AI team, and moving past the framing of collabora...

#Human-AI Team Complementarity#Learning Policies for Collaborative Decision-Making#Robustness to Model Misspecifications#Exploiting Divergent Human and AI Behaviors

17 hours ago

90%

arxiv_ai

H-NeiFi: Non-Invasive and Consensus-Efficient Multi-Agent Opinion Guidance

Abstract: Abstract: The openness of social media enables the free exchange of opinions, but it also presents challenges in guiding opinion evolution towards global consensus. Existing methods often directly modify user views or enforce cross-group connections....

#Multi-Agent Systems#Opinion Dynamics#Social Network Analysis#Reinforcement Learning#Consensus Building

17 hours ago

90%

arxiv_ml

Learning Interactive World Model for Object-Centric Reinforcement Learning

Abstract: Abstract: Agents that understand objects and their interactions can learn policies that are more robust and transferable. However, most object-centric RL methods factor state by individual objects while leaving interactions implicit. We introduce the...

#Reinforcement Learning#World Models#Object-Centric Representation#Robotics#Generalization

17 hours ago

90%

arxiv_ml

A Spatially Informed Gaussian Process UCB Method for Decentralized Coverage Control

Abstract: Abstract: We present a novel decentralized algorithm for coverage control in unknown spatial environments modeled by Gaussian Processes (GPs). To trade-off between exploration and exploitation, each agent autonomously determines its trajectory by min...

#Decentralized Control#Robotics#Spatial Coverage#Gaussian Processes#Multi-Agent Systems

17 hours ago

85%

arxiv_ml

From Solo to Symphony: Orchestrating Multi-Agent Collaboration with Single-Agent Demos

Abstract: Abstract: Training a team of agents from scratch in multi-agent reinforcement learning (MARL) is highly inefficient, much like asking beginners to play a symphony together without first practicing solo. Existing methods, such as offline or transferab...

#Multi-Agent Systems#Reinforcement Learning#Transfer Learning#Cooperative AI#Machine Learning Efficiency

17 hours ago

95%

arxiv_ml

RoME: Domain-Robust Mixture-of-Experts for MILP Solution Prediction across Domains

Abstract: Abstract: Mixed-Integer Linear Programming (MILP) is a fundamental and powerful framework for modeling complex optimization problems across diverse domains. Recently, learning-based methods have shown great promise in accelerating MILP solvers by pre...

#Optimization#Machine Learning for Operations Research#Domain Adaptation#Mixture-of-Experts#Robustness

17 hours ago

80%

arxiv_ml

An End-to-End Learning Approach for Solving Capacitated Location-Routing Problems

Abstract: Abstract: The capacitated location-routing problems (CLRPs) are classical problems in combinatorial optimization, which require simultaneously making location and routing decisions. In CLRPs, the complex constraints and the intricate relationships be...

#Combinatorial Optimization#Deep Reinforcement Learning#Operations Research#Logistics and Supply Chain Management#AI for Optimization

17 hours ago

75%

arxiv_ai

Interpretable end-to-end Neurosymbolic Reinforcement Learning agents

Abstract: Abstract: Deep reinforcement learning (RL) agents rely on shortcut learning, preventing them from generalizing to slightly different environments. To address this problem, symbolic method, that use object-centric states, have been developed. However,...

#Interpretable AI#Reinforcement Learning#Neurosymbolic AI#Generalization in RL#Representation Learning

17 hours ago

85%

Loading more papers...

📚 You've reached the end of the papers list

Today's Reinforcement Learning Research Top Papers

Weekly Reinforcement Learning Research Top Papers

Weekly Executive Briefing

Monday, November 3, 2025

Tuesday, November 4, 2025

Wednesday, November 5, 2025

A Kullback-Leibler divergence method for input-system-state identification

The Collaboration Gap

I Want to Break Free! Persuasion and Anti-Social Behavior of LLMs in Multi-Agent Settings with Social Hierarchy

AI Research Agents for Machine Learning: Search, Exploration, and Generalization in MLE-bench

Optimizing Multi-Lane Intersection Performance in Mixed Autonomy Environments

Closing the Intent-to-Behavior Gap via Fulfillment Priority Logic

Near Optimal Convergence to Coarse Correlated Equilibrium in General-Sum Markov Games

Assessing win strength in MLB win prediction models

Imagine Beyond! Distributionally Robust Auto-Encoding for State Space Coverage in Online Reinforcement Learning

Bayesian Optimization by Kernel Regression and Density-based Exploration

Accelerated Frank-Wolfe Algorithms: Complementarity Conditions and Sparsity

Reset & Distill: A Recipe for Overcoming Negative Transfer in Continual Reinforcement Learning

Detection Augmented Bandit Procedures for Piecewise Stationary MABs: A Modular Approach

Two-Player Zero-Sum Games with Bandit Feedback

Option-aware Temporally Abstracted Value for Offline Goal-Conditioned Reinforcement Learning

LEASE: Offline Preference-based Reinforcement Learning with High Sample Efficiency

Overcoming Non-stationary Dynamics with Evidential Proximal Policy Optimization

RL-Aided Cognitive ISAC: Robust Detection and Sensing-Communication Trade-offs

Automata-Conditioned Cooperative Multi-Agent Reinforcement Learning

Limit Theorems for Stochastic Gradient Descent in High-Dimensional Single-Layer Networks

Many-vs-Many Missile Guidance via Virtual Targets

Gradient-Variation Online Adaptivity for Accelerated Optimization with H\"older Smoothness

Learning Complementary Policies for Human-AI Teams

H-NeiFi: Non-Invasive and Consensus-Efficient Multi-Agent Opinion Guidance

Learning Interactive World Model for Object-Centric Reinforcement Learning

A Spatially Informed Gaussian Process UCB Method for Decentralized Coverage Control

From Solo to Symphony: Orchestrating Multi-Agent Collaboration with Single-Agent Demos

RoME: Domain-Robust Mixture-of-Experts for MILP Solution Prediction across Domains

An End-to-End Learning Approach for Solving Capacitated Location-Routing Problems

Interpretable end-to-end Neurosymbolic Reinforcement Learning agents