arxiv_ml 90% Match Research Paper DRL researchers,AI safety engineers,Software testers for AI systems,Robotics engineers 3 weeks ago

The Pursuit of Diversity: Multi-Objective Testing of Deep Reinforcement Learning Agents

reinforcement-learning › robotics-rl

📄 Abstract

Abstract: Testing deep reinforcement learning (DRL) agents in safety-critical domains requires discovering diverse failure scenarios. Existing tools such as INDAGO rely on single-objective optimization focused solely on maximizing failure counts, but this does not ensure discovered scenarios are diverse or reveal distinct error types. We introduce INDAGO-Nexus, a multi-objective search approach that jointly optimizes for failure likelihood and test scenario diversity using multi-objective evolutionary algorithms with multiple diversity metrics and Pareto front selection strategies. We evaluated INDAGO-Nexus on three DRL agents: humanoid walker, self-driving car, and parking agent. On average, INDAGO-Nexus discovers up to 83% and 40% more unique failures (test effectiveness) than INDAGO in the SDC and Parking scenarios, respectively, while reducing time-to-failure by up to 67% across all agents.

Authors (3)

Antony Bartlett

Cynthia Liem

Annibale Panichella

Submitted

October 16, 2025

arXiv Category

cs.LG

arXiv PDF

Key Contributions

INDAGO-Nexus introduces a multi-objective search approach for testing DRL agents, jointly optimizing for failure likelihood and test scenario diversity using evolutionary algorithms. This method significantly outperforms single-objective approaches like INDAGO in discovering unique failures and reducing time-to-failure.

Business Value

Enhances the safety and reliability of AI systems, particularly in critical domains like autonomous driving and robotics, by providing more comprehensive testing and uncovering subtle failure modes.

Paper Metadata

Innovation Type

Algorithmic/Methodological

Deployment Feasibility

High for testing and validation phases of DRL development.

Limitations Addressed

Addresses the limitation of existing DRL testing tools (like INDAGO) that focus solely on maximizing failure counts, failing to ensure the diversity or distinctness of discovered error types.

Performance Gains

Up to 83% more unique failures than INDAGO in SDC scenarios, up to 40% more in Parking scenarios, and up to 67% reduction in time-to-failure across agents.

Technical Tags

Deep Reinforcement Learning (DRL)Multi-objective optimizationEvolutionary algorithmsTest case generationFailure scenario discoveryDiversity metricsPareto front selectionHumanoid walkerSelf-driving carParking agent

Research Topics

Reinforcement Learning TestingAI SafetyOptimization AlgorithmsSoftware TestingMachine Learning Evaluation

Methods & Architectures

Multi-objective evolutionary algorithmsPareto front selectionDiversity metrics

Applications & Tasks

Autonomous Driving Robotics Safety-Critical Systems Discovering Diverse Failure ScenariosTesting DRL AgentsMaximizing Failure Counts Generating diverse test cases for DRL agentsIdentifying failure modesImproving DRL agent robustness

Datasets & Benchmarks

Benchmarks

Humanoid walker • Self-driving car • Parking agent

Failure likelihoodTest scenario diversityUnique failuresTime-to-failure

Related Fields

AI SafetySoftware EngineeringOptimizationEvolutionary Computation

Keywords

DRLtestingmulti-objective optimizationevolutionary algorithmsfailure scenariosdiversityAI safetyautonomous drivingroboticshumanoidPareto front

Academic Context

#Reinforcement Learning Testing#AI Safety#Optimization Algorithms#Software Testing#Machine Learning Evaluation

Commercial Potential

Potential Products

DRL testing platformsAI safety validation tools

Target Industries

AutomotiveRoboticsAerospaceAI Development

Use Case Examples

Testing autonomous vehicle control systemsValidating robotic manipulation agentsEnsuring safety of DRL-based industrial automation

Competitive Edge

Offers a more effective testing methodology for DRL agents by prioritizing diversity alongside failure rates, leading to more robust and safer AI systems.

Market Opportunity

Growing demand for AI safety and validation tools.

Revenue Models

Licensing of testing softwareconsulting services for AI safety.

Resource Requirements

Compute Needs

Moderate to high, depending on the complexity of the DRL agents and the search space.

Data Requirements

DRL agents trained on relevant tasks (e.g., simulation environments).

Deployment Constraints

Requires access to the DRL agent's environment and control interface for testing.

Scalability

Scales with the number of objectives and the complexity of the search space.

Regulatory Considerations

Safety standards for autonomous systemsDRL agent certification.

Production Readiness

Maturity Level

Research

Time to Market

2-5 years

Patent Potential

Moderate, for the multi-objective testing framework and diversity metrics.

View Full Paper Back to Papers