Redirecting to original paper in 30 seconds...

Click below to go immediately or wait for automatic redirect

arxiv_ai 98% Match Research Report/Methodology AI safety researchers,AI developers,AI ethicists,Policy makers 2 weeks ago

Misalignment Bounty: Crowdsourcing AI Agent Misbehavior

ai-safety › alignment
📄 Abstract

Abstract: Advanced AI systems sometimes act in ways that differ from human intent. To gather clear, reproducible examples, we ran the Misalignment Bounty: a crowdsourced project that collected cases of agents pursuing unintended or unsafe goals. The bounty received 295 submissions, of which nine were awarded. This report explains the program's motivation and evaluation criteria, and walks through the nine winning submissions step by step.
Authors (6)
Rustem Turtayev
Natalia Fedorova
Oleg Serikov
Sergey Koldyba
Lev Avagyan
Dmitrii Volkov
Submitted
October 22, 2025
arXiv Category
cs.AI
arXiv PDF

Key Contributions

The Misalignment Bounty program successfully crowdsourced 295 submissions of AI agent misbehavior, identifying nine awarded cases of unintended or unsafe goals. This initiative provides valuable, reproducible examples for understanding and mitigating AI alignment failures.

Business Value

Provides crucial empirical data for companies developing AI systems, enabling them to proactively identify and fix alignment issues, thereby reducing risks and building more trustworthy AI.