Redirecting to original paper in 30 seconds...
Click below to go immediately or wait for automatic redirect
📄 Abstract
Abstract: Detecting mental health crisis situations such as suicide ideation, rape,
domestic violence, child abuse, and sexual harassment is a critical yet
underexplored challenge for language models. When such situations arise during
user--model interactions, models must reliably flag them, as failure to do so
can have serious consequences. In this work, we introduce CRADLE BENCH, a
benchmark for multi-faceted crisis detection. Unlike previous efforts that
focus on a limited set of crisis types, our benchmark covers seven types
defined in line with clinical standards and is the first to incorporate
temporal labels. Our benchmark provides 600 clinician-annotated evaluation
examples and 420 development examples, together with a training corpus of
around 4K examples automatically labeled using a majority-vote ensemble of
multiple language models, which significantly outperforms single-model
annotation. We further fine-tune six crisis detection models on subsets defined
by consensus and unanimous ensemble agreement, providing complementary models
trained under different agreement criteria.
Authors (5)
Grace Byun
Rebecca Lipschutz
Sean T. Minton
Abigail Lott
Jinho D. Choi
Submitted
October 27, 2025
Key Contributions
CRADLE BENCH is introduced as a comprehensive benchmark for multi-faceted mental health crisis and safety risk detection, covering seven clinically defined crisis types with temporal labels. It includes clinician-annotated evaluation data and a large, automatically labeled training corpus, significantly outperforming single-model annotation.
Business Value
Enables the development of safer AI systems that can reliably identify and respond to users in distress, crucial for platforms dealing with sensitive user interactions.