Redirecting to original paper in 30 seconds...
Click below to go immediately or wait for automatic redirect
📄 Abstract
Abstract: As AI systems progress, we rely more on them to make decisions with us and
for us. To ensure that such decisions are aligned with human values, it is
imperative for us to understand not only what decisions they make but also how
they come to those decisions. Reasoning language models, which provide both
final responses and (partially transparent) intermediate thinking traces,
present a timely opportunity to study AI procedural reasoning. Unlike math and
code problems which often have objectively correct answers, moral dilemmas are
an excellent testbed for process-focused evaluation because they allow for
multiple defensible conclusions. To do so, we present MoReBench: 1,000 moral
scenarios, each paired with a set of rubric criteria that experts consider
essential to include (or avoid) when reasoning about the scenarios. MoReBench
contains over 23 thousand criteria including identifying moral considerations,
weighing trade-offs, and giving actionable recommendations to cover cases on AI
advising humans moral decisions as well as making moral decisions autonomously.
Separately, we curate MoReBench-Theory: 150 examples to test whether AI can
reason under five major frameworks in normative ethics. Our results show that
scaling laws and existing benchmarks on math, code, and scientific reasoning
tasks fail to predict models' abilities to perform moral reasoning. Models also
show partiality towards specific moral frameworks (e.g., Benthamite Act
Utilitarianism and Kantian Deontology), which might be side effects of popular
training paradigms. Together, these benchmarks advance process-focused
reasoning evaluation towards safer and more transparent AI.
Authors (18)
Yu Ying Chiu
Michael S. Lee
Rachel Calcott
Brandon Handoko
Paul de Font-Reaulx
Paula Rodriguez
+12 more
Submitted
October 18, 2025
Key Contributions
Introduces MoReBench, a novel benchmark comprising 1,000 moral scenarios with detailed rubric criteria, designed to evaluate procedural and pluralistic moral reasoning in language models beyond just their final outcomes. This benchmark enables a deeper understanding of how LLMs arrive at decisions in complex ethical dilemmas, moving beyond simple accuracy metrics.
Business Value
Crucial for developing trustworthy AI systems that can be deployed in sensitive applications requiring ethical decision-making. It provides a standardized way to assess and improve the ethical reasoning capabilities of AI, fostering user trust and responsible AI deployment.