Redirecting to original paper in 30 seconds...
Click below to go immediately or wait for automatic redirect
📄 Abstract
Abstract: As AI systems become more advanced, ensuring their alignment with a diverse
range of individuals and societal values becomes increasingly critical. But how
can we capture fundamental human values and assess the degree to which AI
systems align with them? We introduce ValueCompass, a framework of fundamental
values, grounded in psychological theory and a systematic review, to identify
and evaluate human-AI alignment. We apply ValueCompass to measure the value
alignment of humans and large language models (LLMs) across four real-world
scenarios: collaborative writing, education, public sectors, and healthcare.
Our findings reveal concerning misalignments between humans and LLMs, such as
humans frequently endorse values like "National Security" which were largely
rejected by LLMs. We also observe that values differ across scenarios,
highlighting the need for context-aware AI alignment strategies. This work
provides valuable insights into the design space of human-AI alignment, laying
the foundations for developing AI systems that responsibly reflect societal
values and ethics.
Key Contributions
Introduces ValueCompass, a framework of fundamental values grounded in psychological theory to measure human-AI alignment. It reveals concerning misalignments between humans and LLMs across various real-world scenarios (writing, education, public sector, healthcare), highlighting the need for context-aware alignment strategies.
Business Value
Crucial for building trustworthy AI systems that align with user values, enhancing user adoption, safety, and ethical compliance across various industries.