Redirecting to original paper in 30 seconds...
Click below to go immediately or wait for automatic redirect
📄 Abstract
Abstract: As large language models (LLMs) become increasingly integrated into
applications serving users across diverse cultures, communities and
demographics, it is critical to align LLMs with pluralistic human values beyond
average principles (e.g., HHH). In psychological and social value theories such
as Schwartz's Value Theory, pluralistic values are represented by multiple
value dimensions paired with various priorities. However, existing methods
encounter two challenges when aligning with such fine-grained value objectives:
1) they often treat multiple values as independent and equally important,
ignoring their interdependence and relative priorities (value complexity); 2)
they struggle to precisely control nuanced value priorities, especially those
underrepresented ones (value steerability). To handle these challenges, we
propose COUPLE, a COUnterfactual reasoning framework for PLuralistic valuE
alignment. It introduces a structural causal model (SCM) to feature complex
interdependency and prioritization among features, as well as the causal
relationship between high-level value dimensions and behaviors. Moreover, it
applies counterfactual reasoning to generate outputs aligned with any desired
value objectives. Benefitting from explicit causal modeling, COUPLE also
provides better interpretability. We evaluate COUPLE on two datasets with
different value systems and demonstrate that COUPLE advances other baselines
across diverse types of value objectives.
Authors (5)
Hanze Guo
Jing Yao
Xiao Zhou
Xiaoyuan Yi
Xing Xie
Submitted
October 21, 2025
Key Contributions
Proposes COUPLE, a counterfactual reasoning framework for steerable pluralistic value alignment of LLMs. It addresses the challenges of value complexity and steerability by introducing structural causal models to represent interdependencies and priorities among values, enabling finer-grained control over LLM behavior to align with diverse human values.
Business Value
Ensures AI systems, particularly LLMs, behave in ways that are ethically aligned with a broader spectrum of human values, reducing risks of bias and promoting fairness. This is crucial for widespread adoption in diverse user bases.