Redirecting to original paper in 30 seconds...
Click below to go immediately or wait for automatic redirect
This paper characterizes selective refusal bias in LLM safety guardrails, demonstrating that LLMs may refuse harmful content for some demographic groups but not others. The study quantifies this bias across various attributes and highlights the need for more equitable and robust safety measures to prevent unintended discrimination.
Ensuring fairness and equity in AI systems is crucial for building trust and avoiding reputational damage and legal liabilities. Addressing bias in LLMs is essential for responsible AI deployment.