Redirecting to original paper in 30 seconds...

Click below to go immediately or wait for automatic redirect

arxiv_ai 95% Match Research Paper AI researchers,LLM developers,AI safety practitioners 1 week ago

CURATRON: Complete and Robust Preference Data for Rigorous Alignment of Large Language Models

large-language-models › alignment
📄 Abstract

Abstract: This paper addresses the challenges of aligning large language models (LLMs) with human values via preference learning (PL), focusing on incomplete and corrupted data in preference datasets. We propose a novel method for robustly and completely recalibrating values within these datasets to enhance LLMs' resilience against the issues. In particular, we devise a guaranteed polynomial time ranking algorithm that robustifies several existing models, such as the classic Bradley-Terry-Luce (BTL) (Bradley and Terry, 1952) model and certain generalizations of it. To the best of our knowledge, our present work is the first to propose an algorithm that provably recovers an $\epsilon$-optimal ranking with high probability while allowing as large as $O(n)$ perturbed pairwise comparison results per model response. Furthermore, we show robust recovery results in the partially observed setting. Our experiments confirm that our algorithms handle adversarial noise and unobserved comparisons well in both general and LLM preference dataset settings. This work contributes to the development and scaling of more reliable and ethically aligned AI models by equipping the dataset curation pipeline with the ability to handle missing and maliciously manipulated inputs.
Authors (3)
Son The Nguyen
Niranjan Uma Naresh
Theja Tulabandhula
Submitted
March 5, 2024
arXiv Category
cs.AI
arXiv PDF

Key Contributions

This paper introduces a novel method for robustly and completely recalibrating values within preference datasets for LLM alignment. It proposes a guaranteed polynomial time ranking algorithm that can handle a significant amount of perturbed pairwise comparison results and robustly recover rankings in partially observed settings, addressing key limitations of existing methods in dealing with noisy and incomplete data.

Business Value

Enhancing the reliability and safety of LLMs by ensuring they align with human values, which is crucial for their adoption in sensitive applications and for building user trust.