Redirecting to original paper in 30 seconds...

Click below to go immediately or wait for automatic redirect

arxiv_ml 95% Match Research Paper AI ethicists,Privacy engineers,ML researchers,Organizations handling sensitive data 1 week ago

Self-Refining Language Model Anonymizers via Adversarial Distillation

ai-safety › privacy
📄 Abstract

Abstract: Large language models (LLMs) are increasingly used in sensitive domains, where their ability to infer personal data from seemingly benign text introduces emerging privacy risks. While recent LLM-based anonymization methods help mitigate such risks, they often rely on proprietary models (e.g., GPT-4), raising concerns about cost and the potential exposure of sensitive data to untrusted external systems. To address this, we introduce SElf-refining Anonymization with Language model (SEAL), a novel distillation framework for training small language models (SLMs) to perform effective anonymization without relying on external models at inference time. SEAL leverages adversarial interactions between an LLM anonymizer and an inference model to collect trajectories of anonymized texts and inferred attributes, which are then used to distill anonymization and critique capabilities into SLMs through supervised fine-tuning and preference learning. The resulting models learn both to anonymize text and to evaluate their outputs, enabling iterative improvement of anonymization quality via self-refinement. Experiments on SynthPAI, a dataset of synthetic personal profiles and text comments, demonstrate that SLMs trained with SEAL achieve substantial improvements in anonymization capabilities. Notably, 8B models attain a privacy-utility trade-off comparable to that of the GPT-4 anonymizer and, with self-refinement, even surpass it in terms of privacy protection. These results highlight the effectiveness of our adversarial distillation framework for training SLMs as efficient anonymizers.
Authors (3)
Kyuyoung Kim
Hyunjun Jeon
Jinwoo Shin
Submitted
June 2, 2025
arXiv Category
cs.CL
arXiv PDF

Key Contributions

SEAL is a novel distillation framework that trains small language models (SLMs) for effective anonymization without relying on external LLMs at inference. It uses adversarial interactions to distill anonymization and critique capabilities into SLMs, addressing cost and privacy concerns associated with proprietary models.

Business Value

Enables organizations to leverage LLM capabilities for sensitive data processing while ensuring robust privacy protection, reducing compliance risks and operational costs.