arxiv_ml 95% Match Research Paper AI ethicists,Privacy engineers,ML researchers,Organizations handling sensitive data 1 week ago

Self-Refining Language Model Anonymizers via Adversarial Distillation

ai-safety › privacy

📄 Abstract

Abstract: Large language models (LLMs) are increasingly used in sensitive domains, where their ability to infer personal data from seemingly benign text introduces emerging privacy risks. While recent LLM-based anonymization methods help mitigate such risks, they often rely on proprietary models (e.g., GPT-4), raising concerns about cost and the potential exposure of sensitive data to untrusted external systems. To address this, we introduce SElf-refining Anonymization with Language model (SEAL), a novel distillation framework for training small language models (SLMs) to perform effective anonymization without relying on external models at inference time. SEAL leverages adversarial interactions between an LLM anonymizer and an inference model to collect trajectories of anonymized texts and inferred attributes, which are then used to distill anonymization and critique capabilities into SLMs through supervised fine-tuning and preference learning. The resulting models learn both to anonymize text and to evaluate their outputs, enabling iterative improvement of anonymization quality via self-refinement. Experiments on SynthPAI, a dataset of synthetic personal profiles and text comments, demonstrate that SLMs trained with SEAL achieve substantial improvements in anonymization capabilities. Notably, 8B models attain a privacy-utility trade-off comparable to that of the GPT-4 anonymizer and, with self-refinement, even surpass it in terms of privacy protection. These results highlight the effectiveness of our adversarial distillation framework for training SLMs as efficient anonymizers.

Authors (3)

Kyuyoung Kim

Hyunjun Jeon

Jinwoo Shin

Submitted

June 2, 2025

arXiv Category

cs.CL

arXiv PDF

Key Contributions

SEAL is a novel distillation framework that trains small language models (SLMs) for effective anonymization without relying on external LLMs at inference. It uses adversarial interactions to distill anonymization and critique capabilities into SLMs, addressing cost and privacy concerns associated with proprietary models.

Business Value

Enables organizations to leverage LLM capabilities for sensitive data processing while ensuring robust privacy protection, reducing compliance risks and operational costs.

Paper Metadata

Innovation Type

Algorithmic

Deployment Feasibility

High, as it results in smaller models deployable without external dependencies.

Limitations Addressed

Reliance on proprietary LLMs (e.g., GPT-4) for anonymization, raising cost and data exposure concerns; inability of existing methods to perform anonymization without external models at inference.

Performance Gains

Enables anonymization using smaller, self-contained models,Reduces reliance on external, potentially untrusted LLMs

Technical Tags

AnonymizationLarge Language Models (LLMs)DistillationAdversarial learningPrivacy preservationSmall Language Models (SLMs)Self-refiningPreference learningData privacySensitive data

Research Topics

AI SafetyPrivacy in Machine LearningModel CompressionKnowledge DistillationNatural Language Processing

Methods & Architectures

DistillationAdversarial learningPreference learningSupervised fine-tuningSelf-refining Small Language Models (SLMs)Large Language Models (LLMs)

Applications & Tasks

Data privacy Healthcare Finance Legal Customer Service Privacy protectionAnonymizationModel distillationData security Anonymizing text dataTraining privacy-preserving SLMsMitigating privacy risks in LLM applications

Related Fields

Privacy EngineeringCybersecurityMachine LearningNatural Language Processing

Keywords

AnonymizationLLMPrivacyDistillationSLMAdversarialSelf-refiningSensitive dataData securityNLP

Academic Context

#AI Safety#Privacy in Machine Learning#Model Compression#Knowledge Distillation#Natural Language Processing

Commercial Potential

Potential Products

Privacy-preserving text analysis toolsAnonymization APIs for sensitive documents

Target Industries

HealthcareFinanceLegalGovernmentTechnology

Use Case Examples

Anonymizing patient records before analysisRedacting personal information from legal documentsSecurely processing customer feedback

Competitive Edge

Provides a self-contained, distilled solution for anonymization, overcoming the cost and privacy limitations of using large, external proprietary models.

Market Opportunity

Growing rapidly due to increasing data privacy regulations and LLM adoption.

Revenue Models

Licensing the anonymization modelsoffering anonymization-as-a-service.

Resource Requirements

Compute Needs

Training requires significant compute (for LLM interaction and distillation), but inference is efficient on SLMs.

Data Requirements

Requires text data for anonymization and potentially data for training the critic/inference models.

Deployment Constraints

Ensuring the SLM maintains high anonymization quality across diverse inputs.

Scalability

Inference scales well due to the use of SLMs.

Regulatory Considerations

Compliance with GDPRHIPAACCPAetc.

Production Readiness

Maturity Level

Research

Time to Market

1-2 years for robust productization.

View Full Paper Back to Papers