arxiv_cl 95% Match Research Paper AI Developers,Privacy Engineers,Cybersecurity Experts,NLP Researchers 2 days ago

Semantically-Aware LLM Agent to Enhance Privacy in Conversational AI Services

ai-safety › privacy

📄 Abstract

Abstract: With the increasing use of conversational AI systems, there is growing concern over privacy leaks, especially when users share sensitive personal data in interactions with Large Language Models (LLMs). Conversations shared with these models may contain Personally Identifiable Information (PII), which, if exposed, could lead to security breaches or identity theft. To address this challenge, we present the Local Optimizations for Pseudonymization with Semantic Integrity Directed Entity Detection (LOPSIDED) framework, a semantically-aware privacy agent designed to safeguard sensitive PII data when using remote LLMs. Unlike prior work that often degrade response quality, our approach dynamically replaces sensitive PII entities in user prompts with semantically consistent pseudonyms, preserving the contextual integrity of conversations. Once the model generates its response, the pseudonyms are automatically depseudonymized, ensuring the user receives an accurate, privacy-preserving output. We evaluate our approach using real-world conversations sourced from ShareGPT, which we further augment and annotate to assess whether named entities are contextually relevant to the model's response. Our results show that LOPSIDED reduces semantic utility errors by a factor of 5 compared to baseline techniques, all while enhancing privacy.

Authors (2)

Jayden Serenari

Stephen Lee

Submitted

October 30, 2025

arXiv Category

cs.CL

arXiv PDF

Key Contributions

Introduces the LOPSIDED framework, a semantically-aware privacy agent that safeguards PII in LLM conversations by replacing it with semantically consistent pseudonyms. It preserves contextual integrity and depseudonymizes responses, unlike prior work that degrades quality.

Business Value

Enables the secure deployment of conversational AI in sensitive domains (e.g., healthcare, finance) by protecting user privacy, fostering trust and compliance with regulations like GDPR.

Paper Metadata

Innovation Type

Algorithmic/Framework

Deployment Feasibility

Requires integrating the LOPSIDED agent into the conversational AI pipeline. The complexity lies in accurate entity detection and semantic pseudonymization.

Limitations Addressed

Privacy risks associated with users sharing sensitive PII with LLMs and the degradation of response quality often seen in previous privacy-preserving methods.

Technical Tags

Privacy PreservationConversational AIPersonally Identifiable Information (PII)PseudonymizationSemantic IntegrityEntity DetectionLLM AgentsPrompt EngineeringContextual PreservationLOPSIDED framework

Research Topics

AI PrivacyConversational AI SecurityData AnonymizationLLM SecurityNatural Language Processing

Methods & Architectures

Semantically-aware pseudonymizationDynamic PII replacementEntity detectionContextual integrity preservationDepseudonymization Large Language Models (LLMs)LLM Agents

Applications & Tasks

Conversational AI Data Privacy Cybersecurity Natural Language Processing Privacy leaks in LLM interactionsProtection of sensitive personal data (PII)Maintaining response quality while ensuring privacy PII DetectionPseudonymizationPrivacy PreservationConversational AI

Related Fields

Data PrivacyCybersecurityNatural Language ProcessingMachine LearningInformation Security

Keywords

PrivacyLLMConversational AIPIIPseudonymizationSemantic IntegrityAgentSecurityLOPSIDEDNLPData Protection

Academic Context

#AI Privacy#Conversational AI Security#Data Anonymization#LLM Security#Natural Language Processing

Commercial Potential

Potential Products

Privacy-enhanced chatbotsSecure AI assistantsData anonymization tools for conversational data

Target Industries

HealthcareFinanceTechnologyCustomer ServiceLegal

Use Case Examples

A healthcare chatbot that can discuss patient information without revealing PII.A financial advisor AI that handles sensitive account details securely.A customer service bot that protects user identity while providing support.

Competitive Edge

Offers a semantically-aware approach to PII pseudonymization that aims to preserve response quality, differentiating it from simpler anonymization techniques.

Resource Requirements

Compute Needs

Requires compute for entity detection, pseudonymization, and depseudonymization during conversation processing.

Data Requirements

Requires datasets with PII for training entity detection models and evaluating pseudonymization effectiveness.

Deployment Constraints

Accuracy of PII detection and semantic consistency of pseudonyms are critical; potential for increased latency.

Scalability

Scalability depends on the efficiency of the entity detection and pseudonymization algorithms.

Production Readiness

Maturity Level

Research

View Full Paper Back to Papers