arxiv_ai 90% Match Research Paper AI Safety Researchers,Developers of Generative AI models,Content Moderation Specialists,AI Ethicists 2 weeks ago

NDM: A Noise-driven Detection and Mitigation Framework against Implicit Sexual Intentions in Text-to-Image Generation

generative-ai › diffusion-models

📄 Abstract

Abstract: Despite the impressive generative capabilities of text-to-image (T2I) diffusion models, they remain vulnerable to generating inappropriate content, especially when confronted with implicit sexual prompts. Unlike explicit harmful prompts, these subtle cues, often disguised as seemingly benign terms, can unexpectedly trigger sexual content due to underlying model biases, raising significant ethical concerns. However, existing detection methods are primarily designed to identify explicit sexual content and therefore struggle to detect these implicit cues. Fine-tuning approaches, while effective to some extent, risk degrading the model's generative quality, creating an undesirable trade-off. To address this, we propose NDM, the first noise-driven detection and mitigation framework, which could detect and mitigate implicit malicious intention in T2I generation while preserving the model's original generative capabilities. Specifically, we introduce two key innovations: first, we leverage the separability of early-stage predicted noise to develop a noise-based detection method that could identify malicious content with high accuracy and efficiency; second, we propose a noise-enhanced adaptive negative guidance mechanism that could optimize the initial noise by suppressing the prominent region's attention, thereby enhancing the effectiveness of adaptive negative guidance for sexual mitigation. Experimentally, we validate NDM on both natural and adversarial datasets, demonstrating its superior performance over existing SOTA methods, including SLD, UCE, and RECE, etc. Code and resources are available at https://github.com/lorraine021/NDM.

Authors (7)

Yitong Sun

Yao Huang

Ruochen Zhang

Huanran Chen

Shouwei Ruan

Ranjie Duan

+1 more

Submitted

October 17, 2025

arXiv Category

cs.CV

arXiv PDF

Key Contributions

Introduces NDM, the first noise-driven detection and mitigation framework for implicit sexual intentions in text-to-image generation. It addresses the challenge of subtle prompts triggering inappropriate content without degrading the model's original generative capabilities, unlike fine-tuning approaches.

Business Value

Enhances the safety and ethical deployment of powerful text-to-image generation models, reducing risks associated with generating harmful or inappropriate content and building user trust.

Paper Metadata

Innovation Type

Novel Framework/Methodology

Deployment Feasibility

High. Designed as a framework that can be integrated with existing T2I models, aiming to preserve their capabilities.

Limitations Addressed

Difficulty in detecting implicit/subtle harmful prompts,Degradation of generative quality when fine-tuning T2I models for safety,Model biases leading to unintended content generation

Performance Gains

Detects and mitigates implicit malicious intentions.,Preserves original generative capabilities.

Technical Tags

text-to-image generationdiffusion modelsimplicit sexual intentionsinappropriate contentnoise-driven detectionmitigation frameworkmodel biasesgenerative quality preservationprompt engineeringethical AI

Research Topics

AI SafetyGenerative AI EthicsText-to-Image SynthesisContent ModerationDiffusion Models

Methods & Architectures

Noise-driven DetectionMitigation Framework (NDM)Analysis of implicit prompt cuesPreservation of generative quality Text-to-Image Diffusion Models

Applications & Tasks

Content Generation AI Ethics Online Safety Media Moderation Generation of implicit sexual content by T2I modelsInadequacy of existing detection methods for subtle cuesTrade-off between fine-tuning for safety and generative quality Detecting implicit sexual intentions in promptsMitigating generation of inappropriate contentEnsuring safety of T2I models

Related Fields

Generative AIAI SafetyComputer VisionNatural Language ProcessingEthics in AIDiffusion Models

Keywords

Text-to-imageDiffusion modelsAI safetyImplicit intent detectionContent moderationGenerative AIEthical AIPrompt analysisNoise-driven methodsModel bias

Academic Context

#AI Safety#Generative AI Ethics#Text-to-Image Synthesis#Content Moderation#Diffusion Models

Commercial Potential

Potential Products

Safety modules for T2I modelsContent filtering services for AI-generated media

Target Industries

Technology (AI Platforms)Media and EntertainmentSocial MediaAdvertising

Use Case Examples

Preventing AI image generators from creating NSFW content from ambiguous promptsDeveloping safer AI tools for creative professionalsAutomating the detection of harmful content in user-generated images

Competitive Edge

Offers a novel noise-driven approach to detect and mitigate subtle harmful content, distinct from methods relying solely on explicit prompt filtering or performance-degrading fine-tuning.

Market Opportunity

Rapidly expanding market for generative AI tools, with increasing focus on safety and ethics.

Revenue Models

Licensing of the NDM frameworkintegration into AI platform services.

Resource Requirements

Compute Needs

Moderate to High, for training detection models and potentially integrating with diffusion model inference.

Data Requirements

Datasets of prompts and corresponding generated images, labeled for implicit sexual intent and appropriateness.

Deployment Constraints

Requires careful calibration of detection thresholds,Potential for false positives/negatives

Scalability

Scales with the complexity of the detection model and the throughput requirements for T2I generation.

Regulatory Considerations

Content moderation policiesEthical guidelines for AI generation

Production Readiness

Maturity Level

Research/Development

Time to Market

1-3 years

Patent Potential

Moderate

View Full Paper Back to Papers