Redirecting to original paper in 30 seconds...
Click below to go immediately or wait for automatic redirect
📄 Abstract
Abstract: Concept erasure aims to selectively unlearning undesirable content in
diffusion models (DMs) to reduce the risk of sensitive content generation. As a
novel paradigm in concept erasure, most existing methods employ adversarial
training to identify and suppress target concepts, thus reducing the likelihood
of sensitive outputs. However, these methods often neglect the specificity of
adversarial training in DMs, resulting in only partial mitigation. In this
work, we investigate and quantify this specificity from the perspective of
concept space, i.e., can adversarial samples truly fit the target concept
space? We observe that existing methods neglect the role of conceptual
semantics when generating adversarial samples, resulting in ineffective fitting
of concept spaces. This oversight leads to the following issues: 1) when there
are few adversarial samples, they fail to comprehensively cover the object
concept; 2) conversely, they will disrupt other target concept spaces.
Motivated by the analysis of these findings, we introduce S-GRACE
(Semantics-Guided Robust Adversarial Concept Erasure), which grace leveraging
semantic guidance within the concept space to generate adversarial samples and
perform erasure training. Experiments conducted with seven state-of-the-art
methods and three adversarial prompt generation strategies across various DM
unlearning scenarios demonstrate that S-GRACE significantly improves erasure
performance 26%, better preserves non-target concepts, and reduces training
time by 90%. Our code is available at https://github.com/Qhong-522/S-GRACE.
Authors (3)
Qinghong Yin
Yu Tian
Yue Zhang
Submitted
October 31, 2025
Key Contributions
This paper re-evaluates robust adversarial concept erasure in diffusion models, highlighting that existing methods often fail due to neglecting conceptual semantics in adversarial sample generation. It quantifies this specificity, revealing issues like incomplete concept coverage or disruption of other concept spaces, and calls for improved adversarial training strategies.
Business Value
Crucial for developing safer and more controllable generative AI systems, reducing the risk of generating harmful or undesirable content, and building user trust in AI technologies.