Redirecting to original paper in 30 seconds...
Click below to go immediately or wait for automatic redirect
This paper introduces MUDMAN (Meta-Unlearning with Disruption Masking and Normalization), a novel method for robust and irreversible LLM unlearning. MUDMAN combines Disruption Masking (updating weights only when gradients align) and gradient normalization, outperforming prior methods like TAR by 40% in preventing the recovery of dangerous capabilities.
Enhances the safety and trustworthiness of LLMs by providing a reliable way to remove sensitive or harmful information, crucial for responsible AI deployment in various sectors.