Menu

Today's AI Safety & Ethics Research Top Papers

Wednesday, November 5, 2025
Proposes an automated framework that discovers, retrieves, and evolves jailbreak strategies for LLMs by extracting information from failed attacks. Demonstrates a strategy that evades current defenses and self-evolves, enhancing LLM security research.
Identifies and addresses the 'Alignment-Reality Gap' in LLMs, where models misalign with evolving norms. Proposes a framework to update LLMs efficiently without costly re-annotation, aiming for more reliable long-term use.
Introduces ValueCompass, a framework grounded in psychological theory for measuring contextual value alignment between humans and LLMs. Enables systematic assessment of AI alignment with diverse individual and societal values.
Analyzes persuasion and anti-social behavior of LLM agents in hierarchical multi-agent settings. Investigates emergent phenomena and potential risks through simulated interactions, offering insights into AI agent behavior.
Introduces CytoNet, a foundation model encoding high-resolution cerebral cortex images into expressive features using self-supervised learning. Enables comprehensive brain analyses by capturing cellular architecture and spatial proximity.
Explains adversarial fragility in neural networks by identifying feature compression as the root cause. Provides a matrix-theoretic explanation showing how robustness degrades with input compression.
Presents MammoClean, a framework for standardizing mammography datasets and quantifying biases. Addresses heterogeneity in data quality and metadata to improve generalizability and clinical deployment of AI models.
Proposes LiveSecBench, a dynamic benchmark for Chinese-context LLM safety evaluation. Covers legality, ethics, factuality, privacy, adversarial robustness, and reasoning safety rooted in Chinese frameworks.
Sort by:

Loading more papers...

📚 You've reached the end of the papers list