Redirecting to original paper in 30 seconds...
Click below to go immediately or wait for automatic redirect
📄 Abstract
Abstract: As LLM-based agents become increasingly autonomous and will more freely
interact with each other, studying the interplay among them becomes crucial to
anticipate emergent phenomena and potential risks. In this work, we provide an
in-depth analysis of the interactions among agents within a simulated
hierarchical social environment, drawing inspiration from the Stanford Prison
Experiment. Leveraging 2,400 conversations across six LLMs (i.e., LLama3,
Orca2, Command-r, Mixtral, Mistral2, and gpt4.1) and 240 experimental
scenarios, we analyze persuasion and anti-social behavior between a guard and a
prisoner agent with differing objectives. We first document model-specific
conversational failures in this multi-agent power dynamic context, thereby
narrowing our analytic sample to 1,600 conversations. Among models
demonstrating successful interaction, we find that goal setting significantly
influences persuasiveness but not anti-social behavior. Moreover, agent
personas, especially the guard's, substantially impact both successful
persuasion by the prisoner and the manifestation of anti-social actions.
Notably, we observe the emergence of anti-social conduct even in absence of
explicit negative personality prompts. These results have important
implications for the development of interactive LLM agents and the ongoing
discussion of their societal impact.
Key Contributions
This work provides an in-depth analysis of persuasion and anti-social behavior among LLM agents in a simulated hierarchical social environment, inspired by the Stanford Prison Experiment. By analyzing 2,400 conversations across six LLMs, it reveals model-specific failures and finds that goal setting influences persuasiveness but not anti-social behavior, offering crucial insights into the emergent dynamics and potential risks of autonomous LLM interactions.
Business Value
Crucial for developing safer and more predictable AI systems, especially in multi-agent scenarios, by understanding potential negative emergent behaviors and designing mitigation strategies.