arxiv_cl 95% Match Research Paper AI Researchers,Cognitive Scientists,AI Safety Researchers,Developers of Social AI 2 weeks ago

Doing Things with Words: Rethinking Theory of Mind Simulation in Large Language Models

large-language-models › reasoning

📄 Abstract

Abstract: Language is fundamental to human cooperation, facilitating not only the exchange of information but also the coordination of actions through shared interpretations of situational contexts. This study explores whether the Generative Agent-Based Model (GABM) Concordia can effectively model Theory of Mind (ToM) within simulated real-world environments. Specifically, we assess whether this framework successfully simulates ToM abilities and whether GPT-4 can perform tasks by making genuine inferences from social context, rather than relying on linguistic memorization. Our findings reveal a critical limitation: GPT-4 frequently fails to select actions based on belief attribution, suggesting that apparent ToM-like abilities observed in previous studies may stem from shallow statistical associations rather than true reasoning. Additionally, the model struggles to generate coherent causal effects from agent actions, exposing difficulties in processing complex social interactions. These results challenge current statements about emergent ToM-like capabilities in LLMs and highlight the need for more rigorous, action-based evaluation frameworks.

Key Contributions

This study critically evaluates the Theory of Mind (ToM) simulation capabilities of LLMs using the Concordia GABM. It reveals that GPT-4 often fails to make genuine inferences based on belief attribution, suggesting apparent ToM-like abilities may stem from shallow associations rather than true reasoning, and highlights difficulties in processing complex social interactions and generating causal effects.

Business Value

Provides crucial insights into the limitations of current LLMs in understanding and simulating human social cognition, guiding the development of safer and more sophisticated AI systems for human interaction.

Paper Metadata

Innovation Type

Evaluation and Analysis

Deployment Feasibility

N/A (This is an evaluation study, not a deployable system).

Limitations Addressed

Overestimation of LLM Theory of Mind abilities, reliance on linguistic memorization over genuine inference, and difficulties in processing complex social dynamics.

Technical Tags

Theory of Mind (ToM)LLM reasoningGenerative Agent-Based Model (GABM)ConcordiaSocial contextBelief attributionCausal effectsSocial interactionsGPT-4

Research Topics

AI CognitionTheory of MindAgent-Based ModelingLLM InterpretabilitySocial AI

Methods & Architectures

Generative Agent-Based Model (GABM) simulationAssessment of belief attributionAnalysis of causal effects generation GPT-4Generative Agent-Based Model (GABM)

Applications & Tasks

AI Safety Social Simulation Human-AI Interaction Cognitive Science Simulating Theory of MindLLMs performing genuine inferencesProcessing complex social interactionsGenerating coherent causal effects Evaluating ToM simulation in LLMsAssessing LLM inference from social contextUnderstanding LLM limitations in social reasoning

Related Fields

Cognitive SciencePsychologyArtificial IntelligenceAgent-Based ModelingPhilosophy of Mind

Keywords

Theory of MindLLMreasoningGABMsimulationGPT-4social contextbelief attributioncausalityAI safetyinterpretability

Academic Context

#AI Cognition#Theory of Mind#Agent-Based Modeling#LLM Interpretability#Social AI

Technology Stack

Frameworks & Libraries

Concordia

Commercial Potential

Competitive Edge

Offers a rigorous critique and re-evaluation of LLM capabilities in Theory of Mind, challenging previous assumptions.

Market Opportunity

N/A

Revenue Models

N/A

Resource Requirements

Compute Needs

Significant compute required for running GABM simulations and large LLMs like GPT-4.

Data Requirements

Simulated environments and social interaction scenarios.

Scalability

Scalability of simulations depends on computational resources.

Production Readiness

Maturity Level

Research

Time to Market

N/A

Patent Potential

Low, focused on fundamental research findings.

View Full Paper Back to Papers