Redirecting to original paper in 30 seconds...

Click below to go immediately or wait for automatic redirect

arxiv_cl 94% Match Research Paper NLP Researchers,Computer Vision Researchers,Social Media Analysts,AI Engineers 6 days ago

Think Twice Before You Judge: Mixture of Dual Reasoning Experts for Multimodal Sarcasm Detection

large-language-models › multimodal-llms
📄 Abstract

Abstract: Multimodal sarcasm detection has attracted growing interest due to the rise of multimedia posts on social media. Understanding sarcastic image-text posts often requires external contextual knowledge, such as cultural references or commonsense reasoning. However, existing models struggle to capture the deeper rationale behind sarcasm, relying mainly on shallow cues like image captions or object-attribute pairs from images. To address this, we propose \textbf{MiDRE} (\textbf{Mi}xture of \textbf{D}ual \textbf{R}easoning \textbf{E}xperts), which integrates an internal reasoning expert for detecting incongruities within the image-text pair and an external reasoning expert that utilizes structured rationales generated via Chain-of-Thought prompting to a Large Vision-Language Model. An adaptive gating mechanism dynamically weighs the two experts, selecting the most relevant reasoning path. Unlike prior methods that treat external knowledge as static input, MiDRE selectively adapts to when such knowledge is beneficial, mitigating the risks of hallucinated or irrelevant signals from large models. Experiments on two benchmark datasets show that MiDRE achieves superior performance over baselines. Various qualitative analyses highlight the crucial role of external rationales, revealing that even when they are occasionally noisy, they provide valuable cues that guide the model toward a better understanding of sarcasm.
Authors (3)
Soumyadeep Jana
Abhrajyoti Kundu
Sanasam Ranbir Singh
Submitted
July 6, 2025
arXiv Category
cs.CL
arXiv PDF

Key Contributions

Proposes MiDRE, a novel approach for multimodal sarcasm detection that combines internal and external reasoning experts, guided by Chain-of-Thought prompting on LVLMs. An adaptive gating mechanism dynamically weighs these experts, enabling deeper understanding of sarcasm by leveraging both within-modality incongruities and external knowledge.

Business Value

Enables more accurate analysis of user sentiment and intent on social media, improving content moderation, brand monitoring, and targeted advertising.