Redirecting to original paper in 30 seconds...
Click below to go immediately or wait for automatic redirect
📄 Abstract
Abstract: Multimodal sarcasm detection has attracted growing interest due to the rise
of multimedia posts on social media. Understanding sarcastic image-text posts
often requires external contextual knowledge, such as cultural references or
commonsense reasoning. However, existing models struggle to capture the deeper
rationale behind sarcasm, relying mainly on shallow cues like image captions or
object-attribute pairs from images. To address this, we propose \textbf{MiDRE}
(\textbf{Mi}xture of \textbf{D}ual \textbf{R}easoning \textbf{E}xperts), which
integrates an internal reasoning expert for detecting incongruities within the
image-text pair and an external reasoning expert that utilizes structured
rationales generated via Chain-of-Thought prompting to a Large Vision-Language
Model. An adaptive gating mechanism dynamically weighs the two experts,
selecting the most relevant reasoning path. Unlike prior methods that treat
external knowledge as static input, MiDRE selectively adapts to when such
knowledge is beneficial, mitigating the risks of hallucinated or irrelevant
signals from large models. Experiments on two benchmark datasets show that
MiDRE achieves superior performance over baselines. Various qualitative
analyses highlight the crucial role of external rationales, revealing that even
when they are occasionally noisy, they provide valuable cues that guide the
model toward a better understanding of sarcasm.
Authors (3)
Soumyadeep Jana
Abhrajyoti Kundu
Sanasam Ranbir Singh
Key Contributions
Proposes MiDRE, a novel approach for multimodal sarcasm detection that combines internal and external reasoning experts, guided by Chain-of-Thought prompting on LVLMs. An adaptive gating mechanism dynamically weighs these experts, enabling deeper understanding of sarcasm by leveraging both within-modality incongruities and external knowledge.
Business Value
Enables more accurate analysis of user sentiment and intent on social media, improving content moderation, brand monitoring, and targeted advertising.