arxiv_cl 94% Match Research Paper NLP Researchers,Computer Vision Researchers,Social Media Analysts,AI Engineers 6 days ago

Think Twice Before You Judge: Mixture of Dual Reasoning Experts for Multimodal Sarcasm Detection

large-language-models › multimodal-llms

📄 Abstract

Abstract: Multimodal sarcasm detection has attracted growing interest due to the rise of multimedia posts on social media. Understanding sarcastic image-text posts often requires external contextual knowledge, such as cultural references or commonsense reasoning. However, existing models struggle to capture the deeper rationale behind sarcasm, relying mainly on shallow cues like image captions or object-attribute pairs from images. To address this, we propose \textbf{MiDRE} (\textbf{Mi}xture of \textbf{D}ual \textbf{R}easoning \textbf{E}xperts), which integrates an internal reasoning expert for detecting incongruities within the image-text pair and an external reasoning expert that utilizes structured rationales generated via Chain-of-Thought prompting to a Large Vision-Language Model. An adaptive gating mechanism dynamically weighs the two experts, selecting the most relevant reasoning path. Unlike prior methods that treat external knowledge as static input, MiDRE selectively adapts to when such knowledge is beneficial, mitigating the risks of hallucinated or irrelevant signals from large models. Experiments on two benchmark datasets show that MiDRE achieves superior performance over baselines. Various qualitative analyses highlight the crucial role of external rationales, revealing that even when they are occasionally noisy, they provide valuable cues that guide the model toward a better understanding of sarcasm.

Authors (3)

Soumyadeep Jana

Abhrajyoti Kundu

Sanasam Ranbir Singh

Submitted

July 6, 2025

arXiv Category

cs.CL

arXiv PDF

Key Contributions

Proposes MiDRE, a novel approach for multimodal sarcasm detection that combines internal and external reasoning experts, guided by Chain-of-Thought prompting on LVLMs. An adaptive gating mechanism dynamically weighs these experts, enabling deeper understanding of sarcasm by leveraging both within-modality incongruities and external knowledge.

Business Value

Enables more accurate analysis of user sentiment and intent on social media, improving content moderation, brand monitoring, and targeted advertising.

Paper Metadata

Innovation Type

Architectural Innovation

Deployment Feasibility

Moderate, requires integration with LVLMs and careful tuning of the gating mechanism.

Limitations Addressed

Existing models' struggle to capture deeper rationale behind sarcasm, relying on shallow cues and failing to integrate external contextual knowledge effectively.

Performance Gains

Outperforms prior methods by capturing deeper rationale and leveraging external knowledge more effectively.

Technical Tags

multimodal sarcasm detectionimage-text understandingreasoningChain-of-Thought promptingLarge Vision-Language Models (LVLMs)internal reasoningexternal reasoningadaptive gatingcommonsense reasoningsocial media analysis

Research Topics

Multimodal LearningNatural Language ProcessingComputer VisionArtificial IntelligenceSocial Media Analysis

Methods & Architectures

Mixture of Dual Reasoning Experts (MiDRE)Internal reasoning expertExternal reasoning expertChain-of-Thought promptingAdaptive gating mechanismLarge Vision-Language Models (LVLMs) Large Vision-Language Models (LVLMs)Mixture of Experts (MoE)

Applications & Tasks

Social Media Analysis Content Moderation Sentiment Analysis Sarcasm DetectionMultimodal UnderstandingReasoning Multimodal Sarcasm Detection

Related Fields

Computer VisionNatural Language ProcessingArtificial IntelligenceSocial Media Analytics

Keywords

Multimodal Sarcasm DetectionSarcasmImage-TextReasoningLVLMChain-of-ThoughtMixture of ExpertsAdaptive GatingSocial MediaSentiment AnalysisMiDRECommonsense ReasoningMultimodal AIDeep Learning

Academic Context

#Multimodal Learning#Natural Language Processing#Computer Vision#Artificial Intelligence#Social Media Analysis

Commercial Potential

Potential Products

Advanced social media monitoring toolsContent analysis platformsAI-powered moderation systems

Target Industries

Social MediaMarketingAdvertisingPublic Relations

Use Case Examples

Detecting sarcastic comments on social media postsAnalyzing the nuances of user feedbackImproving the accuracy of sentiment analysis

Competitive Edge

Offers a more sophisticated approach to multimodal sarcasm detection by integrating dual reasoning pathways and adaptive selection, surpassing methods that rely on shallow cues.

Market Opportunity

Growing, driven by the need for advanced social media analytics and content understanding.

Revenue Models

SaaS for social media analyticslicensing the technology to platformsor integrating into existing AI solutions.

Resource Requirements

Compute Needs

Significant compute required for LVLM inference and training.

Data Requirements

Labeled multimodal (image-text) datasets for sarcasm detection.

Deployment Constraints

Requires access to powerful LVLMs and careful calibration of the gating mechanism.

Scalability

Scalability depends on the underlying LVLM and the efficiency of the gating mechanism.

Regulatory Considerations

Ethical considerations in analyzing user-generated content.

Production Readiness

Maturity Level

Research

Time to Market

Medium to Long, due to reliance on complex LVLMs.

View Full Paper Back to Papers