arxiv_ml 95% Match Research Paper NLP Researchers,ML Engineers,Information Retrieval Specialists,Developers of AI assistants 3 weeks ago

Multimodal RAG for Unstructured Data:Leveraging Modality-Aware Knowledge Graphs with Hybrid Retrieval

large-language-models › multimodal-llms

📄 Abstract

Abstract: Current Retrieval-Augmented Generation (RAG) systems primarily operate on unimodal textual data, limiting their effectiveness on unstructured multimodal documents. Such documents often combine text, images, tables, equations, and graphs, each contributing unique information. In this work, we present a Modality-Aware Hybrid retrieval Architecture (MAHA), designed specifically for multimodal question answering with reasoning through a modality-aware knowledge graph. MAHA integrates dense vector retrieval with structured graph traversal, where the knowledge graph encodes cross-modal semantics and relationships. This design enables both semantically rich and context-aware retrieval across diverse modalities. Evaluations on multiple benchmark datasets demonstrate that MAHA substantially outperforms baseline methods, achieving a ROUGE-L score of 0.486, providing complete modality coverage. These results highlight MAHA's ability to combine embeddings with explicit document structure, enabling effective multimodal retrieval. Our work establishes a scalable and interpretable retrieval framework that advances RAG systems by enabling modality-aware reasoning over unstructured multimodal data.

Authors (2)

Rashmi R

Vidyadhar Upadhya

Submitted

October 16, 2025

arXiv Category

cs.LG

arXiv PDF

Key Contributions

This paper introduces MAHA, a Modality-Aware Hybrid retrieval Architecture for multimodal RAG on unstructured data. MAHA leverages a modality-aware knowledge graph and combines dense vector retrieval with graph traversal to achieve semantically rich and context-aware retrieval across text, images, tables, and graphs, significantly outperforming unimodal baselines.

Business Value

Enables more comprehensive and accurate information retrieval and generation from complex, unstructured documents (e.g., reports, manuals, web pages), improving knowledge management and decision support.

Paper Metadata

Innovation Type

Architectural/Algorithmic

Deployment Feasibility

Moderate. Requires infrastructure for knowledge graph construction and hybrid retrieval, which can be complex to set up and maintain.

Limitations Addressed

Inability of standard RAG to handle multimodal unstructured documents,Difficulty in capturing cross-modal relationships,Limited effectiveness of purely dense retrieval on complex documents

Performance Gains

Substantially outperforms baseline methods,Achieves complete modality coverage

Technical Tags

Retrieval-Augmented Generation (RAG)multimodal learningunstructured dataknowledge graphshybrid retrievalmodality-awarequestion answeringreasoningdense retrievalgraph traversal

Research Topics

Multimodal AIInformation RetrievalKnowledge RepresentationQuestion Answering SystemsGenerative Models

Methods & Architectures

Modality-Aware Hybrid retrieval Architecture (MAHA)Dense vector retrievalStructured graph traversalModality-aware knowledge graph construction Retrieval-Augmented Generation (RAG) modelsKnowledge Graphs

Applications & Tasks

Document Understanding Information Extraction Customer Support Research Assistance Education Limitations of unimodal RAG on multimodal documentsIntegrating information from diverse modalities (text, images, tables, etc.)Lack of cross-modal semantic understandingEffective retrieval from unstructured multimodal data Multimodal question answeringReasoning over unstructured multimodal documentsGenerating context-aware responses from diverse data sources

Datasets & Benchmarks

Benchmarks

ROUGE-L score of 0.486

ROUGE-L

Related Fields

Natural Language ProcessingComputer VisionKnowledge RepresentationInformation RetrievalGraph Databases

Keywords

Multimodal RAGKnowledge GraphsHybrid RetrievalUnstructured DataQuestion AnsweringReasoningDense RetrievalGraph TraversalLLMsInformation Extraction

Academic Context

#Multimodal AI#Information Retrieval#Knowledge Representation#Question Answering Systems#Generative Models

Commercial Potential

Potential Products

Advanced multimodal search enginesIntelligent document analysis platformsMultimodal AI assistants

Target Industries

LegalHealthcareFinanceResearch & DevelopmentPublishing

Use Case Examples

Answering complex questions from research papers containing text and figuresSummarizing reports with tables and diagramsBuilding intelligent assistants for technical documentation

Competitive Edge

Addresses a key limitation of current RAG systems by enabling effective processing of multimodal unstructured data, offering a more holistic approach to information retrieval and generation.

Market Opportunity

Significant market opportunity in enterprise search, knowledge management, and AI assistants.

Revenue Models

SaaS platformsAPI licensingspecialized AI solutions.

Resource Requirements

Compute Needs

High, due to the complexity of processing multiple modalities, building knowledge graphs, and performing hybrid retrieval.

Data Requirements

Requires diverse multimodal datasets with unstructured documents.

Deployment Constraints

Complexity of the system, potential latency issues, and the need for robust knowledge graph construction and maintenance.

Scalability

Scalability depends on the efficiency of the retrieval mechanisms and the knowledge graph management. Hybrid retrieval can offer better scalability than purely dense methods for certain queries.

Production Readiness

Maturity Level

Research

Time to Market

3-5 years for robust, scalable deployment.

Patent Potential

Moderate to high, for the MAHA architecture, knowledge graph integration, and hybrid retrieval strategy.

View Full Paper Back to Papers