arxiv_cl 85% Match Research Paper NLP Researchers,Machine Learning Engineers,Data Scientists 2 weeks ago

Combining Distantly Supervised Models with In Context Learning for Monolingual and Cross-Lingual Relation Extraction

large-language-models › reasoning

📄 Abstract

Abstract: Distantly Supervised Relation Extraction (DSRE) remains a long-standing challenge in NLP, where models must learn from noisy bag-level annotations while making sentence-level predictions. While existing state-of-the-art (SoTA) DSRE models rely on task-specific training, their integration with in-context learning (ICL) using large language models (LLMs) remains underexplored. A key challenge is that the LLM may not learn relation semantics correctly, due to noisy annotation. In response, we propose HYDRE -- HYbrid Distantly Supervised Relation Extraction framework. It first uses a trained DSRE model to identify the top-k candidate relations for a given test sentence, then uses a novel dynamic exemplar retrieval strategy that extracts reliable, sentence-level exemplars from training data, which are then provided in LLM prompt for outputting the final relation(s). We further extend HYDRE to cross-lingual settings for RE in low-resource languages. Using available English DSRE training data, we evaluate all methods on English as well as a newly curated benchmark covering four diverse low-resource Indic languages -- Oriya, Santali, Manipuri, and Tulu. HYDRE achieves up to 20 F1 point gains in English and, on average, 17 F1 points on Indic languages over prior SoTA DSRE models. Detailed ablations exhibit HYDRE's efficacy compared to other prompting strategies.

Authors (4)

Vipul Rathore

Malik Hammad Faisal

Parag Singla

Mausam

Submitted

October 21, 2025

arXiv Category

cs.CL

arXiv PDF

Key Contributions

This paper introduces HYDRE, a novel framework that combines trained Distantly Supervised Relation Extraction (DSRE) models with In-Context Learning (ICL) using Large Language Models (LLMs). It addresses the challenge of noisy annotations in DSRE by using a dynamic exemplar retrieval strategy to provide reliable sentence-level examples to LLMs, improving relation semantic learning. The framework is also extended to cross-lingual settings for low-resource languages.

Business Value

Improved accuracy in extracting relationships from text can enhance knowledge graph construction, automate information retrieval, and power more sophisticated search engines, leading to better data analysis and decision-making in various industries.

Paper Metadata

Innovation Type

Hybrid Approach

Deployment Feasibility

Moderate. Requires access to LLMs and potentially significant computational resources for training and inference. The dynamic exemplar retrieval might add complexity.

Limitations Addressed

Noisy annotations in Distantly Supervised Relation Extraction,LLMs not learning relation semantics correctly due to noisy data,Lack of effective integration between DSRE and ICL,Challenges in cross-lingual relation extraction for low-resource languages

Technical Tags

Relation ExtractionDistant SupervisionIn-Context LearningLarge Language ModelsCross-Lingual TransferFew-Shot LearningNLPPrompt EngineeringExemplar RetrievalNoisy Labels

Research Topics

Information ExtractionNatural Language ProcessingMachine LearningKnowledge RepresentationLow-Resource NLPTransfer Learning

Methods & Architectures

Distant SupervisionIn-Context Learning (ICL)Dynamic Exemplar RetrievalPromptingFine-tuning (implied) Transformer-based LLMsTrained DSRE models

Applications & Tasks

Information Extraction Knowledge Graph Construction Text Analysis Information ExtractionHandling Noisy DataLow-Resource ScenariosCross-Lingual Tasks Relation ExtractionCross-Lingual Relation Extraction

Related Fields

Information RetrievalKnowledge DiscoveryComputational LinguisticsMachine Translation

Keywords

Relation ExtractionDistant SupervisionIn-Context LearningLarge Language ModelsCross-Lingual NLPLow-Resource LanguagesPrompt EngineeringExemplar SelectionNoisy DataInformation ExtractionKnowledge GraphsNLPMachine Learning

Academic Context

#Information Extraction#Natural Language Processing#Machine Learning#Knowledge Representation#Low-Resource NLP#Transfer Learning

Commercial Potential

Potential Products

Automated knowledge base population toolsEnhanced search and recommendation systemsInformation extraction services

Target Industries

Information TechnologyFinanceHealthcareLegal

Use Case Examples

Extracting drug-drug interactions from biomedical literatureIdentifying company mergers and acquisitions from news articlesPopulating knowledge graphs with entity relationships

Competitive Edge

This work positions itself by bridging the gap between traditional DSRE methods and the emerging capabilities of LLMs with ICL, offering a more robust approach to handling noisy data and low-resource scenarios.

Market Opportunity

Large market for NLP solutions, information extraction, and knowledge management.

Revenue Models

SaaS for information extraction serviceslicensing of specialized models.

Resource Requirements

Compute Needs

High (for LLM inference and potentially training DSRE models)

Data Requirements

Labeled data for DSRE, potentially large unannotated corpora for LLM prompting.

Deployment Constraints

Dependency on LLM availability and performance, computational cost.

Scalability

Scalability depends on the underlying LLM and the efficiency of the exemplar retrieval strategy. Cross-lingual scalability is a focus.

Production Readiness

Maturity Level

Research

Time to Market

1-3 years

Licensing

Likely research/non-commercial, depending on LLM used.

Patent Potential

Low

View Full Paper Back to Papers