arxiv_cl 95% Match Research Paper AI researchers,NLP engineers,Developers of LLM-based applications,Fact-checking organizations 1 week ago

JointCQ: Improving Factual Hallucination Detection with Joint Claim and Query Generation

large-language-models › evaluation

📄 Abstract

Abstract: Current large language models (LLMs) often suffer from hallucination issues, i,e, generating content that appears factual but is actually unreliable. A typical hallucination detection pipeline involves response decomposition (i.e., claim extraction), query generation, evidence collection (i.e., search or retrieval), and claim verification. However, existing methods exhibit limitations in the first two stages, such as context loss during claim extraction and low specificity in query generation, resulting in degraded performance across the hallucination detection pipeline. In this work, we introduce JointCQ https://github.com/pku0xff/JointCQ, a joint claim-and-query generation framework designed to construct an effective and efficient claim-query generator. Our framework leverages elaborately designed evaluation criteria to filter synthesized training data, and finetunes a language model for joint claim extraction and query generation, providing reliable and informative inputs for downstream search and verification. Experimental results demonstrate that our method outperforms previous methods on multiple open-domain QA hallucination detection benchmarks, advancing the goal of more trustworthy and transparent language model systems.

Authors (5)

Fan Xu

Huixuan Zhang

Zhenliang Zhang

Jiahao Wang

Xiaojun Wan

Submitted

October 22, 2025

arXiv Category

cs.CL

arXiv PDF Code

Key Contributions

Proposes JointCQ, a novel framework for joint claim and query generation to improve factual hallucination detection in LLMs. It addresses limitations in existing methods by preventing context loss during claim extraction and enhancing query specificity, leading to more effective hallucination detection pipelines.

Business Value

Enhances the trustworthiness and reliability of AI-generated content, crucial for applications like news generation, customer service bots, and information retrieval systems, thereby reducing risks associated with misinformation.

Paper Metadata

Innovation Type

Algorithmic Improvement

Deployment Feasibility

High. The approach focuses on improving specific components (claim/query generation) within existing hallucination detection pipelines, making it adaptable. Requires fine-tuning LLMs.

Limitations Addressed

Context loss during claim extraction,Low specificity in query generation,Degraded performance in hallucination detection pipelines due to issues in early stages

Performance Gains

Improved performance across the hallucination detection pipeline.

View Code on GitHub

Technical Tags

hallucination detectionlarge language modelsjoint claim generationquery generationfact verificationnatural language generationinformation retrievalresponse decomposition

Research Topics

Natural Language ProcessingTrustworthy AIInformation ExtractionMachine LearningFact Verification

Methods & Architectures

Joint Claim and Query GenerationFine-tuning Language ModelsData FilteringRetrieval-Augmented Generation (RAG) principles LLM-based models

Applications & Tasks

Information Verification Content Moderation Search Engines Fact-Checking Factual HallucinationInformation ReliabilityContent Generation Quality Detecting factual hallucinations in LLM outputsGenerating specific queries for evidence retrievalExtracting claims from text

Related Fields

Information RetrievalKnowledge RepresentationTrustworthy AINatural Language Understanding

Keywords

Hallucination DetectionLarge Language ModelsLLMFactual AccuracyClaim GenerationQuery GenerationFact VerificationNatural Language GenerationInformation RetrievalTrustworthy AINLPJointCQ

Academic Context

Peking University #Natural Language Processing#Trustworthy AI#Information Extraction#Machine Learning#Fact Verification

Companies & Organizations

Research Institutions

Peking University

Technology Stack

Frameworks & Libraries

PyTorchHugging Face Transformers

Programming Languages

Python

Commercial Potential

Potential Products

AI content verification toolsFact-checking plugins for LLMsReliability scoring systems for generated text

Target Industries

MediaPublishingTechnologyCustomer ServiceSearch

Use Case Examples

Ensuring factual accuracy in AI-generated news articlesVerifying information provided by chatbotsImproving the reliability of search engine result summaries

Competitive Edge

Improves upon existing hallucination detection methods by focusing on the critical upstream stages of claim and query generation, leading to more robust and accurate detection.

Market Opportunity

Growing market for AI safety and trustworthiness solutions.

Revenue Models

API access feeslicensing of the detection technology.

Resource Requirements

Compute Needs

Moderate to High (for LLM fine-tuning and inference)

Data Requirements

Requires a dataset for training the joint claim and query generation model, potentially including text passages, claims, queries, and verification labels. Elaborately designed evaluation criteria are used for filtering synthesized training data.

Deployment Constraints

Computational cost of running LLMs for generation and detection, potential for adversarial attacks that bypass detection.

Scalability

Scalable with distributed computing resources for LLM inference.

Regulatory Considerations

Ethical considerations regarding AI-generated content and misinformation.

Production Readiness

Maturity Level

Research/Development

Time to Market

1-2 years for integration into production systems

Licensing

Open Source (via GitHub)

Patent Potential

Moderate (novel joint generation approach)

View Full Paper Back to Papers