Redirecting to original paper in 30 seconds...

Click below to go immediately or wait for automatic redirect

arxiv_cl 95% Match Research Paper AI researchers,NLP engineers,data scientists,information retrieval specialists 1 week ago

Beyond Isolated Dots: Benchmarking Structured Table Construction as Deep Knowledge Extraction

large-language-models › evaluation
📄 Abstract

Abstract: With the emergence of large language models (LLMs), there is an expectation that LLMs can effectively extract explicit information from complex real-world documents (e.g., papers, reports). However, most LLMs generate paragraph-style answers that are chaotic, disorganized, and untraceable. To bridge this gap, we introduce the Arranged and Organized Extraction Benchmark (AOE), a new bilingual benchmark with data and documents of varying lengths designed to systematically evaluate the ability of LLMs to comprehend fragmented documents and reconstruct isolated information into one organized table. Unlike conventional text-to-table tasks, which rely on fixed schema and narrow task domains, AOE includes 11 carefully crafted tasks across three diverse domains, requiring models to generate context-specific schema tailored to varied input queries. In the experiment, we evaluated both open-source and closed-source state-of-the-art LLMs. The results show that even the most advanced models struggled significantly. The benchmark is available at https://anonymous.4open.science/r/AOE-Benchmark/.
Authors (12)
Tianyun Zhong
Guozhao Mo
Yanjiang Liu
Yihan Chen
Lingdi Kong
Xuanang Chen
+6 more
Submitted
July 22, 2025
arXiv Category
cs.CL
arXiv PDF

Key Contributions

This paper introduces the Arranged and Organized Extraction Benchmark (AOE), a new bilingual benchmark designed to evaluate LLMs' ability to reconstruct fragmented information into organized tables, requiring context-specific schema generation. This addresses the limitation of LLMs generating chaotic, untraceable paragraph answers and provides a systematic way to benchmark structured extraction capabilities.

Business Value

Enables businesses to more effectively extract and organize critical information from large volumes of unstructured or semi-structured documents (e.g., reports, contracts, research papers), leading to better data-driven decision-making and operational efficiency.