Redirecting to original paper in 30 seconds...
Click below to go immediately or wait for automatic redirect
📄 Abstract
Abstract: With the emergence of large language models (LLMs), there is an expectation
that LLMs can effectively extract explicit information from complex real-world
documents (e.g., papers, reports). However, most LLMs generate paragraph-style
answers that are chaotic, disorganized, and untraceable. To bridge this gap, we
introduce the Arranged and Organized Extraction Benchmark (AOE), a new
bilingual benchmark with data and documents of varying lengths designed to
systematically evaluate the ability of LLMs to comprehend fragmented documents
and reconstruct isolated information into one organized table. Unlike
conventional text-to-table tasks, which rely on fixed schema and narrow task
domains, AOE includes 11 carefully crafted tasks across three diverse domains,
requiring models to generate context-specific schema tailored to varied input
queries. In the experiment, we evaluated both open-source and closed-source
state-of-the-art LLMs. The results show that even the most advanced models
struggled significantly. The benchmark is available at
https://anonymous.4open.science/r/AOE-Benchmark/.
Authors (12)
Tianyun Zhong
Guozhao Mo
Yanjiang Liu
Yihan Chen
Lingdi Kong
Xuanang Chen
+6 more
Key Contributions
This paper introduces the Arranged and Organized Extraction Benchmark (AOE), a new bilingual benchmark designed to evaluate LLMs' ability to reconstruct fragmented information into organized tables, requiring context-specific schema generation. This addresses the limitation of LLMs generating chaotic, untraceable paragraph answers and provides a systematic way to benchmark structured extraction capabilities.
Business Value
Enables businesses to more effectively extract and organize critical information from large volumes of unstructured or semi-structured documents (e.g., reports, contracts, research papers), leading to better data-driven decision-making and operational efficiency.