Redirecting to original paper in 30 seconds...
Click below to go immediately or wait for automatic redirect
📄 Abstract
Abstract: Table-to-text generation (insight generation from tables) is a challenging
task that requires precision in analyzing the data. In addition, the evaluation
of existing benchmarks is affected by contamination of Large Language Model
(LLM) training data as well as domain imbalance. We introduce FreshTab, an
on-the-fly table-to-text benchmark generation from Wikipedia, to combat the LLM
data contamination problem and enable domain-sensitive evaluation. While
non-English table-to-text datasets are limited, FreshTab collects datasets in
different languages on demand (we experiment with German, Russian and French in
addition to English). We find that insights generated by LLMs from recent
tables collected by our method appear clearly worse by automatic metrics, but
this does not translate into LLM and human evaluations. Domain effects are
visible in all evaluations, showing that a~domain-balanced benchmark is more
challenging.
Key Contributions
Introduces FreshTab, an on-the-fly table-to-text benchmark generation method from Wikipedia to combat LLM data contamination and enable domain-sensitive evaluation. It addresses the limitations of existing benchmarks by providing fresh data and supporting multilingual generation, revealing domain effects in LLM performance.
Business Value
Provides a more reliable and representative evaluation framework for table-to-text generation models, crucial for applications requiring accurate data interpretation and insight generation. This leads to better-performing AI systems in data analysis and reporting.