Redirecting to original paper in 30 seconds...
Click below to go immediately or wait for automatic redirect
📄 Abstract
Abstract: Large Language Models (LLMs) have revolutionized various Natural Language
Generation (NLG) tasks, including Argument Summarization (ArgSum), a key
subfield of Argument Mining. This paper investigates the integration of
state-of-the-art LLMs into ArgSum systems and their evaluation. In particular,
we propose a novel prompt-based evaluation scheme, and validate it through a
novel human benchmark dataset. Our work makes three main contributions: (i) the
integration of LLMs into existing ArgSum systems, (ii) the development of two
new LLM-based ArgSum systems, benchmarked against prior methods, and (iii) the
introduction of an advanced LLM-based evaluation scheme. We demonstrate that
the use of LLMs substantially improves both the generation and evaluation of
argument summaries, achieving state-of-the-art results and advancing the field
of ArgSum. We also show that among the four LLMs integrated in (i) and (ii),
Qwen-3-32B, despite having the fewest parameters, performs best, even
surpassing GPT-4o.
Key Contributions
Investigates the integration of LLMs into Argument Summarization (ArgSum) systems and proposes a novel prompt-based evaluation scheme validated by a new human benchmark dataset. The paper develops two new LLM-based ArgSum systems, demonstrating substantial improvements in both generation and evaluation, and identifies Qwen-3-32B as a top performer.
Business Value
Enhances the ability to automatically generate high-quality summaries of arguments, useful for legal analysis, policy making, debate preparation, and understanding complex discussions, making information more accessible and digestible.