arxiv_ai 92% Match Research Paper AI Researchers,Legal Tech Developers,Legal Professionals,NLP Engineers 2 weeks ago

LeCoDe: A Benchmark Dataset for Interactive Legal Consultation Dialogue Evaluation

large-language-models › evaluation

📄 Abstract

Abstract: Legal consultation is essential for safeguarding individual rights and ensuring access to justice, yet remains costly and inaccessible to many individuals due to the shortage of professionals. While recent advances in Large Language Models (LLMs) offer a promising path toward scalable, low-cost legal assistance, current systems fall short in handling the interactive and knowledge-intensive nature of real-world consultations. To address these challenges, we introduce LeCoDe, a real-world multi-turn benchmark dataset comprising 3,696 legal consultation dialogues with 110,008 dialogue turns, designed to evaluate and improve LLMs' legal consultation capability. With LeCoDe, we innovatively collect live-streamed consultations from short-video platforms, providing authentic multi-turn legal consultation dialogues. The rigorous annotation by legal experts further enhances the dataset with professional insights and expertise. Furthermore, we propose a comprehensive evaluation framework that assesses LLMs' consultation capabilities in terms of (1) clarification capability and (2) professional advice quality. This unified framework incorporates 12 metrics across two dimensions. Through extensive experiments on various general and domain-specific LLMs, our results reveal significant challenges in this task, with even state-of-the-art models like GPT-4 achieving only 39.8% recall for clarification and 59% overall score for advice quality, highlighting the complexity of professional consultation scenarios. Based on these findings, we further explore several strategies to enhance LLMs' legal consultation abilities. Our benchmark contributes to advancing research in legal domain dialogue systems, particularly in simulating more real-world user-expert interactions.

Authors (9)

Weikang Yuan

Kaisong Song

Zhuoren Jiang

Junjie Cao

Yujie Zhang

Jun Lin

+3 more

Submitted

May 26, 2025

arXiv Category

cs.CL

arXiv PDF

Key Contributions

Introduces LeCoDe, a real-world multi-turn benchmark dataset for evaluating LLMs in legal consultation dialogues. Comprising 3,696 dialogues with 110,008 turns, it uses live-streamed consultations annotated by legal experts. LeCoDe aims to address the challenges LLMs face in handling the interactive and knowledge-intensive nature of legal consultations.

Business Value

Facilitates the development of more capable and accessible AI-powered legal assistance, potentially lowering costs and increasing access to justice for individuals and businesses.

Paper Metadata

Innovation Type

Benchmark Creation and Dataset Curation

Deployment Feasibility

High for evaluation and research; direct deployment of LLMs for legal advice requires significant caution and regulatory compliance.

Limitations Addressed

Current LLM systems fall short in handling the interactive and knowledge-intensive nature of real-world legal consultations; lack of specialized benchmarks.

Performance Gains

Establishes a baseline for LLM performance on legal consultation tasks and provides a framework for future improvements.

Technical Tags

legal consultationdialogue evaluationbenchmark datasetLLM capabilitiesmulti-turn conversationsknowledge-intensive taskslegal techinteractive AIreal-world data

Research Topics

Natural Language UnderstandingDialogue SystemsLegal AIAI Benchmarking

Methods & Architectures

Dataset CreationDialogue Evaluation Framework Large Language Models (LLMs)

Applications & Tasks

Legal Services Customer Support AI Assistants LLM limitations in interactive, knowledge-intensive dialoguesLack of specialized benchmarks for legal consultation Legal ConsultationDialogue EvaluationImproving LLM Legal Reasoning

Datasets & Benchmarks

Datasets

LeCoDe

Dialogue coherenceLegal accuracyUser satisfaction

Related Fields

Legal TechnologyNatural Language ProcessingDialogue SystemsArtificial Intelligence

Keywords

legal consultationdialoguebenchmarkLLMlegal AInatural language processinginteractive systemsknowledge-intensivedatasetevaluation

Academic Context

#Natural Language Understanding#Dialogue Systems#Legal AI#AI Benchmarking

Commercial Potential

Potential Products

AI-powered legal assistantsAutomated legal research toolsLegal consultation platforms

Target Industries

Legal ServicesTechnologyConsulting

Use Case Examples

Providing initial legal guidance to individualsAssisting lawyers with case research and document reviewAutomating responses to common legal queries

Competitive Edge

Offers a unique, real-world dataset and evaluation framework specifically designed for the complex domain of legal consultation, pushing the boundaries of LLM capabilities in this area.

Market Opportunity

Large market for legal services and legal tech solutions.

Revenue Models

SaaS subscriptions for legal AI platformslicensing of legal AI models.

Resource Requirements

Compute Needs

Requires compute for training and evaluating LLMs on the dataset.

Data Requirements

The LeCoDe dataset.

Deployment Constraints

Legal advice is highly regulated; LLMs must be extremely accurate and reliable. Ethical and liability concerns are paramount.

Scalability

Scalability of LLMs to handle a large volume of legal consultations is a key consideration.

Regulatory Considerations

Legal practice regulationsConfidentiality and data privacy (attorney-client privilege)Accuracy and liability for legal advice.

Production Readiness

Maturity Level

Research

Time to Market

2-4 years for robust, compliant legal AI products.

Patent Potential

Moderate, for novel methods of legal AI or dataset curation techniques.

View Full Paper Back to Papers