arxiv_ai 90% Match Research Paper Legal tech developers,NLP researchers,Legal professionals,AI researchers 2 weeks ago

The Massive Legal Embedding Benchmark (MLEB)

large-language-models › evaluation

📄 Abstract

Abstract: We present the Massive Legal Embedding Benchmark (MLEB), the largest, most diverse, and most comprehensive open-source benchmark for legal information retrieval to date. MLEB consists of ten expert-annotated datasets spanning multiple jurisdictions (the US, UK, EU, Australia, Ireland, and Singapore), document types (cases, legislation, regulatory guidance, contracts, and literature), and task types (search, zero-shot classification, and question answering). Seven of the datasets in MLEB were newly constructed in order to fill domain and jurisdictional gaps in the open-source legal information retrieval landscape. We document our methodology in building MLEB and creating the new constituent datasets, and release our code, results, and data openly to assist with reproducible evaluations.

Authors (3)

Umar Butler

Abdur-Rahman Butler

Adrian Lucas Malec

Submitted

October 22, 2025

arXiv Category

cs.CL

arXiv PDF

Key Contributions

Presents the Massive Legal Embedding Benchmark (MLEB), the largest and most diverse open-source benchmark for legal information retrieval. MLEB includes ten expert-annotated datasets across multiple jurisdictions and document types, with seven newly constructed datasets to fill existing gaps, facilitating reproducible evaluations of legal NLP models.

Business Value

Accelerates the development and adoption of advanced AI tools for the legal sector by providing a standardized, comprehensive platform for evaluating and comparing legal NLP models.

Paper Metadata

Innovation Type

Benchmark Creation

Deployment Feasibility

High, as it's a resource for researchers and developers.

Limitations Addressed

Addresses the lack of comprehensive, diverse, and open-source benchmarks for evaluating legal information retrieval and NLP models, particularly across different jurisdictions and document types.

Technical Tags

Legal Information RetrievalBenchmarkEmbedding ModelsZero-shot ClassificationQuestion AnsweringLegal TechNLPJurisdictionsDocument Types

Research Topics

Information RetrievalLegal TechnologyNatural Language ProcessingMachine Learning EvaluationLegal AI

Methods & Architectures

Creation of benchmark datasetsExpert annotationOpen-source release of code and data Embedding models (for legal information retrieval)

Applications & Tasks

Legal Law Legal Research Compliance Lack of comprehensive legal IR benchmarksDomain and jurisdictional gaps in existing benchmarksEvaluating legal NLP models Legal Information RetrievalZero-shot Classification (legal)Legal Question Answering

Datasets & Benchmarks

Datasets

MLEB (Massive Legal Embedding Benchmark), US legal data, UK legal data, EU legal data, Australia legal data, Ireland legal data, Singapore legal data

Benchmarks

MLEB

Search performanceZero-shot classification accuracyQuestion answering performance

Related Fields

Information RetrievalLegal InformaticsNatural Language ProcessingMachine LearningLaw

Keywords

Legal Information RetrievalBenchmarkEmbedding ModelsLegal NLPZero-shot ClassificationQuestion AnsweringLegal TechMLEBOpen SourceJurisdictionsDocument TypesLegal ResearchAI in Law

Academic Context

#Information Retrieval#Legal Technology#Natural Language Processing#Machine Learning Evaluation#Legal AI

Commercial Potential

Potential Products

Legal search enginesAI-powered legal research toolsCompliance monitoring software

Target Industries

Legal ServicesGovernmentComplianceFinancial Services

Use Case Examples

Evaluating models for searching case law across different countriesBenchmarking AI for classifying legal documents (e.g., contracts, legislation)Assessing AI's ability to answer legal questions based on provided documents

Competitive Edge

Establishes the most comprehensive open-source benchmark for legal information retrieval, filling critical gaps in existing resources.

Market Opportunity

Significant and growing legal tech market.

Revenue Models

N/A (benchmark creation).

Resource Requirements

Compute Needs

Moderate for running evaluations on the benchmark.

Data Requirements

Access to the MLEB datasets.

Deployment Constraints

Requires understanding of legal domain nuances for effective use.

Scalability

The benchmark is designed to be comprehensive and scalable for evaluating various models.

Regulatory Considerations

Legal data usage and privacy regulations.

Production Readiness

Maturity Level

Benchmark/Research

Time to Market

N/A (benchmark creation).

Patent Potential

Low, as it's a benchmark.

View Full Paper Back to Papers