arxiv_ai 95% Match Research Paper NLP Researchers,Information Extraction Practitioners,Machine Learning Engineers 2 weeks ago

ToMMeR -- Efficient Entity Mention Detection from Large Language Models

large-language-models › model-architecture

📄 Abstract

Abstract: Identifying which text spans refer to entities -- mention detection -- is both foundational for information extraction and a known performance bottleneck. We introduce ToMMeR, a lightweight model (<300K parameters) probing mention detection capabilities from early LLM layers. Across 13 NER benchmarks, ToMMeR achieves 93\% recall zero-shot, with over 90\% precision using an LLM as a judge showing that ToMMeR rarely produces spurious predictions despite high recall. Cross-model analysis reveals that diverse architectures (14M-15B parameters) converge on similar mention boundaries (DICE >75\%), confirming that mention detection emerges naturally from language modeling. When extended with span classification heads, ToMMeR achieves near SOTA NER performance (80-87\% F1 on standard benchmarks). Our work provides evidence that structured entity representations exist in early transformer layers and can be efficiently recovered with minimal parameters.

Authors (4)

Victor Morand

Nadi Tomeh

Josiane Mothe

Benjamin Piwowarski

Submitted

October 22, 2025

arXiv Category

cs.CL

arXiv PDF

Key Contributions

Introduces ToMMeR, a lightweight model (<300K parameters) that efficiently detects entity mentions from early LLM layers. It achieves high recall and precision in zero-shot settings and demonstrates that mention detection capabilities emerge naturally in transformers, providing a more efficient approach to a foundational NLP task.

Business Value

Enables more efficient and accurate information extraction from text, which can be applied to various business intelligence and data analysis tasks. Reduces computational costs for NLP pipelines.

Paper Metadata

Innovation Type

Algorithmic Improvement

Deployment Feasibility

High, due to the lightweight nature of the model (<300K parameters).

Limitations Addressed

Performance bottleneck in information extraction, high computational cost of existing mention detection methods.

Performance Gains

93% recall zero-shot, over 90% precision using LLM as judge, DICE >75% for cross-model analysis, 80-87% F1 on standard benchmarks when extended with span classification heads.

Technical Tags

entity mention detectionlarge language modelstransformer layerszero-shot learningrecallprecisionLLM as judgecross-model analysisnatural language processinginformation extraction

Research Topics

Efficient LLM ProbingFoundational NLP TasksEarly Layer RepresentationsZero-Shot Mention DetectionInformation Extraction Bottlenecks

Methods & Architectures

Lightweight model probingZero-shot evaluationLLM as judgeCross-model analysisSpan classification heads TransformerLightweight model (<300K parameters)

Applications & Tasks

Natural Language Processing Information Extraction Entity Mention DetectionInformation Extraction Bottleneck Entity Mention DetectionNamed Entity Recognition (NER)

Datasets & Benchmarks

Datasets

13 NER benchmarks

RecallPrecisionDICEF1 score

Related Fields

Machine LearningDeep LearningComputational Linguistics

Keywords

entity mention detectionLLMtransformerearly layerslightweight modelzero-shotinformation extractionNLPrecallprecisionNERmention boundariesstructured representations

Academic Context

#Efficient LLM Probing#Foundational NLP Tasks#Early Layer Representations#Zero-Shot Mention Detection#Information Extraction Bottlenecks

Commercial Potential

Potential Products

Efficient NLP extraction toolsInformation retrieval systems

Target Industries

TechnologyFinanceHealthcareMedia

Use Case Examples

Automated document analysisKnowledge graph constructionSearch engine improvement

Competitive Edge

Offers a more parameter-efficient approach to mention detection compared to larger models, while achieving competitive performance.

Resource Requirements

Compute Needs

Low (lightweight model)

Data Requirements

Standard NER benchmarks

Scalability

Scales well due to lightweight architecture.

Production Readiness

Maturity Level

Research

View Full Paper Back to Papers