Redirecting to original paper in 30 seconds...

Click below to go immediately or wait for automatic redirect

arxiv_cv 95% Match Technical Report / Research Paper AI researchers,NLP engineers,Computer vision engineers,Developers of document processing solutions 2 weeks ago

PaddleOCR-VL: Boosting Multilingual Document Parsing via a 0.9B Ultra-Compact Vision-Language Model

large-language-models › multimodal-llms
📄 Abstract

Abstract: In this report, we propose PaddleOCR-VL, a SOTA and resource-efficient model tailored for document parsing. Its core component is PaddleOCR-VL-0.9B, a compact yet powerful vision-language model (VLM) that integrates a NaViT-style dynamic resolution visual encoder with the ERNIE-4.5-0.3B language model to enable accurate element recognition. This innovative model efficiently supports 109 languages and excels in recognizing complex elements (e.g., text, tables, formulas, and charts), while maintaining minimal resource consumption. Through comprehensive evaluations on widely used public benchmarks and in-house benchmarks, PaddleOCR-VL achieves SOTA performance in both page-level document parsing and element-level recognition. It significantly outperforms existing solutions, exhibits strong competitiveness against top-tier VLMs, and delivers fast inference speeds. These strengths make it highly suitable for practical deployment in real-world scenarios. Code is available at https://github.com/PaddlePaddle/PaddleOCR .
Authors (18)
Cheng Cui
Ting Sun
Suyin Liang
Tingquan Gao
Zelun Zhang
Jiaxuan Liu
+12 more
Submitted
October 16, 2025
arXiv Category
cs.CV
arXiv PDF Code

Key Contributions

PaddleOCR-VL introduces PaddleOCR-VL-0.9B, an ultra-compact VLM that integrates a dynamic resolution visual encoder with ERNIE-4.5-0.3B for accurate multilingual document parsing. It achieves SOTA performance across 109 languages for complex element recognition while maintaining minimal resource consumption and fast inference.

Business Value

Enables efficient and accurate processing of diverse multilingual documents, automating tasks like data entry, information retrieval, and knowledge management for global businesses.

View Code on GitHub