Redirecting to original paper in 30 seconds...
Click below to go immediately or wait for automatic redirect
📄 Abstract
Abstract: We present DeepSeek-OCR as an initial investigation into the feasibility of
compressing long contexts via optical 2D mapping. DeepSeek-OCR consists of two
components: DeepEncoder and DeepSeek3B-MoE-A570M as the decoder. Specifically,
DeepEncoder serves as the core engine, designed to maintain low activations
under high-resolution input while achieving high compression ratios to ensure
an optimal and manageable number of vision tokens. Experiments show that when
the number of text tokens is within 10 times that of vision tokens (i.e., a
compression ratio < 10x), the model can achieve decoding (OCR) precision of
97%. Even at a compression ratio of 20x, the OCR accuracy still remains at
about 60%. This shows considerable promise for research areas such as
historical long-context compression and memory forgetting mechanisms in LLMs.
Beyond this, DeepSeek-OCR also demonstrates high practical value. On
OmniDocBench, it surpasses GOT-OCR2.0 (256 tokens/page) using only 100 vision
tokens, and outperforms MinerU2.0 (6000+ tokens per page on average) while
utilizing fewer than 800 vision tokens. In production, DeepSeek-OCR can
generate training data for LLMs/VLMs at a scale of 200k+ pages per day (a
single A100-40G). Codes and model weights are publicly accessible at
http://github.com/deepseek-ai/DeepSeek-OCR.
Authors (3)
Haoran Wei
Yaofeng Sun
Yukun Li
Submitted
October 21, 2025
Key Contributions
Introduces DeepSeek-OCR, a novel approach for compressing long contexts using optical 2D mapping, enabling efficient OCR. The DeepEncoder maintains low activations under high resolution for high compression ratios, achieving 97% OCR precision at <10x compression, showing promise for historical documents and LLM memory.
Business Value
Enables efficient processing and analysis of vast amounts of textual data, particularly historical documents or lengthy reports, unlocking insights and improving accessibility for research and business intelligence.