Redirecting to original paper in 30 seconds...

Click below to go immediately or wait for automatic redirect

arxiv_ai 92% Match Research Paper ML engineers,Software developers,Researchers working on RAG,NLP practitioners 2 weeks ago

Practical Code RAG at Scale: Task-Aware Retrieval Design Choices under Compute Budgets

large-language-models › model-architecture
📄 Abstract

Abstract: We study retrieval design for code-focused generation tasks under realistic compute budgets. Using two complementary tasks from Long Code Arena -- code completion and bug localization -- we systematically compare retrieval configurations across various context window sizes along three axes: (i) chunking strategy, (ii) similarity scoring, and (iii) splitting granularity. (1) For PL-PL, sparse BM25 with word-level splitting is the most effective and practical, significantly outperforming dense alternatives while being an order of magnitude faster. (2) For NL-PL, proprietary dense encoders (Voyager-3 family) consistently beat sparse retrievers, however requiring 100x larger latency. (3) Optimal chunk size scales with available context: 32-64 line chunks work best at small budgets, and whole-file retrieval becomes competitive at 16000 tokens. (4) Simple line-based chunking matches syntax-aware splitting across budgets. (5) Retrieval latency varies by up to 200x across configurations; BPE-based splitting is needlessly slow, and BM25 + word splitting offers the best quality-latency trade-off. Thus, we provide evidence-based recommendations for implementing effective code-oriented RAG systems based on task requirements, model constraints, and computational efficiency.
Authors (3)
Timur Galimzyanov
Olga Kolomyttseva
Egor Bogomolov
Submitted
October 23, 2025
arXiv Category
cs.LG
arXiv PDF

Key Contributions

Provides a systematic study of retrieval design choices for code-focused RAG tasks under realistic compute budgets. It compares chunking, scoring, and granularity, finding BM25 effective for PL-PL tasks and dense encoders for NL-PL, with optimal chunk size scaling with context.

Business Value

Enables more efficient and cost-effective deployment of RAG systems for code-related tasks, improving developer productivity and reducing infrastructure costs.