arxiv_ai 92% Match Research Paper ML engineers,Software developers,Researchers working on RAG,NLP practitioners 2 weeks ago

Practical Code RAG at Scale: Task-Aware Retrieval Design Choices under Compute Budgets

large-language-models › model-architecture

📄 Abstract

Abstract: We study retrieval design for code-focused generation tasks under realistic compute budgets. Using two complementary tasks from Long Code Arena -- code completion and bug localization -- we systematically compare retrieval configurations across various context window sizes along three axes: (i) chunking strategy, (ii) similarity scoring, and (iii) splitting granularity. (1) For PL-PL, sparse BM25 with word-level splitting is the most effective and practical, significantly outperforming dense alternatives while being an order of magnitude faster. (2) For NL-PL, proprietary dense encoders (Voyager-3 family) consistently beat sparse retrievers, however requiring 100x larger latency. (3) Optimal chunk size scales with available context: 32-64 line chunks work best at small budgets, and whole-file retrieval becomes competitive at 16000 tokens. (4) Simple line-based chunking matches syntax-aware splitting across budgets. (5) Retrieval latency varies by up to 200x across configurations; BPE-based splitting is needlessly slow, and BM25 + word splitting offers the best quality-latency trade-off. Thus, we provide evidence-based recommendations for implementing effective code-oriented RAG systems based on task requirements, model constraints, and computational efficiency.

Authors (3)

Timur Galimzyanov

Olga Kolomyttseva

Egor Bogomolov

Submitted

October 23, 2025

arXiv Category

cs.LG

arXiv PDF

Key Contributions

Provides a systematic study of retrieval design choices for code-focused RAG tasks under realistic compute budgets. It compares chunking, scoring, and granularity, finding BM25 effective for PL-PL tasks and dense encoders for NL-PL, with optimal chunk size scaling with context.

Business Value

Enables more efficient and cost-effective deployment of RAG systems for code-related tasks, improving developer productivity and reducing infrastructure costs.

Paper Metadata

Innovation Type

Empirical Study and Best Practices

Deployment Feasibility

High, focuses on practical, compute-constrained scenarios and provides actionable insights for implementation.

Limitations Addressed

Lack of guidance on practical retrieval configuration for code RAG under compute constraints; latency issues with dense retrievers.

Performance Gains

Identifies optimal retrieval configurations for different code tasks and compute budgets, leading to significant improvements in efficiency and effectiveness.

Technical Tags

Retrieval-Augmented Generation (RAG)Code GenerationCode CompletionBug LocalizationSparse RetrievalDense RetrievalBM25Embedding ModelsCompute BudgetsChunking Strategy

Research Topics

Retrieval-Augmented GenerationCode IntelligenceEfficient LLM DeploymentInformation RetrievalNatural Language Processing

Methods & Architectures

Systematic Comparison of Retrieval ConfigurationsBM25 (Sparse Retrieval)Dense Encoder ModelsChunking StrategiesSimilarity Scoring Transformer-based LLMsRetrieval Models (BM25, Dense Encoders)

Applications & Tasks

Software Development Code Generation Code Analysis Compute budget constraintsOptimizing retrieval for specific code tasksLatency vs. Accuracy trade-offs Code completionBug localizationRetrieval design for RAG

Datasets & Benchmarks

Datasets

Long Code Arena

AccuracyLatencyRetrieval Performance

Related Fields

Information RetrievalNatural Language ProcessingSoftware EngineeringMachine Learning Operations (MLOps)

Keywords

RAGCode GenerationRetrievalLLMCompute BudgetBM25Dense RetrievalCode CompletionBug LocalizationChunkingLatencySoftware DevelopmentNLP

Academic Context

#Retrieval-Augmented Generation#Code Intelligence#Efficient LLM Deployment#Information Retrieval#Natural Language Processing

Commercial Potential

Potential Products

Optimized RAG components for code intelligence platformsTools for efficient LLM deployment in software development workflows

Target Industries

Software DevelopmentTechnologyIT Services

Use Case Examples

Building a code completion tool that runs efficiently on developer machinesDeveloping a bug localization system with low latency for IDE integration

Competitive Edge

Provides practical, empirical guidance for optimizing RAG systems for code tasks, focusing on efficiency and real-world constraints.

Resource Requirements

Compute Needs

Focuses on realistic, constrained compute budgets.

Data Requirements

Code datasets (e.g., from Long Code Arena).

Deployment Constraints

Explicitly addresses compute and latency constraints.

Scalability

Provides strategies for scaling RAG systems within given compute budgets.

Production Readiness

Maturity Level

Applied Research / Best Practices

View Full Paper Back to Papers