Redirecting to original paper in 30 seconds...
Click below to go immediately or wait for automatic redirect
Introduces LongCodeBench (LCB), a benchmark designed to evaluate LLM coding abilities in long-context scenarios (up to 1M tokens). LCB includes realistic code comprehension (LongCodeQA) and bug fixing (LongSWE-Bench) tasks derived from GitHub issues, enabling evaluation of models from 14B to flagship sizes.
Helps developers and organizations choose and optimize LLMs for software development tasks, leading to improved code quality, faster debugging, and more efficient development cycles.