Redirecting to original paper in 30 seconds...
Click below to go immediately or wait for automatic redirect
📄 Abstract
Abstract: Large Language Models (LLMs) have shown great potential in supporting
automated code review due to their impressive capabilities in context
understanding and reasoning. However, these capabilities are still limited
compared to human-level cognition because they are heavily influenced by the
training data. Recent research has demonstrated significantly improved
performance through fine-tuning LLMs with code review data. However, compared
to human reviewers who often simultaneously analyze multiple dimensions of code
review to better identify issues, the full potential of these methods is
hampered by the limited or vague information used to fine-tune the models. This
paper contributes MelcotCR, a chain-of-thought (COT) fine-tuning approach that
trains LLMs with an impressive reasoning ability to analyze multiple dimensions
of code review by harnessing long COT techniques to provide rich structured
information. To address context loss and reasoning logic loss issues that
frequently occur when LLMs process long COT prompts, we propose a solution that
combines the Maximum Entropy (ME) modeling principle with pre-defined reasoning
pathways in MelcotCR to enable more effective utilization of in-context
knowledge within long COT prompts while strengthening the logical tightness of
the reasoning process. Empirical evaluations on our curated MelcotCR dataset
and the public CodeReviewer dataset reveal that a low-parameter base model,
such as 14B Qwen2.5, fine-tuned with MelcotCR can surpass state-of-the-art
methods in terms of the accuracy of detecting and describing code issues, with
its performance remarkably on par with that of the 671B DeepSeek-R1 model.