arxiv_ai 92% Match Research Paper Healthcare Providers,Medical Coders,AI Researchers in Healthcare,LLM Developers 3 weeks ago

Reinforcement Learning for Out-of-Distribution Reasoning in LLMs: An Empirical Study on Diagnosis-Related Group Coding

large-language-models › reasoning

📄 Abstract

Abstract: Diagnosis-Related Group (DRG) codes are essential for hospital reimbursement and operations but require labor-intensive assignment. Large Language Models (LLMs) struggle with DRG coding due to the out-of-distribution (OOD) nature of the task: pretraining corpora rarely contain private clinical or billing data. We introduce DRG-Sapphire, which uses large-scale reinforcement learning (RL) for automated DRG coding from clinical notes. Built on Qwen2.5-7B and trained with Group Relative Policy Optimization (GRPO) using rule-based rewards, DRG-Sapphire introduces a series of RL enhancements to address domain-specific challenges not seen in previous mathematical tasks. Our model achieves state-of-the-art accuracy on the MIMIC-IV benchmark and generates physician-validated reasoning for DRG assignments, significantly enhancing explainability. Our study further sheds light on broader challenges of applying RL to knowledge-intensive, OOD tasks. We observe that RL performance scales approximately linearly with the logarithm of the number of supervised fine-tuning (SFT) examples, suggesting that RL effectiveness is fundamentally constrained by the domain knowledge encoded in the base model. For OOD tasks like DRG coding, strong RL performance requires sufficient knowledge infusion prior to RL. Consequently, scaling SFT may be more effective and computationally efficient than scaling RL alone for such tasks.

Authors (7)

Hanyin Wang

Zhenbang Wu

Gururaj Kolar

Hariprasad Korsapati

Brian Bartlett

Bryan Hull

+1 more

Submitted

May 28, 2025

arXiv Category

cs.LG

arXiv PDF

Key Contributions

Introduces DRG-Sapphire, a system using large-scale Reinforcement Learning (RL) for automated DRG coding from clinical notes, achieving state-of-the-art accuracy on MIMIC-IV. It enhances explainability by generating physician-validated reasoning and addresses challenges of applying RL to knowledge-intensive, OOD tasks.

Business Value

Automates the complex and labor-intensive DRG coding process, leading to significant cost savings, improved billing accuracy, and better operational efficiency in healthcare institutions.

Paper Metadata

Innovation Type

Novel Application of RL and LLMs

Deployment Feasibility

Requires integration with clinical data systems and potentially physician validation workflows. The use of RL adds complexity but the results suggest feasibility.

Limitations Addressed

Addresses LLMs' struggle with Out-of-Distribution (OOD) tasks like DRG coding, where pretraining data lacks private clinical/billing information. It also tackles the labor-intensive nature of manual DRG assignment and improves explainability.

Performance Gains

Achieves state-of-the-art accuracy on MIMIC-IV.

Technical Tags

Reinforcement Learning (RL)Out-of-Distribution (OOD) ReasoningDiagnosis-Related Group (DRG) CodingClinical NotesAutomated CodingExplainabilityGroup Relative Policy Optimization (GRPO)Rule-Based RewardsQwen2.5-7B

Research Topics

LLM ApplicationsReinforcement LearningOut-of-Distribution GeneralizationHealthcare AIExplainable AI

Methods & Architectures

Reinforcement Learning (RL)Group Relative Policy Optimization (GRPO)Rule-Based Reward DesignFine-tuning LLMsClinical Data Analysis Qwen2.5-7BLarge Language Models (LLMs)

Applications & Tasks

Healthcare Medical Billing Clinical Informatics Out-of-Distribution GeneralizationAutomated Medical CodingKnowledge-Intensive Task Learning DRG Coding from Clinical NotesImproving LLM Reasoning on OOD tasks

Datasets & Benchmarks

Datasets

MIMIC-IV

Benchmarks

MIMIC-IV accuracy: state-of-the-art

Accuracy

Related Fields

Healthcare AIMedical InformaticsReinforcement LearningNatural Language ProcessingExplainable AI

Keywords

LLMReinforcement LearningDRG CodingHealthcareClinical NotesOODExplainabilityAutomated CodingQwenGRPO

Academic Context

#LLM Applications#Reinforcement Learning#Out-of-Distribution Generalization#Healthcare AI#Explainable AI

Commercial Potential

Potential Products

Automated Medical Coding SoftwareAI-powered Clinical Documentation Tools

Target Industries

HealthcareHospitalsMedical Billing Services

Use Case Examples

Automating the assignment of DRG codes for patient stays based on their clinical notes, streamlining hospital reimbursement processes.

Competitive Edge

Aims to outperform existing manual coding processes and potentially other automated methods by leveraging RL for OOD generalization and providing explainable reasoning.

Market Opportunity

Significant market for healthcare administrative automation and AI solutions.

Revenue Models

SaaS for healthcare providerslicensing to EMR vendors.

Resource Requirements

Data Requirements

Requires access to clinical notes and associated DRG codes (e.g., MIMIC-IV).

Deployment Constraints

Integration into existing hospital workflows, regulatory compliance (HIPAA), and the need for physician validation of generated reasoning.

Scalability

Performance scaling with RL is mentioned as an area of study.

Regulatory Considerations

HIPAA compliance for handling patient data.

Production Readiness

Maturity Level

Research/Demonstration

Time to Market

Medium-term, pending validation and integration.

View Full Paper Back to Papers